HybridInference

Developer Guide:

  • Installation
    • Production (Docker)
    • Development Setup
      • System Requirements
      • Using uv (Recommended)
      • Using conda
    • Configuration
      • Environment Variables
    • Verification
    • Documentation Structure
    • Troubleshooting
  • Deployment Guide
    • Quick Start (Docker)
    • Prerequisites
    • Service Architecture
    • Common Operations
    • Configuration
      • Environment Variables
      • Local GPU Endpoints
    • Nginx and HTTPS
      • Cloudflare
    • Subscription Account Management
      • Import Claude credentials
      • Import Codex credentials
      • Check account health
      • Account state persistence
      • Verify Anthropic-native surface
    • Monitoring
      • Health Checks
      • Prometheus Metrics
      • Grafana Dashboards
      • Alerting
    • Database
    • Troubleshooting
      • Service won’t start
      • Rebuild after code changes
      • Full reset (preserves data)
      • Full reset (destroy data)
  • Architecture Overview
    • System Architecture
    • Network Layer
    • Core Components
      • Serving Layer (serving/)
      • Routing Layer (routing/)
      • Configuration (config/)
      • Infrastructure (infrastructure/)
      • Subscription Adapters
      • Northbound API Surfaces
    • Key Design Principles
    • Data Flow
  • Hybrid Inference Routing System
    • Architecture
      • Decision Layer (routing/manager.py + routing/strategies.py)
      • Execution Layer (routing/executor.py)
    • Features
    • Configuration
      • Required Files
      • Optional Files
      • Example Configuration (60/40 split):
    • Running the Server
    • API Endpoints
    • Extending the System
      • Adding New Strategies
      • Health Monitoring
    • Migration Notes
  • Adding a New Model (OpenRouter-Compatible)
    • Overview
    • Quick Start
      • Adding a Model with an Existing Provider
    • Adding a New Provider
      • Step 1: Create Provider Adapter
      • Step 2: Register the Adapter
      • Step 3: Add Model Configuration
      • Step 4: Configure Environment Variables
      • Step 5: Test the Integration
    • Configuration Reference
      • ModelConfig Fields
      • Route Configuration
      • Hybrid Routing & OFFLOAD
    • BaseAdapter API Reference
      • Required Methods
      • Utility Methods
      • Available Attributes
    • Advanced Features
      • Multi-Modal Support
      • Tool/Function Calling
      • Structured Output (JSON Mode)
      • Rate Limiting (Optional)
    • Examples
      • Example 1: OpenAI-Compatible Provider
      • Example 2: Custom API Format
      • Example 3: Local Deployment
    • Troubleshooting
      • Model Not Appearing in /v1/models
      • Authentication Failures
      • Response Format Errors
      • Streaming Issues
    • Best Practices
    • See Also
  • Configuration Guide
    • 1. Environment Variables and Priority
      • Variable Substitution
      • Configuration Priority
    • 2. models.yaml (Required)
      • Key Points:
    • 3. routing.yaml (Optional)
      • Fixed-Ratio Strategy
      • Health Checking (Optional)
      • Example: Hybrid Deployment (60% local / 40% remote)
      • How It Works:
      • Local-Only Deployment
    • 4. Running the System
      • Set Environment Variables:
      • Start the Server:
      • Verify Operation:
    • 5. Subscription Adapters (Claude / Codex)
      • 5.1 Claude Subscription Setup
      • 5.2 Account Lifecycle
      • 5.3 Credential File Format (v2)
      • 5.4 Codex Subscription
    • 6. FAQ
  • PostgreSQL Admin Playbook
    • Environment Variables
    • Restart Admin Stack
    • Access pgAdmin Securely
    • Register Primary Database
    • Post-Restart Checks
  • OpenRouter-Compatible API Gateway
    • Architecture
      • Key Components
    • Features
    • Development Setup
      • Prerequisites
      • Create Environment
      • Local Environment Variables
      • Run Locally
      • Quick Checks
    • Production Deployment
    • API Surface
      • Example Requests
    • Logging and Metrics
    • Testing
    • Troubleshooting
    • Related Docs
  • FreeInference Deployment
    • Cloudflare + Nginx + FastAPI (current)
      • Deployment
      • Runtime Operations
      • Why Nginx Is Back
    • Legacy Architectures
      • FastAPI direct (v3, abandoned)
      • Nginx (v2, abandoned)
      • Nginx + Lua via OpenResty (v1, abandoned)
        • Overview
        • Installation Notes
      • Nginx (v0, abandoned)
  • Claude Code Setup
    • Quick Setup (macOS / Linux)
    • Manual Setup
    • Available Models
    • Usage
    • Troubleshooting
    • Uninstall
  • FASRC Deployment
    • docker
  • Contributing
    • Development Setup
    • Code Quality Standards
      • Pre-commit Hooks
    • Development Workflow
    • Testing
    • Documentation
    • Pull Request Guidelines
  • Staging Guide
    • Why this staging shape
    • Default ports
    • First-time server bootstrap
    • Required env for a typical staging run
    • Start staging
    • SSH forwarding
    • Common operations
    • Notes for model monitor
HybridInference
  • Search


© Copyright 2025, Harvard System Lab.

Built with Sphinx using a theme provided by Read the Docs.