LLM

Configure providers, route between models, cache responses, and track cost.

Pages

Providers — 30+ built-in LLM providers and how to add your own.
Routing — Pick the right model per request (size, cost, capability).
Caching — Prompt cache, response cache.
Cost Tracking — Per-agent and per-session cost accounting.
Observability — Capture every LLM call as a typed LLMCallLog event with prompt, response, usage, cost, and streaming timing.
Audio — Speech-to-text and text-to-speech.
Advanced — Request hooks, custom transports, rate limiting.

RAG Pipeline

The server provides a per-session Retrieval-Augmented Generation pipeline that indexes local codebases, chunks source files by language boundaries, and retrieves relevant context using hybrid BM25 + vector search with Reciprocal Rank Fusion.

LLM Providers

The LLM module provides a unified interface to 30+ language-model providers. Every provider implements the same LLMClient interface, so switching providers means changing one line — the model name and provider key — not your agent code.

LLM

Pages

On this page