LLM
Configure providers, route between models, cache responses, and track cost.
Pages
- Providers — 30+ built-in LLM providers and how to add your own.
- Routing — Pick the right model per request (size, cost, capability).
- Caching — Prompt cache, response cache.
- Cost Tracking — Per-agent and per-session cost accounting.
- Observability — Capture every LLM call as a typed
LLMCallLogevent with prompt, response, usage, cost, and streaming timing. - Audio — Speech-to-text and text-to-speech.
- Advanced — Request hooks, custom transports, rate limiting.
RAG Pipeline
The server provides a per-session Retrieval-Augmented Generation pipeline that indexes local codebases, chunks source files by language boundaries, and retrieves relevant context using hybrid BM25 + vector search with Reciprocal Rank Fusion.
LLM Providers
The LLM module provides a unified interface to 30+ language-model providers. Every provider implements the same LLMClient interface, so switching providers means changing one line — the model name and provider key — not your agent code.