LLM Routing
Route requests across multiple LLM providers with built-in strategies. Routing enables failover, cost optimization, latency reduction, and capability-based model selection.
Fallback Router
The simplest routing strategy: tries each provider in the order you list them, moving to the next only if the current one fails. Use this when you want high availability with a clear priority order.
LLMRouter router = FallbackRouter.of(
new OpenAIClient("gpt-4o"),
new AnthropicClient("claude-sonnet-4-20250514"),
new GroqClient("llama-3.3-70b-versatile")
);
ChatResponse response = router.chat("Hello");
// If OpenAI fails → tries Anthropic → then GroqCost-Based Router
Automatically selects the cheapest provider for each request based on the model pricing data. Use this when you want to minimize LLM spending while keeping multiple providers available.
LLMRouter router = CostBasedRouter.of(
new OpenAIClient("gpt-4o-mini"), // $0.15/$0.60 per 1M tokens
new GroqClient("llama-3.3-70b"), // Free tier
new AnthropicClient("claude-3.5-haiku") // $0.80/$4.00
);Latency-Based Router
Tracks the actual response time of each provider and routes new requests to the fastest one. The router continuously updates its latency measurements, so it adapts if a provider speeds up or slows down.
LLMRouter router = LatencyBasedRouter.of(
new OpenAIClient("gpt-4o"),
new GroqClient("llama-3.3-70b") // Typically faster
);Capability Router
Inspects each request's requirements (vision, tool calling, etc.) and routes to a provider that supports them. This lets you use cheaper text-only models for simple requests while reserving expensive multimodal models for requests that need them.
LLMRouter router = CapabilityRouter.of(
new OpenAIClient("gpt-4o"), // Vision + tools
new OllamaClient("llama3") // Text only
);
// Vision requests go to OpenAI, text-only to OllamaRound-Robin Router
Distributes requests evenly across providers in a rotating order. This is useful for spreading rate limit usage across multiple API keys or providers.
LLMRouter router = RoundRobinRouter.of(
new OpenAIClient("gpt-4o"),
new AnthropicClient("claude-sonnet-4-20250514"),
new GeminiClient("gemini-2.5-flash")
);Task-Based Router
Routes based on automatic task type classification. The router analyzes prompt content using keyword matching, regex patterns, and conversation history to select the best model for each request.
TaskBasedRouter router = TaskBasedRouter.builder()
.forTask(TaskType.CODING, new AnthropicClient("claude-sonnet-4"))
.forTask(TaskType.CREATIVE, new OpenAIClient("gpt-4o"))
.forTask(TaskType.MATH, new OpenAIClient("o1-preview"))
.forTask(TaskType.FAST, new GroqClient("llama-3.3-70b"))
.forTask(TaskType.VISION, new OpenAIClient("gpt-4o"))
.defaultClient(new OpenAIClient("gpt-4o-mini"))
.build();
// Auto-routing -- no manual model selection
router.chat("Write a Python function to sort a list"); // -> Claude (CODING)
router.chat("Solve: integral of x squared dx"); // -> o1 (MATH)
router.chat("Write a poem about autumn"); // -> GPT-4o (CREATIVE)
router.chat("What is 2+2?"); // -> GPT-4o-mini (default)TaskType Enum
Ten task categories cover the most common LLM use cases. Each type has built-in keywords for automatic classification, and you can add custom keywords for your domain.
| TaskType | Description | Recommended Models | Example Keywords |
|---|---|---|---|
CODING | Code generation, debugging, refactoring | Claude Sonnet, GPT-4o, Codestral | code, function, debug, python, java, algorithm |
CREATIVE | Stories, poetry, marketing copy | GPT-4o, Claude Opus | poem, creative, fiction, blog post, narrative |
MATH | Calculations, proofs, equations | o1, Gemini 2.0, Claude with CoT | calculate, solve, equation, integral, theorem |
ANALYSIS | Data analysis, summarization | GPT-4o, Gemini, Claude | analyze, summarize, extract, compare, metrics |
TRANSLATION | Language translation | GPT-4o, Gemini | translate, in english, in french |
CHAT | Conversational Q&A | Any capable model | hello, what is, explain, tell me |
FAST | Quick, simple tasks | GPT-4o-mini, Groq, Gemini Flash | quick, simple, brief, yes or no |
REASONING | Deep reasoning and logic | o1, o1-pro, Claude extended thinking | think step by step, logic, deduce, infer |
VISION | Image understanding | GPT-4o, Gemini, Claude (vision) | image, picture, screenshot, diagram |
GENERAL | Default fallback | Configured default client | (no keywords) |
Each TaskType provides matches(text) for boolean matching and matchScore(text) returning the number of keyword hits.
TaskClassifier
TaskClassifier analyzes the text of each prompt and assigns it to a task type. It uses a multi-signal scoring system that combines regex patterns, keyword matching, custom keywords, and conversation history to make accurate classifications.
Classification strategy (in order):
- Pattern detection -- Regex patterns for code blocks, function signatures, file extensions, math equations, math symbols, and image references. Highest scoring priority.
- Keyword scoring -- Each keyword match from
TaskType.getKeywords()adds 2 points. - Custom keyword scoring -- User-added keywords add 3 points each.
- History context boost -- Last 3 history messages give a 1.5-point boost per matching task type.
- Best match selection -- Highest total score wins. Score is converted to confidence (0.0-1.0). Falls back to default type if below
minConfidenceThreshold.
TaskClassifier classifier = TaskClassifier.defaultClassifier();
// Simple classification
TaskType type = classifier.classify("Write a Python function to sort a list");
// Returns: CODING
// With confidence score
ClassificationResult result = classifier.classifyWithConfidence("Solve x^2 + 2x + 1 = 0");
result.taskType(); // MATH
result.confidence(); // 0.85
result.reason(); // "Pattern and keyword match"
// With conversation history for context
List<String> history = List.of("I'm working on a React app", "Help me debug");
TaskType type = classifier.classify("What's wrong with this code?", history);
// Returns: CODING (boosted by history context)Custom classifier:
TaskClassifier classifier = TaskClassifier.builder()
.addKeywords(TaskType.CODING, Set.of("backend", "frontend", "docker"))
.addPattern(TaskType.MATH, Pattern.compile("[0-9]+"))
.setDefaultType(TaskType.CHAT)
.setMinConfidenceThreshold(0.2)
.build();TaskBasedRouter Builder
Configure which LLM client handles each task type. A default client is required as a fallback for unclassified requests.
TaskBasedRouter router = TaskBasedRouter.builder()
.forTask(TaskType.CODING, codingClient) // Assign client per task type
.forTask(TaskType.MATH, mathClient)
.defaultClient(generalClient) // Required: fallback for unmatched tasks
.classifier(customClassifier) // Custom TaskClassifier (optional)
.confidenceThreshold(0.3) // Below this, use default (default: 0.3)
.build();Convenience setup for common mappings:
TaskBasedRouter router = TaskBasedRouter.builder()
.withCommonSetup(
codingClient, // CODING
reasoningClient, // MATH + REASONING
fastClient, // FAST + CHAT
generalClient // default
)
.build();Manual Override
When you know the task type in advance, you can bypass automatic classification and route directly to the appropriate client.
// Force a specific task type (bypasses classification)
ChatResponse response = router.chatAs(TaskType.CODING, "Explain this concept");
// Get the client for a task type directly
LLMClient codingClient = router.getClientForTask(TaskType.CODING);
// Inspect classification without routing
ClassificationResult result = router.classifyTask("Write a poem");Routing Statistics
Track how requests are distributed across task types to understand your usage patterns and estimate cost savings from intelligent routing.
TaskRoutingStats stats = router.getTaskStats();
// Requests per task type
Map<TaskType, Long> requests = stats.requestsPerTask();
// Token usage per task type
Map<TaskType, Long> tokens = stats.tokensPerTask();
// Estimated cost savings vs. sending everything to a premium model
double savings = stats.estimatedSavings();
// Most common task type
Optional<TaskType> common = stats.mostCommonTask();
// Task distribution as percentages
Map<TaskType, Double> distribution = stats.taskDistribution();
// General routing stats (total, success, failure, per-provider)
RoutingStats routing = stats.routingStats();
// Reset all counters
router.resetStats();All Strategies
Choose a routing strategy based on your primary concern: availability, cost, speed, or task complexity. You can also combine strategies by nesting routers.
| Router | Strategy | Best For |
|---|---|---|
FallbackRouter | Try next on failure | High availability |
CostBasedRouter | Cheapest provider | Budget optimization |
LatencyBasedRouter | Fastest measured latency | Real-time applications |
CapabilityRouter | Model capabilities match | Mixed workloads (vision + text) |
RoundRobinRouter | Even load distribution | Rate limit management |
TaskBasedRouter | Task type classification | Varied complexity tasks |
LLM Providers
The LLM module provides a unified interface to 14 language model providers. All providers implement the `LLMClient` interface from Core, so you can swap providers without changing your agent code.
MCP Client
The `McpClient` connects to any MCP-compatible server and exposes its tools, resources, and prompts.