LLM Routing

Route requests across multiple LLM providers with built-in strategies. Routing enables failover, cost optimization, latency reduction, and capability-based model selection.

Fallback Router

The simplest routing strategy: tries each provider in the order you list them, moving to the next only if the current one fails. Use this when you want high availability with a clear priority order.

LLMRouter router = FallbackRouter.of(
    new OpenAIClient("gpt-4o"),
    new AnthropicClient("claude-sonnet-4-20250514"),
    new GroqClient("llama-3.3-70b-versatile")
);

ChatResponse response = router.chat("Hello");
// If OpenAI fails → tries Anthropic → then Groq

Cost-Based Router

Automatically selects the cheapest provider for each request based on the model pricing data. Use this when you want to minimize LLM spending while keeping multiple providers available.

LLMRouter router = CostBasedRouter.of(
    new OpenAIClient("gpt-4o-mini"),        // $0.15/$0.60 per 1M tokens
    new GroqClient("llama-3.3-70b"),        // Free tier
    new AnthropicClient("claude-3.5-haiku") // $0.80/$4.00
);

Tracks the actual response time of each provider and routes new requests to the fastest one. The router continuously updates its latency measurements, so it adapts if a provider speeds up or slows down.

LLMRouter router = LatencyBasedRouter.of(
    new OpenAIClient("gpt-4o"),
    new GroqClient("llama-3.3-70b")  // Typically faster
);

Capability Router

Inspects each request's requirements (vision, tool calling, etc.) and routes to a provider that supports them. This lets you use cheaper text-only models for simple requests while reserving expensive multimodal models for requests that need them.

LLMRouter router = CapabilityRouter.of(
    new OpenAIClient("gpt-4o"),       // Vision + tools
    new OllamaClient("llama3")        // Text only
);
// Vision requests go to OpenAI, text-only to Ollama

Round-Robin Router

Distributes requests evenly across providers in a rotating order. This is useful for spreading rate limit usage across multiple API keys or providers.

LLMRouter router = RoundRobinRouter.of(
    new OpenAIClient("gpt-4o"),
    new AnthropicClient("claude-sonnet-4-20250514"),
    new GeminiClient("gemini-2.5-flash")
);

Task-Based Router

Routes based on automatic task type classification. The router analyzes prompt content using keyword matching, regex patterns, and conversation history to select the best model for each request.

TaskBasedRouter router = TaskBasedRouter.builder()
    .forTask(TaskType.CODING, new AnthropicClient("claude-sonnet-4"))
    .forTask(TaskType.CREATIVE, new OpenAIClient("gpt-4o"))
    .forTask(TaskType.MATH, new OpenAIClient("o1-preview"))
    .forTask(TaskType.FAST, new GroqClient("llama-3.3-70b"))
    .forTask(TaskType.VISION, new OpenAIClient("gpt-4o"))
    .defaultClient(new OpenAIClient("gpt-4o-mini"))
    .build();

// Auto-routing -- no manual model selection
router.chat("Write a Python function to sort a list");  // -> Claude (CODING)
router.chat("Solve: integral of x squared dx");          // -> o1 (MATH)
router.chat("Write a poem about autumn");                // -> GPT-4o (CREATIVE)
router.chat("What is 2+2?");                             // -> GPT-4o-mini (default)

TaskType Enum

Ten task categories cover the most common LLM use cases. Each type has built-in keywords for automatic classification, and you can add custom keywords for your domain.

TaskType	Description	Recommended Models	Example Keywords
`CODING`	Code generation, debugging, refactoring	Claude Sonnet, GPT-4o, Codestral	code, function, debug, python, java, algorithm
`CREATIVE`	Stories, poetry, marketing copy	GPT-4o, Claude Opus	poem, creative, fiction, blog post, narrative
`MATH`	Calculations, proofs, equations	o1, Gemini 2.0, Claude with CoT	calculate, solve, equation, integral, theorem
`ANALYSIS`	Data analysis, summarization	GPT-4o, Gemini, Claude	analyze, summarize, extract, compare, metrics
`TRANSLATION`	Language translation	GPT-4o, Gemini	translate, in english, in french
`CHAT`	Conversational Q&A	Any capable model	hello, what is, explain, tell me
`FAST`	Quick, simple tasks	GPT-4o-mini, Groq, Gemini Flash	quick, simple, brief, yes or no
`REASONING`	Deep reasoning and logic	o1, o1-pro, Claude extended thinking	think step by step, logic, deduce, infer
`VISION`	Image understanding	GPT-4o, Gemini, Claude (vision)	image, picture, screenshot, diagram
`GENERAL`	Default fallback	Configured default client	(no keywords)

Each TaskType provides matches(text) for boolean matching and matchScore(text) returning the number of keyword hits.

TaskClassifier

TaskClassifier analyzes the text of each prompt and assigns it to a task type. It uses a multi-signal scoring system that combines regex patterns, keyword matching, custom keywords, and conversation history to make accurate classifications.

Classification strategy (in order):

Pattern detection -- Regex patterns for code blocks, function signatures, file extensions, math equations, math symbols, and image references. Highest scoring priority.
Keyword scoring -- Each keyword match from TaskType.getKeywords() adds 2 points.
Custom keyword scoring -- User-added keywords add 3 points each.
History context boost -- Last 3 history messages give a 1.5-point boost per matching task type.
Best match selection -- Highest total score wins. Score is converted to confidence (0.0-1.0). Falls back to default type if below minConfidenceThreshold.

TaskClassifier classifier = TaskClassifier.defaultClassifier();

// Simple classification
TaskType type = classifier.classify("Write a Python function to sort a list");
// Returns: CODING

// With confidence score
ClassificationResult result = classifier.classifyWithConfidence("Solve x^2 + 2x + 1 = 0");
result.taskType();     // MATH
result.confidence();   // 0.85
result.reason();       // "Pattern and keyword match"

// With conversation history for context
List<String> history = List.of("I'm working on a React app", "Help me debug");
TaskType type = classifier.classify("What's wrong with this code?", history);
// Returns: CODING (boosted by history context)

Custom classifier:

TaskClassifier classifier = TaskClassifier.builder()
    .addKeywords(TaskType.CODING, Set.of("backend", "frontend", "docker"))
    .addPattern(TaskType.MATH, Pattern.compile("[0-9]+"))
    .setDefaultType(TaskType.CHAT)
    .setMinConfidenceThreshold(0.2)
    .build();

TaskBasedRouter Builder

Configure which LLM client handles each task type. A default client is required as a fallback for unclassified requests.

TaskBasedRouter router = TaskBasedRouter.builder()
    .forTask(TaskType.CODING, codingClient)      // Assign client per task type
    .forTask(TaskType.MATH, mathClient)
    .defaultClient(generalClient)                 // Required: fallback for unmatched tasks
    .classifier(customClassifier)                 // Custom TaskClassifier (optional)
    .confidenceThreshold(0.3)                     // Below this, use default (default: 0.3)
    .build();

Convenience setup for common mappings:

TaskBasedRouter router = TaskBasedRouter.builder()
    .withCommonSetup(
        codingClient,     // CODING
        reasoningClient,  // MATH + REASONING
        fastClient,       // FAST + CHAT
        generalClient     // default
    )
    .build();

Manual Override

When you know the task type in advance, you can bypass automatic classification and route directly to the appropriate client.

// Force a specific task type (bypasses classification)
ChatResponse response = router.chatAs(TaskType.CODING, "Explain this concept");

// Get the client for a task type directly
LLMClient codingClient = router.getClientForTask(TaskType.CODING);

// Inspect classification without routing
ClassificationResult result = router.classifyTask("Write a poem");

Routing Statistics

Track how requests are distributed across task types to understand your usage patterns and estimate cost savings from intelligent routing.

TaskRoutingStats stats = router.getTaskStats();

// Requests per task type
Map<TaskType, Long> requests = stats.requestsPerTask();

// Token usage per task type
Map<TaskType, Long> tokens = stats.tokensPerTask();

// Estimated cost savings vs. sending everything to a premium model
double savings = stats.estimatedSavings();

// Most common task type
Optional<TaskType> common = stats.mostCommonTask();

// Task distribution as percentages
Map<TaskType, Double> distribution = stats.taskDistribution();

// General routing stats (total, success, failure, per-provider)
RoutingStats routing = stats.routingStats();

// Reset all counters
router.resetStats();

All Strategies

Choose a routing strategy based on your primary concern: availability, cost, speed, or task complexity. You can also combine strategies by nesting routers.

Router	Strategy	Best For
`FallbackRouter`	Try next on failure	High availability
`CostBasedRouter`	Cheapest provider	Budget optimization
`LatencyBasedRouter`	Fastest measured latency	Real-time applications
`CapabilityRouter`	Model capabilities match	Mixed workloads (vision + text)
`RoundRobinRouter`	Even load distribution	Rate limit management
`TaskBasedRouter`	Task type classification	Varied complexity tasks