LLM Providers
The LLM module provides a unified interface to 14 language model providers. All providers implement the `LLMClient` interface from Core, so you can swap providers without changing your agent code.
Quick Start
Create a client for any supported provider with a single factory call. The client handles authentication, serialization, retries, and streaming automatically.
LLMClient client = LLMClientFactory.create("openai", "gpt-4o", 0.7f);
ChatResponse response = client.chat("What is quantum computing?");API keys are resolved from environment variables automatically:
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...Supported Providers
TnsAI supports 13 LLM providers out of the box. Each provider implements the same LLMClient interface, so switching between providers requires changing only the creation call -- no other code changes needed.
| Provider | Class | Models | Features |
|---|---|---|---|
| OpenAI | OpenAIClient | gpt-4o, gpt-4o-mini, gpt-4-turbo, o1, o3-mini | Streaming, tools, JSON mode, vision |
| Anthropic | AnthropicClient | claude-sonnet-4, claude-3.5-sonnet, claude-3-opus | Streaming, tools, vision, prompt caching |
| Google Gemini | GeminiClient | gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash | Streaming, tools, multimodal |
| Mistral | MistralClient | mistral-large-latest, codestral-latest | Streaming, tools, multimodal |
| Groq | GroqClient | llama-3.3-70b, mixtral-8x7b | Streaming, tools, ultra-fast inference |
| Ollama | OllamaClient | llama3, mistral, any local model | Streaming, tools, local, no API key |
| AWS Bedrock | BedrockClient | claude-3, llama-3 via AWS | Streaming, tools, AWS-managed |
| Azure OpenAI | AzureOpenAIClient | gpt-4, gpt-3.5-turbo via Azure | Streaming, tools, Azure-hosted |
| Cohere | CohereClient | command-r-plus, command-r | Streaming, tools |
| HuggingFace | HuggingFaceClient | 100k+ community models | Streaming, tools |
| OpenRouter | OpenRouterClient | 200+ models via aggregator | Streaming, tools, multi-provider |
| ZhipuAI | ZhipuAIClient | glm-4, glm-4v | Streaming, tools, vision |
| MiniMax | MiniMaxClient | abab6.5-chat, abab6-chat | Streaming, tools |
Creating Clients
Using the Factory
LLMClientFactory is the recommended way to create clients. It resolves API keys from environment variables, selects the correct provider class, and applies default settings. Use this unless you need fine-grained control over client construction.
// Basic — provider name, model, temperature
LLMClient client = LLMClientFactory.create("openai", "gpt-4o", 0.7f);
// With max tokens
LLMClient client = LLMClientFactory.create("anthropic", "claude-sonnet-4-20250514", 0.7f, 4096);
// With topP (nucleus sampling)
LLMClient client = LLMClientFactory.create("gemini", "gemini-2.5-flash", 0.7f, 2048, 0.95f);
// From @RoleSpec annotation
LLMClient client = LLMClientFactory.fromAnnotation(MyRole.class);Provider name aliases (case-insensitive):
| Alias | Provider |
|---|---|
"openai" | OpenAI |
"anthropic", "claude" | Anthropic |
"gemini", "google" | Google Gemini |
"groq" | Groq |
"mistral" | Mistral |
"ollama" | Ollama |
"cohere" | Cohere |
"openrouter" | OpenRouter |
"azure", "azure-openai" | Azure OpenAI |
"bedrock", "aws" | AWS Bedrock |
"huggingface", "hf" | HuggingFace |
"zhipu", "glm" | ZhipuAI |
"minimax" | MiniMax |
Direct Construction
When you need to pass a custom API key, a non-standard base URL, or provider-specific settings, construct the client class directly.
// OpenAI with custom settings
LLMClient client = new OpenAIClient("gpt-4o", 0.7f, 0.95f, 4096);
// Anthropic with custom API key
LLMClient client = new AnthropicClient("claude-sonnet-4-20250514", "sk-ant-...");
// Ollama with custom base URL
LLMClient client = new OllamaClient("http://gpu-server:11434", "llama3", 0.7f, 4096, null);
// Azure with endpoint
LLMClient client = new AzureOpenAIClient("gpt-4", "your-api-key", "https://myresource.openai.azure.com/");Environment Variables
Set your API keys as environment variables. The factory and direct constructors will pick them up automatically. Base URL overrides are optional and only needed for proxies or self-hosted deployments.
| Provider | API Key | Base URL (optional) |
|---|---|---|
| OpenAI | OPENAI_API_KEY | OPENAI_BASE_URL |
| Anthropic | ANTHROPIC_API_KEY | ANTHROPIC_BASE_URL |
| Gemini | GEMINI_API_KEY | GEMINI_BASE_URL |
| Groq | GROQ_API_KEY | GROQ_BASE_URL |
| Mistral | MISTRAL_API_KEY | MISTRAL_BASE_URL |
| Ollama | OLLAMA_API_KEY | OLLAMA_BASE_URL |
| Cohere | COHERE_API_KEY | COHERE_BASE_URL |
| OpenRouter | OPENROUTER_API_KEY | OPENROUTER_BASE_URL |
| Azure | AZURE_OPENAI_API_KEY | AZURE_OPENAI_ENDPOINT |
| Bedrock | AWS credentials | AWS_REGION |
| HuggingFace | HUGGINGFACE_API_KEY | HUGGINGFACE_BASE_URL |
| ZhipuAI | ZHIPU_API_KEY | ZHIPU_BASE_URL |
| MiniMax | MINIMAX_API_KEY | MINIMAX_BASE_URL |
Ollama does not require an API key by default (runs locally on http://localhost:11434).
Streaming
All providers support streaming, which lets you display tokens to the user as they are generated rather than waiting for the complete response. Three streaming patterns are available depending on how much control you need.
// Text stream
Stream<String> tokens = client.streamChat("Tell me a story");
// ChatChunk stream
Stream<ChatChunk> chunks = client.streamChatWithSpec(request);
// Handler-based
client.streamChatWithHandler(request, chunk -> { ... });Resilience
TnsAI includes built-in resilience features so your application keeps working even when LLM providers have temporary issues.
Circuit Breaker
A circuit breaker prevents your application from repeatedly calling a failing provider. After a configurable number of consecutive failures, it stops sending requests ("opens the circuit") and returns errors immediately, giving the provider time to recover.
LLMClient resilient = CircuitBreakerClient.builder()
.client(openaiClient)
.failureThreshold(3)
.recoveryTimeout(Duration.ofSeconds(60))
.build();States: CLOSED (normal) -\> OPEN (fast-fail) -\> HALF_OPEN (probe recovery).
Built-in Retry
All providers include automatic retry with exponential backoff for transient errors like rate limits and server errors. This is enabled by default with no configuration needed.
- Max retries: 3
- Initial delay: 1 second
- Retriable HTTP codes: 408, 425, 429, 500, 502, 503, 504, 529
- Retriable exceptions:
ConnectException,SocketTimeoutException
Observability
Understanding what your LLM calls are doing in production is critical for debugging, cost tracking, and compliance. The observability layer lets you wrap any LLMClient with logging, metrics, and custom observers without changing your application code.
ObservableLLMClient
ObservableLLMClient is a decorator that wraps an existing client and intercepts every call, forwarding lifecycle events to one or more observers. Your application code uses the wrapped client exactly like the original -- the observability is completely transparent.
// Single observer
LLMClient observable = new ObservableLLMClient(client, metrics);
// Multiple observers (varargs)
LLMClient observable = new ObservableLLMClient(client, metrics, promptLogger, auditObserver);Internally, multiple observers are merged into a CompositeObserver. Null observers and LLMObserver.NOOP are filtered out automatically.
LLMObserver Interface (6 hooks)
LLMObserver is the callback interface for monitoring LLM operations. All six methods have default no-op implementations, so you only override the hooks you care about. For example, you might only need onResponse for latency tracking.
| Hook | When it fires | Parameters |
|---|---|---|
onRequest | Before sending a request | client, message, systemPrompt, history, tools |
onResponse | After a successful response | client, response, latencyMs |
onError | When a request fails | client, error, latencyMs |
onStreamChunk | For each streaming chunk | client, chunk, chunkIndex |
onStreamComplete | When streaming finishes | client, totalChunks, latencyMs |
onStreamError | When streaming fails | client, error, chunksReceived, latencyMs |
LLMObserver myObserver = new LLMObserver() {
@Override
public void onRequest(LLMClient client, String message,
Optional<String> systemPrompt,
Optional<List<Map<String, Object>>> history,
Optional<List<Map<String, Object>>> tools) {
log.info("Request to {}: {}", client.getModel(), message);
}
@Override
public void onResponse(LLMClient client, ChatResponse response, long latencyMs) {
log.info("Response from {} ({}ms)", client.getModel(), latencyMs);
}
};
LLMClient observed = new ObservableLLMClient(client, myObserver);A pre-built no-op sentinel is available as LLMObserver.NOOP.
Compose multiple observers with CompositeObserver:
LLMObserver combined = CompositeObserver.of(metricsObserver, loggingObserver, auditObserver);PromptLogger (PII Filtering + MDC Context)
PromptLogger is a production-ready observer that logs every LLM request and response with automatic PII redaction. It prevents sensitive data (emails, credit cards, API keys) from appearing in your logs and adds correlation IDs so you can trace requests through distributed systems.
PII filtering. When enabled (default), the logger redacts common sensitive patterns before writing to logs:
| Pattern | Replacement |
|---|---|
| Email addresses | [REDACTED_EMAIL] |
| Phone numbers | [REDACTED_PHONE] |
| Credit card numbers | [REDACTED_CC] |
| Social Security Numbers | [REDACTED_SSN] |
| API keys and tokens | [REDACTED_KEY] |
| IP addresses | [REDACTED_IP] |
MDC context. The logger populates SLF4J MDC with correlation fields so downstream log infrastructure (ELK, Datadog, etc.) can group related events:
| MDC Key | Value |
|---|---|
llm.requestId | Unique 8-char ID per request |
llm.provider | Provider name (e.g. OpenAI) |
llm.model | Model name (e.g. gpt-4o) |
llm.latencyMs | Request latency (set on response) |
MDC fields are cleared automatically after each request/response cycle.
Builder API:
PromptLogger promptLogger = PromptLogger.builder()
.filterPII(true) // default: true
.logLevel(PromptLogger.LogLevel.INFO) // DEBUG, INFO, or WARN
.maxContentLength(500) // default: 200 chars
.logFullContent(false) // true to disable truncation
.build();
LLMClient observed = new ObservableLLMClient(client, promptLogger);Factory methods for common configurations:
PromptLogger.withPIIFiltering(); // PII enabled, defaults
PromptLogger.withoutFiltering(); // PII disabled, defaultsLog output format:
[INFO] LLM Request [req-abc123] OpenAI/gpt-4o: "My email is [REDACTED_EMAIL]"
[INFO] LLM Response [req-abc123] OpenAI/gpt-4o (245ms): "The answer is 4"Tool calls within responses are logged individually:
[INFO] Tool call: calculator({"expression":"2+2"})LLMMetrics (Performance Tracking)
LLMMetrics is an observer that automatically collects performance data -- request counts, token usage, latency percentiles (p50/p95/p99), error rates, and estimated costs -- across all providers. Use it to monitor your LLM spending and identify performance bottlenecks.
Setup:
LLMMetrics metrics = new LLMMetrics();
LLMClient observed = new ObservableLLMClient(client, metrics);
// Use the client normally
observed.chat("Hello!");Global report via getReport():
LLMMetrics.Report report = metrics.getReport();
report.totalRequests(); // total request count
report.totalResponses(); // successful responses
report.totalErrors(); // error count
report.totalInputTokens(); // estimated input tokens
report.totalOutputTokens(); // estimated output tokens
report.totalEstimatedCost(); // cost in USD (based on provider pricing)
report.avgLatencyMs(); // average latency
report.p50LatencyMs(); // median latency
report.p95LatencyMs(); // 95th percentile latency
report.p99LatencyMs(); // 99th percentile latency
report.successRate(); // percentage (0-100)
report.errorRate(); // percentage (0-100)
report.timestamp(); // Instant of report generationPer-provider breakdown via getMetricsByProvider():
Map<String, LLMMetrics.ProviderMetrics> byProvider = metrics.getMetricsByProvider();
for (var entry : byProvider.entrySet()) {
String providerKey = entry.getKey(); // e.g. "OpenAI/gpt-4o"
LLMMetrics.ProviderMetrics pm = entry.getValue();
pm.requests(); // request count for this provider
pm.responses(); // successful responses
pm.errors(); // errors
pm.inputTokens(); // estimated input tokens
pm.outputTokens(); // estimated output tokens
pm.estimatedCost(); // cost in USD
pm.avgLatencyMs(); // average latency
pm.streamChunks(); // total streaming chunks
pm.successRate(); // percentage (0-100)
pm.errorRate(); // percentage (0-100)
}Token counts are estimated at ~4 characters per token. Cost is calculated using LLMCapabilities.getInputCostPer1KTokens() and getOutputCostPer1KTokens() from the provider.
Call metrics.reset() to clear all counters.
Combining observers. Use metrics alongside prompt logging:
LLMMetrics metrics = new LLMMetrics();
PromptLogger logger = PromptLogger.withPIIFiltering();
LLMClient observed = new ObservableLLMClient(client, metrics, logger);JSON Mode
When you need the LLM to return valid JSON instead of free-form text, wrap your client with JsonModeClient. It uses provider-native JSON mode when available (OpenAI, Gemini) and falls back to prompt engineering for providers that lack native support (Anthropic, Ollama).
Quick Wrap
The simplest way to get JSON output -- just wrap your existing client.
// Simple wrap -- uses JSON_OBJECT format, auto-detects native support
JsonModeClient client = JsonModeClient.wrap(baseClient);
ChatResponse response = client.chat("List 3 programming languages");
// {"languages": ["Python", "Java", "JavaScript"]}Builder
For advanced control, the builder lets you specify a custom JSON schema, provide your own ObjectMapper, or force prompt engineering mode even when native JSON mode is available.
JsonModeClient client = JsonModeClient.builder()
.client(baseClient) // Required: LLM client to wrap
.responseFormat(format) // ResponseFormat (default: jsonObject())
.objectMapper(customMapper) // Custom Jackson ObjectMapper
.forcePromptEngineering(true) // Skip native JSON mode, use prompt injection
.schemaFromClass("Person", Person.class) // Auto-generate schema from class
.build();chatAs -- Type-Safe JSON Parsing
The chatAs method combines JSON generation and deserialization in one step, returning a strongly-typed Java object instead of a raw JSON string.
record LanguageList(List<String> languages) {}
// Simple
LanguageList list = client.chatAs(LanguageList.class, "List 3 programming languages");
// With system prompt
LanguageList list = client.chatAs(LanguageList.class, "List 3 languages",
Optional.of("You are a helpful assistant"));
// Full parameters
LanguageList list = client.chatAs(LanguageList.class, message, systemPrompt, history, tools);If JSON parsing fails, JsonModeClient.JsonParseException is thrown with getRawContent() for debugging.
ResponseFormat
Controls the structure of LLM output. Use text() for default behavior, jsonObject() for generic JSON, or jsonSchema() to enforce a specific schema.
// Plain text (default LLM behavior)
ResponseFormat text = ResponseFormat.text();
// JSON object (valid JSON, structure not enforced)
ResponseFormat json = ResponseFormat.jsonObject();
// JSON Schema (valid JSON conforming to a schema)
ResponseFormat schema = ResponseFormat.jsonSchema("Person", Map.of(
"type", "object",
"properties", Map.of(
"name", Map.of("type", "string"),
"age", Map.of("type", "integer")
),
"required", List.of("name", "age")
));
// JSON Schema from a Java class (uses SchemaGenerator)
ResponseFormat schema = ResponseFormat.jsonSchema("Person", Person.class);SchemaGenerator
Automatically generates a JSON Schema from any Java class using reflection. This saves you from writing schemas by hand -- just pass your record or POJO and it produces a valid schema that the LLM can follow.
public record Person(String name, int age, List<String> hobbies) {}
Map<String, Object> schema = SchemaGenerator.generateSchema(Person.class);
// {"type":"object","properties":{"name":{"type":"string"},"age":{"type":"integer"},
// "hobbies":{"type":"array","items":{"type":"string"}}},"required":["name","age","hobbies"]}
// Record-specific (all components required)
Map<String, Object> schema = SchemaGenerator.generateRecordSchema(Person.class);Provider Support
Not all providers support native JSON mode. When native support is unavailable, JsonModeClient falls back to prompt engineering (injecting JSON instructions into the prompt).
| Provider | JSON_OBJECT | JSON_SCHEMA |
|---|---|---|
| OpenAI | Yes | Yes (GPT-4o, GPT-4-turbo) |
| Anthropic | No (use tool_use) | No (use tool_use) |
| Gemini | Yes | Yes |
| Ollama | Depends on model | No |
Model Capabilities
Before sending a request that requires specific features (vision, tool calling, large context), you should check whether the model supports them. The LLMCapabilities interface provides a standardized way to query any model's features, limits, and pricing.
LLMClient client = new OpenAIClient("gpt-4o");
LLMCapabilities caps = client.getCapabilities();
// Check before use
if (caps.supportsVision()) {
response = client.chat(List.of(textPart, imagePart), system, history, tools);
}
// Context window check
if (estimatedTokens > caps.getMaxInputTokens()) {
// Truncate or summarize
}Core Capability Methods
These boolean methods tell you what the model can do. Check them before using advanced features to avoid runtime errors.
| Method | Return | Description |
|---|---|---|
supportsStreaming() | boolean | Streaming responses (most modern LLMs) |
supportsVision() | boolean | Image/visual input (GPT-4o, Claude 3, Gemini) |
supportsFunctionCalling() | boolean | Tool/function calling for agents |
supportsStructuredOutput() | boolean | JSON mode / structured output |
supportsSystemPrompt() | boolean | System prompt distinction (default true) |
supportsParallelFunctionCalls() | boolean | Multiple tool calls in one response |
Token Limits
Know your model's context window to avoid truncation errors and plan your context management strategy.
| Method | Return | Description |
|---|---|---|
getMaxInputTokens() | int | Maximum input tokens (context window) |
getMaxOutputTokens() | int | Maximum output tokens |
getContextWindow() | int | Total context window (defaults to maxInputTokens) |
Modality
Modalities describe what types of input a model can process. Use this to check whether a model supports image, audio, or video input before sending multimodal content.
Set<LLMCapabilities.Modality> modalities = caps.getSupportedModalities();
boolean canHandleAudio = caps.supportsModality(Modality.AUDIO);Provider & Cost Information
Access pricing and provider metadata to estimate costs before making calls or to build cost-tracking dashboards.
| Method | Return | Description |
|---|---|---|
getProviderName() | String | Provider name (OpenAI, Anthropic, Google, etc.) |
getModelId() | String | Model identifier |
getModelVersion() | Optional<String> | Model version |
getInputCostPer1KTokens() | Optional<Double> | Input cost in USD per 1K tokens |
getOutputCostPer1KTokens() | Optional<Double> | Output cost in USD per 1K tokens |
getEstimatedLatencyMs() | Optional<Long> | Estimated time-to-first-token in ms |
Special Capabilities
Some models support advanced features beyond standard chat. Check these before using specialized functionality.
| Method | Default | Description |
|---|---|---|
supportsCodeExecution() | false | Code Interpreter support |
supportsWebBrowsing() | false | Web browsing support |
supportsFileUpload() | From modalities | File upload support |
supportsReasoning() | false | Reasoning/thinking (o1-style) |
meetsRequirements
A convenience method that checks multiple capability requirements at once, so you can verify a model is suitable for your use case in a single call.
boolean suitable = caps.meetsRequirements(
true, // requiresVision
true, // requiresTools
32000 // minContextTokens
);Validation Methods
When a capability is required (not optional), use these methods to fail fast at startup with a clear error message rather than getting cryptic errors at runtime.
caps.requireToolCalling(); // throws ToolCallNotSupportedException
caps.requireStreaming(); // throws LLMCapabilityException
caps.requireVision(); // throws LLMCapabilityExceptionModel Capability Profiles
A quick reference for the most commonly used models and their supported features.
| Model | Vision | Tools | JSON | Context |
|---|---|---|---|---|
| GPT-4o | Yes | Yes | Yes | 128K |
| GPT-4-turbo | Yes | Yes | Yes | 128K |
| GPT-3.5-turbo | No | Yes | Yes | 16K |
| Claude Sonnet 4 | Yes | Yes | Yes | 200K |
| Claude 3 Opus | Yes | Yes | Yes | 200K |
| Gemini 2.5 Flash | Yes | Yes | Yes | 1M |
| Llama 3.2 | No | Yes | No | 128K |
| Mistral Large | No | Yes | Yes | 128K |
Multimodal Input
Some models can process images, audio, and video alongside text. TnsAI uses a ContentPart system to represent mixed-media messages, so you can combine text with images or audio in a single request.
| Class | Type | Description |
|---|---|---|
TextPart | "text" | Plain text content |
ImagePart | "image" | Image data (Base64 encoded) |
AudioPart | "audio" | Audio data (Base64, URL, or file) |
VideoPart | "video" | Video data (Gemini) |
Sending Images
Create an ImagePart from Base64-encoded data and include it alongside text in a multimodal message.
// Create image part from Base64 data
ImagePart image = ImagePart.fromBase64(base64Data, "image/png");
// Build multimodal message
List<ContentPart> parts = List.of(
new TextPart("What do you see in this image?"),
image
);
// Send to a vision-capable model
ChatResponse response = client.chat(parts,
Optional.of("You are a helpful assistant"),
Optional.empty(),
Optional.empty()
);Sending Audio
Create an AudioPart from a file, byte array, Base64 string, or URL. The model will process the audio alongside any text you include.
// From file
AudioPart audio = AudioPart.fromFile(new File("recording.mp3"));
// From Base64
AudioPart audio = AudioPart.fromBase64(base64String, "audio/wav");
// From byte array
AudioPart audio = AudioPart.fromBytes(rawBytes, "audio/mp3");
// From URL
AudioPart audio = AudioPart.fromUrl("https://example.com/audio.mp3");
// Send as multimodal message
List<ContentPart> parts = List.of(
new TextPart("Transcribe this audio"),
audio
);
ChatResponse response = client.chat(parts, systemPrompt, history, tools);Capability Check Before Multimodal
Always check the model's capabilities before sending multimodal content. If the model does not support vision or audio, fall back to a text-only alternative.
LLMCapabilities caps = client.getCapabilities();
if (caps.supportsVision()) {
// Safe to send ImagePart
client.chat(List.of(new TextPart("Describe this"), image), system, history, tools);
} else {
// Fall back to text-only
client.chat("Describe the concept", system, history, tools);
}SPI Registration
The LLM module uses Java's ServiceLoader mechanism to register itself automatically. You do not need to configure this manually -- just add the tnsai-llm dependency to your project and the providers become available through LLMClientFactory.
# META-INF/services/com.tnsai.llm.LLMClientProvider
com.tnsai.llm.LLMClientFactoryProviderCost Tracking
Monitor and control LLM spending across providers with built-in cost tracking, budget management, and model pricing data for 100+ models.
LLM Routing
Route requests across multiple LLM providers with built-in strategies. Routing enables failover, cost optimization, latency reduction, and capability-based model selection.