LLM Providers

The LLM module provides a unified interface to 14 language model providers. All providers implement the `LLMClient` interface from Core, so you can swap providers without changing your agent code.

Quick Start

Create a client for any supported provider with a single factory call. The client handles authentication, serialization, retries, and streaming automatically.

LLMClient client = LLMClientFactory.create("openai", "gpt-4o", 0.7f);
ChatResponse response = client.chat("What is quantum computing?");

API keys are resolved from environment variables automatically:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

TnsAI supports 13 LLM providers out of the box. Each provider implements the same LLMClient interface, so switching between providers requires changing only the creation call -- no other code changes needed.

Provider	Class	Models	Features
OpenAI	`OpenAIClient`	gpt-4o, gpt-4o-mini, gpt-4-turbo, o1, o3-mini	Streaming, tools, JSON mode, vision
Anthropic	`AnthropicClient`	claude-sonnet-4, claude-3.5-sonnet, claude-3-opus	Streaming, tools, vision, prompt caching
Google Gemini	`GeminiClient`	gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash	Streaming, tools, multimodal
Mistral	`MistralClient`	mistral-large-latest, codestral-latest	Streaming, tools, multimodal
Groq	`GroqClient`	llama-3.3-70b, mixtral-8x7b	Streaming, tools, ultra-fast inference
Ollama	`OllamaClient`	llama3, mistral, any local model	Streaming, tools, local, no API key
AWS Bedrock	`BedrockClient`	claude-3, llama-3 via AWS	Streaming, tools, AWS-managed
Azure OpenAI	`AzureOpenAIClient`	gpt-4, gpt-3.5-turbo via Azure	Streaming, tools, Azure-hosted
Cohere	`CohereClient`	command-r-plus, command-r	Streaming, tools
HuggingFace	`HuggingFaceClient`	100k+ community models	Streaming, tools
OpenRouter	`OpenRouterClient`	200+ models via aggregator	Streaming, tools, multi-provider
ZhipuAI	`ZhipuAIClient`	glm-4, glm-4v	Streaming, tools, vision
MiniMax	`MiniMaxClient`	abab6.5-chat, abab6-chat	Streaming, tools

Creating Clients

Using the Factory

LLMClientFactory is the recommended way to create clients. It resolves API keys from environment variables, selects the correct provider class, and applies default settings. Use this unless you need fine-grained control over client construction.

// Basic — provider name, model, temperature
LLMClient client = LLMClientFactory.create("openai", "gpt-4o", 0.7f);

// With max tokens
LLMClient client = LLMClientFactory.create("anthropic", "claude-sonnet-4-20250514", 0.7f, 4096);

// With topP (nucleus sampling)
LLMClient client = LLMClientFactory.create("gemini", "gemini-2.5-flash", 0.7f, 2048, 0.95f);

// From @RoleSpec annotation
LLMClient client = LLMClientFactory.fromAnnotation(MyRole.class);

Provider name aliases (case-insensitive):

Alias	Provider
`"openai"`	OpenAI
`"anthropic"`, `"claude"`	Anthropic
`"gemini"`, `"google"`	Google Gemini
`"groq"`	Groq
`"mistral"`	Mistral
`"ollama"`	Ollama
`"cohere"`	Cohere
`"openrouter"`	OpenRouter
`"azure"`, `"azure-openai"`	Azure OpenAI
`"bedrock"`, `"aws"`	AWS Bedrock
`"huggingface"`, `"hf"`	HuggingFace
`"zhipu"`, `"glm"`	ZhipuAI
`"minimax"`	MiniMax

Direct Construction

When you need to pass a custom API key, a non-standard base URL, or provider-specific settings, construct the client class directly.

// OpenAI with custom settings
LLMClient client = new OpenAIClient("gpt-4o", 0.7f, 0.95f, 4096);

// Anthropic with custom API key
LLMClient client = new AnthropicClient("claude-sonnet-4-20250514", "sk-ant-...");

// Ollama with custom base URL
LLMClient client = new OllamaClient("http://gpu-server:11434", "llama3", 0.7f, 4096, null);

// Azure with endpoint
LLMClient client = new AzureOpenAIClient("gpt-4", "your-api-key", "https://myresource.openai.azure.com/");

Environment Variables

Set your API keys as environment variables. The factory and direct constructors will pick them up automatically. Base URL overrides are optional and only needed for proxies or self-hosted deployments.

Provider	API Key	Base URL (optional)
OpenAI	`OPENAI_API_KEY`	`OPENAI_BASE_URL`
Anthropic	`ANTHROPIC_API_KEY`	`ANTHROPIC_BASE_URL`
Gemini	`GEMINI_API_KEY`	`GEMINI_BASE_URL`
Groq	`GROQ_API_KEY`	`GROQ_BASE_URL`
Mistral	`MISTRAL_API_KEY`	`MISTRAL_BASE_URL`
Ollama	`OLLAMA_API_KEY`	`OLLAMA_BASE_URL`
Cohere	`COHERE_API_KEY`	`COHERE_BASE_URL`
OpenRouter	`OPENROUTER_API_KEY`	`OPENROUTER_BASE_URL`
Azure	`AZURE_OPENAI_API_KEY`	`AZURE_OPENAI_ENDPOINT`
Bedrock	AWS credentials	`AWS_REGION`
HuggingFace	`HUGGINGFACE_API_KEY`	`HUGGINGFACE_BASE_URL`
ZhipuAI	`ZHIPU_API_KEY`	`ZHIPU_BASE_URL`
MiniMax	`MINIMAX_API_KEY`	`MINIMAX_BASE_URL`

Ollama does not require an API key by default (runs locally on http://localhost:11434).

Streaming

All providers support streaming, which lets you display tokens to the user as they are generated rather than waiting for the complete response. Three streaming patterns are available depending on how much control you need.

// Text stream
Stream<String> tokens = client.streamChat("Tell me a story");

// ChatChunk stream
Stream<ChatChunk> chunks = client.streamChatWithSpec(request);

// Handler-based
client.streamChatWithHandler(request, chunk -> { ... });

Resilience

TnsAI includes built-in resilience features so your application keeps working even when LLM providers have temporary issues.

Circuit Breaker

A circuit breaker prevents your application from repeatedly calling a failing provider. After a configurable number of consecutive failures, it stops sending requests ("opens the circuit") and returns errors immediately, giving the provider time to recover.

LLMClient resilient = CircuitBreakerClient.builder()
    .client(openaiClient)
    .failureThreshold(3)
    .recoveryTimeout(Duration.ofSeconds(60))
    .build();

States: CLOSED (normal) -\> OPEN (fast-fail) -\> HALF_OPEN (probe recovery).

Built-in Retry

All providers include automatic retry with exponential backoff for transient errors like rate limits and server errors. This is enabled by default with no configuration needed.

Max retries: 3
Initial delay: 1 second
Retriable HTTP codes: 408, 425, 429, 500, 502, 503, 504, 529
Retriable exceptions: ConnectException, SocketTimeoutException

Observability

Understanding what your LLM calls are doing in production is critical for debugging, cost tracking, and compliance. The observability layer lets you wrap any LLMClient with logging, metrics, and custom observers without changing your application code.

ObservableLLMClient

ObservableLLMClient is a decorator that wraps an existing client and intercepts every call, forwarding lifecycle events to one or more observers. Your application code uses the wrapped client exactly like the original -- the observability is completely transparent.

// Single observer
LLMClient observable = new ObservableLLMClient(client, metrics);

// Multiple observers (varargs)
LLMClient observable = new ObservableLLMClient(client, metrics, promptLogger, auditObserver);

Internally, multiple observers are merged into a CompositeObserver. Null observers and LLMObserver.NOOP are filtered out automatically.

LLMObserver Interface (6 hooks)

LLMObserver is the callback interface for monitoring LLM operations. All six methods have default no-op implementations, so you only override the hooks you care about. For example, you might only need onResponse for latency tracking.

Hook	When it fires	Parameters
`onRequest`	Before sending a request	client, message, systemPrompt, history, tools
`onResponse`	After a successful response	client, response, latencyMs
`onError`	When a request fails	client, error, latencyMs
`onStreamChunk`	For each streaming chunk	client, chunk, chunkIndex
`onStreamComplete`	When streaming finishes	client, totalChunks, latencyMs
`onStreamError`	When streaming fails	client, error, chunksReceived, latencyMs

LLMObserver myObserver = new LLMObserver() {
    @Override
    public void onRequest(LLMClient client, String message,
                          Optional<String> systemPrompt,
                          Optional<List<Map<String, Object>>> history,
                          Optional<List<Map<String, Object>>> tools) {
        log.info("Request to {}: {}", client.getModel(), message);
    }

    @Override
    public void onResponse(LLMClient client, ChatResponse response, long latencyMs) {
        log.info("Response from {} ({}ms)", client.getModel(), latencyMs);
    }
};

LLMClient observed = new ObservableLLMClient(client, myObserver);

A pre-built no-op sentinel is available as LLMObserver.NOOP.

Compose multiple observers with CompositeObserver:

LLMObserver combined = CompositeObserver.of(metricsObserver, loggingObserver, auditObserver);

PromptLogger (PII Filtering + MDC Context)

PromptLogger is a production-ready observer that logs every LLM request and response with automatic PII redaction. It prevents sensitive data (emails, credit cards, API keys) from appearing in your logs and adds correlation IDs so you can trace requests through distributed systems.

PII filtering. When enabled (default), the logger redacts common sensitive patterns before writing to logs:

Pattern	Replacement
Email addresses	`[REDACTED_EMAIL]`
Phone numbers	`[REDACTED_PHONE]`
Credit card numbers	`[REDACTED_CC]`
Social Security Numbers	`[REDACTED_SSN]`
API keys and tokens	`[REDACTED_KEY]`
IP addresses	`[REDACTED_IP]`

MDC context. The logger populates SLF4J MDC with correlation fields so downstream log infrastructure (ELK, Datadog, etc.) can group related events:

MDC Key	Value
`llm.requestId`	Unique 8-char ID per request
`llm.provider`	Provider name (e.g. `OpenAI`)
`llm.model`	Model name (e.g. `gpt-4o`)
`llm.latencyMs`	Request latency (set on response)

MDC fields are cleared automatically after each request/response cycle.

Builder API:

PromptLogger promptLogger = PromptLogger.builder()
    .filterPII(true)            // default: true
    .logLevel(PromptLogger.LogLevel.INFO)  // DEBUG, INFO, or WARN
    .maxContentLength(500)      // default: 200 chars
    .logFullContent(false)      // true to disable truncation
    .build();

LLMClient observed = new ObservableLLMClient(client, promptLogger);

Factory methods for common configurations:

PromptLogger.withPIIFiltering();   // PII enabled, defaults
PromptLogger.withoutFiltering();   // PII disabled, defaults

Log output format:

[INFO] LLM Request [req-abc123] OpenAI/gpt-4o: "My email is [REDACTED_EMAIL]"
[INFO] LLM Response [req-abc123] OpenAI/gpt-4o (245ms): "The answer is 4"

Tool calls within responses are logged individually:

[INFO]   Tool call: calculator({"expression":"2+2"})

LLMMetrics (Performance Tracking)

LLMMetrics is an observer that automatically collects performance data -- request counts, token usage, latency percentiles (p50/p95/p99), error rates, and estimated costs -- across all providers. Use it to monitor your LLM spending and identify performance bottlenecks.

Setup:

LLMMetrics metrics = new LLMMetrics();
LLMClient observed = new ObservableLLMClient(client, metrics);

// Use the client normally
observed.chat("Hello!");

Global report via getReport():

LLMMetrics.Report report = metrics.getReport();
report.totalRequests();      // total request count
report.totalResponses();     // successful responses
report.totalErrors();        // error count
report.totalInputTokens();   // estimated input tokens
report.totalOutputTokens();  // estimated output tokens
report.totalEstimatedCost(); // cost in USD (based on provider pricing)
report.avgLatencyMs();       // average latency
report.p50LatencyMs();       // median latency
report.p95LatencyMs();       // 95th percentile latency
report.p99LatencyMs();       // 99th percentile latency
report.successRate();        // percentage (0-100)
report.errorRate();          // percentage (0-100)
report.timestamp();          // Instant of report generation

Per-provider breakdown via getMetricsByProvider():

Map<String, LLMMetrics.ProviderMetrics> byProvider = metrics.getMetricsByProvider();
for (var entry : byProvider.entrySet()) {
    String providerKey = entry.getKey(); // e.g. "OpenAI/gpt-4o"
    LLMMetrics.ProviderMetrics pm = entry.getValue();
    pm.requests();       // request count for this provider
    pm.responses();      // successful responses
    pm.errors();         // errors
    pm.inputTokens();    // estimated input tokens
    pm.outputTokens();   // estimated output tokens
    pm.estimatedCost();  // cost in USD
    pm.avgLatencyMs();   // average latency
    pm.streamChunks();   // total streaming chunks
    pm.successRate();    // percentage (0-100)
    pm.errorRate();      // percentage (0-100)
}

Token counts are estimated at ~4 characters per token. Cost is calculated using LLMCapabilities.getInputCostPer1KTokens() and getOutputCostPer1KTokens() from the provider.

Call metrics.reset() to clear all counters.

Combining observers. Use metrics alongside prompt logging:

LLMMetrics metrics = new LLMMetrics();
PromptLogger logger = PromptLogger.withPIIFiltering();

LLMClient observed = new ObservableLLMClient(client, metrics, logger);

JSON Mode

When you need the LLM to return valid JSON instead of free-form text, wrap your client with JsonModeClient. It uses provider-native JSON mode when available (OpenAI, Gemini) and falls back to prompt engineering for providers that lack native support (Anthropic, Ollama).

Quick Wrap

The simplest way to get JSON output -- just wrap your existing client.

// Simple wrap -- uses JSON_OBJECT format, auto-detects native support
JsonModeClient client = JsonModeClient.wrap(baseClient);

ChatResponse response = client.chat("List 3 programming languages");
// {"languages": ["Python", "Java", "JavaScript"]}

Builder

For advanced control, the builder lets you specify a custom JSON schema, provide your own ObjectMapper, or force prompt engineering mode even when native JSON mode is available.

JsonModeClient client = JsonModeClient.builder()
    .client(baseClient)                      // Required: LLM client to wrap
    .responseFormat(format)                   // ResponseFormat (default: jsonObject())
    .objectMapper(customMapper)              // Custom Jackson ObjectMapper
    .forcePromptEngineering(true)            // Skip native JSON mode, use prompt injection
    .schemaFromClass("Person", Person.class) // Auto-generate schema from class
    .build();

chatAs -- Type-Safe JSON Parsing

The chatAs method combines JSON generation and deserialization in one step, returning a strongly-typed Java object instead of a raw JSON string.

record LanguageList(List<String> languages) {}

// Simple
LanguageList list = client.chatAs(LanguageList.class, "List 3 programming languages");

// With system prompt
LanguageList list = client.chatAs(LanguageList.class, "List 3 languages",
    Optional.of("You are a helpful assistant"));

// Full parameters
LanguageList list = client.chatAs(LanguageList.class, message, systemPrompt, history, tools);

If JSON parsing fails, JsonModeClient.JsonParseException is thrown with getRawContent() for debugging.

ResponseFormat

Controls the structure of LLM output. Use text() for default behavior, jsonObject() for generic JSON, or jsonSchema() to enforce a specific schema.

// Plain text (default LLM behavior)
ResponseFormat text = ResponseFormat.text();

// JSON object (valid JSON, structure not enforced)
ResponseFormat json = ResponseFormat.jsonObject();

// JSON Schema (valid JSON conforming to a schema)
ResponseFormat schema = ResponseFormat.jsonSchema("Person", Map.of(
    "type", "object",
    "properties", Map.of(
        "name", Map.of("type", "string"),
        "age", Map.of("type", "integer")
    ),
    "required", List.of("name", "age")
));

// JSON Schema from a Java class (uses SchemaGenerator)
ResponseFormat schema = ResponseFormat.jsonSchema("Person", Person.class);

SchemaGenerator

Automatically generates a JSON Schema from any Java class using reflection. This saves you from writing schemas by hand -- just pass your record or POJO and it produces a valid schema that the LLM can follow.

public record Person(String name, int age, List<String> hobbies) {}

Map<String, Object> schema = SchemaGenerator.generateSchema(Person.class);
// {"type":"object","properties":{"name":{"type":"string"},"age":{"type":"integer"},
//  "hobbies":{"type":"array","items":{"type":"string"}}},"required":["name","age","hobbies"]}

// Record-specific (all components required)
Map<String, Object> schema = SchemaGenerator.generateRecordSchema(Person.class);

Provider Support

Not all providers support native JSON mode. When native support is unavailable, JsonModeClient falls back to prompt engineering (injecting JSON instructions into the prompt).

Provider	JSON_OBJECT	JSON_SCHEMA
OpenAI	Yes	Yes (GPT-4o, GPT-4-turbo)
Anthropic	No (use tool_use)	No (use tool_use)
Gemini	Yes	Yes
Ollama	Depends on model	No

Model Capabilities

Before sending a request that requires specific features (vision, tool calling, large context), you should check whether the model supports them. The LLMCapabilities interface provides a standardized way to query any model's features, limits, and pricing.

LLMClient client = new OpenAIClient("gpt-4o");
LLMCapabilities caps = client.getCapabilities();

// Check before use
if (caps.supportsVision()) {
    response = client.chat(List.of(textPart, imagePart), system, history, tools);
}

// Context window check
if (estimatedTokens > caps.getMaxInputTokens()) {
    // Truncate or summarize
}

Core Capability Methods

These boolean methods tell you what the model can do. Check them before using advanced features to avoid runtime errors.

Method	Return	Description
`supportsStreaming()`	`boolean`	Streaming responses (most modern LLMs)
`supportsVision()`	`boolean`	Image/visual input (GPT-4o, Claude 3, Gemini)
`supportsFunctionCalling()`	`boolean`	Tool/function calling for agents
`supportsStructuredOutput()`	`boolean`	JSON mode / structured output
`supportsSystemPrompt()`	`boolean`	System prompt distinction (default `true`)
`supportsParallelFunctionCalls()`	`boolean`	Multiple tool calls in one response

Token Limits

Know your model's context window to avoid truncation errors and plan your context management strategy.

Method	Return	Description
`getMaxInputTokens()`	`int`	Maximum input tokens (context window)
`getMaxOutputTokens()`	`int`	Maximum output tokens
`getContextWindow()`	`int`	Total context window (defaults to maxInputTokens)

Modality

Modalities describe what types of input a model can process. Use this to check whether a model supports image, audio, or video input before sending multimodal content.

Set<LLMCapabilities.Modality> modalities = caps.getSupportedModalities();
boolean canHandleAudio = caps.supportsModality(Modality.AUDIO);

Provider & Cost Information

Access pricing and provider metadata to estimate costs before making calls or to build cost-tracking dashboards.

Method	Return	Description
`getProviderName()`	`String`	Provider name (OpenAI, Anthropic, Google, etc.)
`getModelId()`	`String`	Model identifier
`getModelVersion()`	`Optional<String>`	Model version
`getInputCostPer1KTokens()`	`Optional<Double>`	Input cost in USD per 1K tokens
`getOutputCostPer1KTokens()`	`Optional<Double>`	Output cost in USD per 1K tokens
`getEstimatedLatencyMs()`	`Optional<Long>`	Estimated time-to-first-token in ms

Special Capabilities

Some models support advanced features beyond standard chat. Check these before using specialized functionality.

Method	Default	Description
`supportsCodeExecution()`	`false`	Code Interpreter support
`supportsWebBrowsing()`	`false`	Web browsing support
`supportsFileUpload()`	From modalities	File upload support
`supportsReasoning()`	`false`	Reasoning/thinking (o1-style)

meetsRequirements

A convenience method that checks multiple capability requirements at once, so you can verify a model is suitable for your use case in a single call.

boolean suitable = caps.meetsRequirements(
    true,   // requiresVision
    true,   // requiresTools
    32000   // minContextTokens
);

Validation Methods

When a capability is required (not optional), use these methods to fail fast at startup with a clear error message rather than getting cryptic errors at runtime.

caps.requireToolCalling();  // throws ToolCallNotSupportedException
caps.requireStreaming();     // throws LLMCapabilityException
caps.requireVision();        // throws LLMCapabilityException

Model Capability Profiles

A quick reference for the most commonly used models and their supported features.

Model	Vision	Tools	JSON	Context
GPT-4o	Yes	Yes	Yes	128K
GPT-4-turbo	Yes	Yes	Yes	128K
GPT-3.5-turbo	No	Yes	Yes	16K
Claude Sonnet 4	Yes	Yes	Yes	200K
Claude 3 Opus	Yes	Yes	Yes	200K
Gemini 2.5 Flash	Yes	Yes	Yes	1M
Llama 3.2	No	Yes	No	128K
Mistral Large	No	Yes	Yes	128K

Multimodal Input

Some models can process images, audio, and video alongside text. TnsAI uses a ContentPart system to represent mixed-media messages, so you can combine text with images or audio in a single request.

Class	Type	Description
`TextPart`	`"text"`	Plain text content
`ImagePart`	`"image"`	Image data (Base64 encoded)
`AudioPart`	`"audio"`	Audio data (Base64, URL, or file)
`VideoPart`	`"video"`	Video data (Gemini)

Sending Images

Create an ImagePart from Base64-encoded data and include it alongside text in a multimodal message.

// Create image part from Base64 data
ImagePart image = ImagePart.fromBase64(base64Data, "image/png");

// Build multimodal message
List<ContentPart> parts = List.of(
    new TextPart("What do you see in this image?"),
    image
);

// Send to a vision-capable model
ChatResponse response = client.chat(parts,
    Optional.of("You are a helpful assistant"),
    Optional.empty(),
    Optional.empty()
);

Sending Audio

Create an AudioPart from a file, byte array, Base64 string, or URL. The model will process the audio alongside any text you include.

// From file
AudioPart audio = AudioPart.fromFile(new File("recording.mp3"));

// From Base64
AudioPart audio = AudioPart.fromBase64(base64String, "audio/wav");

// From byte array
AudioPart audio = AudioPart.fromBytes(rawBytes, "audio/mp3");

// From URL
AudioPart audio = AudioPart.fromUrl("https://example.com/audio.mp3");

// Send as multimodal message
List<ContentPart> parts = List.of(
    new TextPart("Transcribe this audio"),
    audio
);
ChatResponse response = client.chat(parts, systemPrompt, history, tools);

Capability Check Before Multimodal

Always check the model's capabilities before sending multimodal content. If the model does not support vision or audio, fall back to a text-only alternative.

LLMCapabilities caps = client.getCapabilities();

if (caps.supportsVision()) {
    // Safe to send ImagePart
    client.chat(List.of(new TextPart("Describe this"), image), system, history, tools);
} else {
    // Fall back to text-only
    client.chat("Describe the concept", system, history, tools);
}

SPI Registration

The LLM module uses Java's ServiceLoader mechanism to register itself automatically. You do not need to configure this manually -- just add the tnsai-llm dependency to your project and the providers become available through LLMClientFactory.

# META-INF/services/com.tnsai.llm.LLMClientProvider
com.tnsai.llm.LLMClientFactoryProvider