TnsAI
LLM

LLM Providers

The LLM module provides a unified interface to 14 language model providers. All providers implement the `LLMClient` interface from Core, so you can swap providers without changing your agent code.

Quick Start

Create a client for any supported provider with a single factory call. The client handles authentication, serialization, retries, and streaming automatically.

LLMClient client = LLMClientFactory.create("openai", "gpt-4o", 0.7f);
ChatResponse response = client.chat("What is quantum computing?");

API keys are resolved from environment variables automatically:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

Supported Providers

TnsAI supports 13 LLM providers out of the box. Each provider implements the same LLMClient interface, so switching between providers requires changing only the creation call -- no other code changes needed.

ProviderClassModelsFeatures
OpenAIOpenAIClientgpt-4o, gpt-4o-mini, gpt-4-turbo, o1, o3-miniStreaming, tools, JSON mode, vision
AnthropicAnthropicClientclaude-sonnet-4, claude-3.5-sonnet, claude-3-opusStreaming, tools, vision, prompt caching
Google GeminiGeminiClientgemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flashStreaming, tools, multimodal
MistralMistralClientmistral-large-latest, codestral-latestStreaming, tools, multimodal
GroqGroqClientllama-3.3-70b, mixtral-8x7bStreaming, tools, ultra-fast inference
OllamaOllamaClientllama3, mistral, any local modelStreaming, tools, local, no API key
AWS BedrockBedrockClientclaude-3, llama-3 via AWSStreaming, tools, AWS-managed
Azure OpenAIAzureOpenAIClientgpt-4, gpt-3.5-turbo via AzureStreaming, tools, Azure-hosted
CohereCohereClientcommand-r-plus, command-rStreaming, tools
HuggingFaceHuggingFaceClient100k+ community modelsStreaming, tools
OpenRouterOpenRouterClient200+ models via aggregatorStreaming, tools, multi-provider
ZhipuAIZhipuAIClientglm-4, glm-4vStreaming, tools, vision
MiniMaxMiniMaxClientabab6.5-chat, abab6-chatStreaming, tools

Creating Clients

Using the Factory

LLMClientFactory is the recommended way to create clients. It resolves API keys from environment variables, selects the correct provider class, and applies default settings. Use this unless you need fine-grained control over client construction.

// Basic — provider name, model, temperature
LLMClient client = LLMClientFactory.create("openai", "gpt-4o", 0.7f);

// With max tokens
LLMClient client = LLMClientFactory.create("anthropic", "claude-sonnet-4-20250514", 0.7f, 4096);

// With topP (nucleus sampling)
LLMClient client = LLMClientFactory.create("gemini", "gemini-2.5-flash", 0.7f, 2048, 0.95f);

// From @RoleSpec annotation
LLMClient client = LLMClientFactory.fromAnnotation(MyRole.class);

Provider name aliases (case-insensitive):

AliasProvider
"openai"OpenAI
"anthropic", "claude"Anthropic
"gemini", "google"Google Gemini
"groq"Groq
"mistral"Mistral
"ollama"Ollama
"cohere"Cohere
"openrouter"OpenRouter
"azure", "azure-openai"Azure OpenAI
"bedrock", "aws"AWS Bedrock
"huggingface", "hf"HuggingFace
"zhipu", "glm"ZhipuAI
"minimax"MiniMax

Direct Construction

When you need to pass a custom API key, a non-standard base URL, or provider-specific settings, construct the client class directly.

// OpenAI with custom settings
LLMClient client = new OpenAIClient("gpt-4o", 0.7f, 0.95f, 4096);

// Anthropic with custom API key
LLMClient client = new AnthropicClient("claude-sonnet-4-20250514", "sk-ant-...");

// Ollama with custom base URL
LLMClient client = new OllamaClient("http://gpu-server:11434", "llama3", 0.7f, 4096, null);

// Azure with endpoint
LLMClient client = new AzureOpenAIClient("gpt-4", "your-api-key", "https://myresource.openai.azure.com/");

Environment Variables

Set your API keys as environment variables. The factory and direct constructors will pick them up automatically. Base URL overrides are optional and only needed for proxies or self-hosted deployments.

ProviderAPI KeyBase URL (optional)
OpenAIOPENAI_API_KEYOPENAI_BASE_URL
AnthropicANTHROPIC_API_KEYANTHROPIC_BASE_URL
GeminiGEMINI_API_KEYGEMINI_BASE_URL
GroqGROQ_API_KEYGROQ_BASE_URL
MistralMISTRAL_API_KEYMISTRAL_BASE_URL
OllamaOLLAMA_API_KEYOLLAMA_BASE_URL
CohereCOHERE_API_KEYCOHERE_BASE_URL
OpenRouterOPENROUTER_API_KEYOPENROUTER_BASE_URL
AzureAZURE_OPENAI_API_KEYAZURE_OPENAI_ENDPOINT
BedrockAWS credentialsAWS_REGION
HuggingFaceHUGGINGFACE_API_KEYHUGGINGFACE_BASE_URL
ZhipuAIZHIPU_API_KEYZHIPU_BASE_URL
MiniMaxMINIMAX_API_KEYMINIMAX_BASE_URL

Ollama does not require an API key by default (runs locally on http://localhost:11434).

Streaming

All providers support streaming, which lets you display tokens to the user as they are generated rather than waiting for the complete response. Three streaming patterns are available depending on how much control you need.

// Text stream
Stream<String> tokens = client.streamChat("Tell me a story");

// ChatChunk stream
Stream<ChatChunk> chunks = client.streamChatWithSpec(request);

// Handler-based
client.streamChatWithHandler(request, chunk -> { ... });

Resilience

TnsAI includes built-in resilience features so your application keeps working even when LLM providers have temporary issues.

Circuit Breaker

A circuit breaker prevents your application from repeatedly calling a failing provider. After a configurable number of consecutive failures, it stops sending requests ("opens the circuit") and returns errors immediately, giving the provider time to recover.

LLMClient resilient = CircuitBreakerClient.builder()
    .client(openaiClient)
    .failureThreshold(3)
    .recoveryTimeout(Duration.ofSeconds(60))
    .build();

States: CLOSED (normal) -\> OPEN (fast-fail) -\> HALF_OPEN (probe recovery).

Built-in Retry

All providers include automatic retry with exponential backoff for transient errors like rate limits and server errors. This is enabled by default with no configuration needed.

  • Max retries: 3
  • Initial delay: 1 second
  • Retriable HTTP codes: 408, 425, 429, 500, 502, 503, 504, 529
  • Retriable exceptions: ConnectException, SocketTimeoutException

Observability

Understanding what your LLM calls are doing in production is critical for debugging, cost tracking, and compliance. The observability layer lets you wrap any LLMClient with logging, metrics, and custom observers without changing your application code.

ObservableLLMClient

ObservableLLMClient is a decorator that wraps an existing client and intercepts every call, forwarding lifecycle events to one or more observers. Your application code uses the wrapped client exactly like the original -- the observability is completely transparent.

// Single observer
LLMClient observable = new ObservableLLMClient(client, metrics);

// Multiple observers (varargs)
LLMClient observable = new ObservableLLMClient(client, metrics, promptLogger, auditObserver);

Internally, multiple observers are merged into a CompositeObserver. Null observers and LLMObserver.NOOP are filtered out automatically.

LLMObserver Interface (6 hooks)

LLMObserver is the callback interface for monitoring LLM operations. All six methods have default no-op implementations, so you only override the hooks you care about. For example, you might only need onResponse for latency tracking.

HookWhen it firesParameters
onRequestBefore sending a requestclient, message, systemPrompt, history, tools
onResponseAfter a successful responseclient, response, latencyMs
onErrorWhen a request failsclient, error, latencyMs
onStreamChunkFor each streaming chunkclient, chunk, chunkIndex
onStreamCompleteWhen streaming finishesclient, totalChunks, latencyMs
onStreamErrorWhen streaming failsclient, error, chunksReceived, latencyMs
LLMObserver myObserver = new LLMObserver() {
    @Override
    public void onRequest(LLMClient client, String message,
                          Optional<String> systemPrompt,
                          Optional<List<Map<String, Object>>> history,
                          Optional<List<Map<String, Object>>> tools) {
        log.info("Request to {}: {}", client.getModel(), message);
    }

    @Override
    public void onResponse(LLMClient client, ChatResponse response, long latencyMs) {
        log.info("Response from {} ({}ms)", client.getModel(), latencyMs);
    }
};

LLMClient observed = new ObservableLLMClient(client, myObserver);

A pre-built no-op sentinel is available as LLMObserver.NOOP.

Compose multiple observers with CompositeObserver:

LLMObserver combined = CompositeObserver.of(metricsObserver, loggingObserver, auditObserver);

PromptLogger (PII Filtering + MDC Context)

PromptLogger is a production-ready observer that logs every LLM request and response with automatic PII redaction. It prevents sensitive data (emails, credit cards, API keys) from appearing in your logs and adds correlation IDs so you can trace requests through distributed systems.

PII filtering. When enabled (default), the logger redacts common sensitive patterns before writing to logs:

PatternReplacement
Email addresses[REDACTED_EMAIL]
Phone numbers[REDACTED_PHONE]
Credit card numbers[REDACTED_CC]
Social Security Numbers[REDACTED_SSN]
API keys and tokens[REDACTED_KEY]
IP addresses[REDACTED_IP]

MDC context. The logger populates SLF4J MDC with correlation fields so downstream log infrastructure (ELK, Datadog, etc.) can group related events:

MDC KeyValue
llm.requestIdUnique 8-char ID per request
llm.providerProvider name (e.g. OpenAI)
llm.modelModel name (e.g. gpt-4o)
llm.latencyMsRequest latency (set on response)

MDC fields are cleared automatically after each request/response cycle.

Builder API:

PromptLogger promptLogger = PromptLogger.builder()
    .filterPII(true)            // default: true
    .logLevel(PromptLogger.LogLevel.INFO)  // DEBUG, INFO, or WARN
    .maxContentLength(500)      // default: 200 chars
    .logFullContent(false)      // true to disable truncation
    .build();

LLMClient observed = new ObservableLLMClient(client, promptLogger);

Factory methods for common configurations:

PromptLogger.withPIIFiltering();   // PII enabled, defaults
PromptLogger.withoutFiltering();   // PII disabled, defaults

Log output format:

[INFO] LLM Request [req-abc123] OpenAI/gpt-4o: "My email is [REDACTED_EMAIL]"
[INFO] LLM Response [req-abc123] OpenAI/gpt-4o (245ms): "The answer is 4"

Tool calls within responses are logged individually:

[INFO]   Tool call: calculator({"expression":"2+2"})

LLMMetrics (Performance Tracking)

LLMMetrics is an observer that automatically collects performance data -- request counts, token usage, latency percentiles (p50/p95/p99), error rates, and estimated costs -- across all providers. Use it to monitor your LLM spending and identify performance bottlenecks.

Setup:

LLMMetrics metrics = new LLMMetrics();
LLMClient observed = new ObservableLLMClient(client, metrics);

// Use the client normally
observed.chat("Hello!");

Global report via getReport():

LLMMetrics.Report report = metrics.getReport();
report.totalRequests();      // total request count
report.totalResponses();     // successful responses
report.totalErrors();        // error count
report.totalInputTokens();   // estimated input tokens
report.totalOutputTokens();  // estimated output tokens
report.totalEstimatedCost(); // cost in USD (based on provider pricing)
report.avgLatencyMs();       // average latency
report.p50LatencyMs();       // median latency
report.p95LatencyMs();       // 95th percentile latency
report.p99LatencyMs();       // 99th percentile latency
report.successRate();        // percentage (0-100)
report.errorRate();          // percentage (0-100)
report.timestamp();          // Instant of report generation

Per-provider breakdown via getMetricsByProvider():

Map<String, LLMMetrics.ProviderMetrics> byProvider = metrics.getMetricsByProvider();
for (var entry : byProvider.entrySet()) {
    String providerKey = entry.getKey(); // e.g. "OpenAI/gpt-4o"
    LLMMetrics.ProviderMetrics pm = entry.getValue();
    pm.requests();       // request count for this provider
    pm.responses();      // successful responses
    pm.errors();         // errors
    pm.inputTokens();    // estimated input tokens
    pm.outputTokens();   // estimated output tokens
    pm.estimatedCost();  // cost in USD
    pm.avgLatencyMs();   // average latency
    pm.streamChunks();   // total streaming chunks
    pm.successRate();    // percentage (0-100)
    pm.errorRate();      // percentage (0-100)
}

Token counts are estimated at ~4 characters per token. Cost is calculated using LLMCapabilities.getInputCostPer1KTokens() and getOutputCostPer1KTokens() from the provider.

Call metrics.reset() to clear all counters.

Combining observers. Use metrics alongside prompt logging:

LLMMetrics metrics = new LLMMetrics();
PromptLogger logger = PromptLogger.withPIIFiltering();

LLMClient observed = new ObservableLLMClient(client, metrics, logger);

JSON Mode

When you need the LLM to return valid JSON instead of free-form text, wrap your client with JsonModeClient. It uses provider-native JSON mode when available (OpenAI, Gemini) and falls back to prompt engineering for providers that lack native support (Anthropic, Ollama).

Quick Wrap

The simplest way to get JSON output -- just wrap your existing client.

// Simple wrap -- uses JSON_OBJECT format, auto-detects native support
JsonModeClient client = JsonModeClient.wrap(baseClient);

ChatResponse response = client.chat("List 3 programming languages");
// {"languages": ["Python", "Java", "JavaScript"]}

Builder

For advanced control, the builder lets you specify a custom JSON schema, provide your own ObjectMapper, or force prompt engineering mode even when native JSON mode is available.

JsonModeClient client = JsonModeClient.builder()
    .client(baseClient)                      // Required: LLM client to wrap
    .responseFormat(format)                   // ResponseFormat (default: jsonObject())
    .objectMapper(customMapper)              // Custom Jackson ObjectMapper
    .forcePromptEngineering(true)            // Skip native JSON mode, use prompt injection
    .schemaFromClass("Person", Person.class) // Auto-generate schema from class
    .build();

chatAs -- Type-Safe JSON Parsing

The chatAs method combines JSON generation and deserialization in one step, returning a strongly-typed Java object instead of a raw JSON string.

record LanguageList(List<String> languages) {}

// Simple
LanguageList list = client.chatAs(LanguageList.class, "List 3 programming languages");

// With system prompt
LanguageList list = client.chatAs(LanguageList.class, "List 3 languages",
    Optional.of("You are a helpful assistant"));

// Full parameters
LanguageList list = client.chatAs(LanguageList.class, message, systemPrompt, history, tools);

If JSON parsing fails, JsonModeClient.JsonParseException is thrown with getRawContent() for debugging.

ResponseFormat

Controls the structure of LLM output. Use text() for default behavior, jsonObject() for generic JSON, or jsonSchema() to enforce a specific schema.

// Plain text (default LLM behavior)
ResponseFormat text = ResponseFormat.text();

// JSON object (valid JSON, structure not enforced)
ResponseFormat json = ResponseFormat.jsonObject();

// JSON Schema (valid JSON conforming to a schema)
ResponseFormat schema = ResponseFormat.jsonSchema("Person", Map.of(
    "type", "object",
    "properties", Map.of(
        "name", Map.of("type", "string"),
        "age", Map.of("type", "integer")
    ),
    "required", List.of("name", "age")
));

// JSON Schema from a Java class (uses SchemaGenerator)
ResponseFormat schema = ResponseFormat.jsonSchema("Person", Person.class);

SchemaGenerator

Automatically generates a JSON Schema from any Java class using reflection. This saves you from writing schemas by hand -- just pass your record or POJO and it produces a valid schema that the LLM can follow.

public record Person(String name, int age, List<String> hobbies) {}

Map<String, Object> schema = SchemaGenerator.generateSchema(Person.class);
// {"type":"object","properties":{"name":{"type":"string"},"age":{"type":"integer"},
//  "hobbies":{"type":"array","items":{"type":"string"}}},"required":["name","age","hobbies"]}

// Record-specific (all components required)
Map<String, Object> schema = SchemaGenerator.generateRecordSchema(Person.class);

Provider Support

Not all providers support native JSON mode. When native support is unavailable, JsonModeClient falls back to prompt engineering (injecting JSON instructions into the prompt).

ProviderJSON_OBJECTJSON_SCHEMA
OpenAIYesYes (GPT-4o, GPT-4-turbo)
AnthropicNo (use tool_use)No (use tool_use)
GeminiYesYes
OllamaDepends on modelNo

Model Capabilities

Before sending a request that requires specific features (vision, tool calling, large context), you should check whether the model supports them. The LLMCapabilities interface provides a standardized way to query any model's features, limits, and pricing.

LLMClient client = new OpenAIClient("gpt-4o");
LLMCapabilities caps = client.getCapabilities();

// Check before use
if (caps.supportsVision()) {
    response = client.chat(List.of(textPart, imagePart), system, history, tools);
}

// Context window check
if (estimatedTokens > caps.getMaxInputTokens()) {
    // Truncate or summarize
}

Core Capability Methods

These boolean methods tell you what the model can do. Check them before using advanced features to avoid runtime errors.

MethodReturnDescription
supportsStreaming()booleanStreaming responses (most modern LLMs)
supportsVision()booleanImage/visual input (GPT-4o, Claude 3, Gemini)
supportsFunctionCalling()booleanTool/function calling for agents
supportsStructuredOutput()booleanJSON mode / structured output
supportsSystemPrompt()booleanSystem prompt distinction (default true)
supportsParallelFunctionCalls()booleanMultiple tool calls in one response

Token Limits

Know your model's context window to avoid truncation errors and plan your context management strategy.

MethodReturnDescription
getMaxInputTokens()intMaximum input tokens (context window)
getMaxOutputTokens()intMaximum output tokens
getContextWindow()intTotal context window (defaults to maxInputTokens)

Modality

Modalities describe what types of input a model can process. Use this to check whether a model supports image, audio, or video input before sending multimodal content.

Set<LLMCapabilities.Modality> modalities = caps.getSupportedModalities();
boolean canHandleAudio = caps.supportsModality(Modality.AUDIO);

Provider & Cost Information

Access pricing and provider metadata to estimate costs before making calls or to build cost-tracking dashboards.

MethodReturnDescription
getProviderName()StringProvider name (OpenAI, Anthropic, Google, etc.)
getModelId()StringModel identifier
getModelVersion()Optional<String>Model version
getInputCostPer1KTokens()Optional<Double>Input cost in USD per 1K tokens
getOutputCostPer1KTokens()Optional<Double>Output cost in USD per 1K tokens
getEstimatedLatencyMs()Optional<Long>Estimated time-to-first-token in ms

Special Capabilities

Some models support advanced features beyond standard chat. Check these before using specialized functionality.

MethodDefaultDescription
supportsCodeExecution()falseCode Interpreter support
supportsWebBrowsing()falseWeb browsing support
supportsFileUpload()From modalitiesFile upload support
supportsReasoning()falseReasoning/thinking (o1-style)

meetsRequirements

A convenience method that checks multiple capability requirements at once, so you can verify a model is suitable for your use case in a single call.

boolean suitable = caps.meetsRequirements(
    true,   // requiresVision
    true,   // requiresTools
    32000   // minContextTokens
);

Validation Methods

When a capability is required (not optional), use these methods to fail fast at startup with a clear error message rather than getting cryptic errors at runtime.

caps.requireToolCalling();  // throws ToolCallNotSupportedException
caps.requireStreaming();     // throws LLMCapabilityException
caps.requireVision();        // throws LLMCapabilityException

Model Capability Profiles

A quick reference for the most commonly used models and their supported features.

ModelVisionToolsJSONContext
GPT-4oYesYesYes128K
GPT-4-turboYesYesYes128K
GPT-3.5-turboNoYesYes16K
Claude Sonnet 4YesYesYes200K
Claude 3 OpusYesYesYes200K
Gemini 2.5 FlashYesYesYes1M
Llama 3.2NoYesNo128K
Mistral LargeNoYesYes128K

Multimodal Input

Some models can process images, audio, and video alongside text. TnsAI uses a ContentPart system to represent mixed-media messages, so you can combine text with images or audio in a single request.

ClassTypeDescription
TextPart"text"Plain text content
ImagePart"image"Image data (Base64 encoded)
AudioPart"audio"Audio data (Base64, URL, or file)
VideoPart"video"Video data (Gemini)

Sending Images

Create an ImagePart from Base64-encoded data and include it alongside text in a multimodal message.

// Create image part from Base64 data
ImagePart image = ImagePart.fromBase64(base64Data, "image/png");

// Build multimodal message
List<ContentPart> parts = List.of(
    new TextPart("What do you see in this image?"),
    image
);

// Send to a vision-capable model
ChatResponse response = client.chat(parts,
    Optional.of("You are a helpful assistant"),
    Optional.empty(),
    Optional.empty()
);

Sending Audio

Create an AudioPart from a file, byte array, Base64 string, or URL. The model will process the audio alongside any text you include.

// From file
AudioPart audio = AudioPart.fromFile(new File("recording.mp3"));

// From Base64
AudioPart audio = AudioPart.fromBase64(base64String, "audio/wav");

// From byte array
AudioPart audio = AudioPart.fromBytes(rawBytes, "audio/mp3");

// From URL
AudioPart audio = AudioPart.fromUrl("https://example.com/audio.mp3");

// Send as multimodal message
List<ContentPart> parts = List.of(
    new TextPart("Transcribe this audio"),
    audio
);
ChatResponse response = client.chat(parts, systemPrompt, history, tools);

Capability Check Before Multimodal

Always check the model's capabilities before sending multimodal content. If the model does not support vision or audio, fall back to a text-only alternative.

LLMCapabilities caps = client.getCapabilities();

if (caps.supportsVision()) {
    // Safe to send ImagePart
    client.chat(List.of(new TextPart("Describe this"), image), system, history, tools);
} else {
    // Fall back to text-only
    client.chat("Describe the concept", system, history, tools);
}

SPI Registration

The LLM module uses Java's ServiceLoader mechanism to register itself automatically. You do not need to configure this manually -- just add the tnsai-llm dependency to your project and the providers become available through LLMClientFactory.

# META-INF/services/com.tnsai.llm.LLMClientProvider
com.tnsai.llm.LLMClientFactoryProvider

On this page