Error Handling
TnsAI.Core provides a structured exception hierarchy rooted in `TnsAIException`. Every exception carries an error code, retryability flag, and suggested retry parameters, enabling automated recovery decisions across the framework.
TnsAIException (Base Class)
All TnsAI exceptions extend TnsAIException, which itself extends RuntimeException (unchecked).
public class TnsAIException extends RuntimeException {
public boolean isRetryable();
public String getErrorCode();
public long getSuggestedRetryDelayMs();
public int getMaxRetryAttempts();
}| Method | Description |
|---|---|
isRetryable() | true for transient errors (network, rate limits, server errors) |
getErrorCode() | Auto-derived code in format TNSAI-CLASSNAME (e.g., TNSAI-NETWORK, TNSAI-RATELIMIT) |
getSuggestedRetryDelayMs() | Default 1000ms for retryable, 0 for non-retryable. Subclasses override with specific delays. |
getMaxRetryAttempts() | Default 3 for retryable, 0 for non-retryable |
The error code is derived from the class name using Locale.ROOT:
// TnsAIException -> "TNSAI-TNSAI"
// LLMException -> "TNSAI-LLM"
// NetworkException -> "TNSAI-NETWORK"LLMException
When something goes wrong during a call to an LLM provider (invalid API key, context too long, server outage), the framework throws an LLMException. Each exception carries an ErrorType that tells you exactly what happened and whether it makes sense to retry.
public class LLMException extends TnsAIException {
public String getModel();
public ErrorType getErrorType();
public long getSuggestedRetryDelayMs();
}ErrorType Enum
The ErrorType enum classifies the root cause of an LLM failure. Use it to decide how your application should react -- for example, retrying a transient SERVER_ERROR but surfacing a permanent AUTHENTICATION_FAILED to the user.
| Value | Retryable | Description |
|---|---|---|
MODEL_NOT_FOUND | No | Model not found or unavailable |
AUTHENTICATION_FAILED | No | Invalid API key or auth failure |
CONTENT_FILTERED | No | Content policy violation |
INVALID_REQUEST | No | Invalid request format |
MODEL_OVERLOADED | Yes | Model overloaded, try again |
CONTEXT_TOO_LONG | No | Context length exceeded |
SERVER_ERROR | Yes | Generic server error |
MALFORMED_TOOL_CALL | No | Bad JSON or missing fields in tool call from model |
CAPABILITY_MISMATCH | No | Model does not support a required capability |
UNKNOWN | Yes | Unknown error |
Retry delays are error-type specific:
MODEL_OVERLOADED-- 5000msSERVER_ERROR-- 2000ms- All others -- 1000ms
Factory Methods
Instead of calling the constructor directly, use these static factory methods to create LLMException instances with the correct ErrorType and retry parameters already set.
LLMException.modelNotFound("gpt-5")
LLMException.authenticationFailed("claude-sonnet-4", "Invalid API key")
LLMException.contentFiltered("gpt-4o", "Violates content policy")
LLMException.contextTooLong("gpt-4o-mini", 128000, 150000)
LLMException.modelOverloaded("claude-sonnet-4")
LLMException.serverError("gemini-2.5-flash", cause)
LLMException.malformedToolCall("gpt-4o", "search", "Invalid JSON in arguments", cause)
LLMException.malformedToolCall("gpt-4o", "search", "Missing required field 'query'")RateLimitException
LLM providers enforce request quotas and return HTTP 429 ("Too Many Requests") when you exceed them. RateLimitException wraps these responses and is always retryable, carrying the provider-suggested wait time so your code can back off automatically.
public class RateLimitException extends TnsAIException {
public Long getRetryAfterMs();
public String getService();
public Integer getRemainingQuota();
public long getSuggestedRetryDelayMs(); // Uses retryAfterMs, defaults to 60000ms
public int getMaxRetryAttempts(); // Returns 5
}Factory Methods
These factory methods create RateLimitException instances from common rate-limit scenarios, automatically setting the correct retry delay and max retry count.
// Parse HTTP 429 Retry-After header (seconds -> ms conversion)
RateLimitException.fromHttp429("openai", "30")
// LLM quota exceeded (defaults to 300000ms / 5 minutes)
RateLimitException.llmQuotaExceeded("claude-sonnet-4")
// API endpoint rate limit with explicit delay
RateLimitException.apiRateLimit("/api/v1/chat", 10000L)ActionExecutionException
When an agent action fails at runtime (a web service call times out, a parameter is missing, an MCP tool returns an error), the framework throws an ActionExecutionException. It includes the action name, type, and error category so you can programmatically decide whether to retry, fix parameters, or escalate.
public class ActionExecutionException extends TnsAIException {
public String getActionName();
public ActionType getActionType();
public ErrorCategory getCategory();
public String getDetailedMessage();
}ErrorCategory Enum
Each ActionExecutionException is tagged with an ErrorCategory that groups the failure by root cause. This makes it straightforward to write a switch block that handles transient network errors differently from permanent validation errors.
| Category | Retryable (default) | Description |
|---|---|---|
NETWORK | Yes | Connection timeout, DNS failure |
PARAMETER | No | Missing parameter, wrong type |
CLIENT_ERROR | No | HTTP 4xx status codes |
SERVER_ERROR | Yes | HTTP 5xx status codes |
VALIDATION | No | Contract violations |
LLM | Yes | Model errors, quota exceeded |
MCP | Yes | MCP tool errors |
INVOCATION | No | Reflection, method not found |
UNKNOWN | No | Unclassified errors |
The getDetailedMessage() method produces a structured log line:
[WEB_SERVICE] Action 'fetchWeather' failed: Network error | Category: Network error | Retryable: true | Cause: SocketTimeoutException (Connect timed out)Factory Methods
Use these static factories to create ActionExecutionException instances with the correct category, retryability flag, and detailed message already populated.
ActionExecutionException.fromNetworkError("fetchWeather", ActionType.WEB_SERVICE, ioException)
ActionExecutionException.fromParameterError("search", ActionType.LOCAL, "query is required", cause)
ActionExecutionException.fromApiError("createIssue", ActionType.WEB_SERVICE, 503, "Service Unavailable", cause)
ActionExecutionException.fromLLMError("summarize", ActionType.LLM, llmException)
ActionExecutionException.fromMCPError("mcp-tool-name", mcpException)
ActionExecutionException.fromInvocationError("calculate", ActionType.LOCAL, reflectionException)Other Exceptions
Beyond the main exception types above, TnsAI provides several specialized exceptions for network failures, timeouts, validation errors, capability mismatches, and control-flow signals. The table below summarizes their retry behavior and key fields.
| Exception | Retryable | Retry Delay | Max Retries | Key Fields |
|---|---|---|---|---|
NetworkException | Yes | 2000ms | 5 | host, port |
TimeoutException | Yes | min(timeoutMs/2, 5000) | 3 | timeoutMs, operation |
ValidationException | No | -- | -- | -- |
ApprovalRequiredException | No | -- | -- | actionName, reason |
TaskCompleteException | No | -- | -- | summary, result, success, metadata |
LLMCapabilityException | No | -- | -- | provider, capability |
ToolCallNotSupportedException | No | -- | -- | model, provider |
NetworkException Factories
Create NetworkException instances for common connectivity failures like refused connections, DNS resolution problems, and connection timeouts.
NetworkException.connectionRefused("api.example.com", 443)
NetworkException.dnsResolutionFailed("api.example.com", cause)
NetworkException.connectionTimeout("api.example.com", 443, 5000L)TimeoutException Factories
Create TimeoutException instances for operations that exceed their time budget, whether that is an LLM call, an HTTP request, or an action execution.
TimeoutException.llmTimeout(30000L)
TimeoutException.httpTimeout("https://api.example.com/v1/chat", 10000L)
TimeoutException.actionTimeout("fetchData", 5000L)LLMCapabilityException Factories
Thrown when you request a feature (streaming, vision, structured output) that the selected model or provider does not support. These are never retryable because the model simply lacks the capability.
LLMCapabilityException.streamingNotSupported("phi", "Ollama")
LLMCapabilityException.visionNotSupported("gpt-3.5-turbo", "OpenAI")
LLMCapabilityException.structuredOutputNotSupported("llama-2", "Ollama")TaskCompleteException (Control Flow)
TaskCompleteException is not an error -- it is a control flow signal used to indicate that a task has been completed and the agent loop should terminate.
// Simple completion
throw new TaskCompleteException("Analysis complete");
// With result data
throw TaskCompleteException.withResult("Task done", Map.of("filesCreated", 5));
// Failed completion
throw TaskCompleteException.failed("Could not complete", "API unavailable");
// With metadata
throw TaskCompleteException.withMetadata("Done", Map.of("duration", "45s"));Handling in the agent loop:
try {
agent.run(task);
} catch (TaskCompleteException e) {
System.out.println("Summary: " + e.getSummary());
System.out.println("Success: " + e.isSuccess());
if (e.hasResult()) {
MyResult result = e.getResultAs(MyResult.class);
}
}Code Examples
These examples show common patterns for handling TnsAI exceptions in your application code.
Catching and Classifying Errors
The recommended approach is to catch exceptions from most specific to least specific. This lets you handle rate limits, LLM-specific errors, and generic TnsAI errors each in the most appropriate way.
try {
String response = agent.chat("Analyze this data");
} catch (RateLimitException e) {
// Wait for the provider-specified delay
Thread.sleep(e.getSuggestedRetryDelayMs());
// Retry...
} catch (LLMException e) {
if (e.getErrorType() == LLMException.ErrorType.CONTEXT_TOO_LONG) {
// Truncate context and retry
} else if (e.isRetryable()) {
// Retry with backoff
} else {
// Log and fail
logger.error("LLM error [{}]: {}", e.getErrorCode(), e.getMessage());
}
} catch (TnsAIException e) {
if (e.isRetryable()) {
logger.warn("Retryable error [{}], retrying in {}ms",
e.getErrorCode(), e.getSuggestedRetryDelayMs());
} else {
logger.error("Non-retryable error [{}]: {}", e.getErrorCode(), e.getMessage());
}
}Handling Action Execution Errors
When an action fails, you can use the error category to decide on recovery. Transient errors (network, server) can be retried with backoff, while parameter or validation errors need to be fixed before retrying.
try {
executor.execute(action, params);
} catch (ActionExecutionException e) {
logger.error(e.getDetailedMessage());
switch (e.getCategory()) {
case NETWORK, SERVER_ERROR -> {
// Transient -- retry with backoff
Thread.sleep(e.getSuggestedRetryDelayMs());
}
case PARAMETER, VALIDATION -> {
// Fix parameters and retry
logger.warn("Fix parameters for action: {}", e.getActionName());
}
case LLM -> {
// Check nested LLMException for details
if (e.getCause() instanceof LLMException llm) {
logger.warn("LLM error type: {}", llm.getErrorType());
}
}
default -> throw e;
}
}Using Error Codes for Monitoring
Every TnsAIException carries a stable error code (like TNSAI-NETWORK or TNSAI-RATELIMIT) that you can use as a metric tag in your monitoring system. This example shows how to increment a counter on each failure for dashboards and alerting.
try {
agent.chat("query");
} catch (TnsAIException e) {
metrics.counter("tnsai.errors",
"code", e.getErrorCode(),
"retryable", String.valueOf(e.isRetryable())
).increment();
throw e;
}Agents
An `Agent` is the top-level orchestrator in TnsAI. It owns an LLM client, one or more roles, a memory store, and an event system. Agents handle the full chat loop: receiving a message, consulting their roles for available actions, calling the LLM, executing tool calls, and returning a response.
Event System
The event system provides full observability into the agent lifecycle. Events use a sealed interface hierarchy with 20+ event types, enabling type-safe pattern matching.