Error Handling

TnsAI.Core provides a structured exception hierarchy rooted in `TnsAIException`. Every exception carries an error code, retryability flag, and suggested retry parameters, enabling automated recovery decisions across the framework.

TnsAIException (Base Class)

All TnsAI exceptions extend TnsAIException, which itself extends RuntimeException (unchecked).

public class TnsAIException extends RuntimeException {
    public boolean isRetryable();
    public String getErrorCode();
    public long getSuggestedRetryDelayMs();
    public int getMaxRetryAttempts();
}

Method	Description
`isRetryable()`	`true` for transient errors (network, rate limits, server errors)
`getErrorCode()`	Auto-derived code in format `TNSAI-CLASSNAME` (e.g., `TNSAI-NETWORK`, `TNSAI-RATELIMIT`)
`getSuggestedRetryDelayMs()`	Default `1000ms` for retryable, `0` for non-retryable. Subclasses override with specific delays.
`getMaxRetryAttempts()`	Default `3` for retryable, `0` for non-retryable

The error code is derived from the class name using Locale.ROOT:

// TnsAIException -> "TNSAI-TNSAI"
// LLMException   -> "TNSAI-LLM"
// NetworkException -> "TNSAI-NETWORK"

LLMException

When something goes wrong during a call to an LLM provider (invalid API key, context too long, server outage), the framework throws an LLMException. Each exception carries an ErrorType that tells you exactly what happened and whether it makes sense to retry.

public class LLMException extends TnsAIException {
    public String getModel();
    public ErrorType getErrorType();
    public long getSuggestedRetryDelayMs();
}

ErrorType Enum

The ErrorType enum classifies the root cause of an LLM failure. Use it to decide how your application should react -- for example, retrying a transient SERVER_ERROR but surfacing a permanent AUTHENTICATION_FAILED to the user.

Value	Retryable	Description
`MODEL_NOT_FOUND`	No	Model not found or unavailable
`AUTHENTICATION_FAILED`	No	Invalid API key or auth failure
`CONTENT_FILTERED`	No	Content policy violation
`INVALID_REQUEST`	No	Invalid request format
`MODEL_OVERLOADED`	Yes	Model overloaded, try again
`CONTEXT_TOO_LONG`	No	Context length exceeded
`SERVER_ERROR`	Yes	Generic server error
`MALFORMED_TOOL_CALL`	No	Bad JSON or missing fields in tool call from model
`CAPABILITY_MISMATCH`	No	Model does not support a required capability
`UNKNOWN`	Yes	Unknown error

Retry delays are error-type specific:

MODEL_OVERLOADED -- 5000ms
SERVER_ERROR -- 2000ms
All others -- 1000ms

Factory Methods

Instead of calling the constructor directly, use these static factory methods to create LLMException instances with the correct ErrorType and retry parameters already set.

LLMException.modelNotFound("gpt-5")
LLMException.authenticationFailed("claude-sonnet-4", "Invalid API key")
LLMException.contentFiltered("gpt-4o", "Violates content policy")
LLMException.contextTooLong("gpt-4o-mini", 128000, 150000)
LLMException.modelOverloaded("claude-sonnet-4")
LLMException.serverError("gemini-2.5-flash", cause)
LLMException.malformedToolCall("gpt-4o", "search", "Invalid JSON in arguments", cause)
LLMException.malformedToolCall("gpt-4o", "search", "Missing required field 'query'")

RateLimitException

LLM providers enforce request quotas and return HTTP 429 ("Too Many Requests") when you exceed them. RateLimitException wraps these responses and is always retryable, carrying the provider-suggested wait time so your code can back off automatically.

public class RateLimitException extends TnsAIException {
    public Long getRetryAfterMs();
    public String getService();
    public Integer getRemainingQuota();
    public long getSuggestedRetryDelayMs();  // Uses retryAfterMs, defaults to 60000ms
    public int getMaxRetryAttempts();        // Returns 5
}

Factory Methods

These factory methods create RateLimitException instances from common rate-limit scenarios, automatically setting the correct retry delay and max retry count.

// Parse HTTP 429 Retry-After header (seconds -> ms conversion)
RateLimitException.fromHttp429("openai", "30")

// LLM quota exceeded (defaults to 300000ms / 5 minutes)
RateLimitException.llmQuotaExceeded("claude-sonnet-4")

// API endpoint rate limit with explicit delay
RateLimitException.apiRateLimit("/api/v1/chat", 10000L)

ActionExecutionException

When an agent action fails at runtime (a web service call times out, a parameter is missing, an MCP tool returns an error), the framework throws an ActionExecutionException. It includes the action name, type, and error category so you can programmatically decide whether to retry, fix parameters, or escalate.

public class ActionExecutionException extends TnsAIException {
    public String getActionName();
    public ActionType getActionType();
    public ErrorCategory getCategory();
    public String getDetailedMessage();
}

ErrorCategory Enum

Each ActionExecutionException is tagged with an ErrorCategory that groups the failure by root cause. This makes it straightforward to write a switch block that handles transient network errors differently from permanent validation errors.

Category	Retryable (default)	Description
`NETWORK`	Yes	Connection timeout, DNS failure
`PARAMETER`	No	Missing parameter, wrong type
`CLIENT_ERROR`	No	HTTP 4xx status codes
`SERVER_ERROR`	Yes	HTTP 5xx status codes
`VALIDATION`	No	Contract violations
`LLM`	Yes	Model errors, quota exceeded
`MCP`	Yes	MCP tool errors
`INVOCATION`	No	Reflection, method not found
`UNKNOWN`	No	Unclassified errors

The getDetailedMessage() method produces a structured log line:

[WEB_SERVICE] Action 'fetchWeather' failed: Network error | Category: Network error | Retryable: true | Cause: SocketTimeoutException (Connect timed out)

Factory Methods

Use these static factories to create ActionExecutionException instances with the correct category, retryability flag, and detailed message already populated.

ActionExecutionException.fromNetworkError("fetchWeather", ActionType.WEB_SERVICE, ioException)
ActionExecutionException.fromParameterError("search", ActionType.LOCAL, "query is required", cause)
ActionExecutionException.fromApiError("createIssue", ActionType.WEB_SERVICE, 503, "Service Unavailable", cause)
ActionExecutionException.fromLLMError("summarize", ActionType.LLM, llmException)
ActionExecutionException.fromMCPError("mcp-tool-name", mcpException)
ActionExecutionException.fromInvocationError("calculate", ActionType.LOCAL, reflectionException)

Other Exceptions

Beyond the main exception types above, TnsAI provides several specialized exceptions for network failures, timeouts, validation errors, capability mismatches, and control-flow signals. The table below summarizes their retry behavior and key fields.

Exception	Retryable	Retry Delay	Max Retries	Key Fields
`NetworkException`	Yes	2000ms	5	`host`, `port`
`TimeoutException`	Yes	`min(timeoutMs/2, 5000)`	3	`timeoutMs`, `operation`
`ValidationException`	No	--	--	--
`ApprovalRequiredException`	No	--	--	`actionName`, `reason`
`TaskCompleteException`	No	--	--	`summary`, `result`, `success`, `metadata`
`LLMCapabilityException`	No	--	--	`provider`, `capability`
`ToolCallNotSupportedException`	No	--	--	`model`, `provider`

NetworkException Factories

Create NetworkException instances for common connectivity failures like refused connections, DNS resolution problems, and connection timeouts.

NetworkException.connectionRefused("api.example.com", 443)
NetworkException.dnsResolutionFailed("api.example.com", cause)
NetworkException.connectionTimeout("api.example.com", 443, 5000L)

TimeoutException Factories

Create TimeoutException instances for operations that exceed their time budget, whether that is an LLM call, an HTTP request, or an action execution.

TimeoutException.llmTimeout(30000L)
TimeoutException.httpTimeout("https://api.example.com/v1/chat", 10000L)
TimeoutException.actionTimeout("fetchData", 5000L)

LLMCapabilityException Factories

Thrown when you request a feature (streaming, vision, structured output) that the selected model or provider does not support. These are never retryable because the model simply lacks the capability.

LLMCapabilityException.streamingNotSupported("phi", "Ollama")
LLMCapabilityException.visionNotSupported("gpt-3.5-turbo", "OpenAI")
LLMCapabilityException.structuredOutputNotSupported("llama-2", "Ollama")

TaskCompleteException (Control Flow)

TaskCompleteException is not an error -- it is a control flow signal used to indicate that a task has been completed and the agent loop should terminate.

// Simple completion
throw new TaskCompleteException("Analysis complete");

// With result data
throw TaskCompleteException.withResult("Task done", Map.of("filesCreated", 5));

// Failed completion
throw TaskCompleteException.failed("Could not complete", "API unavailable");

// With metadata
throw TaskCompleteException.withMetadata("Done", Map.of("duration", "45s"));

Handling in the agent loop:

try {
    agent.run(task);
} catch (TaskCompleteException e) {
    System.out.println("Summary: " + e.getSummary());
    System.out.println("Success: " + e.isSuccess());
    if (e.hasResult()) {
        MyResult result = e.getResultAs(MyResult.class);
    }
}

Code Examples

These examples show common patterns for handling TnsAI exceptions in your application code.

Catching and Classifying Errors

The recommended approach is to catch exceptions from most specific to least specific. This lets you handle rate limits, LLM-specific errors, and generic TnsAI errors each in the most appropriate way.

try {
    String response = agent.chat("Analyze this data");
} catch (RateLimitException e) {
    // Wait for the provider-specified delay
    Thread.sleep(e.getSuggestedRetryDelayMs());
    // Retry...
} catch (LLMException e) {
    if (e.getErrorType() == LLMException.ErrorType.CONTEXT_TOO_LONG) {
        // Truncate context and retry
    } else if (e.isRetryable()) {
        // Retry with backoff
    } else {
        // Log and fail
        logger.error("LLM error [{}]: {}", e.getErrorCode(), e.getMessage());
    }
} catch (TnsAIException e) {
    if (e.isRetryable()) {
        logger.warn("Retryable error [{}], retrying in {}ms",
            e.getErrorCode(), e.getSuggestedRetryDelayMs());
    } else {
        logger.error("Non-retryable error [{}]: {}", e.getErrorCode(), e.getMessage());
    }
}

Handling Action Execution Errors

When an action fails, you can use the error category to decide on recovery. Transient errors (network, server) can be retried with backoff, while parameter or validation errors need to be fixed before retrying.

try {
    executor.execute(action, params);
} catch (ActionExecutionException e) {
    logger.error(e.getDetailedMessage());

    switch (e.getCategory()) {
        case NETWORK, SERVER_ERROR -> {
            // Transient -- retry with backoff
            Thread.sleep(e.getSuggestedRetryDelayMs());
        }
        case PARAMETER, VALIDATION -> {
            // Fix parameters and retry
            logger.warn("Fix parameters for action: {}", e.getActionName());
        }
        case LLM -> {
            // Check nested LLMException for details
            if (e.getCause() instanceof LLMException llm) {
                logger.warn("LLM error type: {}", llm.getErrorType());
            }
        }
        default -> throw e;
    }
}

Using Error Codes for Monitoring

Every TnsAIException carries a stable error code (like TNSAI-NETWORK or TNSAI-RATELIMIT) that you can use as a metric tag in your monitoring system. This example shows how to increment a counter on each failure for dashboards and alerting.

try {
    agent.chat("query");
} catch (TnsAIException e) {
    metrics.counter("tnsai.errors",
        "code", e.getErrorCode(),
        "retryable", String.valueOf(e.isRetryable())
    ).increment();
    throw e;
}

Error Handling

On this page