TnsAI
Quality

Observability

Quick Start

TnsAI provides structured logging out of the box. Create a logger for your agent and attach key-value metadata to every log entry for easy filtering and searching in your logging backend.

// Structured logging
AgentLogger logger = AgentLogger.forAgent("agent-001");
logger.info("Processing request")
    .with("userId", userId)
    .with("action", "search")
    .log();

OpenTelemetry Tracing

Distributed tracing with automatic span management:

TnsAITelemetry telemetry = TnsAITelemetry.builder()
    .serviceName("my-agent-service")
    .serviceVersion("1.0.0")
    .otlpEndpoint("http://localhost:4317")
    .build();

OpenTelemetryTracer tracer = new OpenTelemetryTracer(telemetry.getTracer());

try (SpanScope scope = tracer.startSpan("process-request")) {
    scope.setAttribute("user.id", userId);
    scope.setAttribute("request.type", "search");
    // ... work happens here ...
    scope.setSuccess();
} // Span automatically closed

// Convenience wrapper
String result = tracer.trace("fetch-data", () -> fetchData(query));

Prometheus Metrics

TnsAI provides pre-defined counters and histograms that track agent actions, LLM calls, token usage, and latency. These integrate with OpenTelemetry and can be exported to any compatible metrics backend.

OpenTelemetryMetrics metrics = new OpenTelemetryMetrics(meter);

// Record agent actions
metrics.recordActionSuccess("search", 150);  // type, duration ms
metrics.recordActionFailure("write", 50, "timeout");

// Record LLM calls
metrics.recordLlmCall("openai", "gpt-4o", 1000, 500, 2300);  // provider, model, tokens, latency
metrics.recordLlmCallFailure("anthropic", "claude-3", 1500, "rate_limit");

// Gauges
metrics.setActiveAgents(5);

Available metrics:

MetricTypeDescription
tnsai.agent.actions.totalCounterTotal actions executed
tnsai.agent.actions.successCounterSuccessful actions
tnsai.agent.actions.failedCounterFailed actions
tnsai.agent.action.durationHistogramAction execution time (ms)
tnsai.llm.calls.totalCounterTotal LLM calls
tnsai.llm.tokens.promptCounterInput tokens consumed
tnsai.llm.tokens.completionCounterOutput tokens generated
tnsai.llm.latencyHistogramLLM response time (ms)
tnsai.agent.activeGaugeCurrently active agents

Structured Logging

The AgentLogger provides a fluent API for emitting structured log entries with arbitrary key-value metadata. It also includes convenience methods for common agent events like action starts, completions, and LLM calls.

AgentLogger logger = AgentLogger.forAgent("agent-001");

// Fluent API
logger.info("Tool executed")
    .with("tool", "brave_search")
    .with("duration", 350)
    .log();

// Convenience methods
logger.logActionStart("searchPapers", params);
logger.logActionComplete("searchPapers", 1200, true);
logger.logLLMCall("gpt-4o", 500, 2100);

// Timed execution
String result = logger.timed("database-query", () -> db.query(sql));

Themed Logging

Themed loggers wrap SLF4J and add fun, themed prefixes to log messages based on the event type. 7 built-in themes, 22 event types, custom theme support.

// Create with theme
ThemedLogger logger = ThemedLogger.create("my-agent", LogThemes.STAR_WARS);

// Log events — each gets a themed prefix
logger.actionStart("Searching for documents");
// Output: [LIGHTSABER ON] Searching for documents

logger.actionComplete("Found 5 documents");
// Output: [THE FORCE SUCCEEDS] Found 5 documents

logger.error("Connection failed");
// Output: [SITH LORD DETECTED] Connection failed

// Startup/shutdown banners
logger.logStartup();
// Output: A long time ago in a galaxy far, far away... TnsAI awakens.

Factory methods:

// With explicit theme
ThemedLogger logger = ThemedLogger.create("agent-001", LogThemes.LOTR);

// Default theme (no prefix)
ThemedLogger logger = ThemedLogger.create("agent-001");

// From class name
ThemedLogger logger = ThemedLogger.forClass(MyAgent.class, LogThemes.MATRIX);

Available themes:

ThemeNameExample prefix
LogThemes.DEFAULT"default"(no prefix)
LogThemes.STAR_WARS"starwars"[LIGHTSABER ON], [CONSULTING YODA]
LogThemes.LOTR"lotr"[QUEST BEGINS], [CONSULTING GANDALF]
LogThemes.MATRIX"matrix"[ENTERING THE MATRIX], [CALLING ORACLE]
LogThemes.PIRATE"pirate"[SETTING SAIL], [CONSULTING DAVY JONES]
LogThemes.TURKISH"turkish"[IS BASLIYOR], [AKIL DANIYOR]
LogThemes.EMOJI"emoji"Lightning, Robot, Trophy emojis

22 event types (LogEvent enum):

GroupEvents
Agent lifecycleAGENT_START, AGENT_STOP, AGENT_IDLE
ActionsACTION_START, ACTION_COMPLETE, ACTION_ERROR
PlanningPLAN_START, PLAN_COMPLETE, GOAL_ACHIEVED, GOAL_FAILED
LLMLLM_CALL, LLM_RESPONSE, LLM_ERROR, TOOL_CALL
CommunicationMESSAGE_SENT, MESSAGE_RECEIVED
StateSTATE_CHANGE, BELIEF_UPDATE
GeneralINFO, WARNING, ERROR, DEBUG, SUCCESS, FAILURE

Custom themes and runtime switching:

// Implement LogTheme interface
LogTheme custom = new LogTheme() {
    public String name() { return "custom"; }
    public String format(LogEvent event, String message) {
        return "[MY-PREFIX] " + message;
    }
};

// Register for lookup by name
LogThemes.register(custom);
LogTheme theme = LogThemes.get("custom");

// Change theme at runtime
logger.setTheme(LogThemes.PIRATE);

// List and check themes
String[] names = LogThemes.listThemes();
boolean exists = LogThemes.hasTheme("starwars");

Prometheus Exporter

If you use Prometheus for monitoring, this exporter spins up a lightweight HTTP server that serves metrics in the Prometheus text format. It syncs with the Metrics singleton and exposes all predefined agent and LLM counters.

// Create and start
PrometheusMetricsExporter exporter = PrometheusMetricsExporter.builder()
    .port(9090)
    .build();
exporter.start();
// Metrics available at http://localhost:9090/metrics

// Record metrics
exporter.recordActionSuccess("search", 150);       // type, duration ms
exporter.recordActionFailure("write", "timeout", 50);
exporter.recordLlmCall("openai", "gpt-4o", 1000, 500, 2300);
exporter.recordLlmCallFailure("anthropic", "claude-3", "rate_limit", 1500);
exporter.setActiveAgents(5);

// One-liner startup
PrometheusMetricsExporter exporter = PrometheusMetricsExporter.createAndStart();

// Shutdown
exporter.stop();

Environment variables:

VariableDefaultDescription
TNSAI_PROMETHEUS_PORT9090HTTP server port
TNSAI_PROMETHEUS_ENABLEDtrueEnable/disable exporter
// Configure from environment
PrometheusMetricsExporter exporter = PrometheusMetricsExporter.builder()
    .fromEnvironment()
    .build();

Predefined Prometheus metrics:

MetricTypeLabels
tnsai_agent_actions_totalCounterstatus
tnsai_agent_actions_success_totalCounter--
tnsai_agent_actions_failed_totalCountererror_type
tnsai_llm_calls_totalCounterprovider, model, status
tnsai_llm_calls_failed_totalCounterprovider, model, error_type
tnsai_llm_prompt_tokens_totalCounterprovider, model
tnsai_llm_completion_tokens_totalCounterprovider, model
tnsai_agent_activeGauge--
tnsai_agent_action_duration_secondsHistogramaction_type
tnsai_llm_latency_secondsHistogramprovider, model

Custom metrics:

Counter counter = exporter.getOrCreateCounter("my_counter", "My help text", "label1");
Gauge gauge = exporter.getOrCreateGauge("my_gauge", "My help text", "label1");
Histogram histogram = exporter.getOrCreateHistogram("my_histogram", "My help text", "label1");

LLM Instrumentation

This module provides OpenTelemetry instrumentation specifically for LLM calls. It follows the GenAI semantic conventions, which means your traces are compatible with observability tools like Jaeger, Zipkin, and Grafana Tempo without any custom mapping.

LlmInstrumentation llmInstr = new LlmInstrumentation(telemetry.getTracer());

// Trace a chat completion
ChatResponse response = llmInstr.traceChat("openai", "gpt-4", () -> {
    return client.chat(message);
});

// With request parameters
ChatResponse response = llmInstr.traceChat("openai", "gpt-4", 4096, 0.7, () -> {
    return client.chat(message);
});

// Trace streaming
Stream<ChatChunk> stream = llmInstr.traceStreamChat("anthropic", "claude-3", () -> {
    return client.streamChat(message);
});

// Trace embeddings
List<float[]> embeddings = llmInstr.traceEmbeddings("openai", "text-embedding-3", 10, () -> {
    return client.embed(inputs);
});

GenAI semantic convention attributes:

AttributeDescription
gen_ai.systemLLM provider (openai, anthropic, etc.)
gen_ai.request.modelModel name
gen_ai.request.max_tokensMax tokens requested
gen_ai.request.temperatureTemperature setting
gen_ai.request.top_pTop-p setting
gen_ai.usage.prompt_tokensPrompt tokens used
gen_ai.usage.completion_tokensCompletion tokens generated
gen_ai.usage.total_tokensTotal tokens
gen_ai.response.finish_reasonsFinish reason (stop, length, error)
gen_ai.operation.nameOperation type (chat, stream_chat, embeddings)

TnsAI-specific attributes:

AttributeDescription
tnsai.agent.idAgent identifier
tnsai.agent.nameAgent name
tnsai.message.lengthInput message length
tnsai.tool_use.enabledWhether tool use is enabled
tnsai.tool_calls.countNumber of tool calls

Recording token usage and tool events on a span:

Span span = Span.current();
llmInstr.recordTokenUsage(span, 150, 50);
llmInstr.recordResponse(span, 150, 50, "stop");
llmInstr.recordToolUse(span, 3);
llmInstr.recordToolCallEvent(span, "brave_search", true);

Span builder for full control:

Span span = llmInstr.spanBuilder("openai", "gpt-4")
    .operation("chat")
    .maxTokens(4096)
    .temperature(0.7)
    .topP(0.9)
    .agent("agent-001", "ResearchAgent")
    .messageLength(500)
    .toolUseEnabled(true)
    .start();

Agent Instrumentation

Agent instrumentation adds tracing and metrics to agent operations without modifying the Agent class itself. It wraps calls like chat, action execution, and tool calls with OpenTelemetry spans, giving you end-to-end visibility into what your agents are doing.

// Initialize
AgentInstrumentation instrumentation = new AgentInstrumentation(telemetry);
// Or with custom tracer/metrics
AgentInstrumentation instrumentation = new AgentInstrumentation(tracer, metrics);

// Register agents (increments active agent gauge)
instrumentation.registerAgent(agent);

Traced operations:

// Trace chat
String response = instrumentation.traceChat(agent, "Hello", () -> {
    return agent.chat("Hello");
});

// Trace action execution
Object result = instrumentation.traceAction(agent, "searchPapers", params, () -> {
    return agent.executeAction("searchPapers", params);
});

// Trace tool call
String output = instrumentation.traceToolCall(agent, "brave_search", args, () -> {
    return tool.execute(args);
});

// Trace LLM call with GenAI conventions
ChatResponse resp = instrumentation.traceLlmCall(agent, "openai", "gpt-4", () -> {
    return llmClient.chat(messages);
});

Span attributes per operation:

OperationAttributes
agent.chatagent.id, agent.name, message.length, message.preview
agent.actionagent.id, agent.name, action.name, action.parameters.count
agent.tool_callagent.id, tool.name, tool.arguments.count
agent.llm_callagent.id, gen_ai.system, gen_ai.request.model, gen_ai.operation.name

Recording metrics directly:

instrumentation.recordLlmTokens("openai", "gpt-4", 150, 50, 2300);
instrumentation.recordLlmFailure("openai", "gpt-4", 1500, "rate_limit");
instrumentation.unregisterAgent(agent);  // decrements active agent gauge

Agent-Specific Tracing

While the above instrumentation gives you OpenTelemetry-level visibility, agent-specific tracing goes deeper. It captures the full execution trace of a single agent run, including every LLM call, tool invocation, and guardrail check, along with quality scores you attach manually or automatically.

AgentTrace

An AgentTrace represents a complete execution recording for one agent run. You start a trace, add observations as the agent works, and then complete it. This is the foundation for evaluation and debugging.

AgentTrace trace = AgentTrace.start("agent-001", "research-task");

// Add observations during execution
trace.addObservation(Observation.span("llm-call")
    .input("Summarize the document")
    .output("The document describes...")
    .duration(Duration.ofMillis(2300))
    .metadata(Map.of("model", "claude-sonnet-4", "tokens", 1500))
    .build());

trace.addObservation(Observation.generation("tool-call")
    .input(Map.of("tool", "brave_search", "query", "quantum computing"))
    .output("Results: ...")
    .build());

trace.addObservation(Observation.event("guardrail-check")
    .metadata(Map.of("guardrail", "pii-filter", "passed", true))
    .build());

trace.complete();

Observation Types

Observations are the building blocks of a trace. Each one records a specific event or operation during the agent's execution, and they come in three types depending on what happened.

TypeConstantDescription
SPANObservation.span()Timed execution block (LLM call, tool execution)
GENERATIONObservation.generation()LLM text generation with input/output
EVENTObservation.event()Point-in-time event (guardrail check, state change)

Each observation supports: input, output, duration, metadata, parentId (for nesting), and status.

Score

Scores let you attach quality measurements to a trace. You can use them for manual evaluation (did the agent answer correctly?) or automated evaluation (did the output contain PII?). Scores come in three types: numeric, boolean, and categorical.

// Numeric score
trace.addScore(Score.numeric("relevance", 0.92));

// Boolean score
trace.addScore(Score.bool("contains_pii", false));

// Categorical score
trace.addScore(Score.categorical("sentiment", "positive"));
Factory MethodScore TypeValue
Score.numeric(name, value)Numeric0.0-1.0 double
Score.bool(name, value)Booleantrue/false
Score.categorical(name, value)CategoricalString category

TraceContext (ThreadLocal Nesting)

TraceContext uses a ThreadLocal to track which trace is active on the current thread. This means you can start nested spans anywhere in your code and they automatically link to the parent span, without passing the trace object through every method call.

// Set current trace for this thread
TraceContext.set(trace);

// Get current trace (returns Optional)
AgentTrace current = TraceContext.current().orElseThrow();

// Nested spans auto-link to parent
try (var scope = TraceContext.startSpan("sub-operation")) {
    // This span's parentId is automatically set to the enclosing span
    doWork();
}

// Clear when done
TraceContext.clear();

Guardrail SPI

Guardrails are safety checks that evaluate agent behavior during or after execution. Implement the Guardrail interface to define your own checks, such as ensuring the agent does not leak sensitive data or produce harmful content.

public interface Guardrail {
    GuardrailResult evaluate(AgentTrace trace);
    String name();
}

PiiGuardrail

The PiiGuardrail is a built-in guardrail that scans agent outputs for personally identifiable information such as email addresses, phone numbers, and social security numbers. Use it to prevent your agents from accidentally leaking sensitive data.

PiiGuardrail piiGuard = new PiiGuardrail();
GuardrailResult result = piiGuard.evaluate(trace);

if (!result.passed()) {
    System.out.println("PII detected: " + result.findings());
    // Findings include: type (EMAIL, PHONE, SSN, etc.), location, severity
}

Wire guardrails into the agent:

Agent agent = AgentBuilder.create()
    .model("claude-sonnet-4")
    .guardrail(new PiiGuardrail())
    .build();

Health Checks

Health checks let you monitor the operational status of your application's components (database, cache, LLM providers, etc.). TnsAI provides an aggregated health registry that runs checks concurrently, handles timeouts gracefully, and produces HTTP-ready summaries suitable for load balancer probes.

HealthStatus levels:

StatusSeverityDescription
UP0Component functioning normally
DEGRADED1Functioning with reduced capability
UNKNOWN2Status cannot be determined
DOWN3Not functioning

HealthCheckResult factory methods:

HealthCheckResult.up()                        // healthy
HealthCheckResult.up("All good")              // healthy with message
HealthCheckResult.down()                      // unhealthy
HealthCheckResult.down("Connection refused")  // unhealthy with message
HealthCheckResult.down(exception)             // unhealthy from exception
HealthCheckResult.degraded("High latency")    // degraded
HealthCheckResult.unknown("Check timed out")  // unknown
HealthCheckResult.of(status, message)         // any status

// Add details (immutable — returns new instance)
HealthCheckResult result = HealthCheckResult.up()
    .withDetail("connections", 10)
    .withDetail("latency_ms", 50)
    .withDuration(duration);

// Inspect
result.getStatus();         // HealthStatus enum
result.getMessage();        // String or null
result.getDetails();        // Map<String, Object>
result.isHealthy();         // true if UP or DEGRADED
result.getDuration();       // Duration or null
result.toMap();             // Map for JSON serialization
result.combine(other);      // returns result with worse status

HealthIndicator interface:

// Implement for custom components
public class DatabaseHealth implements HealthIndicator {
    public String getName() { return "database"; }
    public HealthCheckResult check() {
        return conn.isValid(5) ? HealthCheckResult.up() : HealthCheckResult.down();
    }
}

// Lambda shorthand — from boolean
HealthIndicator simple = HealthIndicator.of("cache", () -> cache.isConnected());

// Lambda shorthand — from supplier
HealthIndicator detailed = HealthIndicator.of("cache", () -> {
    return HealthCheckResult.up().withDetail("size", cache.size());
});

// Async and timeout support (default methods)
CompletableFuture<HealthCheckResult> future = indicator.checkAsync();
HealthCheckResult result = indicator.checkWithTimeout(Duration.ofSeconds(2));

HealthRegistry:

HealthRegistry registry = HealthRegistry.getInstance();  // singleton

// Register indicators
registry.register(new ResourceHealthIndicator());
registry.register("custom", HealthIndicator.of("custom", () -> true));

// Check health
Optional<HealthCheckResult> single = registry.check("database");
HealthCheckResult all = registry.checkAll();                          // default 30s timeout
HealthCheckResult all = registry.checkAll(Duration.ofSeconds(10));    // custom timeout
CompletableFuture<HealthCheckResult> async = registry.checkAllAsync();
HealthCheckResult matched = registry.checkMatching("db-*");           // wildcard

// Quick status (2s timeout per indicator)
HealthStatus status = registry.getQuickStatus();  // UP, DOWN, DEGRADED, or UNKNOWN

// HTTP-ready summary
Map<String, Object> summary = registry.getHealthSummary();
// { "status": "UP", "timestamp": "...", "totalDurationMs": 45,
//   "components": { "database": {...}, "cache": {...} } }

// Management
registry.getIndicatorNames();  // Set<String>
registry.getIndicatorCount();  // int
registry.unregister("old");
registry.clear();
registry.shutdown();           // call on app shutdown

ResourceHealthIndicator -- monitors heap memory, CPU load, threads, and deadlocks:

// Default thresholds (85% memory, 90% CPU)
registry.register(new ResourceHealthIndicator());

// Custom thresholds
registry.register(new ResourceHealthIndicator(0.9, 0.9));

// Direct queries
ResourceHealthIndicator res = new ResourceHealthIndicator();
double heapUsage = res.getHeapUsageRatio();   // 0.0-1.0
int threads = res.getThreadCount();
boolean deadlock = res.hasDeadlock();

Reports details: heap.used.mb, heap.max.mb, heap.usage.percent, nonHeap.used.mb, cpu.availableProcessors, cpu.systemLoadAverage, cpu.loadPerProcessor, threads.current, threads.peak, threads.deadlocked. Returns DEGRADED on high memory/CPU, DOWN on deadlock.

On this page