TnsAI
Core

Streaming

TnsAI supports three streaming modes for real-time token delivery from LLM providers.

Token Streaming

Returns text tokens as they are generated — simplest mode:

Stream<String> tokens = agent.streamChat("Explain relativity");
tokens.forEach(System.out::print);

ChatChunk Streaming

Returns typed chunks with metadata (token counts, finish reason, tool calls):

llmClient.streamChatWithSpec(request).forEach(chunk -> {
    switch (chunk.getType()) {
        case START -> System.out.println("Stream started: " + chunk.getModel());
        case CONTENT -> System.out.print(chunk.getContent());
        case TOOL_CALL -> handleToolCall(chunk.getToolCall().orElseThrow());
        case DONE -> System.out.println("\nTokens: " + chunk.getTokenCount());
        case ERROR -> System.err.println("Error: " + chunk.getContent());
    }
});

Chunk Types

Each ChatChunk has a type that tells you what kind of data it carries. Your code should handle each type to respond appropriately as the stream progresses.

TypeDescription
STARTStream initialization with model info
CONTENTText content delta
TOOL_CALLTool/function invocation request
DONEStream complete with finish reason
ERRORError occurred during streaming

Finish Reasons

When a stream ends, the DONE chunk includes a finish reason that explains why the LLM stopped generating. This helps you decide what to do next -- for example, if the reason is TOOL_CALLS, you need to execute the requested tool and feed the result back.

ReasonDescription
STOPNatural completion
LENGTHMax tokens reached
TOOL_CALLSLLM wants to call tools
CONTENT_FILTERContent was filtered
ERRORError during generation

Handler-Based Streaming

Callback pattern with full tool-call loop — ideal for UI integration:

llmClient.streamChatWithHandler(request, chunk -> {
    if (chunk.isContent()) {
        System.out.print(chunk.getContent());
    } else if (chunk.isToolCall()) {
        // Framework handles tool execution automatically
    } else if (chunk.isDone()) {
        System.out.println("\nFinish reason: " + chunk.getFinishReason());
    }
});

Convenience Methods

ChatChunk provides static factory methods so you can create chunks without calling constructors directly. These are useful when you build custom streaming pipelines or write tests that simulate LLM output.

// ChatChunk factory methods
ChatChunk.start(model, requestId);
ChatChunk.content("Hello", tokenCount, index);
ChatChunk.content("Hello");
ChatChunk.toolCall(toolCallObject);
ChatChunk.done(FinishReason.STOP, totalTokens);
ChatChunk.error("Something went wrong");

Which Mode to Use?

TnsAI offers three streaming modes at different levels of abstraction. Pick the simplest one that meets your needs.

ModeUse When
Token StreamSimple text display, CLI output
ChatChunk StreamNeed metadata (tokens, model), manual tool handling
Handler-BasedUI integration, automatic tool execution loop

Async Execution

The AsyncAgent interface (com.tnsai.agents.async.AsyncAgent) provides non-blocking chat operations with multiple consumption patterns.

Methods

AsyncAgent exposes several ways to consume responses. Choose based on whether you need simple text, typed events, or reactive backpressure control.

MethodReturn TypeDescription
chatAsync(message)CompletableFuture<String>Async chat, completes with full response
chatAsync(message, options)CompletableFuture<String>Async chat with ChatOptions
chatStream(message)Stream<String>Streaming tokens as a Java Stream
chatEventStream(message)Stream<ChatEvent>Typed event stream (tokens, tool calls, etc.)
chatPublisher(message)Flow.Publisher<ChatEvent>Reactive Streams publisher for backpressure-aware consumers
cancel()voidCancels any ongoing async operation
isProcessing()booleanTrue if an async operation is in progress
getProgress()doubleExecution progress (0.0 - 1.0)

CompletableFuture

The simplest async pattern. chatAsync returns a CompletableFuture that completes with the full response string once the LLM finishes generating. Use this when you do not need to show partial results to the user.

AsyncAgent agent = new MyAsyncAgent();

agent.chatAsync("Tell me about Java")
     .thenAccept(response -> System.out.println(response))
     .exceptionally(e -> { e.printStackTrace(); return null; });

Token Stream

Returns a Stream<String> that emits each text token as it arrives. This lets you print tokens to the console (or a UI) incrementally instead of waiting for the full response.

agent.chatStream("Tell me a story")
     .forEach(token -> System.out.print(token));

Typed Event Stream

ChatEvent subtypes distinguish tokens from tool calls and other events:

agent.chatEventStream("Complex task")
     .forEach(event -> {
         if (event instanceof ChatEvent.Token t) {
             System.out.print(t.content());
         } else if (event instanceof ChatEvent.ToolCall tc) {
             System.out.println("Calling tool: " + tc.toolName());
         }
     });

Reactive Publisher

For backpressure-aware consumers using java.util.concurrent.Flow:

agent.chatPublisher("Generate a report")
     .subscribe(new Flow.Subscriber<>() {
         private Flow.Subscription subscription;

         @Override
         public void onSubscribe(Flow.Subscription s) {
             this.subscription = s;
             s.request(1);
         }

         @Override
         public void onNext(ChatEvent event) {
             process(event);
             subscription.request(1);
         }

         @Override
         public void onError(Throwable t) { t.printStackTrace(); }

         @Override
         public void onComplete() { System.out.println("Done"); }
     });

Cancellation

You can cancel a running async operation at any time. This is useful for timeout handling or when the user navigates away from a page before the response finishes.

CompletableFuture<String> future = agent.chatAsync("Long running task");

// Cancel if still running
if (agent.isProcessing()) {
    agent.cancel();
}

On this page