Streaming

TnsAI supports three streaming modes for real-time token delivery from LLM providers.

Token Streaming

Returns text tokens as they are generated — simplest mode:

Stream<String> tokens = agent.streamChat("Explain relativity");
tokens.forEach(System.out::print);

ChatChunk Streaming

Returns typed chunks with metadata (token counts, finish reason, tool calls):

llmClient.streamChatWithSpec(request).forEach(chunk -> {
    switch (chunk.getType()) {
        case START -> System.out.println("Stream started: " + chunk.getModel());
        case CONTENT -> System.out.print(chunk.getContent());
        case TOOL_CALL -> handleToolCall(chunk.getToolCall().orElseThrow());
        case DONE -> System.out.println("\nTokens: " + chunk.getTokenCount());
        case ERROR -> System.err.println("Error: " + chunk.getContent());
    }
});

Chunk Types

Each ChatChunk has a type that tells you what kind of data it carries. Your code should handle each type to respond appropriately as the stream progresses.

Type	Description
`START`	Stream initialization with model info
`CONTENT`	Text content delta
`TOOL_CALL`	Tool/function invocation request
`DONE`	Stream complete with finish reason
`ERROR`	Error occurred during streaming

When a stream ends, the DONE chunk includes a finish reason that explains why the LLM stopped generating. This helps you decide what to do next -- for example, if the reason is TOOL_CALLS, you need to execute the requested tool and feed the result back.

Reason	Description
`STOP`	Natural completion
`LENGTH`	Max tokens reached
`TOOL_CALLS`	LLM wants to call tools
`CONTENT_FILTER`	Content was filtered
`ERROR`	Error during generation

Handler-Based Streaming

Callback pattern with full tool-call loop — ideal for UI integration:

llmClient.streamChatWithHandler(request, chunk -> {
    if (chunk.isContent()) {
        System.out.print(chunk.getContent());
    } else if (chunk.isToolCall()) {
        // Framework handles tool execution automatically
    } else if (chunk.isDone()) {
        System.out.println("\nFinish reason: " + chunk.getFinishReason());
    }
});

Convenience Methods

ChatChunk provides static factory methods so you can create chunks without calling constructors directly. These are useful when you build custom streaming pipelines or write tests that simulate LLM output.

// ChatChunk factory methods
ChatChunk.start(model, requestId);
ChatChunk.content("Hello", tokenCount, index);
ChatChunk.content("Hello");
ChatChunk.toolCall(toolCallObject);
ChatChunk.done(FinishReason.STOP, totalTokens);
ChatChunk.error("Something went wrong");

Which Mode to Use?

TnsAI offers three streaming modes at different levels of abstraction. Pick the simplest one that meets your needs.

Mode	Use When
Token Stream	Simple text display, CLI output
ChatChunk Stream	Need metadata (tokens, model), manual tool handling
Handler-Based	UI integration, automatic tool execution loop

Async Execution

The AsyncAgent interface (com.tnsai.agents.async.AsyncAgent) provides non-blocking chat operations with multiple consumption patterns.

Methods

AsyncAgent exposes several ways to consume responses. Choose based on whether you need simple text, typed events, or reactive backpressure control.

Method	Return Type	Description
`chatAsync(message)`	`CompletableFuture<String>`	Async chat, completes with full response
`chatAsync(message, options)`	`CompletableFuture<String>`	Async chat with `ChatOptions`
`chatStream(message)`	`Stream<String>`	Streaming tokens as a Java Stream
`chatEventStream(message)`	`Stream<ChatEvent>`	Typed event stream (tokens, tool calls, etc.)
`chatPublisher(message)`	`Flow.Publisher<ChatEvent>`	Reactive Streams publisher for backpressure-aware consumers
`cancel()`	`void`	Cancels any ongoing async operation
`isProcessing()`	`boolean`	True if an async operation is in progress
`getProgress()`	`double`	Execution progress (0.0 - 1.0)

CompletableFuture

The simplest async pattern. chatAsync returns a CompletableFuture that completes with the full response string once the LLM finishes generating. Use this when you do not need to show partial results to the user.

AsyncAgent agent = new MyAsyncAgent();

agent.chatAsync("Tell me about Java")
     .thenAccept(response -> System.out.println(response))
     .exceptionally(e -> { e.printStackTrace(); return null; });

Token Stream

Returns a Stream<String> that emits each text token as it arrives. This lets you print tokens to the console (or a UI) incrementally instead of waiting for the full response.

agent.chatStream("Tell me a story")
     .forEach(token -> System.out.print(token));

Typed Event Stream

ChatEvent subtypes distinguish tokens from tool calls and other events:

agent.chatEventStream("Complex task")
     .forEach(event -> {
         if (event instanceof ChatEvent.Token t) {
             System.out.print(t.content());
         } else if (event instanceof ChatEvent.ToolCall tc) {
             System.out.println("Calling tool: " + tc.toolName());
         }
     });

Reactive Publisher

For backpressure-aware consumers using java.util.concurrent.Flow:

agent.chatPublisher("Generate a report")
     .subscribe(new Flow.Subscriber<>() {
         private Flow.Subscription subscription;

         @Override
         public void onSubscribe(Flow.Subscription s) {
             this.subscription = s;
             s.request(1);
         }

         @Override
         public void onNext(ChatEvent event) {
             process(event);
             subscription.request(1);
         }

         @Override
         public void onError(Throwable t) { t.printStackTrace(); }

         @Override
         public void onComplete() { System.out.println("Done"); }
     });

Cancellation

You can cancel a running async operation at any time. This is useful for timeout handling or when the user navigates away from a page before the response finishes.

CompletableFuture<String> future = agent.chatAsync("Long running task");

// Cancel if still running
if (agent.isProcessing()) {
    agent.cancel();
}