TnsAI
Intelligence

RAG Strategy SPI

TnsAI.Intelligence provides a pluggable Retrieval-Augmented Generation (RAG) framework with three built-in strategies and a composable pipeline. Package: `com.tnsai.intelligence.rag`.

RAGStrategy Interface

Every retrieval strategy implements this interface. You call retrieve(query, topK) with a user query and the number of results you want, and the strategy returns the most relevant documents it can find.

public interface RAGStrategy {
    List<RetrievalResult> retrieve(String query, int topK);
    String name();
}

Each strategy returns ranked RetrievalResult objects containing the retrieved text, a relevance score, and source metadata.

public record RetrievalResult(
    String content,
    double score,
    Map<String, Object> metadata
) {}

VectorRAGStrategy

Dense vector retrieval using embedding similarity. Best for semantic matching where exact keywords may not appear in the source documents.

RAGStrategy vectorRAG = VectorRAGStrategy.builder()
    .embeddingClient(embeddingClient)
    .vectorStore(vectorStore)
    .similarityThreshold(0.7)
    .build();

List<RetrievalResult> results = vectorRAG.retrieve("How do agents communicate?", 5);
ParameterDefaultDescription
embeddingClientrequiredClient for generating embeddings
vectorStorerequiredVector database for similarity search
similarityThreshold0.7Minimum cosine similarity to include

KeywordRAGStrategy

Sparse retrieval using BM25 scoring. Best for queries with specific technical terms, identifiers, or exact phrases.

RAGStrategy keywordRAG = KeywordRAGStrategy.builder()
    .index(bm25Index)
    .build();

List<RetrievalResult> results = keywordRAG.retrieve("ContextCompactor interface", 5);

HybridRAGStrategy

Combines vector and keyword strategies using Reciprocal Rank Fusion (RRF) to merge result lists. This gives the best of both semantic and lexical matching.

RAGStrategy hybridRAG = HybridRAGStrategy.builder()
    .vectorStrategy(vectorRAG)
    .keywordStrategy(keywordRAG)
    .vectorWeight(0.6)
    .keywordWeight(0.4)
    .fusionK(60)          // RRF constant
    .build();

List<RetrievalResult> results = hybridRAG.retrieve("agent memory persistence", 10);
ParameterDefaultDescription
vectorStrategyrequiredDense retrieval strategy
keywordStrategyrequiredSparse retrieval strategy
vectorWeight0.6Weight for vector results in fusion
keywordWeight0.4Weight for keyword results in fusion
fusionK60RRF smoothing constant (higher = more equal weighting)

RAGPipeline

A pipeline wraps a retrieval strategy with optional query rewriting (to improve recall) and result reranking (to improve precision). This lets you build a complete retrieval system by composing simple, testable components.

RAGPipeline pipeline = RAGPipeline.builder()
    .strategy(hybridRAG)
    .queryRewriter(query -> expandAcronyms(query))
    .reranker((results, query) -> crossEncoderRerank(results, query))
    .maxResults(5)
    .build();

List<RetrievalResult> results = pipeline.execute("How does RRF fusion work?");

Pipeline Stages

The pipeline processes a query through three stages. Each stage is optional -- you can use just a strategy, or add rewriting and reranking for better results.

User Query
    |
    v
Query Rewriter (optional) -- expand, rephrase, or decompose the query
    |
    v
RAGStrategy.retrieve() -- fetch candidates from one or more sources
    |
    v
Reranker (optional) -- re-score and re-order results
    |
    v
Top-K Selection -- return final results
StageInterfaceDescription
Query RewriterFunction<String, String>Transform the query before retrieval
StrategyRAGStrategyCore retrieval (vector, keyword, or hybrid)
RerankerBiFunction<List<RetrievalResult>, String, List<RetrievalResult>>Re-score results using a cross-encoder or other model

Integration with Agents

Once you have a RAG pipeline, you can wire it directly into an agent. The agent will automatically retrieve relevant documents before generating each response, so it can answer questions grounded in your data.

Agent agent = AgentBuilder.create()
    .model("claude-sonnet-4")
    .ragPipeline(pipeline)
    .build();

// The agent automatically retrieves relevant context before generating responses
String response = agent.chat("Explain the memory architecture");

Choosing a Strategy

Pick the strategy that matches your data and query patterns. For most production systems, hybrid gives the best results by combining semantic understanding with exact term matching.

StrategyStrengthsWeaknessesBest For
VectorSemantic understanding, handles paraphrasingMisses exact terms, requires embeddingsNatural language queries
KeywordFast, exact term matching, no embeddings neededNo semantic understandingTechnical docs, code search
HybridBest overall recall, handles both semantic and lexicalHigher latency (two retrievals + fusion)Production RAG systems

On this page