RAG Strategy SPI

TnsAI.Intelligence provides a pluggable Retrieval-Augmented Generation (RAG) framework with three built-in strategies and a composable pipeline. Package: `com.tnsai.intelligence.rag`.

RAGStrategy Interface

Every retrieval strategy implements this interface. You call retrieve(query, topK) with a user query and the number of results you want, and the strategy returns the most relevant documents it can find.

public interface RAGStrategy {
    List<RetrievalResult> retrieve(String query, int topK);
    String name();
}

Each strategy returns ranked RetrievalResult objects containing the retrieved text, a relevance score, and source metadata.

public record RetrievalResult(
    String content,
    double score,
    Map<String, Object> metadata
) {}

VectorRAGStrategy

Dense vector retrieval using embedding similarity. Best for semantic matching where exact keywords may not appear in the source documents.

RAGStrategy vectorRAG = VectorRAGStrategy.builder()
    .embeddingClient(embeddingClient)
    .vectorStore(vectorStore)
    .similarityThreshold(0.7)
    .build();

List<RetrievalResult> results = vectorRAG.retrieve("How do agents communicate?", 5);

Parameter	Default	Description
`embeddingClient`	required	Client for generating embeddings
`vectorStore`	required	Vector database for similarity search
`similarityThreshold`	0.7	Minimum cosine similarity to include

KeywordRAGStrategy

Sparse retrieval using BM25 scoring. Best for queries with specific technical terms, identifiers, or exact phrases.

RAGStrategy keywordRAG = KeywordRAGStrategy.builder()
    .index(bm25Index)
    .build();

List<RetrievalResult> results = keywordRAG.retrieve("ContextCompactor interface", 5);

HybridRAGStrategy

Combines vector and keyword strategies using Reciprocal Rank Fusion (RRF) to merge result lists. This gives the best of both semantic and lexical matching.

RAGStrategy hybridRAG = HybridRAGStrategy.builder()
    .vectorStrategy(vectorRAG)
    .keywordStrategy(keywordRAG)
    .vectorWeight(0.6)
    .keywordWeight(0.4)
    .fusionK(60)          // RRF constant
    .build();

List<RetrievalResult> results = hybridRAG.retrieve("agent memory persistence", 10);

Parameter	Default	Description
`vectorStrategy`	required	Dense retrieval strategy
`keywordStrategy`	required	Sparse retrieval strategy
`vectorWeight`	0.6	Weight for vector results in fusion
`keywordWeight`	0.4	Weight for keyword results in fusion
`fusionK`	60	RRF smoothing constant (higher = more equal weighting)

RAGPipeline

A pipeline wraps a retrieval strategy with optional query rewriting (to improve recall) and result reranking (to improve precision). This lets you build a complete retrieval system by composing simple, testable components.

RAGPipeline pipeline = RAGPipeline.builder()
    .strategy(hybridRAG)
    .queryRewriter(query -> expandAcronyms(query))
    .reranker((results, query) -> crossEncoderRerank(results, query))
    .maxResults(5)
    .build();

List<RetrievalResult> results = pipeline.execute("How does RRF fusion work?");

Pipeline Stages

The pipeline processes a query through three stages. Each stage is optional -- you can use just a strategy, or add rewriting and reranking for better results.

User Query
    |
    v
Query Rewriter (optional) -- expand, rephrase, or decompose the query
    |
    v
RAGStrategy.retrieve() -- fetch candidates from one or more sources
    |
    v
Reranker (optional) -- re-score and re-order results
    |
    v
Top-K Selection -- return final results

Stage	Interface	Description
Query Rewriter	`Function<String, String>`	Transform the query before retrieval
Strategy	`RAGStrategy`	Core retrieval (vector, keyword, or hybrid)
Reranker	`BiFunction<List<RetrievalResult>, String, List<RetrievalResult>>`	Re-score results using a cross-encoder or other model

Integration with Agents

Once you have a RAG pipeline, you can wire it directly into an agent. The agent will automatically retrieve relevant documents before generating each response, so it can answer questions grounded in your data.

Agent agent = AgentBuilder.create()
    .model("claude-sonnet-4")
    .ragPipeline(pipeline)
    .build();

// The agent automatically retrieves relevant context before generating responses
String response = agent.chat("Explain the memory architecture");

Choosing a Strategy

Pick the strategy that matches your data and query patterns. For most production systems, hybrid gives the best results by combining semantic understanding with exact term matching.

Strategy	Strengths	Weaknesses	Best For
Vector	Semantic understanding, handles paraphrasing	Misses exact terms, requires embeddings	Natural language queries
Keyword	Fast, exact term matching, no embeddings needed	No semantic understanding	Technical docs, code search
Hybrid	Best overall recall, handles both semantic and lexical	Higher latency (two retrievals + fusion)	Production RAG systems

RAG Strategy SPI

On this page