Judge Agent Pattern
TnsAI.Coordination provides a judge agent pattern for evaluating and selecting the best output from multiple agents. A judge applies a policy to score candidate outputs and pick a winner. Package: `com.tnsai.coordination.judge`.
Quick Start
// Create a judge with LLM-based evaluation
JudgePolicy policy = new LLMJudgePolicy(llmClient, "Pick the most accurate and complete answer");
JudgeCoordinator judge = JudgeCoordinator.builder()
.policy(policy)
.build();
// Evaluate candidate outputs
List<CandidateOutput> candidates = List.of(
new CandidateOutput("agent-1", "Paris is the capital of France."),
new CandidateOutput("agent-2", "The capital of France is Paris, located on the Seine river."),
new CandidateOutput("agent-3", "France's capital is Paris.")
);
JudgeResult result = judge.evaluate("What is the capital of France?", candidates);
System.out.println(result.getWinner().agentId()); // "agent-2"
System.out.println(result.getWinner().score()); // 0.95
System.out.println(result.getReasoning()); // LLM explanationJudgePolicy SPI
A JudgePolicy defines how candidates are scored and ranked. TnsAI ships with two built-in policies (threshold-based and LLM-based), and you can register your own via SPI. All judge policies implement this interface:
public interface JudgePolicy {
JudgeResult evaluate(String task, List<CandidateOutput> candidates);
}Register custom policies via META-INF/services/com.tnsai.coordination.judge.JudgePolicy.
ThresholdJudgePolicy
The simplest judge policy: it scores candidates on keyword overlap and length without calling an LLM, making it fast and free. Use it for automated testing or CI/CD pipelines where you just need a quick quality check.
JudgePolicy policy = new ThresholdJudgePolicy(0.7); // minimum score threshold
JudgeResult result = policy.evaluate(task, candidates);
if (result.hasWinner()) {
System.out.println("Winner: " + result.getWinner().agentId());
} else {
System.out.println("No candidate met the threshold");
}Best for: fast evaluation without LLM calls, CI/CD quality gates, automated testing.
LLMJudgePolicy
This policy sends all candidate outputs to an LLM and asks it to score each one against your evaluation criteria. It provides the highest quality judgments but incurs an LLM API call. The LLM returns a score for each candidate and a reasoning explanation.
JudgePolicy policy = new LLMJudgePolicy(llmClient, "Evaluate for accuracy, completeness, and clarity");
// With custom scoring rubric
JudgePolicy detailed = LLMJudgePolicy.builder()
.llm(llmClient)
.criteria("Evaluate for accuracy, completeness, and clarity")
.scoringScale(10) // score 1-10 instead of 0.0-1.0
.requireReasoning(true) // LLM must explain each score
.model("claude-sonnet-4")
.build();| Parameter | Default | Description |
|---|---|---|
llm | required | LLMClient for evaluation |
criteria | required | Evaluation criteria prompt |
scoringScale | 10 | Maximum score value |
requireReasoning | true | Whether LLM must explain scores |
model | (from llm) | Model to use for evaluation |
JudgeCoordinator
The JudgeCoordinator is the main entry point for using the judge pattern. It either accepts pre-collected candidate outputs or runs agents itself, then passes the candidates to the policy for scoring.
JudgeCoordinator judge = JudgeCoordinator.builder()
.policy(policy)
.timeout(Duration.ofSeconds(30))
.build();
// Evaluate with pre-collected candidates
JudgeResult result = judge.evaluate(task, candidates);
// Or run agents and judge in one step
JudgeResult result = judge.runAndJudge(task, List.of(agent1, agent2, agent3));JudgeResult
The JudgeResult contains the winner, the full ranking of all candidates, the judge's reasoning, and individual scores.
JudgeResult result = judge.evaluate(task, candidates);
result.getWinner(); // ScoredCandidate with highest score
result.hasWinner(); // true if at least one candidate scored
result.getRankings(); // All candidates sorted by score (descending)
result.getReasoning(); // Judge's explanation (from LLMJudgePolicy)
result.getScores(); // Map<String, Double> of agentId -> scoreScoredCandidate
Each candidate in the result is wrapped in a ScoredCandidate that includes the agent ID, the output text, the normalized score, and optional per-candidate reasoning.
ScoredCandidate winner = result.getWinner();
winner.agentId(); // Agent that produced this output
winner.output(); // The candidate output text
winner.score(); // Normalized score (0.0-1.0)
winner.reasoning(); // Per-candidate reasoning (if available)Integration with Topologies
The judge pattern is not a standalone feature -- it is designed to combine with any Group Topology. Below are two common integration patterns.
Parallel Generation + Judge
A common pattern is to have a parallel team generate multiple candidate answers, then use a judge to select the best one. This gives you the speed of parallel generation with the quality assurance of evaluation.
Team team = Team.builder()
.formation(TeamFormation.PARALLEL)
.addMember(agent1, TeamRole.MEMBER)
.addMember(agent2, TeamRole.MEMBER)
.addMember(agent3, TeamRole.MEMBER)
.build();
team.start();
List<String> outputs = team.executeAll(task);
JudgeCoordinator judge = JudgeCoordinator.builder()
.policy(new LLMJudgePolicy(llmClient, "Pick the best answer"))
.build();
List<CandidateOutput> candidates = IntStream.range(0, outputs.size())
.mapToObj(i -> new CandidateOutput("agent-" + i, outputs.get(i)))
.toList();
JudgeResult result = judge.evaluate(task, candidates);Iterative Refinement with Judge
For higher quality, run multiple rounds: generate candidates, judge them, feed the best back as context, and repeat. Each round improves on the previous best answer.
String bestOutput = null;
for (int round = 0; round < 3; round++) {
List<CandidateOutput> candidates = agents.stream()
.map(a -> new CandidateOutput(a.getId(), a.chat(task + (bestOutput != null ? "\nPrevious best: " + bestOutput : ""))))
.toList();
JudgeResult result = judge.evaluate(task, candidates);
bestOutput = result.getWinner().output();
}Council and Voting
TnsAI.Coordination provides two complementary systems for group decision-making: `CouncilExecutor` for multi-model deliberation with peer review, and `ConsensusBuilder` / `GroupDecisionFramework` for agent voting.
Negotiation
TnsAI.Coordination provides a pluggable negotiation framework with 4 built-in protocols, configurable concession strategies, and a unified `NegotiationExecutor` that resolves the correct protocol from configuration.