Judge Agent Pattern

TnsAI.Coordination provides a judge agent pattern for evaluating and selecting the best output from multiple agents. A judge applies a policy to score candidate outputs and pick a winner. Package: `com.tnsai.coordination.judge`.

Quick Start

// Create a judge with LLM-based evaluation
JudgePolicy policy = new LLMJudgePolicy(llmClient, "Pick the most accurate and complete answer");

JudgeCoordinator judge = JudgeCoordinator.builder()
    .policy(policy)
    .build();

// Evaluate candidate outputs
List<CandidateOutput> candidates = List.of(
    new CandidateOutput("agent-1", "Paris is the capital of France."),
    new CandidateOutput("agent-2", "The capital of France is Paris, located on the Seine river."),
    new CandidateOutput("agent-3", "France's capital is Paris.")
);

JudgeResult result = judge.evaluate("What is the capital of France?", candidates);
System.out.println(result.getWinner().agentId());    // "agent-2"
System.out.println(result.getWinner().score());       // 0.95
System.out.println(result.getReasoning());            // LLM explanation

JudgePolicy SPI

A JudgePolicy defines how candidates are scored and ranked. TnsAI ships with two built-in policies (threshold-based and LLM-based), and you can register your own via SPI. All judge policies implement this interface:

public interface JudgePolicy {
    JudgeResult evaluate(String task, List<CandidateOutput> candidates);
}

ThresholdJudgePolicy

The simplest judge policy: it scores candidates on keyword overlap and length without calling an LLM, making it fast and free. Use it for automated testing or CI/CD pipelines where you just need a quick quality check.

JudgePolicy policy = new ThresholdJudgePolicy(0.7); // minimum score threshold

JudgeResult result = policy.evaluate(task, candidates);
if (result.hasWinner()) {
    System.out.println("Winner: " + result.getWinner().agentId());
} else {
    System.out.println("No candidate met the threshold");
}

Best for: fast evaluation without LLM calls, CI/CD quality gates, automated testing.

LLMJudgePolicy

This policy sends all candidate outputs to an LLM and asks it to score each one against your evaluation criteria. It provides the highest quality judgments but incurs an LLM API call. The LLM returns a score for each candidate and a reasoning explanation.

JudgePolicy policy = new LLMJudgePolicy(llmClient, "Evaluate for accuracy, completeness, and clarity");

// With custom scoring rubric
JudgePolicy detailed = LLMJudgePolicy.builder()
    .llm(llmClient)
    .criteria("Evaluate for accuracy, completeness, and clarity")
    .scoringScale(10)         // score 1-10 instead of 0.0-1.0
    .requireReasoning(true)   // LLM must explain each score
    .model("claude-sonnet-4")
    .build();

Parameter	Default	Description
`llm`	required	LLMClient for evaluation
`criteria`	required	Evaluation criteria prompt
`scoringScale`	10	Maximum score value
`requireReasoning`	true	Whether LLM must explain scores
`model`	(from llm)	Model to use for evaluation

JudgeCoordinator

The JudgeCoordinator is the main entry point for using the judge pattern. It either accepts pre-collected candidate outputs or runs agents itself, then passes the candidates to the policy for scoring.

JudgeCoordinator judge = JudgeCoordinator.builder()
    .policy(policy)
    .timeout(Duration.ofSeconds(30))
    .build();

// Evaluate with pre-collected candidates
JudgeResult result = judge.evaluate(task, candidates);

// Or run agents and judge in one step
JudgeResult result = judge.runAndJudge(task, List.of(agent1, agent2, agent3));

JudgeResult

The JudgeResult contains the winner, the full ranking of all candidates, the judge's reasoning, and individual scores.

JudgeResult result = judge.evaluate(task, candidates);

result.getWinner();           // ScoredCandidate with highest score
result.hasWinner();           // true if at least one candidate scored
result.getRankings();         // All candidates sorted by score (descending)
result.getReasoning();        // Judge's explanation (from LLMJudgePolicy)
result.getScores();           // Map<String, Double> of agentId -> score

ScoredCandidate

Each candidate in the result is wrapped in a ScoredCandidate that includes the agent ID, the output text, the normalized score, and optional per-candidate reasoning.

ScoredCandidate winner = result.getWinner();
winner.agentId();     // Agent that produced this output
winner.output();      // The candidate output text
winner.score();       // Normalized score (0.0-1.0)
winner.reasoning();   // Per-candidate reasoning (if available)

Integration with Topologies

The judge pattern is not a standalone feature -- it is designed to combine with any Group Topology. Below are two common integration patterns.

Parallel Generation + Judge

A common pattern is to have a parallel team generate multiple candidate answers, then use a judge to select the best one. This gives you the speed of parallel generation with the quality assurance of evaluation.

Team team = Team.builder()
    .formation(TeamFormation.PARALLEL)
    .addMember(agent1, TeamRole.MEMBER)
    .addMember(agent2, TeamRole.MEMBER)
    .addMember(agent3, TeamRole.MEMBER)
    .build();

team.start();
List<String> outputs = team.executeAll(task);

JudgeCoordinator judge = JudgeCoordinator.builder()
    .policy(new LLMJudgePolicy(llmClient, "Pick the best answer"))
    .build();

List<CandidateOutput> candidates = IntStream.range(0, outputs.size())
    .mapToObj(i -> new CandidateOutput("agent-" + i, outputs.get(i)))
    .toList();

JudgeResult result = judge.evaluate(task, candidates);

For higher quality, run multiple rounds: generate candidates, judge them, feed the best back as context, and repeat. Each round improves on the previous best answer.

String bestOutput = null;
for (int round = 0; round < 3; round++) {
    List<CandidateOutput> candidates = agents.stream()
        .map(a -> new CandidateOutput(a.getId(), a.chat(task + (bestOutput != null ? "\nPrevious best: " + bestOutput : ""))))
        .toList();

    JudgeResult result = judge.evaluate(task, candidates);
    bestOutput = result.getWinner().output();
}