Code Review Harness
Agent-driven, idempotent, fan-out-safe security review pipeline (TNS-291). Takes a codebase, runs static matchers as a pre-filter, invokes an LLM agent on each candidate file, and produces structured ReviewFindings exportable as SARIF, JSON, or Markdown.
The framework's answer to: how do I run a continuous security review over my repo without hand-rolling the orchestration, idempotency, fan-out, and output format?
The com.tnsai.quality.security.harness package in tnsai-quality ships the primitives:
CodeReviewPipeline— orchestrator + builderPipelineStage— SPI; one impl perscan / process / revalidate / enrich / exportMatcherSpi— pluggable static-analysis matchers (regex, AST, dependency-version, …)ReviewAgentSpi— pluggable agent backends (default:ReviewAgent.claudeOpus(LLMClient))PipelineStateStore— disk persistence with deepsec-shaped layoutFileRecord/ReviewFinding/AgentRun— canonical data recordsSeverity/ExportFormat/BuiltInMatchers
Inspired by Vercel-Labs deepsec; pattern adapted, code not ported.
Why a separate primitive
Three forces motivate the harness as a first-class layer:
- Continuous security scan needs orchestration, not just a tool call. Wiring "walk repo → filter → call LLM → parse JSON → de-dup → SARIF" by hand for every consumer is friction that buries the use case. The harness ships it.
- Static matchers are cheap pre-filters; agents are expensive deep checks. Sending every file through an LLM is wasteful and noisy. The two-stage
scan+processshape keeps cost tractable: matchers seed candidates, agents adjudicate. - Cross-checking reduces false positives. A single agent's output has a non-trivial false-positive rate. The optional
revalidatestage runs a second agent against the first agent's findings and keeps only the ones a different model also agrees with.
Quick start
import com.tnsai.quality.security.harness.*;
import com.tnsai.llm.LLMClient;
CodeReviewPipeline pipeline = CodeReviewPipeline.builder()
.projectRoot(Path.of("/work/myrepo"))
.stateStore(new FileSystemPipelineStateStore(Path.of(".tnsai/security")))
.matcher(BuiltInMatchers.sqlInjection())
.matcher(BuiltInMatchers.commandInjection())
.matcher(BuiltInMatchers.hardcodedSecrets())
.agent(ReviewAgent.claudeOpus(llmClient))
.stage(new ScanStage())
.stage(new ProcessStage())
.stage(new EnrichStage())
.stage(new ExportStage(ExportFormat.SARIF))
.build();
pipeline.run(); // executes every registered stage in orderAfter the run, look in .tnsai/security/<projectId>/:
project.json # project metadata
files/<path-hash>.json # one FileRecord per flagged file
runs/<epoch>-<stage>.json # stage-execution audit trail
reports/findings.sarif # exporter outputPipeline stages
| Stage | What it does | Idempotent skip key |
|---|---|---|
scan | Walks the repo, runs every MatcherSpi against every (filtered) file. Persists a FileRecord for files with at least one matcher hit. | (path, contentHash, matcherSet) — unchanged content + same matcher set ⇒ skip |
process | Iterates flagged files, invokes the first registered agent on each, persists findings + an AgentRun per call. | (agentId, agentVersion, inputHash) — successful run with the same tuple ⇒ skip |
revalidate | Runs the second registered agent (when present) over files that already have findings. Drops agent-only findings the cross-agent rejects; keeps matcher-only findings always. | Same (agentId, agentVersion, inputHash) rule applied to the cross-agent |
enrich | Adds cweName + cweUrl to every finding's metadata using the built-in CWE catalog. | Already-enriched findings are skipped |
export | Serialises every finding to SARIF / JSON / Markdown. Multiple ExportStage(format) instances can be registered. | Always overwrites the report |
Built-in matchers
Ten regex-based matchers covering common CWEs across Java / Python / JavaScript / TypeScript:
| Matcher id | CWE | Default severity | Targets |
|---|---|---|---|
sql-injection | CWE-89 | HIGH | Statement.execute*("..." + var), String.format("SELECT ...", ...), Python f-string SQL |
path-traversal | CWE-22 | HIGH | new File(input + ...), Path.of(input + ...), literal ../ traversal |
command-injection | CWE-78 | CRITICAL | Runtime.exec(input + ...), subprocess(shell=True), child_process.exec with template literals |
hardcoded-secrets | CWE-798 | CRITICAL | AWS / GitHub / OpenAI / Anthropic / Google / Slack tokens + password = "..." + bearer |
weak-crypto | CWE-327 | MEDIUM | MessageDigest.getInstance("MD5"), Cipher.getInstance("DES"), hashlib.md5 |
cross-site-scripting | CWE-79 | HIGH | .innerHTML = userVar, dangerouslySetInnerHTML, document.write |
unsafe-deserialization | CWE-502 | CRITICAL | ObjectInputStream, pickle.loads, yaml.load (no SafeLoader) |
ssrf | CWE-918 | HIGH | HTTP client receiving req.body.url or requests.get(request.json[...]) |
xxe | CWE-611 | HIGH | XML factories without setFeature(...) / IS_SUPPORTING_EXTERNAL_ENTITIES |
eval-on-input | CWE-95 | CRITICAL | eval(req.*), exec(input(...)), new Function(var) |
BuiltInMatchers.all() returns the full list in stable order. Custom matchers extend RegexMatcher (regex-based) or implement MatcherSpi directly (AST / dependency-version / import-graph).
Agent backends
The default agent is ReviewAgent.claudeOpus(llmClient) — an instance of ReviewAgent (a concrete ReviewAgentSpi) backed by any tnsai-llm LLMClient. The agent prompts the LLM with the file path, candidate-matcher list, and file content, and asks for a JSON array of findings matching the ReviewFinding schema.
The output is parsed defensively: malformed JSON yields an empty list and a WARN log (not a throw), so one bad response never kills the stage. Markdown ```json ... ``` fences are stripped if the model ignores the "no fences" instruction.
| Factory | Id | Notes |
|---|---|---|
ReviewAgent.claudeOpus(LLMClient) | claude-opus | Recommended default |
ReviewAgent.gpt(LLMClient) | gpt | Useful as the second agent for revalidate cross-check |
new ReviewAgent(id, version, LLMClient) | custom | For routing + version-pinning |
The first registered agent is the process backend; the second (when present) is used by revalidate. Mix-and-match across providers is the intended pattern — the cross-check is most valuable when the two agents come from different model families.
State store + idempotency
FileSystemPipelineStateStore persists everything under <baseDir>/<projectId>/:
- Atomic writes — temp file +
Files.move(ATOMIC_MOVE)so a partial write never replaces a good one - Per-path locking — concurrent writers for the same
FileRecordserialise; the rest fan-out - Path-hashed filenames —
files/<sha256-of-path>.json; same path always lands in the same file regardless of OS encoding quirks - Crash-safe restart — interrupting the JVM mid-run leaves the store in a consistent state; the next invocation picks up where it left off
FileRecord carries:
path(relative)contentHash(SHA-256 of the file)candidateMatchers(ids that fired during scan)findings(canonical, deduplicated by id)runs(history of every agent invocation; cache key for theprocessstage)
Output formats
| Format | When to use |
|---|---|
SARIF | Default — feeds GitHub Code Scanning, Azure DevOps, GitLab SAST. The driver, rules, and result properties are populated for the consumer's UI. |
JSON | TnsAI's canonical shape — the same ReviewFinding records the SDK uses. Best for in-house tooling. |
MARKDOWN | Human reading — one section per file with severity, lines, description, evidence code blocks, mitigation, CWE reference link. |
.stage(new ExportStage(ExportFormat.SARIF))
.stage(new ExportStage(ExportFormat.JSON))
.stage(new ExportStage(ExportFormat.MARKDOWN))Stage ids encode the format (export-sarif, export-json, export-markdown), so multiple instances coexist.
SARIF severity mapping
Severity | SARIF level |
|---|---|
INFO | note |
LOW, MEDIUM | warning |
HIGH, CRITICAL | error |
Cost containment (v1)
process.maxFiles and process.maxFindings config keys cap the work the process stage does in a single invocation:
.config("process.maxFiles", "200")
.config("process.maxFindings", "50")Deeper integration with CostBudget — pipeline-wide USD cap with per-stage breakdown via the Hook<PreAction> enforcement layer — is a follow-up issue. The lightweight cap above keeps a single pipeline run bounded today.
Pairs with
- Sandbox — agent invocations can run inside the framework's sandbox primitive when the consumer wraps the
LLMClientaccordingly. The harness itself is process-local; sandbox is the inner ring. - Cost Governance — full
CostBudgetintegration is the in-flight piece (see "Cost containment" above). - Accountability — every
AgentRuncarries the agent id + version; downstream consumers can join oncorrelationIdto merge harness audit with the rest of the agent activity stream.
What's not in v1 (deferred to follow-ups)
- Custom matcher DSL — built-in catalog ships; user-defined matchers are a v2.
- Multi-language full coverage — Java / Python / JS / TS / Go / Ruby / PHP are scanned; Rust / C++ / C# / Kotlin matchers come as additive extensions.
- Auto-fix generation — findings include a
mitigationsuggestion, but a PR-applying agent is its own issue. - GitHub Action wrapper — the harness ships a CLI-shaped surface; turning it into a published Action is separate tooling.
- CodeQL / Semgrep rule import — only the native catalog in v1; importing community rule sets is a v2.
- Cloud / hosted variant — self-hosted only.
- Full
CostBudgetHook integration — see "Cost containment".
See also
- Sandbox — isolated execution primitive used as the inner ring for any per-agent code execution.
- Cost Governance —
CostBudgetshape; full integration is planned. - Accountability —
AgentLiabilityRecordfor joining harness audit with the rest of the agent timeline.