Agent-driven, idempotent, fan-out-safe security review pipeline (TNS-291). Takes a codebase, runs static matchers as a pre-filter, invokes an LLM agent on each candidate file, and produces structured ReviewFindings exportable as SARIF, JSON, or Markdown.

The framework's answer to: how do I run a continuous security review over my repo without hand-rolling the orchestration, idempotency, fan-out, and output format?

The com.tnsai.quality.security.harness package in tnsai-quality ships the primitives:

CodeReviewPipeline — orchestrator + builder
PipelineStage — SPI; one impl per scan / process / revalidate / enrich / export
MatcherSpi — pluggable static-analysis matchers (regex, AST, dependency-version, …)
ReviewAgentSpi — pluggable agent backends (default: ReviewAgent.claudeOpus(LLMClient))
PipelineStateStore — disk persistence with deepsec-shaped layout
FileRecord / ReviewFinding / AgentRun — canonical data records
Severity / ExportFormat / BuiltInMatchers

Inspired by Vercel-Labs deepsec; pattern adapted, code not ported.

Why a separate primitive

Three forces motivate the harness as a first-class layer:

Continuous security scan needs orchestration, not just a tool call. Wiring "walk repo → filter → call LLM → parse JSON → de-dup → SARIF" by hand for every consumer is friction that buries the use case. The harness ships it.
Static matchers are cheap pre-filters; agents are expensive deep checks. Sending every file through an LLM is wasteful and noisy. The two-stage scan + process shape keeps cost tractable: matchers seed candidates, agents adjudicate.
Cross-checking reduces false positives. A single agent's output has a non-trivial false-positive rate. The optional revalidate stage runs a second agent against the first agent's findings and keeps only the ones a different model also agrees with.

Quick start

import com.tnsai.quality.security.harness.*;
import com.tnsai.llm.LLMClient;

CodeReviewPipeline pipeline = CodeReviewPipeline.builder()
    .projectRoot(Path.of("/work/myrepo"))
    .stateStore(new FileSystemPipelineStateStore(Path.of(".tnsai/security")))
    .matcher(BuiltInMatchers.sqlInjection())
    .matcher(BuiltInMatchers.commandInjection())
    .matcher(BuiltInMatchers.hardcodedSecrets())
    .agent(ReviewAgent.claudeOpus(llmClient))
    .stage(new ScanStage())
    .stage(new ProcessStage())
    .stage(new EnrichStage())
    .stage(new ExportStage(ExportFormat.SARIF))
    .build();

pipeline.run();   // executes every registered stage in order

After the run, look in .tnsai/security/<projectId>/:

project.json                 # project metadata
files/<path-hash>.json       # one FileRecord per flagged file
runs/<epoch>-<stage>.json    # stage-execution audit trail
reports/findings.sarif       # exporter output

Pipeline stages

Stage	What it does	Idempotent skip key
`scan`	Walks the repo, runs every `MatcherSpi` against every (filtered) file. Persists a `FileRecord` for files with at least one matcher hit.	`(path, contentHash, matcherSet)` — unchanged content + same matcher set ⇒ skip
`process`	Iterates flagged files, invokes the first registered agent on each, persists findings + an `AgentRun` per call.	`(agentId, agentVersion, inputHash)` — successful run with the same tuple ⇒ skip
`revalidate`	Runs the second registered agent (when present) over files that already have findings. Drops agent-only findings the cross-agent rejects; keeps matcher-only findings always.	Same `(agentId, agentVersion, inputHash)` rule applied to the cross-agent
`enrich`	Adds `cweName` + `cweUrl` to every finding's metadata using the built-in CWE catalog.	Already-enriched findings are skipped
`export`	Serialises every finding to SARIF / JSON / Markdown. Multiple `ExportStage(format)` instances can be registered.	Always overwrites the report

Built-in matchers

Ten regex-based matchers covering common CWEs across Java / Python / JavaScript / TypeScript:

Matcher id	CWE	Default severity	Targets
`sql-injection`	CWE-89	HIGH	`Statement.execute*("..." + var)`, `String.format("SELECT ...", ...)`, Python f-string SQL
`path-traversal`	CWE-22	HIGH	`new File(input + ...)`, `Path.of(input + ...)`, literal `../` traversal
`command-injection`	CWE-78	CRITICAL	`Runtime.exec(input + ...)`, `subprocess(shell=True)`, `child_process.exec` with template literals
`hardcoded-secrets`	CWE-798	CRITICAL	AWS / GitHub / OpenAI / Anthropic / Google / Slack tokens + `password = "..."` + bearer
`weak-crypto`	CWE-327	MEDIUM	`MessageDigest.getInstance("MD5")`, `Cipher.getInstance("DES")`, `hashlib.md5`
`cross-site-scripting`	CWE-79	HIGH	`.innerHTML = userVar`, `dangerouslySetInnerHTML`, `document.write`
`unsafe-deserialization`	CWE-502	CRITICAL	`ObjectInputStream`, `pickle.loads`, `yaml.load` (no `SafeLoader`)
`ssrf`	CWE-918	HIGH	HTTP client receiving `req.body.url` or `requests.get(request.json[...])`
`xxe`	CWE-611	HIGH	XML factories without `setFeature(...)` / `IS_SUPPORTING_EXTERNAL_ENTITIES`
`eval-on-input`	CWE-95	CRITICAL	`eval(req.*)`, `exec(input(...))`, `new Function(var)`

BuiltInMatchers.all() returns the full list in stable order. Custom matchers extend RegexMatcher (regex-based) or implement MatcherSpi directly (AST / dependency-version / import-graph).

Agent backends

The default agent is ReviewAgent.claudeOpus(llmClient) — an instance of ReviewAgent (a concrete ReviewAgentSpi) backed by any tnsai-llm LLMClient. The agent prompts the LLM with the file path, candidate-matcher list, and file content, and asks for a JSON array of findings matching the ReviewFinding schema.

The output is parsed defensively: malformed JSON yields an empty list and a WARN log (not a throw), so one bad response never kills the stage. Markdown ```json ... ``` fences are stripped if the model ignores the "no fences" instruction.

Factory	Id	Notes
`ReviewAgent.claudeOpus(LLMClient)`	`claude-opus`	Recommended default
`ReviewAgent.gpt(LLMClient)`	`gpt`	Useful as the second agent for `revalidate` cross-check
`new ReviewAgent(id, version, LLMClient)`	custom	For routing + version-pinning

The first registered agent is the process backend; the second (when present) is used by revalidate. Mix-and-match across providers is the intended pattern — the cross-check is most valuable when the two agents come from different model families.

State store + idempotency

FileSystemPipelineStateStore persists everything under <baseDir>/<projectId>/:

Atomic writes — temp file + Files.move(ATOMIC_MOVE) so a partial write never replaces a good one
Per-path locking — concurrent writers for the same FileRecord serialise; the rest fan-out
Path-hashed filenames — files/<sha256-of-path>.json; same path always lands in the same file regardless of OS encoding quirks
Crash-safe restart — interrupting the JVM mid-run leaves the store in a consistent state; the next invocation picks up where it left off

FileRecord carries:

path (relative)
contentHash (SHA-256 of the file)
candidateMatchers (ids that fired during scan)
findings (canonical, deduplicated by id)
runs (history of every agent invocation; cache key for the process stage)

Output formats

Format	When to use
`SARIF`	Default — feeds GitHub Code Scanning, Azure DevOps, GitLab SAST. The driver, rules, and result properties are populated for the consumer's UI.
`JSON`	TnsAI's canonical shape — the same `ReviewFinding` records the SDK uses. Best for in-house tooling.
`MARKDOWN`	Human reading — one section per file with severity, lines, description, evidence code blocks, mitigation, CWE reference link.

.stage(new ExportStage(ExportFormat.SARIF))
.stage(new ExportStage(ExportFormat.JSON))
.stage(new ExportStage(ExportFormat.MARKDOWN))

Stage ids encode the format (export-sarif, export-json, export-markdown), so multiple instances coexist.

SARIF severity mapping

`Severity`	SARIF level
`INFO`	`note`
`LOW`, `MEDIUM`	`warning`
`HIGH`, `CRITICAL`	`error`

Cost containment (v1)

process.maxFiles and process.maxFindings config keys cap the work the process stage does in a single invocation:

.config("process.maxFiles", "200")
.config("process.maxFindings", "50")

Deeper integration with CostBudget — pipeline-wide USD cap with per-stage breakdown via the Hook<PreAction> enforcement layer — is a follow-up issue. The lightweight cap above keeps a single pipeline run bounded today.

Pairs with

Sandbox — agent invocations can run inside the framework's sandbox primitive when the consumer wraps the LLMClient accordingly. The harness itself is process-local; sandbox is the inner ring.
Cost Governance — full CostBudget integration is the in-flight piece (see "Cost containment" above).
Accountability — every AgentRun carries the agent id + version; downstream consumers can join on correlationId to merge harness audit with the rest of the agent activity stream.

What's not in v1 (deferred to follow-ups)

Custom matcher DSL — built-in catalog ships; user-defined matchers are a v2.
Multi-language full coverage — Java / Python / JS / TS / Go / Ruby / PHP are scanned; Rust / C++ / C# / Kotlin matchers come as additive extensions.
Auto-fix generation — findings include a mitigation suggestion, but a PR-applying agent is its own issue.
GitHub Action wrapper — the harness ships a CLI-shaped surface; turning it into a published Action is separate tooling.
CodeQL / Semgrep rule import — only the native catalog in v1; importing community rule sets is a v2.
Cloud / hosted variant — self-hosted only.
Full CostBudget Hook integration — see "Cost containment".

Code Review Harness