TnsAI
Security

Code Review Harness

Agent-driven, idempotent, fan-out-safe security review pipeline (TNS-291). Takes a codebase, runs static matchers as a pre-filter, invokes an LLM agent on each candidate file, and produces structured ReviewFindings exportable as SARIF, JSON, or Markdown.

The framework's answer to: how do I run a continuous security review over my repo without hand-rolling the orchestration, idempotency, fan-out, and output format?

The com.tnsai.quality.security.harness package in tnsai-quality ships the primitives:

  • CodeReviewPipeline — orchestrator + builder
  • PipelineStage — SPI; one impl per scan / process / revalidate / enrich / export
  • MatcherSpi — pluggable static-analysis matchers (regex, AST, dependency-version, …)
  • ReviewAgentSpi — pluggable agent backends (default: ReviewAgent.claudeOpus(LLMClient))
  • PipelineStateStore — disk persistence with deepsec-shaped layout
  • FileRecord / ReviewFinding / AgentRun — canonical data records
  • Severity / ExportFormat / BuiltInMatchers

Inspired by Vercel-Labs deepsec; pattern adapted, code not ported.

Why a separate primitive

Three forces motivate the harness as a first-class layer:

  1. Continuous security scan needs orchestration, not just a tool call. Wiring "walk repo → filter → call LLM → parse JSON → de-dup → SARIF" by hand for every consumer is friction that buries the use case. The harness ships it.
  2. Static matchers are cheap pre-filters; agents are expensive deep checks. Sending every file through an LLM is wasteful and noisy. The two-stage scan + process shape keeps cost tractable: matchers seed candidates, agents adjudicate.
  3. Cross-checking reduces false positives. A single agent's output has a non-trivial false-positive rate. The optional revalidate stage runs a second agent against the first agent's findings and keeps only the ones a different model also agrees with.

Quick start

import com.tnsai.quality.security.harness.*;
import com.tnsai.llm.LLMClient;

CodeReviewPipeline pipeline = CodeReviewPipeline.builder()
    .projectRoot(Path.of("/work/myrepo"))
    .stateStore(new FileSystemPipelineStateStore(Path.of(".tnsai/security")))
    .matcher(BuiltInMatchers.sqlInjection())
    .matcher(BuiltInMatchers.commandInjection())
    .matcher(BuiltInMatchers.hardcodedSecrets())
    .agent(ReviewAgent.claudeOpus(llmClient))
    .stage(new ScanStage())
    .stage(new ProcessStage())
    .stage(new EnrichStage())
    .stage(new ExportStage(ExportFormat.SARIF))
    .build();

pipeline.run();   // executes every registered stage in order

After the run, look in .tnsai/security/<projectId>/:

project.json                 # project metadata
files/<path-hash>.json       # one FileRecord per flagged file
runs/<epoch>-<stage>.json    # stage-execution audit trail
reports/findings.sarif       # exporter output

Pipeline stages

StageWhat it doesIdempotent skip key
scanWalks the repo, runs every MatcherSpi against every (filtered) file. Persists a FileRecord for files with at least one matcher hit.(path, contentHash, matcherSet) — unchanged content + same matcher set ⇒ skip
processIterates flagged files, invokes the first registered agent on each, persists findings + an AgentRun per call.(agentId, agentVersion, inputHash) — successful run with the same tuple ⇒ skip
revalidateRuns the second registered agent (when present) over files that already have findings. Drops agent-only findings the cross-agent rejects; keeps matcher-only findings always.Same (agentId, agentVersion, inputHash) rule applied to the cross-agent
enrichAdds cweName + cweUrl to every finding's metadata using the built-in CWE catalog.Already-enriched findings are skipped
exportSerialises every finding to SARIF / JSON / Markdown. Multiple ExportStage(format) instances can be registered.Always overwrites the report

Built-in matchers

Ten regex-based matchers covering common CWEs across Java / Python / JavaScript / TypeScript:

Matcher idCWEDefault severityTargets
sql-injectionCWE-89HIGHStatement.execute*("..." + var), String.format("SELECT ...", ...), Python f-string SQL
path-traversalCWE-22HIGHnew File(input + ...), Path.of(input + ...), literal ../ traversal
command-injectionCWE-78CRITICALRuntime.exec(input + ...), subprocess(shell=True), child_process.exec with template literals
hardcoded-secretsCWE-798CRITICALAWS / GitHub / OpenAI / Anthropic / Google / Slack tokens + password = "..." + bearer
weak-cryptoCWE-327MEDIUMMessageDigest.getInstance("MD5"), Cipher.getInstance("DES"), hashlib.md5
cross-site-scriptingCWE-79HIGH.innerHTML = userVar, dangerouslySetInnerHTML, document.write
unsafe-deserializationCWE-502CRITICALObjectInputStream, pickle.loads, yaml.load (no SafeLoader)
ssrfCWE-918HIGHHTTP client receiving req.body.url or requests.get(request.json[...])
xxeCWE-611HIGHXML factories without setFeature(...) / IS_SUPPORTING_EXTERNAL_ENTITIES
eval-on-inputCWE-95CRITICALeval(req.*), exec(input(...)), new Function(var)

BuiltInMatchers.all() returns the full list in stable order. Custom matchers extend RegexMatcher (regex-based) or implement MatcherSpi directly (AST / dependency-version / import-graph).

Agent backends

The default agent is ReviewAgent.claudeOpus(llmClient) — an instance of ReviewAgent (a concrete ReviewAgentSpi) backed by any tnsai-llm LLMClient. The agent prompts the LLM with the file path, candidate-matcher list, and file content, and asks for a JSON array of findings matching the ReviewFinding schema.

The output is parsed defensively: malformed JSON yields an empty list and a WARN log (not a throw), so one bad response never kills the stage. Markdown ```json ... ``` fences are stripped if the model ignores the "no fences" instruction.

FactoryIdNotes
ReviewAgent.claudeOpus(LLMClient)claude-opusRecommended default
ReviewAgent.gpt(LLMClient)gptUseful as the second agent for revalidate cross-check
new ReviewAgent(id, version, LLMClient)customFor routing + version-pinning

The first registered agent is the process backend; the second (when present) is used by revalidate. Mix-and-match across providers is the intended pattern — the cross-check is most valuable when the two agents come from different model families.

State store + idempotency

FileSystemPipelineStateStore persists everything under <baseDir>/<projectId>/:

  • Atomic writes — temp file + Files.move(ATOMIC_MOVE) so a partial write never replaces a good one
  • Per-path locking — concurrent writers for the same FileRecord serialise; the rest fan-out
  • Path-hashed filenamesfiles/<sha256-of-path>.json; same path always lands in the same file regardless of OS encoding quirks
  • Crash-safe restart — interrupting the JVM mid-run leaves the store in a consistent state; the next invocation picks up where it left off

FileRecord carries:

  • path (relative)
  • contentHash (SHA-256 of the file)
  • candidateMatchers (ids that fired during scan)
  • findings (canonical, deduplicated by id)
  • runs (history of every agent invocation; cache key for the process stage)

Output formats

FormatWhen to use
SARIFDefault — feeds GitHub Code Scanning, Azure DevOps, GitLab SAST. The driver, rules, and result properties are populated for the consumer's UI.
JSONTnsAI's canonical shape — the same ReviewFinding records the SDK uses. Best for in-house tooling.
MARKDOWNHuman reading — one section per file with severity, lines, description, evidence code blocks, mitigation, CWE reference link.
.stage(new ExportStage(ExportFormat.SARIF))
.stage(new ExportStage(ExportFormat.JSON))
.stage(new ExportStage(ExportFormat.MARKDOWN))

Stage ids encode the format (export-sarif, export-json, export-markdown), so multiple instances coexist.

SARIF severity mapping

SeveritySARIF level
INFOnote
LOW, MEDIUMwarning
HIGH, CRITICALerror

Cost containment (v1)

process.maxFiles and process.maxFindings config keys cap the work the process stage does in a single invocation:

.config("process.maxFiles", "200")
.config("process.maxFindings", "50")

Deeper integration with CostBudget — pipeline-wide USD cap with per-stage breakdown via the Hook<PreAction> enforcement layer — is a follow-up issue. The lightweight cap above keeps a single pipeline run bounded today.

Pairs with

  • Sandbox — agent invocations can run inside the framework's sandbox primitive when the consumer wraps the LLMClient accordingly. The harness itself is process-local; sandbox is the inner ring.
  • Cost Governance — full CostBudget integration is the in-flight piece (see "Cost containment" above).
  • Accountability — every AgentRun carries the agent id + version; downstream consumers can join on correlationId to merge harness audit with the rest of the agent activity stream.

What's not in v1 (deferred to follow-ups)

  • Custom matcher DSL — built-in catalog ships; user-defined matchers are a v2.
  • Multi-language full coverage — Java / Python / JS / TS / Go / Ruby / PHP are scanned; Rust / C++ / C# / Kotlin matchers come as additive extensions.
  • Auto-fix generation — findings include a mitigation suggestion, but a PR-applying agent is its own issue.
  • GitHub Action wrapper — the harness ships a CLI-shaped surface; turning it into a published Action is separate tooling.
  • CodeQL / Semgrep rule import — only the native catalog in v1; importing community rule sets is a v2.
  • Cloud / hosted variant — self-hosted only.
  • Full CostBudget Hook integration — see "Cost containment".

See also

  • Sandbox — isolated execution primitive used as the inner ring for any per-agent code execution.
  • Cost GovernanceCostBudget shape; full integration is planned.
  • AccountabilityAgentLiabilityRecord for joining harness audit with the rest of the agent timeline.

On this page