TnsAI
Security

Redaction

The Quality module ships a pluggable redaction layer that scrubs PII and secrets out of every framework boundary that could leak — log lines, trace attributes, memory writes, captured LLM prompts, agent events. Redaction is always-on when the decorator is wired in; opt-out is per-tenant per-sink, never the default.

The redactor SPI is Redactor in com.tnsai.quality.redaction. The default implementation PatternRedactor ships with a 14-pattern catalog covering email, phone, credit card, SSN, TC Kimlik, API keys, JWT, AWS keys, IBAN, IP, bearer tokens, and SSH private-key blocks. Operators wire the redactor into framework sinks via decorator types (RedactingMemoryStore, RedactingAgentEventPublisher).

Quick start

import com.tnsai.quality.redaction.*;

// 1. Build a redactor with the default pattern catalog.
Redactor redactor = PatternRedactor.withDefaults();

// 2. Wrap the agent's memory store and event publisher.
MemoryStore           memory = new RedactingMemoryStore(new InMemoryStore(), redactor);
AgentEventPublisher   events = new RedactingAgentEventPublisher(new Slf4jAgentEventPublisher(), redactor);

// 3. From this point on, every write through these sinks is scrubbed.
agent.setMemoryStore(memory);
agent.setEventPublisher(events);

Pattern catalog

RedactionPatterns.defaults() returns the patterns below. Each is exported as a constant on RedactionPatterns (e.g. RedactionPatterns.EMAIL) so you can subset or augment.

PatternSeverityExample
emailMEDIUMuser@example.com
phone_e164MEDIUM+15551234567
credit_cardCRITICAL4111-1111-1111-1111 (Luhn-validated)
us_ssnCRITICAL123-45-6789
tc_kimlikCRITICAL11-digit Turkish national ID (checksum-validated)
openai_api_keyHIGHsk-...
anthropic_api_keyHIGHsk-ant-...
github_patHIGHghp_...
aws_access_keyHIGHAKIA...
jwtHIGHthree base64url segments separated by .
ibanMEDIUMDE89370400440532013000
ipv4LOW10.0.0.1
bearer_tokenHIGHBearer <token> in headers
ssh_private_key_blockCRITICAL-----BEGIN ... PRIVATE KEY----- blocks

Numeric patterns (credit card, TC Kimlik) include checksum validators so order IDs that look like credit-card numbers don't get scrubbed. The placeholder shape is [REDACTED:<pattern_name>] and is itself inert against every default pattern — a placeholder cannot be re-redacted on a later pass.

Where redaction fires

SinkDecoratorWhat gets scrubbed
Conversation memory writesRedactingMemoryStoreaddMessage(role, content), addMessage(Map), search(query, …)
Agent event logRedactingAgentEventPublisherAll 13 TnsAIEvent variants — every user-input-bearing field rebuilt with redacted values
Search queries against memoryRedactingMemoryStoresearch(query, limit) — query scrubbed before backend dispatch

Reads (getHistory, getRecentHistory) pass through unchanged because content stored via these decorators is already redacted at write time; double-scrubbing wastes cycles.

Composition

Every redactor type below implements the same Redactor SPI, so they slot into each other freely.

Multiple redactor pipeline

CompositeRedactor chains redactors in order; the output text of one feeds the next. Useful for stacking a fast pattern matcher with a slower LLM-classifier or Presidio bridge.

Redactor pipeline = CompositeRedactor.of(
        PatternRedactor.withDefaults(),                    // fast, regex-based
        new LLMClassifierRedactor(haiku, "pii-policy"));   // slower, NER-based (when shipped)

Order matters: the first redactor's [REDACTED:...] placeholders pass through every later redactor unchanged, so a downstream classifier doesn't get to "explain" what was already scrubbed.

Per-tenant policies

TenantPolicyRedactor dispatches each call to a per-tenant redactor based on the active RedactionContext. Resolution order:

  1. RedactionContext.tenantPolicyId() — explicit override
  2. EventContext.tenantId() from the framework's correlation context
  3. Configured default redactor — fallback for unmatched / blank ids
Redactor everything   = PatternRedactor.withDefaults();
Redactor emailOnly    = new PatternRedactor(List.of(RedactionPatterns.EMAIL));
Redactor pciOnly      = new PatternRedactor(List.of(RedactionPatterns.CREDIT_CARD,
                                                   RedactionPatterns.US_SSN));

Redactor dispatcher = TenantPolicyRedactor.builder()
        .tenant("acme",   emailOnly)        // acme: minimal redaction
        .tenant("globex", pciOnly)          // globex: PCI-only
        .defaultRedactor(everything)        // everyone else: full catalog
        .build();

The dispatch happens on every call (no per-thread cache) so a single agent group serving multiple tenants doesn't leak across tenant boundaries.

Audit trail

AuditingRedactor decorates any redactor and emits a RedactionAuditEvent to a RedactionAuditListener whenever findings exist. Aggregates pattern counts + highest severity + framework correlation context — the redacted content itself never appears in the audit event.

Redactor base    = PatternRedactor.withDefaults();
Redactor audited = new AuditingRedactor(base, event -> {
    metrics.counter("redaction.applied",
            "scope", event.scope().name(),
            "severity", event.highestSeverity().name())
           .increment(event.totalFindings());
    if (event.isCritical()) {
        slack.alert("CRITICAL pii redaction in tenant " + event.eventContext().tenantId());
    }
});

AuditingRedactor IS a Redactor, so it stacks anywhere — inside TenantPolicyRedactor for per-tenant audit, or wrapping the dispatcher itself for cross-tenant audit.

A broken audit listener cannot break redaction itself: exceptions thrown inside onRedaction(...) are caught and logged at WARN.

Custom patterns

Build your own RedactionPattern and pass it to a PatternRedactor:

import java.util.regex.Pattern;
import com.tnsai.quality.redaction.*;

RedactionPattern medicalRecordId = RedactionPattern.of(
        "medical_record_id",
        "MRN-\\d{8}",                  // pattern (compiled to Pattern internally)
        Severity.CRITICAL);

RedactionPattern licensePlate = RedactionPattern.of(
        "license_plate_tr",
        "\\b\\d{2}\\s?[A-Z]{1,3}\\s?\\d{2,4}\\b",
        Severity.MEDIUM,
        plate -> plate.length() >= 7);  // optional validator filters false positives

List<RedactionPattern> myCatalog = new java.util.ArrayList<>(RedactionPatterns.defaults());
myCatalog.add(medicalRecordId);
myCatalog.add(licensePlate);

Redactor custom = new PatternRedactor(myCatalog);

The validator runs after the regex matches and lets you check shape rules (Luhn checksum, range, character class) that regex alone can't express. Patterns without a validator accept every regex match.

SPI summary

TypeRole
RedactorThe SPI. Three methods: scrubString, scrubValue, maybeContainsSensitive.
PatternRedactorDefault implementation — regex-driven, stateless, thread-safe.
RedactionPatternsConstants for the 14 default patterns + defaults() list.
RedactionResultOutput of scrubString — text + findings.
RedactionFindingOne match — pattern name, offsets, placeholder, severity.
RedactionContextPer-call context — EventContext, scope, tenant policy id.
RedactionScopeEnum of sinks — LOG_ATTR, MEMORY_WRITE, LLM_PROMPT, etc.
SeverityLOW / MEDIUM / HIGH / CRITICAL.
CompositeRedactorChain of redactors.
TenantPolicyRedactorPer-tenant dispatcher.
AuditingRedactorDecorator that fires RedactionAuditEvents.
RedactingMemoryStoreMemoryStore decorator.
RedactingAgentEventPublisherAgentEventPublisher decorator.

Invariants

The framework's property test suite (RedactorPropertyTest) verifies four invariants that every Redactor implementation must hold:

  1. Idempotencyredact(redact(x)).text == redact(x).text. A sink that double-scrubs (memory write then log emit) doesn't drift.
  2. Pattern survival — no input PII fragment appears in the redacted output.
  3. Placeholder safety — the literal placeholder [REDACTED:foo] does not match any default pattern.
  4. Length bounded — output length ≤ input length + (max placeholder length × findings count).

Custom redactors that ship into framework sinks should add coverage to the same suite.

Trade-offs

  • False positives — phone-regex matches tracking IDs that look like phones. Default placeholder preserves visual shape; tighten per-pattern with a validator.
  • False negatives — rare PII formats (driver's licence, medical record IDs) are not in the default catalog. Plug in custom patterns or the (forthcoming) classifier redactor.
  • Performance cost — on hot paths with huge payloads (1MB+ LLM responses), redaction adds measurable latency. Big-payload paths should run through async / buffered publish.
  • Data loss risk — aggressive redaction may scrub legitimate content (base64 images, opaque IDs that look like tokens). Tune per tenant.

Roadmap

The redaction SPI lives in tnsai-quality and is stable. Pending work tracked under issue #80:

  • LLMClassifierRedactor — opt-in classifier using any LLMClient for non-pattern PII (names, addresses, health info).
  • PresidioRedactor — bridge to Microsoft Presidio for state-of-the-art NER.
  • YAML policy loader — tenant-redaction-policies.yaml parser to populate TenantPolicyRedactor from config.
  • OnNotification hook integration — fire on critical-severity findings (depends on the hook system in #60).

On this page