Cap LLM spend per tenant, agent, or capability with declarative budgets backed by a pluggable spend store. Pairs with LLM observability — every captured LLMCallLog carries a cost estimate; the cost-governance layer reads cumulative spend and enforces a policy when a budget is exhausted.

See also: Cost Tracking — the older CostAwareLLMClient system focused on client-side spend control. Use cost-tracking for self-contained client-edge enforcement; use cost-governance when budgets need to span multiple clients / agents / processes against a shared spend ledger.

Why a separate layer

Capturing cost (LLMCallLog.cost()) tells you what's been spent. Cost governance answers the next question: what should happen when a tenant's daily budget is exhausted? Three policies cover the realistic answers:

Policy	Behavior	Use case
`SOFT_WARN`	Emit warning event + Prometheus alert; let the call through	Budget is informational (analytics, not throttling)
`HARD_STOP`	Block any new LLM call that would push spend over budget	Interactive chat where user should see "budget exceeded" immediately
`DEFER`	Queue the call for the next period; notify via `OnNotification` hook	Non-interactive workloads that can wait (overnight batch, async research agents)

The framework ships the data model and the evaluator. Wiring it into the LLM call hot path uses the hook system from issue #60.

Building blocks

package com.tnsai.quality.cost;

// Declarative budget — what the limit is + what to do when exhausted.
public record CostBudget(
        String tenantId,
        Optional<String> agentId,
        Optional<String> capabilityId,
        BigDecimal periodBudgetUSD,
        Duration period,
        BudgetPolicy onExceed) { ... }

// Where spend is bucketed for accumulation.
public record CostScope(
        String tenantId,
        Optional<String> agentId,
        Optional<String> capabilityId) { ... }

// SPI — store cumulative spend per scope per rolling window.
public interface CostBudgetStore {
    void record(CostScope scope, BigDecimal amountUSD);
    BigDecimal currentSpend(CostScope scope, Duration period);
    void clear();
}

// Stateless evaluator — query the store, package as a snapshot.
public final class CostBudgetEvaluator {
    public BudgetState evaluate(CostBudget budget);
}

// Snapshot — budget + spend + status (HEALTHY / WARNING / EXHAUSTED).
public record BudgetState(
        CostBudget budget,
        BigDecimal spentUSD,
        BigDecimal remainingUSD,
        double fractionConsumed,
        Status status,           // HEALTHY < 80%, WARNING 80–100%, EXHAUSTED ≥ 100%
        Instant evaluatedAt) { ... }

Quick start

Define a tenant-wide daily cap, record spend after each LLM call, and read the status before each call:

import com.tnsai.quality.cost.*;
import java.math.BigDecimal;
import java.time.Duration;

CostBudgetStore store = new InMemoryCostBudgetStore();
CostBudgetEvaluator eval = new CostBudgetEvaluator(store);

CostBudget acmeDaily = CostBudget.tenantWide(
        "acme",
        new BigDecimal("50.00"),     // $50/day cap
        Duration.ofDays(1),
        BudgetPolicy.HARD_STOP);

// After each captured LLMCallLog (e.g. from CapturingLLMClient),
// fold the cost into the store.
LLMCallLog log = ...; // from your LLMCallPublisher
store.record(
        CostScope.tenant(log.context().tenantId().orElse("default")),
        log.cost().totalUSD());

// Before the next call, check the budget.
BudgetState state = eval.evaluate(acmeDaily);
switch (state.status()) {
    case HEALTHY -> { /* proceed */ }
    case WARNING -> logger.warn("Budget at {}% — consider throttling",
            (int)(state.fractionConsumed() * 100));
    case EXHAUSTED -> throw new IllegalStateException(
            "Tenant acme over $" + acmeDaily.periodBudgetUSD() + " daily cap");
}

Hierarchical scopes

Budgets nest: tenant-wide caps the whole tenant, per-agent caps a single agent inside it, per-capability caps a specific @Capability:

CostBudget tenantCap = CostBudget.tenantWide("acme",
        new BigDecimal("500.00"), Duration.ofDays(1), BudgetPolicy.HARD_STOP);

CostBudget summarizerCap = CostBudget.forAgent("acme", "summarizer-agent",
        new BigDecimal("50.00"), Duration.ofDays(1), BudgetPolicy.SOFT_WARN);

// Spend recorded at finest scope rolls up to coarser-scoped queries.
store.record(CostScope.full("acme", "summarizer-agent", "extractive-summary"),
        new BigDecimal("0.42"));

// Both budgets see the $0.42 — the tenant query rolls up; the agent
// query matches exactly.
eval.evaluate(tenantCap).spentUSD();      // $0.42
eval.evaluate(summarizerCap).spentUSD();  // $0.42

CostScope.contains(other) defines the rollup rule: a scope's populated fields must all match the recorded scope; absent fields wildcard.

Status thresholds

BudgetState.Status follows the SRE 80% rule of thumb:

HEALTHY — spend under 80% of the cap
WARNING — 80% – 100% (raise the flag)
EXHAUSTED — 100%+ (cap reached or breached)

Match thresholds to your alerting policy: WARNING typically becomes a Prometheus alert; EXHAUSTED triggers BudgetPolicy.HARD_STOP enforcement when you wire the enforcement hook.

Stores

InMemoryCostBudgetStore ships as the default — process-local, fine for dev / single-tenant. The SPI accepts custom backends:

Redis — distributed across multiple TnsAI server instances; respected globally
Postgres — strong durability + queryable history for audit
ClickHouse / DuckDB — analytical workloads where you also want time-series cost reports

A Redis adapter ships as a focused follow-up. The SPI is stable so consumer-side adapters can be written today.

Enforcement

CostBudgetEnforcer is a Hook<PreAction> that reads recorded spend before every ActionType.LLM dispatch and applies the budget's onExceed policy. It runs at priority -100 (before normal-priority hooks) so a budget-blocked call never reaches argument-scrubbing or observability hooks that would do real work.

package com.tnsai.quality.cost;

CostBudgetStore store = new InMemoryCostBudgetStore(Duration.ofHours(24));

// Resolve the most-specific budget for an in-flight LLM call.
Function<PreAction, Optional<CostBudget>> resolver = event ->
        Optional.ofNullable(budgetByTenant.get(event.context().tenantId().orElse(null)));

// Optional pre-call estimator. When zero (default), the enforcer fires
// after the cap is crossed by recorded spend; with a real estimator it
// can block borderline calls before they go out.
Function<PreAction, BigDecimal> estimator = event -> BigDecimal.ZERO;

CostBudgetEnforcer enforcer = new CostBudgetEnforcer(store, resolver, estimator);

// Wire alongside security/audit hooks via the standard dispatcher.
hookDispatcher.register(enforcer);

Policy outcomes:

SOFT_WARN → log a warning and allow the call (budget is informational)
HARD_STOP → return HookResult.Block with Severity.HIGH and a structured "budget exceeded" reason
DEFER → also blocks (the framework owns no queue), but the block reason explicitly tells the consumer "DEFER policy requested but no framework queue is configured; consumer must catch and reschedule" — typically caught and routed onto a consumer-owned background worker.

The enforcer's matcher restricts it to ActionType.LLM; tool / web-service / local-method dispatches never invoke it.

Composing with sampling and redaction

All three observability decorators (SamplingAgentEventPublisher, RedactingAgentEventPublisher, custom LLMCallPublisher) compose freely. A typical production stack:

LLM call → CapturingLLMClient (LLMCallLog with cost)
                    ↓
       publish to LLMCallPublisher chain:
                    ↓
       Spend recorder → store.record(scope, cost)
                    ↓
       Sampling decorator → drop high-volume noise
                    ↓
       Redaction decorator → scrub PII before sink
                    ↓
       Final sink (LangFuse / Loki / S3 / Slack)

Cost recording happens before sampling so the spend ledger is always complete — sampling decisions are about log volume, not budget accuracy.

Future work

RedisCostBudgetStore — distributed store for multi-process deployments
Prometheus metrics derivation: tnsai_llm_cost_usd_total{tenant,agent,capability}, tnsai_llm_budget_utilized_ratio, tnsai_llm_budget_exceeded_total
Grafana dashboard JSON bundled with the framework
Prometheus Alertmanager rule examples (80% / 100% utilization)
YAML budget config loader (avoids programmatic builder per tenant)

Cost Governance