Cost Governance
Cap LLM spend per tenant, agent, or capability with declarative budgets backed by a pluggable spend store. Pairs with LLM observability — every captured LLMCallLog carries a cost estimate; the cost-governance layer reads cumulative spend and enforces a policy when a budget is exhausted.
See also: Cost Tracking — the older
CostAwareLLMClientsystem focused on client-side spend control. Use cost-tracking for self-contained client-edge enforcement; use cost-governance when budgets need to span multiple clients / agents / processes against a shared spend ledger.
Why a separate layer
Capturing cost (LLMCallLog.cost()) tells you what's been spent. Cost governance answers the next question: what should happen when a tenant's daily budget is exhausted? Three policies cover the realistic answers:
| Policy | Behavior | Use case |
|---|---|---|
SOFT_WARN | Emit warning event + Prometheus alert; let the call through | Budget is informational (analytics, not throttling) |
HARD_STOP | Block any new LLM call that would push spend over budget | Interactive chat where user should see "budget exceeded" immediately |
DEFER | Queue the call for the next period; notify via OnNotification hook | Non-interactive workloads that can wait (overnight batch, async research agents) |
The framework ships the data model and the evaluator. Wiring it into the LLM call hot path uses the hook system from issue #60.
Building blocks
package com.tnsai.quality.cost;
// Declarative budget — what the limit is + what to do when exhausted.
public record CostBudget(
String tenantId,
Optional<String> agentId,
Optional<String> capabilityId,
BigDecimal periodBudgetUSD,
Duration period,
BudgetPolicy onExceed) { ... }
// Where spend is bucketed for accumulation.
public record CostScope(
String tenantId,
Optional<String> agentId,
Optional<String> capabilityId) { ... }
// SPI — store cumulative spend per scope per rolling window.
public interface CostBudgetStore {
void record(CostScope scope, BigDecimal amountUSD);
BigDecimal currentSpend(CostScope scope, Duration period);
void clear();
}
// Stateless evaluator — query the store, package as a snapshot.
public final class CostBudgetEvaluator {
public BudgetState evaluate(CostBudget budget);
}
// Snapshot — budget + spend + status (HEALTHY / WARNING / EXHAUSTED).
public record BudgetState(
CostBudget budget,
BigDecimal spentUSD,
BigDecimal remainingUSD,
double fractionConsumed,
Status status, // HEALTHY < 80%, WARNING 80–100%, EXHAUSTED ≥ 100%
Instant evaluatedAt) { ... }Quick start
Define a tenant-wide daily cap, record spend after each LLM call, and read the status before each call:
import com.tnsai.quality.cost.*;
import java.math.BigDecimal;
import java.time.Duration;
CostBudgetStore store = new InMemoryCostBudgetStore();
CostBudgetEvaluator eval = new CostBudgetEvaluator(store);
CostBudget acmeDaily = CostBudget.tenantWide(
"acme",
new BigDecimal("50.00"), // $50/day cap
Duration.ofDays(1),
BudgetPolicy.HARD_STOP);
// After each captured LLMCallLog (e.g. from CapturingLLMClient),
// fold the cost into the store.
LLMCallLog log = ...; // from your LLMCallPublisher
store.record(
CostScope.tenant(log.context().tenantId().orElse("default")),
log.cost().totalUSD());
// Before the next call, check the budget.
BudgetState state = eval.evaluate(acmeDaily);
switch (state.status()) {
case HEALTHY -> { /* proceed */ }
case WARNING -> logger.warn("Budget at {}% — consider throttling",
(int)(state.fractionConsumed() * 100));
case EXHAUSTED -> throw new IllegalStateException(
"Tenant acme over $" + acmeDaily.periodBudgetUSD() + " daily cap");
}Hierarchical scopes
Budgets nest: tenant-wide caps the whole tenant, per-agent caps a single agent inside it, per-capability caps a specific @Capability:
CostBudget tenantCap = CostBudget.tenantWide("acme",
new BigDecimal("500.00"), Duration.ofDays(1), BudgetPolicy.HARD_STOP);
CostBudget summarizerCap = CostBudget.forAgent("acme", "summarizer-agent",
new BigDecimal("50.00"), Duration.ofDays(1), BudgetPolicy.SOFT_WARN);
// Spend recorded at finest scope rolls up to coarser-scoped queries.
store.record(CostScope.full("acme", "summarizer-agent", "extractive-summary"),
new BigDecimal("0.42"));
// Both budgets see the $0.42 — the tenant query rolls up; the agent
// query matches exactly.
eval.evaluate(tenantCap).spentUSD(); // $0.42
eval.evaluate(summarizerCap).spentUSD(); // $0.42CostScope.contains(other) defines the rollup rule: a scope's populated fields must all match the recorded scope; absent fields wildcard.
Status thresholds
BudgetState.Status follows the SRE 80% rule of thumb:
HEALTHY— spend under 80% of the capWARNING— 80% – 100% (raise the flag)EXHAUSTED— 100%+ (cap reached or breached)
Match thresholds to your alerting policy: WARNING typically becomes a Prometheus alert; EXHAUSTED triggers BudgetPolicy.HARD_STOP enforcement when you wire the enforcement hook.
Stores
InMemoryCostBudgetStore ships as the default — process-local, fine for dev / single-tenant. The SPI accepts custom backends:
- Redis — distributed across multiple TnsAI server instances; respected globally
- Postgres — strong durability + queryable history for audit
- ClickHouse / DuckDB — analytical workloads where you also want time-series cost reports
A Redis adapter ships as a focused follow-up. The SPI is stable so consumer-side adapters can be written today.
Enforcement
CostBudgetEnforcer is a Hook<PreAction> that reads recorded spend before every ActionType.LLM dispatch and applies the budget's onExceed policy. It runs at priority -100 (before normal-priority hooks) so a budget-blocked call never reaches argument-scrubbing or observability hooks that would do real work.
package com.tnsai.quality.cost;
CostBudgetStore store = new InMemoryCostBudgetStore(Duration.ofHours(24));
// Resolve the most-specific budget for an in-flight LLM call.
Function<PreAction, Optional<CostBudget>> resolver = event ->
Optional.ofNullable(budgetByTenant.get(event.context().tenantId().orElse(null)));
// Optional pre-call estimator. When zero (default), the enforcer fires
// after the cap is crossed by recorded spend; with a real estimator it
// can block borderline calls before they go out.
Function<PreAction, BigDecimal> estimator = event -> BigDecimal.ZERO;
CostBudgetEnforcer enforcer = new CostBudgetEnforcer(store, resolver, estimator);
// Wire alongside security/audit hooks via the standard dispatcher.
hookDispatcher.register(enforcer);Policy outcomes:
SOFT_WARN→ log a warning and allow the call (budget is informational)HARD_STOP→ returnHookResult.BlockwithSeverity.HIGHand a structured "budget exceeded" reasonDEFER→ also blocks (the framework owns no queue), but the block reason explicitly tells the consumer "DEFER policy requested but no framework queue is configured; consumer must catch and reschedule" — typically caught and routed onto a consumer-owned background worker.
The enforcer's matcher restricts it to ActionType.LLM; tool / web-service / local-method dispatches never invoke it.
Composing with sampling and redaction
All three observability decorators (SamplingAgentEventPublisher, RedactingAgentEventPublisher, custom LLMCallPublisher) compose freely. A typical production stack:
LLM call → CapturingLLMClient (LLMCallLog with cost)
↓
publish to LLMCallPublisher chain:
↓
Spend recorder → store.record(scope, cost)
↓
Sampling decorator → drop high-volume noise
↓
Redaction decorator → scrub PII before sink
↓
Final sink (LangFuse / Loki / S3 / Slack)Cost recording happens before sampling so the spend ledger is always complete — sampling decisions are about log volume, not budget accuracy.
Future work
RedisCostBudgetStore— distributed store for multi-process deployments- Prometheus metrics derivation:
tnsai_llm_cost_usd_total{tenant,agent,capability},tnsai_llm_budget_utilized_ratio,tnsai_llm_budget_exceeded_total - Grafana dashboard JSON bundled with the framework
- Prometheus Alertmanager rule examples (80% / 100% utilization)
- YAML budget config loader (avoids programmatic builder per tenant)
See Also
- LLM Observability —
LLMCallLogis the source of cost data; without capture, there's nothing to govern - Cost Tracking — older client-edge
CostAwareLLMClient+BudgetManagersystem - Sampling — companion observability decorator; sampling decisions never affect spend recording
- Redaction — pair cost-governance dashboards with redaction so spend reports don't leak prompt content