Risk Function, Not Risk Score
Why trust for AI agents should be expressed as a function with inputs, not a single number.
The Formal Model
An agent takes actions. Each action has probability of producing a harmful outcome. The system has containment layers, each with failure probability . If a harmful action passes all layers, it produces loss .
The naive independent-layers model:
Three independent layers at yield a joint failure probability of . But this model is wrong for agents for three reasons:
- Layers are not independent. An agent that social-engineers its operator can compromise oversight, permissions, and insurance in a single move.
- The agent can degrade its own containment. Containment quality is endogenous to agent behavior: .
- The system is non-stationary. Model updates change . Containment changes alter . Loss magnitude fluctuates.
The actual risk:
Why a Function, Not a Score
Different counterparties have different risk tolerances. A DEX allowing 500 USDC swaps has different requirements than a lending protocol extending 100,000 USDC credit.
The certificate provides inputs — constraint types, enforcement mechanisms, reserve amounts, auditor attestations. The counterparty applies their own function to those inputs.
This is a design choice: the protocol does not prescribe what is "safe enough." It gives every counterparty the data to make that judgment for themselves.
The Design Constraint
Even if all agent-influenceable layers are compromised, the remaining agent-independent layers must bound the loss: