Mutually Assured Evaluations: Trust as Infrastructure in the AI Age

The global race for artificial intelligence (AI) capability is not just a matter of innovation—it's increasingly a question of security, coordination, and trust. As powerful AI systems diffuse across jurisdictions, the traditional tools of geopolitical governance—export controls, embargoes, and licensing regimes—are proving insufficient.

To address this, the Evals As National Security policy brief proposes a novel framework: Mutually Assured Evaluations (MAE). Like the arms control mechanisms of the nuclear era, MAE offers a systemic way to manage risk—through transparency, cooperation, and shared metrics.

What Are Mutually Assured Evaluations?

MAE are structured protocols that incentivize the open disclosure of AI capabilities, behaviors, and risks among nations, labs, and regulators. By creating a shared technical language for evaluating systems, MAE aligns diverse actors around verifiable safety norms without relying solely on coercive enforcement.

It builds on three evaluation modalities:

Performance (task-specific outcomes)
Capabilities (inferred generalization and potential)
Human-risk effects (social persuasion, bias, misuse)

These evaluations are shared through secure info-channels, audited for integrity, and used to inform decisions ranging from model deployment to export permissions.

Why This Matters

Opaque evaluation practices today allow actors to understate capabilities, obscure risks, or bypass export thresholds. This enables not just accidents—but geopolitical miscalculations.

MAE proposes a shift:

From competition over secrecy to cooperation through evaluation
From unilateral restrictions to mutual verification
From punitive export controls to preemptive transparency

This shift helps nations ensure that emerging AI systems are secure by design, reducing the likelihood of adversarial development, espionage, or unintentional harm.

Incentives: Aligning Evaluation With Security

In MAE, trust becomes a tradable asset. Actors who participate in good-faith disclosures and adhere to agreed-upon safety norms gain:

Access to collaborative testbeds (like Elo-based evaluations)
Favorable regulatory treatment or compute access
Enhanced geopolitical standing as “trusted” developers

MAE doesn’t replace export controls—but it augments them with visibility. For example, if one country demonstrates that its frontier model complies with MAE standards, others may ease restrictions or form cooperative development pacts.

The Tools: Registries, Protocols, and Shared Standards

The proposal doesn’t stop at theory. It includes:

Formal registries to track chip origin, compute thresholds, and diffusion patterns
Evaluation Responsible Disclosure Policies (REDP) to manage sensitive information
Verification protocols to confirm claims about model safety, robustness, and intent

These systems are interoperable with existing export regimes (e.g., Wassenaar Arrangement, BIS controls) but fill in the critical gap of functional transparency.

Challenges and Limitations

Despite its promise, MAE faces real implementation barriers:

Asymmetric capabilities: Some nations may lack the compute or evaluative infrastructure to participate meaningfully
Inconsistent norms: Disagreements over what counts as “risky” or “safe” could erode shared standards
Information leakage concerns: Private actors may resist disclosing capabilities that expose IP or confer strategic disadvantage

Still, the alternative—fragmented regulation, secretive escalation, and governance by accident—is far riskier.

Conclusion

MAE stands in stark contrast to what the paper terms Mutual Assured AI Malfunction—a world where distrust, disinformation, and adversarial secrecy dominate the AI space. In MAE, transparency is deterrence, and shared evaluation is defense.

By building trust into the infrastructure of AI development, Mutually Assured Evaluations offer a path toward a world where innovation and security reinforce each other—not tear each other apart.

Mutually Assured Evaluations