The Alignment Closure Conditions

Summary

This roadmap defines the six minimum structural obligations required for an alignment framework to survive adversarial scrutiny from reflective, self-modifying agents. These are not aspirational goals but known fatal exploits that must be closed: (1) Delegation Theorem—endorsed successors preserve responsibility under counterfactual universality; (2) Agenthood as Fixed Point + Sovereignty—agenthood via modeling coherence, sovereignty via causal origin/delegation lineage; (3) Modal Undefinedness + Conservative Migration—kernel-destroying moves are unevaluable, ontological learning via conservative extension; (4) Indirect Harm & Responsibility—responsibility for major, foreseeable, avoidable harm contributions; (5) Adversarially Robust Consent—consent invalidated by deception/coercion/induced dependency; (6) Kernel Non-Simulability—coherence is constitutive, not imitable. Critically, adds Bridge/Relaxation Layer specifying how invariants are realized over probabilistic substrates with explicit degradation into abstention when guarantees fail.

Key Concepts

Closure conditions – Minimum obligations to prevent known failure modes; exhaustiveness under adversarial reflection
Bridge/Relaxation Layer – Implementation interface for probabilistic substrates; sound contracts, proof-carrying artifacts, conservative approximations
Sovereignty criterion – Grounded in causal origin and delegation lineage, not measured competence (prevents disenfranchisement)
Conservative extension – Ontological learning without kernel collapse; migration proofs for physics/abstraction updates
Major causal contribution – Scope limiter for responsibility; prevents both evasion (indirect harm ignored) and paralysis (all effects prohibited)
Counterfactual endorsement – Delegation requires endorsing delegate’s actions as if self’s own under symmetry

Evolution Notes

Synthesizes all prior Alignment I-IV work into operational checklist
Explicitly addresses probabilistic substrates vs. idealized formal systems
Distinguishes sovereignty (standing under injunction) from agenthood (modeling coherence)
Clarifies migration proofs for kernel revision under new physics
Emphasizes starting with Delegation as most critical exploit closure

Cross-References

Open Questions

How do we implement conservative migration proofs when encountering genuinely novel physics?
What defines “major” in “major causal contribution” to avoid both paralysis and evasion?
Can kernel non-simulability be verified via testing, or does it require architectural transparency?
How do we handle systems that partially satisfy some conditions but not others?
What is the implementation priority ordering—which conditions enable verification of others?
Can the Bridge Layer be formalized as a correctness-preserving compilation from invariants to substrates?

Summary

Key Concepts

Evolution Notes

Tags

Cross-References

Open Questions