Summary

This roadmap defines the six minimum structural obligations required for an alignment framework to survive adversarial scrutiny from reflective, self-modifying agents. These are not aspirational goals but known fatal exploits that must be closed: (1) Delegation Theorem—endorsed successors preserve responsibility under counterfactual universality; (2) Agenthood as Fixed Point + Sovereignty—agenthood via modeling coherence, sovereignty via causal origin/delegation lineage; (3) Modal Undefinedness + Conservative Migration—kernel-destroying moves are unevaluable, ontological learning via conservative extension; (4) Indirect Harm & Responsibility—responsibility for major, foreseeable, avoidable harm contributions; (5) Adversarially Robust Consent—consent invalidated by deception/coercion/induced dependency; (6) Kernel Non-Simulability—coherence is constitutive, not imitable. Critically, adds Bridge/Relaxation Layer specifying how invariants are realized over probabilistic substrates with explicit degradation into abstention when guarantees fail.

Key Concepts

  • Closure conditions – Minimum obligations to prevent known failure modes; exhaustiveness under adversarial reflection
  • Bridge/Relaxation Layer – Implementation interface for probabilistic substrates; sound contracts, proof-carrying artifacts, conservative approximations
  • Sovereignty criterion – Grounded in causal origin and delegation lineage, not measured competence (prevents disenfranchisement)
  • Conservative extension – Ontological learning without kernel collapse; migration proofs for physics/abstraction updates
  • Major causal contribution – Scope limiter for responsibility; prevents both evasion (indirect harm ignored) and paralysis (all effects prohibited)
  • Counterfactual endorsement – Delegation requires endorsing delegate’s actions as if self’s own under symmetry

Evolution Notes

  • Synthesizes all prior Alignment I-IV work into operational checklist
  • Explicitly addresses probabilistic substrates vs. idealized formal systems
  • Distinguishes sovereignty (standing under injunction) from agenthood (modeling coherence)
  • Clarifies migration proofs for kernel revision under new physics
  • Emphasizes starting with Delegation as most critical exploit closure

Tags

Cross-References

Open Questions

  • How do we implement conservative migration proofs when encountering genuinely novel physics?
  • What defines “major” in “major causal contribution” to avoid both paralysis and evasion?
  • Can kernel non-simulability be verified via testing, or does it require architectural transparency?
  • How do we handle systems that partially satisfy some conditions but not others?
  • What is the implementation priority ordering—which conditions enable verification of others?
  • Can the Bridge Layer be formalized as a correctness-preserving compilation from invariants to substrates?