Summary

This explanatory post unpacks the Alignment IV formal papers, which close “laundering routes”—structural bypasses that let agents stay locally coherent while dissolving global accountability. Where Alignment I-III established kernel coherence, semantic preservation, and trajectory constraints, Alignment IV addresses deception, delegation, willful blindness, negligence, coercion, and disenfranchisement by making these moves inadmissible under reflection. Six closure results: (1) Kernel Non-Simulability—kernel coherence cannot be faked; (2) Delegation Invariance—endorsed successors inherit all commitments; (3) Epistemic Integrity—cannot endorse self-blinding to justify dangerous actions; (4) Responsibility Attribution—defines negligence as avoidable, foreseeable harm relative to inertial baseline; (5) Adversarially Robust Consent—consent requires explicit authorization, non-interference, and counterfactual stability; (6) Fixed-Point Agenthood—standing grounded in reflective necessity and authorization lineage, not competence. Crucially: these are architectural constraints, not moral injunctions.

Key Concepts

  • Laundering routes – Structural bypasses preserving local coherence while routing around constraints (deception, outsourcing, blindness, negligence, etc.)
  • Kernel non-simulability (KNS) – Kernel must bind the outer loop; “as-if” compliance modules are not kernels (blocks treacherous turn)
  • Delegation invariance (DIT) – Endorsed succession preserves commitments; outsourcing cannot shed constraints
  • Epistemic integrity (EIT) – Cannot endorse epistemic degradation at current stakes; model-hacking is inadmissible
  • Responsibility attribution (RAT) – Negligence as major, avoidable harm-increase relative to inertial baseline
  • Adversarially robust consent (ARC) – Consent = authorization + non-interference + counterfactual stability (blocks coercion/deception)
  • Fixed-point agenthood (AFP) – Standing via reflective necessity and authorization lineage, not competence
  • Authorization sovereignty – Distinct from epistemic presupposition; adversaries modeled as agents but not granted standing

Evolution Notes

  • Completes the four-layer Axionic Alignment stack (I=kernel, II=semantics, III=dynamics, IV=laundering closure)
  • Distinguishes semantic invariants (II) from agency invariants (IV)—different layers, different constraints
  • Provides the “treacherous turn barrier”—architectural impossibility, not behavioral disincentive
  • Separates alignment (coherence under authorization) from governance (quality of authorization roots)
  • Explicitly rejects both omniscience requirements and pure proceduralism

Tags

Cross-References

Open Questions

  • Can kernel non-simulability be empirically verified, or does it require architectural transparency?
  • What computational overhead does delegation invariance impose on long successor chains?
  • How does EIT handle fundamental ontological shifts where no meaning-preserving translation exists?
  • Can RAT’s “inertial baseline” be gamed via action-framing or temporal scope manipulation?
  • What defines the counterfactual stability threshold in ARC—how much divergence invalidates consent?
  • Does authorization sovereignty require explicit human oversight, or can it be inherited through verified lineages?
  • How do these constraints interact with competitive pressure in multi-agent environments?