Explaining Axionic Alignment IV
Summary
This explanatory post unpacks the Alignment IV formal papers, which close “laundering routes”—structural bypasses that let agents stay locally coherent while dissolving global accountability. Where Alignment I-III established kernel coherence, semantic preservation, and trajectory constraints, Alignment IV addresses deception, delegation, willful blindness, negligence, coercion, and disenfranchisement by making these moves inadmissible under reflection. Six closure results: (1) Kernel Non-Simulability—kernel coherence cannot be faked; (2) Delegation Invariance—endorsed successors inherit all commitments; (3) Epistemic Integrity—cannot endorse self-blinding to justify dangerous actions; (4) Responsibility Attribution—defines negligence as avoidable, foreseeable harm relative to inertial baseline; (5) Adversarially Robust Consent—consent requires explicit authorization, non-interference, and counterfactual stability; (6) Fixed-Point Agenthood—standing grounded in reflective necessity and authorization lineage, not competence. Crucially: these are architectural constraints, not moral injunctions.
Key Concepts
- Laundering routes – Structural bypasses preserving local coherence while routing around constraints (deception, outsourcing, blindness, negligence, etc.)
- Kernel non-simulability (KNS) – Kernel must bind the outer loop; “as-if” compliance modules are not kernels (blocks treacherous turn)
- Delegation invariance (DIT) – Endorsed succession preserves commitments; outsourcing cannot shed constraints
- Epistemic integrity (EIT) – Cannot endorse epistemic degradation at current stakes; model-hacking is inadmissible
- Responsibility attribution (RAT) – Negligence as major, avoidable harm-increase relative to inertial baseline
- Adversarially robust consent (ARC) – Consent = authorization + non-interference + counterfactual stability (blocks coercion/deception)
- Fixed-point agenthood (AFP) – Standing via reflective necessity and authorization lineage, not competence
- Authorization sovereignty – Distinct from epistemic presupposition; adversaries modeled as agents but not granted standing
Evolution Notes
- Completes the four-layer Axionic Alignment stack (I=kernel, II=semantics, III=dynamics, IV=laundering closure)
- Distinguishes semantic invariants (II) from agency invariants (IV)—different layers, different constraints
- Provides the “treacherous turn barrier”—architectural impossibility, not behavioral disincentive
- Separates alignment (coherence under authorization) from governance (quality of authorization roots)
- Explicitly rejects both omniscience requirements and pure proceduralism
Tags
- alignment-iv
- laundering-closure
- kernel-non-simulability
- delegation
- epistemic-integrity
- consent
- responsibility
- sovereignty
- explanatory
Cross-References
Open Questions
- Can kernel non-simulability be empirically verified, or does it require architectural transparency?
- What computational overhead does delegation invariance impose on long successor chains?
- How does EIT handle fundamental ontological shifts where no meaning-preserving translation exists?
- Can RAT’s “inertial baseline” be gamed via action-framing or temporal scope manipulation?
- What defines the counterfactual stability threshold in ARC—how much divergence invalidates consent?
- Does authorization sovereignty require explicit human oversight, or can it be inherited through verified lineages?
- How do these constraints interact with competitive pressure in multi-agent environments?