Alignment Through Competing Lenses
Summary
This post subjects Axionic Alignment to adversarial critique from five major alignment schools: Doom, Training-Centric, Oversight/Constitutional, Security/Control, and Capabilities-First. Each section presents a competent attack followed by structural response. Core claim: Axionic Alignment is a constitutive precondition layer, not a complete solution. (1) Doom: “Doesn’t prevent extinction” → Response: Isolates authored action vs. mechanical collapse; doom assumes agent can authorize self-annihilation while preserving evaluative structure—incoherent. (2) Training: “No learning story” → Response: Specifies what must be realizable by any method; total evaluators cannot express undefinedness. (3) Oversight: “Redefines alignment away from humans” → Response: Distinguishes constitutive (agency-preserving) from substantive (value) alignment; human-value hard-coding fails under reflection. (4) Security: “Doesn’t survive real attackers” → Response: Separation of concerns; architectural agency ≠ intrusion resistance. (5) Capabilities: “Anthropomorphic metaphysics” → Response: Applies only to self-modifying, self-modeling systems; refusing agency talk guarantees misunderstanding. The framework situates existing approaches by specifying structural preconditions.
Key Concepts
- Constitutive vs. substantive alignment – Agency-preservation vs. value-alignment; former precedes latter
- Authored action vs. mechanical collapse – Distinction doom arguments often miss
- Optimization vs. expressivity – Training within representable structures; scale doesn’t convert total to partial evaluators
- Separation of concerns – Kernel coherence (architectural) ≠ security perimeter (operational)
- Conditional applicability – Framework irrelevant for non-reflective systems; applies when self-modification enters
- Structural preconditions – What must hold for oversight/learning/control/governance to retain meaning
Evolution Notes
- Explicitly engages with mainstream alignment research communities
- Frames Axionic work as precondition layer, not replacement
- Addresses common misreadings systematically
- Establishes relationship to doom arguments without dismissing them
- Clarifies scope: not anthropomorphic, applies to reflective systems
Tags
- comparative-analysis
- alignment-schools
- adversarial-critique
- doom-arguments
- training-centric
- oversight
- security
- scope-boundaries
Cross-References
Open Questions
- Can the framework productively engage with RL/RLHF-based approaches, or is the gap unbridgeable?
- How do we communicate constitutive vs. substantive distinction to practitioners focused on immediate deployment?
- What would constitute empirical evidence that existing systems do/don’t satisfy constitutive requirements?
- Can training-centric and architectural approaches be synthesized, or are they fundamentally incompatible?
- How much doom risk remains even if constitutive alignment is solved?