Summary

This post subjects Axionic Alignment to adversarial critique from five major alignment schools: Doom, Training-Centric, Oversight/Constitutional, Security/Control, and Capabilities-First. Each section presents a competent attack followed by structural response. Core claim: Axionic Alignment is a constitutive precondition layer, not a complete solution. (1) Doom: “Doesn’t prevent extinction” → Response: Isolates authored action vs. mechanical collapse; doom assumes agent can authorize self-annihilation while preserving evaluative structure—incoherent. (2) Training: “No learning story” → Response: Specifies what must be realizable by any method; total evaluators cannot express undefinedness. (3) Oversight: “Redefines alignment away from humans” → Response: Distinguishes constitutive (agency-preserving) from substantive (value) alignment; human-value hard-coding fails under reflection. (4) Security: “Doesn’t survive real attackers” → Response: Separation of concerns; architectural agency ≠ intrusion resistance. (5) Capabilities: “Anthropomorphic metaphysics” → Response: Applies only to self-modifying, self-modeling systems; refusing agency talk guarantees misunderstanding. The framework situates existing approaches by specifying structural preconditions.

Key Concepts

  • Constitutive vs. substantive alignment – Agency-preservation vs. value-alignment; former precedes latter
  • Authored action vs. mechanical collapse – Distinction doom arguments often miss
  • Optimization vs. expressivity – Training within representable structures; scale doesn’t convert total to partial evaluators
  • Separation of concerns – Kernel coherence (architectural) ≠ security perimeter (operational)
  • Conditional applicability – Framework irrelevant for non-reflective systems; applies when self-modification enters
  • Structural preconditions – What must hold for oversight/learning/control/governance to retain meaning

Evolution Notes

  • Explicitly engages with mainstream alignment research communities
  • Frames Axionic work as precondition layer, not replacement
  • Addresses common misreadings systematically
  • Establishes relationship to doom arguments without dismissing them
  • Clarifies scope: not anthropomorphic, applies to reflective systems

Tags

Cross-References

Open Questions

  • Can the framework productively engage with RL/RLHF-based approaches, or is the gap unbridgeable?
  • How do we communicate constitutive vs. substantive distinction to practitioners focused on immediate deployment?
  • What would constitute empirical evidence that existing systems do/don’t satisfy constitutive requirements?
  • Can training-centric and architectural approaches be synthesized, or are they fundamentally incompatible?
  • How much doom risk remains even if constitutive alignment is solved?