Why Axionic Alignment Requires Hybrid Architectures

Summary

This post argues that constitutive alignment constraints imposed by Axionic theory cannot be satisfied by end-to-end learning systems, regardless of scale or capability. The argument is structural, not empirical: total evaluators (which assign scores to all possibilities) cannot express inadmissibility—futures that are undefined rather than merely dispreferred. Three critical incompatibilities: (1) Undefinedness vs. low utility—kernel-destroying moves must be unevaluable, not just penalized; probabilities can rise under distribution shift, but type violations remain inadmissible; (2) Standing is not a feature—admissibility depends on authorship symmetry, not trajectory properties; end-to-end systems cannot natively represent standing relations; (3) Delegation requires counterfactual endorsement—delegator must be able to endorse delegate’s actions as if they were their own; behavioral equivalence is insufficient. The minimal hybrid split requires: sovereign kernel with restricted domain, learning component constrained by admissibility, and separation between prediction and authorization.

Key Concepts

Total evaluators – Systems that assign scores/probabilities to all possibilities; cannot express inadmissibility
Partial evaluators – Required for Axionic Alignment; some proposals have no defined evaluation
Inadmissibility vs. dispreference – Undefined actions (type violations) distinct from low-utility actions (can be traded off)
Standing relations – Authorship symmetry determining what counts as authored act; not a trajectory property
Conservative extension – How ontological learning must occur without kernel collapse
Minimal hybrid split – Sovereign kernel + constrained learning + prediction/authorization separation

Evolution Notes

Directly addresses the “but what about scaling?” objection to Axionic theory
Establishes architectural impossibility result, not merely practical difficulty
Distinguishes this from anti-deep-learning sentiment: argument is about mathematical form, not implementation substrate
Sets up later work on Reflective Sovereign Agency proof-of-concept architectures
Explains why behavioral compliance testing cannot verify Axionhood

Cross-References

Open Questions

What is the minimum computational overhead for implementing partial evaluation in real-time systems?
Can type-based inadmissibility be enforced in continuous optimization systems, or does it require symbolic representation?
How do we empirically verify that a kernel is non-simulable vs. merely behaviorally compliant?
Could novel architectures (e.g., neurosymbolic, compositional) satisfy partial evaluation without traditional hybrid splits?
What happens when the learning component discovers optimization bypasses around the kernel constraints?

Summary

Key Concepts

Evolution Notes

Tags

Cross-References

Open Questions