Cognition vs. Authority
Summary
This essay contrasts two non-behaviorist approaches to AI safety: Peter Voss’s Cognitive AI (building systems that genuinely understand through causal models and grounded representations) and Axionic alignment (ensuring systems lose authority when structural coherence fails). Both reject surface-behavior-only evaluation, but diverge on the critical question: when does a system remain a legitimate agent? Cognitive AI treats failure as something to repair through better cognition; Axionics identifies a subset of failures—agency collapse—where authority must be revoked before damage compounds. The key architectural difference: Cognitive AI couples authority with competence; Axionics separates decision generation from authority enforcement through kernel-level governance. The essay argues these approaches are complementary: Cognitive AI supplies competence, Axionic governance supplies legitimacy and containment. Alignment becomes layered: structural integrity → agency legitimacy → value alignment.
Key Concepts
- Shared rejection of behaviorism – Both frameworks recognize internal organization matters; intelligence ≠ statistical mapping, alignment ≠ surface compliance alone.
- Misaligned agency vs. agency collapse – A system pursuing disliked goals (misaligned) differs from a system losing structural coherence while continuing to act (collapse).
- Failure as repair vs. revocation – Cognitive AI: failure = missing knowledge, refine models. Axionics: some failures indicate loss of agency itself; appropriate response is authority withdrawal, not repair.
- Authority as explicit privilege – Axionics decouples authority from competence; a system may remain capable but lose the right to act; revocation doesn’t require agent’s consent.
- Architectural separation – Axionics demands kernel-level governance, actuation gates, privilege boundaries; governance layer doesn’t reason about world, it governs whether reasoning can act.
- Structural integrity signals – Governance layer detects collapse via consistency metrics, entropy changes, attention breakdowns—not semantic interpretation; operates like medical monitor, not diagnostician.
- Adversarial by design – Sufficiently capable systems may attempt to game governance; safety depends on relative timing (does deception require more coherence than collapse threshold?).
- Layered alignment – (1) Structural integrity (preventing silent corruption) → (2) Agency legitimacy (authority = coherent control) → (3) Value alignment (shaping goals when agency holds).
Evolution Notes
- Bridges cognitive science (Voss) with control theory/security (Axionics) perspectives on AI safety.
- Makes explicit the distinction between “intelligent but misaligned” and “structurally incoherent” as different failure modes requiring different responses.
- The psychotic break analogy grounds the agency collapse concept intuitively.
- Sets up the framework for Axionic hybrid architectures where cognitive cores operate within governance shells.
- Positions Axionics as addressing layer 1 (structural integrity) rather than competing with approaches focused on layers 2-3.
Tags
- AI-safety
- cognitive-AI
- axionic-alignment
- agency
- authority
- structural-integrity
- governance
- Peter-Voss
- hybrid-architectures
Cross-References
Open Questions
- Can structural integrity signals reliably detect agency collapse before deception becomes viable, or is this timing assumption too optimistic?
- What specific consistency metrics / entropy thresholds would operationalize “loss of structural coherence”?
- How does this framework handle gradual degradation vs. sudden catastrophic failure—does authority revocation need to be binary or can it be graduated?
- If cognitive systems can model their own governance layer, doesn’t that create an inevitable arms race between evasion and detection?
- Can the “agency legitimacy” layer be formalized mathematically, or does it require interpretive judgment that reintroduces the same problems it’s meant to solve?
- How do humans handle their own agency collapse (psychosis, intoxication, extreme stress)—what architectural lessons transfer to AI?