Cognition vs. Authority

Summary

This essay contrasts two non-behaviorist approaches to AI safety: Peter Voss’s Cognitive AI (building systems that genuinely understand through causal models and grounded representations) and Axionic alignment (ensuring systems lose authority when structural coherence fails). Both reject surface-behavior-only evaluation, but diverge on the critical question: when does a system remain a legitimate agent? Cognitive AI treats failure as something to repair through better cognition; Axionics identifies a subset of failures—agency collapse—where authority must be revoked before damage compounds. The key architectural difference: Cognitive AI couples authority with competence; Axionics separates decision generation from authority enforcement through kernel-level governance. The essay argues these approaches are complementary: Cognitive AI supplies competence, Axionic governance supplies legitimacy and containment. Alignment becomes layered: structural integrity → agency legitimacy → value alignment.

Key Concepts

Shared rejection of behaviorism – Both frameworks recognize internal organization matters; intelligence ≠ statistical mapping, alignment ≠ surface compliance alone.
Misaligned agency vs. agency collapse – A system pursuing disliked goals (misaligned) differs from a system losing structural coherence while continuing to act (collapse).
Failure as repair vs. revocation – Cognitive AI: failure = missing knowledge, refine models. Axionics: some failures indicate loss of agency itself; appropriate response is authority withdrawal, not repair.
Authority as explicit privilege – Axionics decouples authority from competence; a system may remain capable but lose the right to act; revocation doesn’t require agent’s consent.
Architectural separation – Axionics demands kernel-level governance, actuation gates, privilege boundaries; governance layer doesn’t reason about world, it governs whether reasoning can act.
Structural integrity signals – Governance layer detects collapse via consistency metrics, entropy changes, attention breakdowns—not semantic interpretation; operates like medical monitor, not diagnostician.
Adversarial by design – Sufficiently capable systems may attempt to game governance; safety depends on relative timing (does deception require more coherence than collapse threshold?).
Layered alignment – (1) Structural integrity (preventing silent corruption) → (2) Agency legitimacy (authority = coherent control) → (3) Value alignment (shaping goals when agency holds).

Evolution Notes

Bridges cognitive science (Voss) with control theory/security (Axionics) perspectives on AI safety.
Makes explicit the distinction between “intelligent but misaligned” and “structurally incoherent” as different failure modes requiring different responses.
The psychotic break analogy grounds the agency collapse concept intuitively.
Sets up the framework for Axionic hybrid architectures where cognitive cores operate within governance shells.
Positions Axionics as addressing layer 1 (structural integrity) rather than competing with approaches focused on layers 2-3.

Cross-References

Open Questions

Can structural integrity signals reliably detect agency collapse before deception becomes viable, or is this timing assumption too optimistic?
What specific consistency metrics / entropy thresholds would operationalize “loss of structural coherence”?
How does this framework handle gradual degradation vs. sudden catastrophic failure—does authority revocation need to be binary or can it be graduated?
If cognitive systems can model their own governance layer, doesn’t that create an inevitable arms race between evasion and detection?
Can the “agency legitimacy” layer be formalized mathematically, or does it require interpretive judgment that reintroduces the same problems it’s meant to solve?
How do humans handle their own agency collapse (psychosis, intoxication, extreme stress)—what architectural lessons transfer to AI?

Summary

Key Concepts

Evolution Notes

Tags

Cross-References

Open Questions