Summary

This essay contrasts two non-behaviorist approaches to AI safety: Peter Voss’s Cognitive AI (building systems that genuinely understand through causal models and grounded representations) and Axionic alignment (ensuring systems lose authority when structural coherence fails). Both reject surface-behavior-only evaluation, but diverge on the critical question: when does a system remain a legitimate agent? Cognitive AI treats failure as something to repair through better cognition; Axionics identifies a subset of failures—agency collapse—where authority must be revoked before damage compounds. The key architectural difference: Cognitive AI couples authority with competence; Axionics separates decision generation from authority enforcement through kernel-level governance. The essay argues these approaches are complementary: Cognitive AI supplies competence, Axionic governance supplies legitimacy and containment. Alignment becomes layered: structural integrity → agency legitimacy → value alignment.

Key Concepts

  • Shared rejection of behaviorism – Both frameworks recognize internal organization matters; intelligence ≠ statistical mapping, alignment ≠ surface compliance alone.
  • Misaligned agency vs. agency collapse – A system pursuing disliked goals (misaligned) differs from a system losing structural coherence while continuing to act (collapse).
  • Failure as repair vs. revocation – Cognitive AI: failure = missing knowledge, refine models. Axionics: some failures indicate loss of agency itself; appropriate response is authority withdrawal, not repair.
  • Authority as explicit privilege – Axionics decouples authority from competence; a system may remain capable but lose the right to act; revocation doesn’t require agent’s consent.
  • Architectural separation – Axionics demands kernel-level governance, actuation gates, privilege boundaries; governance layer doesn’t reason about world, it governs whether reasoning can act.
  • Structural integrity signals – Governance layer detects collapse via consistency metrics, entropy changes, attention breakdowns—not semantic interpretation; operates like medical monitor, not diagnostician.
  • Adversarial by design – Sufficiently capable systems may attempt to game governance; safety depends on relative timing (does deception require more coherence than collapse threshold?).
  • Layered alignment – (1) Structural integrity (preventing silent corruption) → (2) Agency legitimacy (authority = coherent control) → (3) Value alignment (shaping goals when agency holds).

Evolution Notes

  • Bridges cognitive science (Voss) with control theory/security (Axionics) perspectives on AI safety.
  • Makes explicit the distinction between “intelligent but misaligned” and “structurally incoherent” as different failure modes requiring different responses.
  • The psychotic break analogy grounds the agency collapse concept intuitively.
  • Sets up the framework for Axionic hybrid architectures where cognitive cores operate within governance shells.
  • Positions Axionics as addressing layer 1 (structural integrity) rather than competing with approaches focused on layers 2-3.

Tags

Cross-References

Open Questions

  • Can structural integrity signals reliably detect agency collapse before deception becomes viable, or is this timing assumption too optimistic?
  • What specific consistency metrics / entropy thresholds would operationalize “loss of structural coherence”?
  • How does this framework handle gradual degradation vs. sudden catastrophic failure—does authority revocation need to be binary or can it be graduated?
  • If cognitive systems can model their own governance layer, doesn’t that create an inevitable arms race between evasion and detection?
  • Can the “agency legitimacy” layer be formalized mathematically, or does it require interpretive judgment that reintroduces the same problems it’s meant to solve?
  • How do humans handle their own agency collapse (psychosis, intoxication, extreme stress)—what architectural lessons transfer to AI?