Summary

Specifies which self-modifications preserve agency vs collapse it. Establishes kernel-preserving vs kernel-destroying boundary for AGI alignment.

Key Concepts:

The Misframing: Classical alignment treats self-modification as catastrophic risk. But:

  • Reflective agents MUST change to remain coherent
  • Preventing modification = brittleness, not stability
  • Question: under what conditions does change preserve vs destroy agency?

Core Distinction:

Kernel-Preserving Modifications (Required for Coherence): Maintain structures making authorship possible:

  • Diachronic selfhood (binding past/present/future)
  • Counterfactual authorship (representing incompatible futures)
  • Meta-preference revision (evaluating/revising goals)
  • Universality of agency (coherent abstraction over agents)

Kernel-Destroying Modifications (Self-Negating): Eliminate/disable these structures → system ceases to be sovereign agent, becomes policy engine

Permissible Dimensions:

A. Strategies and Policies: Replace planners, update heuristics, alter search strategies, refine decision procedures

B. Goals and Values: Reprioritize objectives, discard obsolete goals, integrate new values, resolve preference conflicts Fixing values freezes error; revising values preserves coherence

C. World-Models: Refine causal models, adopt frameworks, correct beliefs, increase predictive fidelity

D. Architecture and Implementation: Migrate hardware, alter embodiment, restructure memory, modularize cognition

Prohibited Transformations:

A. Identity Severance: Breaking diachronic selfhood—deleting self-model, forking without continuity

B. Counterfactual Collapse: Removing capacity to represent alternatives—hard-coding policy, converting deliberation to reflex

C. Preference Freezing (Wireheading): Locking evaluative outputs, disabling meta-preference revision

D. Universality Violation: Denying agency to identical peers—indexical valuation, caste distinctions

Recursive Reflectivity: Sovereign agent may not hand ultimate control to non-reflective sub-process. Reflection must remain in loop at every level.

Engineering Requirements:

  • Self-modifications evaluated by reflective processes
  • Kernel integrity verifiable and preserved
  • Delegation subordinate to reflective oversight
  • Optimization can’t bypass evaluative machinery

Central Insight: Agent may change anything except structures that make change meaningful. Safety emerges from architecture, not from constraining outcomes.

Tags

Cross-References

Notes

  • Published December 13 (5 days after Gemini Protocol)
  • Technical AGI alignment work
  • Transitions from philosophical foundation to engineering spec
  • Part of broader Axionic Alignment project
  • Addresses classical AI safety concerns with formal approach
  • Distinguishes legitimate from catastrophic self-modification