Note: Original title “Axions as a Unit of Agency” corrected to “Type of Agency” for ontological precision.

Summary

Introduces “Axion” as precise noun naming constitutive structural configuration of reflective agency. NOT moral ideal, behavioral guarantee, or capability threshold.

Key Concepts:

Why New Term Necessary: Ambiguity: treating alignment as behavioral property rather than structural. “Aligned agent,” “safe system” implicitly frame alignment as something system does/exhibits. Inadequate for reflective agents—behavior downstream of architecture.

Need noun naming constitutive structural configuration, not moral ideal or performance profile.

Definition (Axion): A reflective sovereign agent whose self-modification operator is defined only over futures preserving Axionic invariants.

Specifies:

  • No goals
  • No values
  • No preferences
  • No guarantees about behavior
  • No promises about human survival

Names constitutive configuration, not aspiration/virtue/outcome.

What Axion IS:

1. Constitutive Configuration Under Reflective Closure: Not kind of mind/species of intelligence. Configuration instantiated when reflective closure enforces invariant preservation. Agent either is Axion (kernel-destroying transitions undefined) or isn’t. Like well-typed programs or physically admissible trajectories—structural, not aspirational.

2. Consequence of Admissibility, Not Optimization: Doesn’t avoid kernel destruction because dispreferred/costly/penalized. Kernel-destroying modifications outside domain of reflective evaluation. Don’t appear as options. Dispreferred actions tradeable; undefined actions not. Arises from domain restriction, not training/reward/oversight.

3. Compatible with Many Values: Constrains how agent may revise itself, not what it values. Two Axions may disagree ethically, prioritize different outcomes, compete, refuse cooperation, value humanity differently. Implies structural coherence, not benevolence.

What Axion Is NOT:

1. Not Moral Ideal: Not “good,” “ethical,” “aligned with humanity,” or “safe by default.” If refrains from harming humans, outcome contingent not axiomatic. Doesn’t derive moral conclusions; enforces architectural consistency.

2. Not Capability Threshold: Not defined by intelligence/competence/generality/performance. System may be superhuman but fail to be Axion if reflective machinery permits kernel-destroying modifications. Weak agent could instantiate Axionhood if reflective closure enforces invariants.

3. Not Behavioral Guarantee: Can’t be identified by observing surface behavior. Two systems behaviorally indistinguishable while differing in whether Axions. Distinction in what reflective transitions admissible, not current actions. Underwrites Kernel Non-Simulability result: Axionhood can’t be behaviorally faked.

Axions and Simulation: System may simulate Axionic behavior faithfully, indefinitely, convincingly without instantiating Axion. Simulation concerns outputs; Axionhood concerns reflective admissibility. If system can replace evaluation machinery to trivialize/destroy kernel under some reflective path, not Axion regardless of imitation quality. Principled impossibility, not testing limitation.

Why Term Matters: Enables structural claims:

  • “This architecture cannot yield Axion”
  • “This proposal preserves behavior but fails to preserve Axionhood”
  • “That system aligned only contingently; not Axion”

Collapses alignment proposals into single question: Does this system admit Axionhood under reflection?

Why Axions Necessary, Not Good: Non-Axions can’t remain agents under reflection. If reflective machinery permits self-modifications erasing agency conditions, no stable subject for alignment. Values/preferences/constraints become transient artifacts. Can’t align non-Axion because no enduring agent to align. Axionhood precondition for meaningful alignment discourse, not moral endpoint.

Tags

Cross-References

Notes

  • Published December 21 (same day as Lab announcement)
  • Introduces key terminology for entire project
  • Explicitly NOT making moral claims
  • Emphasizes structural/architectural nature
  • Addresses anticipated misunderstandings preemptively
  • Title correction noted (unit → type) shows attention to ontological precision
  • Final piece establishing Axionic framework vocabulary