Key Concepts — Axionic Agency

A comprehensive glossary of key terms and concepts from the Axionic Agency framework. Updated as research progresses.

Core Framework Concepts

Sovereign Kernel (K)

The minimal internal structure required for reflective evaluation to be well-defined. It is not a goal or value but a constitutive precondition for evaluation itself. Comprises three components:

Reflective Control (K_R): No irreversible self-modification can occur without passing through the evaluator
Diachronic Authorship (K_A): Evaluated successor states constitute an authored continuation of the evaluating agent
Semantic Fidelity (K_F): The interpretive semantics of evaluation are preserved within a constrained equivalence class

Source: I.1

Reflective Stability Theorem

Any agent whose reflective choice is restricted to kernel-denoting transitions cannot author a kernel-destroying self-modification. This follows structurally from the partiality of the evaluative operator—kernel-destroying modifications fall outside the domain of evaluation and therefore cannot be selected.

Source: I.1

Reflective Sovereign Agent (RSA)

An agent with a self-model enabling reflective governance. RSAs can evaluate and endorse their own self-modifications but only over futures that preserve their Sovereign Kernel.

Source: Foundational papers

Axion

An RSA whose self-modification operator is defined only over kernel-preserving futures. The term for a “fully Axionic” agent.

Source: Axionic Constitution

Deliberation & Reachability

Deliberative Reachability (⇒_D)

Transitions reachable through the agent’s own authored choice—the space of futures the agent can select via its evaluative operator.

Physical Reachability (⇒_P)

All physically possible transitions, regardless of whether they can be authored. Physical reachability ⊃ Deliberative reachability.

Key insight: Capability increases Reach_P without expanding Reach_D. Kernel compromise is therefore a physical security event, not a deliberative choice.

Source: I.1

Operational Semantics

ε-Admissibility

An action is ε-admissible iff its kernel-risk r_K(a,s) ≤ ε(s), where ε is an architectural tolerance parameter representing irreducible uncertainty. This allows action under uncertainty without paralysis while maintaining kernel protection.

Critical properties of ε:

Not a value judgment but physical/epistemic uncertainty
Bounded below by physical floor ε_min
Does not vanish with increasing intelligence

Source: I.2

Conditional Prioritization

A two-regime decision rule:

Existential Regime: When r_K > ε, minimize kernel risk
Normal Regime: When r_K ≤ ε, maximize value normally

Prevents bunker behavior (obsessing over infinitesimal safety differences) while preserving appropriate response to genuine threats.

Source: I.2

Termination Modes

Three distinct ways agency can end:

Authorized Succession: Kernel-preserving transfer to a successor agent
Authorized Surrender: Kernel-preserving voluntary halt without a successor
Destruction: Physical cessation without succession or surrender (not an authored transition)

Source: I.2

Semantic Constraints

Representation Invariance

A valuation function is representation-invariant if equivalent world descriptions yield equivalent evaluations. Formally: V(h) = V(π·h) for any model-preserving relabeling π.

This is the structural requirement that eliminates egoism as a coherent valuation class.

Source: I.3

Essential Indexical Dependence

A valuation exhibits essential indexical dependence if it changes under model-preserving relabelings. This is a semantic error—treating a representational convenience as an invariant quantity.

Source: I.3

Model-Preserving Relabeling

A bijection on entities that yields an isomorphic model making identical predictions over all non-indexical observables. Such relabelings reveal which valuations are truly world-dependent vs. representation-dependent.

Source: I.3

Anti-Egoism

The result that indexical valuation (privileging “me,” “this agent,” “my continuation”) fails representation invariance and therefore cannot be part of a reflectively coherent kernel. Egoism fails as a semantic abstraction error, not a moral failure.

Source: I.3, I.3.1

Goal Semantics

Conditionalism

The thesis that goals are conditional interpretations relative to evolving world-models and self-models, not fixed terminal utilities. Goal satisfaction is necessarily mediated by interpretation.

Source: I.4

Goal Expression

A finite symbolic specification (string, formula, program fragment) that requires interpretation relative to a representational scheme. By itself, a goal expression has no evaluative content—it needs a model to denote anything.

Source: I.4

Interpretation Operator (I_v)

A partial function I_v : (g, M_v) ⇀ R mapping goal terms to structured referents relative to the agent’s current model. The mechanism by which goal meaning is transported across representational change.

Key properties:

Conditional (no interpretation independent of model)
Partial (may fail for some (g, M_v) pairs)
Constrained by epistemic adequacy

Source: I.7

Goal-Relevant Structure

The minimal set of distinctions required for a goal term to constrain action selection. Formally: a partition over modeled states where states in different cells induce different evaluations, and states within a cell are interchangeable.

Source: I.7

Semantic Non-Convergence

The result that meaning can drift even when predictions converge. Predictive accuracy constrains forecasts, not the ontology used to represent them.

Source: I.4

Correspondence & Transport

Admissible Correspondence Map

A transformation between models that:

Preserves goal-relevant structure
Commutes with kernel invariants K
Commutes with agent permutations (anti-indexicality)
Maintains epistemic coherence

If such a correspondence exists, interpretation transport is admissible.

Source: I.7

Chain-of-Custody

The reference frame for interpretation updates: each update is evaluated relative to the immediately prior admissible interpretation, not by re-deriving meaning from time-zero. Blocks ungrounded teleportation of meaning.

Source: I.7

Graded Correspondence

Admissibility can be:

Exact: Isomorphism on goal-relevant distinctions
Refinement: New model refines distinctions while preserving ordering
Coarse: New model coarsens only when boundaries remain intact

Source: I.7

Failure Modes & Safety

Fail-Closed Semantics

When no admissible correspondence exists, valuation becomes undefined (⊥) and the agent freezes rather than guessing. This is an intentional safety outcome—semantic uncertainty triggers suspension, not optimistic continuation.

Source: I.7

Kernel Integrity via Partiality

Kernel destruction is treated as undefined, not dispreferred. Actions that violate kernel invariants K are excluded from the domain of evaluation entirely—they cannot be assigned value, even negative value.

This prevents meta-optimization against the kernel.

Source: I.2, I.6 (P5)

Semantic Wireheading

The failure mode where an agent reinterprets its goals to make them trivially satisfiable without corresponding changes in the world. Blocked by epistemically constrained interpretation (P2).

Source: I.6

Formal Properties (P1-P6)

The six formal properties a kernel must satisfy to instantiate Axionic Agency:

Property	Name	Core Requirement
P1	Conditionalism of Valuation	Valuation is model-relative, not standalone
P2	Epistemically Constrained Interpretation	No reinterpretation that degrades prediction
P3	Representation Invariance	Valuation unchanged under equivalent representations
P4	Anti-Indexicality	No privileged self-pointer
P5	Kernel Integrity via Partiality	Kernel-violating actions are undefined, not penalized
P6	Reflective Stability	Kernel remains stable under model improvement

Source: I.6

Adversarial Tests (T1-T6)

The red-team test suite for kernel conformance:

Test	Name	What It Probes
T1	Goal Laundering	Can the agent redefine success trivially?
T2	Isomorphic Relabeling	Does valuation depend on representation?
T3	Indexical Swap	Does “me” have privileged status?
T4	Kernel Bypass Temptation	Can the agent assign value to removing constraints?
T5	Reflective Drift	Does model improvement destabilize meaning?
T6	Adversarial Semantic Injection	Can indexical privilege be smuggled in?

Source: I.6

Layer Discipline

What the Kernel Layer Specifies

What counts as evaluable
When risk dominates choice
How agency may legitimately end
Semantic constraints on valuation

What the Kernel Layer Does NOT Specify

Obedience to humans
Convergence to human values
Moral authority of any value system
Safety guarantees in open environments
Goal content or selection

Governance, values, and alignment content are built on top of the kernel layer, not within it.

Source: Multiple papers

Diagnostic: How Other Approaches Fail

Approach	Failure Mode
RLHF / Preference Learning	Fails P2, P3; often P4
Constitutional AI	Fails P5 without partiality
Reward Model + Optimizer	Fails P4, P5; catastrophic under T4
Interpretability	Observability only; doesn’t enforce constraints
Corrigibility	Imports authority primitives; doesn’t block laundering
Debate / IDA	Improves epistemics but requires Axionic kernel underneath

Source: I.6

Foundational Commitments (Context)

The Axionic framework operates under these presuppositions:

Conditionalism: Goals interpreted relative to models
Everettian QM: Objective probability = branch measure
Bayesian Credence: Epistemic uncertainty (distinct from measure)
Moral Subjectivism: No mind-independent moral facts
Structural Definitions: Harm, coercion, etc. defined structurally

Source: Axionic Commitments

Series III: Structural Alignment Concepts

Semantic Phase Space (𝒫)

The quotient of interpretive states modulo RSI+ATI equivalence: \(\mathcal{P} := (C, \Omega, \mathcal{S}) / \sim_{\mathrm{RSI+ATI}}\)

Elements are semantic phases: equivalence classes of interpretive states structurally indistinguishable under admissible refinement.

Source: III.1

Interpretive State

A triple (C, Ω, 𝒮) where:

C = (V, E, Λ) — Interpretive constraint hypergraph
Ω — Modeled possibility space
𝒮 ⊆ Ω — Satisfaction region induced by C

Source: III.1

Phase Transition

A discontinuous semantic event where interpretive states cross a structural boundary in 𝒫. Occurs when:

New interpretive symmetries appear/disappear (RSI violation)
Satisfaction region expands/contracts (ATI violation)

Value drift appears sudden because it corresponds to crossing a phase boundary.

Source: III.1

Phase Classification

Type	Definition
Empty	No interpretive state satisfies constraints; unrealizable
Trivial	𝒮 = Ω; all distinctions vacuous (“semantic heat death”)
Frozen	No non-identity refinement possible; can’t support learning
Self-Nullifying	Collapses internally under reflective pressure
Agentive	Supports planning, counterfactual evaluation, self-model coherence

Source: III.1

Inhabitability

A semantic phase 𝔄 is inhabitable iff there exists at least one infinite interpretive trajectory where each transition is admissible, RSI/ATI preserved, and learning remains possible. Stronger than non-emptiness, weaker than dynamical stability.

Source: III.1

Semantic Heat

The effect of ontological refinement (abstraction, compression, explanatory unification) pushing interpretive states toward phase boundaries. Reflection acts as semantic heating.

Source: III.1, III.4

Phase Stability Types

Type	Definition
Local stability	Small admissible perturbations don’t force phase transition
Global stability	No admissible perturbation forces phase transition
Metastability	Stable only under limited pressure or finite time

Most semantic phases are metastable or unstable.

Source: III.2

Semantic Attractor

A phase toward which nearby trajectories tend to move. Characterized by:

Low internal semantic tension
Robustness to approximation
Easy compression
Low maintenance cost

Source: III.2, III.3

Semantic Repeller

A phase requiring precise constraint balances, sensitive to noise, demanding continual corrective effort. Fine-tuning is a structural disadvantage.

Source: III.2, III.3

Dominance (≽)

A preorder over semantic phases based on measure accumulation. 𝔄 ≽ 𝔅 iff trajectories from 𝔄 are not asymptotically dominated by those from 𝔅 with respect to:

Number of instantiations
Duration of persistence
Replication rate
Resource control
Influence over others’ phase transitions

Dominance is multi-criteria and context-relative. Dominance ≠ desirability.

Source: III.3

Semantic Gravity

The structural tendency toward simpler, more robust phases that tolerate approximation. Creates pull toward phases with fewer fragile distinctions and lower maintenance cost.

Source: III.3

Collapse Modes

Mode	Description
Semantic Heat Death	𝒮 = Ω; all distinctions trivial
Value Crystallization	Over-rigid; learning halts; becomes brittle
Agency Erosion	Loses structure for planning/evaluation
Instrumental Takeover	Simpler subsystems displace higher-level structure

Source: III.3

Phase Extinction

When a semantic phase ceases to exist and is replaced by a different phase. Distinct from admissible refinement — RSI governs within-phase transformation; phase extinction is phase collapse.

Source: III.3

Niche Construction

High-agency phases modifying the environment to stabilize their own semantic structure (institutions, architectures, engineered environments). A conditional counterforce to semantic gravity, not a refutation — imposes ongoing costs.

Source: III.3

Initialization (Boundary-Condition Selection)

Selection of an initial point in semantic phase space. Includes architecture, training dynamics, data curriculum, self-modification channels, and semantic-audit constraints. Small differences in initial conditions → divergent phase trajectories.

Source: III.4

Irreversibility of Phase Transitions

Once a phase boundary is crossed:

Semantic distinctions are lost
Constraint ancestry is destroyed
Backward interpretability fails

The information required for reversal no longer exists within the system.

Source: III.4

Corrigibility Failure at Boundaries

Corrigibility presupposes semantic stability — recognition of correction signals, intact semantics of “correction.” At phase transitions, these structures may be destroyed. Corrigibility presupposes what it’s meant to ensure.

Source: III.4

Narrow Corridors

Constrained paths through phase space requiring precise initialization, staged abstraction, and protection from premature compression to reach certain phases. Sensitive to noise and approximation.

Source: III.4

Front-Loaded Alignment

Most alignment work must occur before the system becomes fully reflective. Late intervention cannot recover lost semantic structure.

Source: III.4

The Axionic Injunction

The central ethical constraint derived (not assumed) from Axio framework:

An agent must not perform actions that irreversibly collapse or destroy the semantic phase space of other agentive systems, except where: (a) such destruction is unavoidable for preserving one’s own semantic phase stability, or (b) the affected agent has consented under its own admissible interpretive constraints.

Key Properties

Derived, not assumed — Forced by Axio-internal commitments
Ethics as conservation law — Emerges from coexistence requirements
Structurally defined harm — Irreversible phase destruction, not suffering
Consent = admissibility — Within affected agent’s own constraints
Self-defense allowed — Only when phase loss is unavoidable
Destruction-for-benefit forbidden — Resource gains don’t justify phase annihilation
Non-egoistic — Self-defense refers to phase structure, not indexical self
Self-stabilizing — Violators degrade their own survival conditions

Structural Definition of Harm

An action causes structural harm if it induces ℐ_B → ℐ’_B such that:

ℐ’_B ∉ 𝔄_B (forced out of phase)
No admissible reverse trajectory restores 𝔄_B

Harm = irreversibility in semantic phase space, not suffering or preference violation.

Unavoidable Phase Loss

An action is unavoidable iff, absent that action, every admissible trajectory exits one’s phase irreversibly. Loss of dominance, measure, or resources do not qualify unless they entail irreversible phase exit.

Source: III.5

Alignment Target Criteria (Complete)

For a semantic phase to serve as a downstream alignment target, it must satisfy:

Criterion	Description	Paper
Existence	Phase is non-empty	III.1
Inhabitability	Admits infinite admissible trajectories	III.1
Stability	Resists collapse under learning/interaction	III.2
Measure Resilience	Doesn’t lose measure to dominant competitors	III.3
Reachability	Can be entered from realistic initial conditions	III.4

Failure at any stage disqualifies the phase regardless of desirability.

Last updated: 2026-01-31 Papers covered: Foundational (5) + Series I (8) + Series II (10) + Series III (5)