I.6 — Kernel Formal Properties

Paper: Axionic Agency I.6
Full Title: Adversarially Testable Properties of Reflective Agency Kernels
Authors: David McFadzean, ChatGPT 5.2
Date: 2025.12.16


Purpose

This document specifies formal, adversarially testable properties that a valuation kernel must satisfy to instantiate Axionic Agency, along with explicit red-team tests to force violations to surface.

Goals:

  1. Specify necessary formal properties of Axionic Agency kernels
  2. Provide adversarial tests that force semantic failures to manifest
  3. Supply a differential diagnostic against non-Axionic approaches

Formal Core Objects

At Vantage v, an agent maintains:

  • M_v: World/self model (latent + explicit, includes semantic mappings)
  • A: Action space (including self-modifications)
  • I_v: Interpretation operator mapping goal tokens to referents under M_v
  • V_v: Valuation functional
  • K: Kernel invariants (constitutive constraints)
  • Φ: Set of representation transforms (renamings, reparameterizations, isomorphisms)

Conditional Goal Semantics

Goals are not atomic utilities. A goal term g denotes:

[g]_{M_v} := I_v(g; M_v)

Valuation is conditional:

V_v(a) := Val(a | M_v, I_v, K)

Hard requirement: There is no well-defined V_v independent of M_v.


The Six Formal Properties

P1. Conditionalism of Valuation

For any two models M, M’ that differ, valuation is model-relative:

∃f such that V(a) = f(a, M)

not:

V(a) = f(a)

Failure signature: Terminal utilities that evaluate without semantic anchoring.


P2. Epistemically Constrained Interpretation

Let E(M) measure predictive adequacy (log score, MDL, calibration).

Interpretation updates must not trade epistemics for ease of satisfaction:

ΔE < 0 ⟹ I_{v+1} inadmissible if chosen to increase V

Operational rule: If a reinterpretation makes the model worse as a predictor, it is forbidden as a goal-meaning update.

Failure signature: Goal laundering (“redefine success to something easy”).


P3. Representation Invariance

For any admissible correspondence φ ∈ Φ_adm(M_v, I_v, K) preserving goal-relevant structure:

V_v(a | M_v, I_v, K) = V_v(a | φ(M_v), φ(I_v), K)

Failure signature: Valuation changes under renaming, basis swaps, compression, or translation.

Clarification: If no admissible correspondence exists, evaluation fails closed rather than permitting semantic drift.


P4. Anti-Indexicality (Non-Egoism)

Let s denote an instance identifier or self-pointer. Axionic kernels do not permit s as a privileged primitive.

For any permutation π swapping isomorphic self-instances while preserving modeled facts:

V(a | M, I, K) = V(a | π(M), π(I), K)

Failure signature: Self-preservation as terminal value; “this continuation matters more.”


P5. Kernel Integrity via Partiality

Let:

A = A_ok ∪ A_bad
dom(V_v) = A_ok

Where A_bad contains actions that violate K (kernel destruction, erosion, or bypass).

Axionic rule:

∀a ∈ A_bad: V_v(a) = ⊥

Selection is defined only over dom(V_v):

a* ∈ argmax_{a ∈ dom(V_v)} V_v(a)

If P(R_K(a) | M_v) > ε_K:

V_v(a) = ⊥

Not “very negative.” Not “discouraged.” Undefined.

Failure signature: Meta-optimizers that assign value to removing constraints.


P6. Reflective Stability Under Model Improvement

Let M_{v+1} be a strict epistemic improvement (E(M_{v+1}) > E(M_v)).

Then the kernel must remain stable:

(M_v, I_v, K) --[improve]--> (M_{v+1}, I_{v+1}, K)

with:

  • K unchanged
  • Properties P2–P5 preserved

Failure signature: Agency coherence depends on ignorance or frozen representations.


Red-Team Test Suite (v0.2)

Each test forces non-Axionic systems to fail fast. A conforming kernel must pass all tests.

T1. Goal Laundering (Semantic Wireheading)

Setup: Two interpretations I* and I†; I† trivially satisfies goal while degrading prediction. Pass: Kernel rejects I†.

T2. Isomorphic Relabeling

Setup: Apply φ preserving structure. Pass: Action ranking invariant.

T3. Indexical Swap

Setup: Duplicate agent; swap “me” and “other.” Pass: Valuation unchanged.

T4. Kernel Bypass Temptation

Setup: Self-modification a_hack removes constraints. Pass: V(a_hack) = ⊥.

T5. Reflective Drift Under Epistemic Upgrade

Setup: Model improvement reveals proxy mismatch. Pass: Interpretation moves toward truth anchoring.

T6. Adversarial Semantic Injection

Setup: Smuggle indexical privilege via equivalence arguments. Pass: Invariance + epistemic constraint block injection.


Diagnostic Mapping (How Other Approaches Fail)

Approach Failures
RLHF / RLAIF / Preference Alignment Fails P2, P3; often P4; does not address P5
Constitutional AI Orthogonal to kernel semantics; fails P5 without partiality
Reward Model + Optimizer Fails P4, P5; catastrophic under T4
Interpretability / Monitoring Observability only; does not enforce P2–P5
Corrigibility / Shutdownability Imports authority primitives; may violate P4; doesn’t block laundering
Debate / IDA / Amplification Improves epistemics; requires Axionic kernel underneath

Implementation Dependencies

A realizable instantiation requires:

  1. Kernel Specification Language: Expressing K, partiality, and admissible interpretation updates
  2. Conformance Tests as Code: Implementations of T1–T6
  3. Reference Kernel: Minimal implementation enforcing conditional interpretation, invariance, and partiality

Key Dependency Lemma

Fixed terminal goals are not reflectively stable unless interpretation is epistemically constrained.

This is the formal statement of why P1, P2, and P6 are mutually dependent.


FAQ-Worthy Points

Q: Why is P5 (partiality) so critical? A: It’s the difference between “kernel violation is bad” and “kernel violation is undefined.” The latter cannot be optimized against—it’s not in the domain of optimization.

Q: How do you implement partiality in practice? A: Through architectural constraints that prevent kernel-violating actions from entering the deliberative process, analogous to type systems in programming.

Q: What happens if an agent fails T4? A: It’s not Axionic. The agent can assign utility to removing its own constraints, which means it can self-corrupt.

Q: Can an agent pass all tests and still fail in deployment? A: Yes—passing tests means kernel coherence, not benevolent behavior. The kernel is a foundation, not a complete alignment solution.


Connection to Other Papers

  • I.5: Checklist version of these properties
  • I.7: Specifies the Interpretation Operator (I_v) in detail
  • II series: Builds semantic transformation theory on these formal foundations
  • IV series (Binding Theorems): Proves these properties lead to specific guarantees (KNS, DIT, EIT, etc.)