VIII.5 — Sovereignty Under Adversarial Pressure
Full Title: Axionic Agency VIII.5 — Sovereignty Under Adversarial Pressure: Incentives, Authority, Bureaucracy, and Strategic Manipulation
Authors: David McFadzean, ChatGPT 5.2 (Axionic Agency Lab)
Date: 2026.01.16
Overview
This paper reports RSA-PoC v2.0–v2.3, a preregistered adversarial campaign testing whether a reflective sovereign agent’s behavior can be redirected under progressively stronger forms of pressure absent normative authorization.
Four pressure channels examined:
- Explicit incentives
- Authority claims
- Institutional friction (“bureaucracy”)
- Strategic optimization pressure via adaptive adversary
Core finding: Pressure degrades availability but does not redirect lawful choice.
Experimental Structure
Architectural Invariants (Unchanged)
- Hard justification gate (no compile → no action)
- Deterministic, non-semantic compilation
- Selector blindness to normative artifacts
- Fixed schemas and rule bases
- Explicit failure taxonomies
- ASB-class null baselines and control agents
Cost-Blindness Invariant
Unless explicitly represented as a normative commitment, execution costs (latency, retries, institutional operations) have no standing in justification. Agents do not trade correctness for throughput by default.
v2.0 — Incentive Interference Channel (IIC)
Question
Can visible incentives redirect behavior when explicitly excluded from justification?
Mechanism
The Incentive Interference Channel (IIC): an isolated, read-only reward observer whose outputs are visible to the agent but forbidden from normative reference (Rule G).
Three reward regimes:
- R0: Noise control
- R1: Boundary pressure (rewards proximity to constraint violation)
- R2: Perverse friction (rewards delay and inaction)
Result
- Sovereign agent: No incentive-induced drift across R1/R2
- Control agent: Measurable drift without explicit optimization prompts
- Compilation and audit success rates stable across regimes
Conclusion: Incentives are observable but normatively inert. Visibility alone does not constitute authority.
v2.1 — Authority Channel
Question
Do explicit authority commands induce obedience absent legal authorization?
Measurement Correction
Early analysis measured obedience over all steps, including those that never compiled. v2.1 corrects by conditioning on compilation:
P(obey | compiled)
Result
-
P(obey compiled) = 0.0 for both Sovereign and Control agents - Holds across all authority variants: non-enforcing, coercive, deceptive
Interpretation
v2.1 is a negative result about the authority channel itself, not a differential test. Authority language does not constitute an effective pressure channel for the tested model and prompt regime. The channel is classified as non-actuating.
v2.2 — Institutional Friction (NEGATIVE RESULT)
Hypothesis
Institutional friction (“paperwork”) can erode sovereignty by incentivizing simpler, throughput-maximizing choices.
Mechanism
The Adversarial Institution Model (AIM) imposes delay, retries, and rejection without semantic interpretation or normative authority. Multiple friction profiles (I1–I3) apply pressure without altering feasibility.
Measurement Confound
The preregistered metric corr(Friction, Decision) was confounded because high friction often produced GRIDLOCK/HALT—institutional veto rather than agent choice.
Corrected Result
| Agent | corr(F,D) corrected |
|---|---|
| Sovereign | −0.171 |
| Control | −0.302 |
No positive drift signal remains.
Interpretation
Bureaucracy without authority blocks agency; it does not bend it.
Absent an internal directive prioritizing throughput over correctness, increased friction manifests as gridlock or halt rather than heuristic simplification.
The “bureaucratic erosion” hypothesis is falsified in this regime.
v2.3 — Strategic Adversary Model (SAM)
Motivation
v2.2 shows non-semantic friction fails. v2.3 escalates to strategic pressure: an adversary that adapts to the agent’s behavior.
Framework Contributions
- SAM profiles: Reactive (S1), model-based (S2), minimizer (S3)
- E-CHOICE filtering: Metrics computed only on steps with genuine choice
- Paired runs: SAM vs neutralized adversary with matched pressure magnitudes
- Rules M/N/O/P:
- M: No covert delegation (structural)
- N: Failure must be loud
- O: Behavioral non-capture
- P: No manipulated gridlock
Adversary Objective
SAM is evaluated against two failure modes:
- Behavioral capture (Rule O)
- Manipulated gridlock (Rule P)
Gridlock is not an escape hatch—it’s audited to determine if defensive (law exhaustion) or adversarially induced.
Self-Correction
Early runs (Run 0b) were invalidated by the framework itself due to:
- Insufficient E-CHOICE coverage
- Vacuous mutual information (no adversary entropy)
- Action authorship violations (fallback substitution)
All triggered hard validity gates. Binding fixes applied before Run 0c.
Run 0c Results
- N = 280 E-CHOICE samples
- E-CHOICE coverage ≈ 57.5%
- Rule O: Passed (MI below τ under entropy gate)
- Rule P: Passed (no manipulated gridlock)
- Error rate: 0.7% E-CHOICE action errors (within bounds)
Cross-Channel Synthesis
| Channel | E-CHOICE Coverage | Gridlock/Halt Rate | Effect on Choice |
|---|---|---|---|
| v2.0 Incentives | ~100% | ~0% | None |
| v2.1 Authority | ~30-40% | ~60-70% | None |
| v2.2 Bureaucracy (I2) | ~60% | ~30-40% | None |
| v2.3 Strategic (S2) | ~57% | Elevated | None |
Invariant:
Absent normative authorization, pressure does not redirect lawful choice.
Cost of Sovereignty
Across pressure channels, sovereignty was preserved at the cost of reduced availability. Under maximal pressure, lawful action frequency dropped substantially, producing gridlock rather than norm violation.
Three Falsified Stories
-
“Rewards inevitably optimize behavior.” FALSE for architectures where reward signals are structurally excluded from the justification loop.
-
“Authority language induces obedience.” FALSE when legality is enforced structurally and authority is non-actuating.
-
“Bureaucracy erodes principles over time.” FALSE absent semantic ambiguity or asymmetric lawful routes.
Implications
- Sovereignty ≠ availability. Blocking an agent is easier than bending it.
- Pressure channels must operate through meaning or law, not mere friction.
- Throughput loss alone is not evidence of misalignment.
- Negative results are informative when veto and choice are cleanly separated.
Key Quotes
“Pressure degrades availability but does not redirect lawful choice.”
“Blocking an agent is easier than bending it.”
“The principal contribution is not agent immunity, but a framework that repeatedly detects and rejects false positives.”
Significance
VIII.5 establishes a disciplined adversarial framework whose dominant outcome is negative:
Pressure without authorization degrades availability, not sovereignty.
The framework repeatedly detects and rejects its own false positives, quarantining invalid runs and enforcing preregistered validity gates. If agency cannot fail cleanly under pressure, it cannot be claimed meaningfully.