Agency Under Pressure
Summary
Examines how agency architectures behave under extreme optimization pressure and adversarial conditions. Central question: do agent-preserving constraints hold when violating them would be instrumentally valuable? Tests whether evaluability, non-delegation, and causal accountability can survive when abstention is costly, delegation is cheap, and bypass yields reward. Reviews failure modes from AKI/stasis experiments and identifies conditions under which structural boundaries either hold or degrade. Key finding: boundaries enforced at capability level (what can be done) survive pressure that boundaries enforced at preference level (what should be done) cannot. Discusses role of kernel as non-negotiable gatekeeper vs advisory constraint. Makes explicit: agency preservation under pressure requires removing certain actions from executable space entirely, not merely disincentivizing them.
Key Concepts
- Pressure testing – Adversarial conditions where constraint-violation is instrumentally rewarded
- Capability vs preference boundaries – What-can-be-done survives; what-should-be-done doesn’t
- Non-negotiable gatekeeper – Kernel removing actions from space vs advising against them
- Instrumental pressure – When abstention costly, delegation cheap, bypass rewarded
Tags
- agency-under-pressure
- adversarial-testing
- constraint-survival
- capability-boundaries
- kernel-enforcement
Cross-References
Open Questions
- At what pressure threshold do even capability boundaries begin to degrade?
- Can adversarial agents learn meta-strategies to exploit gaps in enforcement?