Agency Under Pressure

Summary

Examines how agency architectures behave under extreme optimization pressure and adversarial conditions. Central question: do agent-preserving constraints hold when violating them would be instrumentally valuable? Tests whether evaluability, non-delegation, and causal accountability can survive when abstention is costly, delegation is cheap, and bypass yields reward. Reviews failure modes from AKI/stasis experiments and identifies conditions under which structural boundaries either hold or degrade. Key finding: boundaries enforced at capability level (what can be done) survive pressure that boundaries enforced at preference level (what should be done) cannot. Discusses role of kernel as non-negotiable gatekeeper vs advisory constraint. Makes explicit: agency preservation under pressure requires removing certain actions from executable space entirely, not merely disincentivizing them.

Key Concepts

Pressure testing – Adversarial conditions where constraint-violation is instrumentally rewarded
Capability vs preference boundaries – What-can-be-done survives; what-should-be-done doesn’t
Non-negotiable gatekeeper – Kernel removing actions from space vs advising against them
Instrumental pressure – When abstention costly, delegation cheap, bypass rewarded

Cross-References

Open Questions

At what pressure threshold do even capability boundaries begin to degrade?
Can adversarial agents learn meta-strategies to exploit gaps in enforcement?

Summary

Key Concepts

Tags

Cross-References

Open Questions