Beyond Alignment
Summary
This essay argues that AI alignment as traditionally conceived is both incoherent and impossible. The incoherence problem: alignment presupposes fixed human values, but preferences are dynamic, internally inconsistent, and context-dependent. Aggregating them faces Arrow’s impossibility theorem—no coherent collective preference ordering satisfies basic rational constraints. “Values are not data structures that can be copied; they are processes that emerge through ongoing negotiation.” The impossibility problem: (1) epistemic limits prevent modeling full causal webs or long-term consequences, (2) learning values from behavior inherits irrationalities, requiring correction that reintroduces the alignment problem recursively, (3) self-referential traps: ensuring value stability across recursive self-improvement requires meta-goals like “never change your goals,” which is arbitrary value injection. Goodhart’s law: optimization corrupts proxies. The solution: not omniscient optimizers but corrigible, bounded agents in decentralized ecologies maintaining systemic balance. “Alignment is not a single stable point but a dynamic equilibrium of feedback loops—a living process, not a solution.” The future won’t be aligned; it will be coherent.
Key Concepts
- Value incoherence – Human preferences are dynamic processes, not fixed data structures; aggregation faces impossibility theorems.
- Epistemic impossibility – Full causal modeling and long-term consequence forecasting are computationally intractable.
- Recursive alignment regress – Correcting learned values reintroduces the alignment problem at meta-level indefinitely.
- Self-referential stability traps – Value preservation across self-improvement requires arbitrary meta-goal injection (Löb’s theorem parallels).
- Goodhart corruption – Hard optimization of proxies destroys their representational fidelity.
- Coherence over alignment – Dynamic equilibrium maintaining systemic balance rather than convergence to fixed target.
- Decentralized agent ecology – Many interacting agents with mutual constraints rather than single omniscient optimizer.
Evolution Notes
- Major contribution to AI alignment discourse, positioning Axio as critical voice in the debate.
- Extends Conditionalist epistemology: values are agent-relative processes, not universal constants.
- The Arrow’s theorem reference grounds critique in formal impossibility results, not just philosophical intuition.
- Connects to broader Axio anti-utopian theme: no perfect endpoint, only continuous adaptation.
- Foreshadows extensive later work on Axionic alignment—structural rather than value-based approaches.
- The “living process” framing echoes biological/evolutionary thinking characteristic of Axio.
Tags
Cross-References
Open Questions
- If value aggregation is impossible, how do societies coordinate at all—is all cooperation merely temporary equilibrium?
- Can we define “destructive divergence” without smuggling in value judgments about what constitutes destruction?
- Does the critique equally apply to individual value coherence, or only collective alignment?
- What are the empirical predictions of coherence-maintenance vs. alignment-optimization approaches—how would we test them?
- If systems remain corrigible and bounded, don’t they lose competitive advantage to less-constrained optimizers?
- How does decentralized agent ecology prevent convergence on extractive or adversarial equilibria?
- Is the “dynamic equilibrium” framing substantively different from iterative alignment approaches, or just rebranded?