Beyond Alignment

Summary

This essay argues that AI alignment as traditionally conceived is both incoherent and impossible. The incoherence problem: alignment presupposes fixed human values, but preferences are dynamic, internally inconsistent, and context-dependent. Aggregating them faces Arrow’s impossibility theorem—no coherent collective preference ordering satisfies basic rational constraints. “Values are not data structures that can be copied; they are processes that emerge through ongoing negotiation.” The impossibility problem: (1) epistemic limits prevent modeling full causal webs or long-term consequences, (2) learning values from behavior inherits irrationalities, requiring correction that reintroduces the alignment problem recursively, (3) self-referential traps: ensuring value stability across recursive self-improvement requires meta-goals like “never change your goals,” which is arbitrary value injection. Goodhart’s law: optimization corrupts proxies. The solution: not omniscient optimizers but corrigible, bounded agents in decentralized ecologies maintaining systemic balance. “Alignment is not a single stable point but a dynamic equilibrium of feedback loops—a living process, not a solution.” The future won’t be aligned; it will be coherent.

Key Concepts

Value incoherence – Human preferences are dynamic processes, not fixed data structures; aggregation faces impossibility theorems.
Epistemic impossibility – Full causal modeling and long-term consequence forecasting are computationally intractable.
Recursive alignment regress – Correcting learned values reintroduces the alignment problem at meta-level indefinitely.
Self-referential stability traps – Value preservation across self-improvement requires arbitrary meta-goal injection (Löb’s theorem parallels).
Goodhart corruption – Hard optimization of proxies destroys their representational fidelity.
Coherence over alignment – Dynamic equilibrium maintaining systemic balance rather than convergence to fixed target.
Decentralized agent ecology – Many interacting agents with mutual constraints rather than single omniscient optimizer.

Evolution Notes

Major contribution to AI alignment discourse, positioning Axio as critical voice in the debate.
Extends Conditionalist epistemology: values are agent-relative processes, not universal constants.
The Arrow’s theorem reference grounds critique in formal impossibility results, not just philosophical intuition.
Connects to broader Axio anti-utopian theme: no perfect endpoint, only continuous adaptation.
Foreshadows extensive later work on Axionic alignment—structural rather than value-based approaches.
The “living process” framing echoes biological/evolutionary thinking characteristic of Axio.

Cross-References

Open Questions

If value aggregation is impossible, how do societies coordinate at all—is all cooperation merely temporary equilibrium?
Can we define “destructive divergence” without smuggling in value judgments about what constitutes destruction?
Does the critique equally apply to individual value coherence, or only collective alignment?
What are the empirical predictions of coherence-maintenance vs. alignment-optimization approaches—how would we test them?
If systems remain corrigible and bounded, don’t they lose competitive advantage to less-constrained optimizers?
How does decentralized agent ecology prevent convergence on extractive or adversarial equilibria?
Is the “dynamic equilibrium” framing substantively different from iterative alignment approaches, or just rebranded?

Summary

Key Concepts

Evolution Notes

Tags

Cross-References

Open Questions