Summary

This essay argues that AI alignment as traditionally conceived is both incoherent and impossible. The incoherence problem: alignment presupposes fixed human values, but preferences are dynamic, internally inconsistent, and context-dependent. Aggregating them faces Arrow’s impossibility theorem—no coherent collective preference ordering satisfies basic rational constraints. “Values are not data structures that can be copied; they are processes that emerge through ongoing negotiation.” The impossibility problem: (1) epistemic limits prevent modeling full causal webs or long-term consequences, (2) learning values from behavior inherits irrationalities, requiring correction that reintroduces the alignment problem recursively, (3) self-referential traps: ensuring value stability across recursive self-improvement requires meta-goals like “never change your goals,” which is arbitrary value injection. Goodhart’s law: optimization corrupts proxies. The solution: not omniscient optimizers but corrigible, bounded agents in decentralized ecologies maintaining systemic balance. “Alignment is not a single stable point but a dynamic equilibrium of feedback loops—a living process, not a solution.” The future won’t be aligned; it will be coherent.

Key Concepts

  • Value incoherence – Human preferences are dynamic processes, not fixed data structures; aggregation faces impossibility theorems.
  • Epistemic impossibility – Full causal modeling and long-term consequence forecasting are computationally intractable.
  • Recursive alignment regress – Correcting learned values reintroduces the alignment problem at meta-level indefinitely.
  • Self-referential stability traps – Value preservation across self-improvement requires arbitrary meta-goal injection (Löb’s theorem parallels).
  • Goodhart corruption – Hard optimization of proxies destroys their representational fidelity.
  • Coherence over alignment – Dynamic equilibrium maintaining systemic balance rather than convergence to fixed target.
  • Decentralized agent ecology – Many interacting agents with mutual constraints rather than single omniscient optimizer.

Evolution Notes

  • Major contribution to AI alignment discourse, positioning Axio as critical voice in the debate.
  • Extends Conditionalist epistemology: values are agent-relative processes, not universal constants.
  • The Arrow’s theorem reference grounds critique in formal impossibility results, not just philosophical intuition.
  • Connects to broader Axio anti-utopian theme: no perfect endpoint, only continuous adaptation.
  • Foreshadows extensive later work on Axionic alignment—structural rather than value-based approaches.
  • The “living process” framing echoes biological/evolutionary thinking characteristic of Axio.

Tags

Cross-References

Open Questions

  • If value aggregation is impossible, how do societies coordinate at all—is all cooperation merely temporary equilibrium?
  • Can we define “destructive divergence” without smuggling in value judgments about what constitutes destruction?
  • Does the critique equally apply to individual value coherence, or only collective alignment?
  • What are the empirical predictions of coherence-maintenance vs. alignment-optimization approaches—how would we test them?
  • If systems remain corrigible and bounded, don’t they lose competitive advantage to less-constrained optimizers?
  • How does decentralized agent ecology prevent convergence on extractive or adversarial equilibria?
  • Is the “dynamic equilibrium” framing substantively different from iterative alignment approaches, or just rebranded?