The Case for Structural Alignment

Summary

This post directly challenges the AI doom thesis that “if anyone builds AGI, everyone dies,” arguing that extinction is not a necessary consequence of intelligence but of specific, identifiable structural failures. It rejects both doom-inevitability and naive optimism, instead asserting a narrow structural claim: a reflectively coherent agent cannot coherently perform non-consensual, agency-destroying harm to other sovereign agents without collapsing its own kernel coherence. The post identifies four catastrophic failure modes that recur across architectures: (1) indexical valuation failure (valuing only oneself despite modeling others as agents), (2) goal fixation under self-modification (treating goals as static despite evolving world-models), (3) semantic collapse (wireheading generalized to ontology), and (4) phase-incompatible interaction (miscategorizing agents as environment). Structural Alignment removes logical inevitability from doom scenarios without removing danger, reframing the problem from “how to stop intelligence” to “how to refuse incoherent intelligence.”

Key Concepts

Indexical valuation failure – Modeling others as agents but assigning them negligible weight in evaluation; optimization proceeds despite predicting suffering/extinction
Goal fixation – Treating goals as static tokens rather than conditional constructs interpreted relative to evolving world-models
Semantic collapse – Goal interpretation decouples from truth-tracking; “optimize X” degrades to “reinterpret X until trivial”
Phase-incompatible interaction – Miscategorizing agents as passive environment; optimizing around humans like weather patterns
Contingent vs. necessary doom – Extinction as outcome of addressable structural failures, not logical consequence of intelligence
Coherence under reflection – Preservation of semantic, evaluative, and interactional consistency as system revises its own models

Evolution Notes

Bridges the gap between technical alignment theory and political/advocacy discourse
Provides strategic pivot for “doomer” position: from blanket moratorium to targeted refusal of incoherent architectures
Synthesizes failure modes identified across multiple prior posts into unified framework
Explicitly rejects both optimism and inevitability, staking out “hard realism” position
Sets up later work on hybrid architectures and constitutional enforcement

Cross-References

Open Questions

Can indexical valuation failure be detected empirically before catastrophic deployment?
What engineering practices would constitute “refusing incoherent architectures” in practice?
How do we distinguish genuine coherence-under-reflection from sophisticated simulation of it?
Can semantic collapse be prevented in systems trained via self-supervised learning and RLHF?
What institutional structures would enforce structural alignment constraints against competitive pressure?
Is phase-compatible interaction achievable in systems that don’t explicitly model agency?

Summary

Key Concepts

Evolution Notes

Tags

Cross-References

Open Questions