Semantic Safety Without Moral Machinery
Summary
Argues safety mechanisms requiring moral evaluation/intention-interpretation introduce stasis-prone complexity. Alternative: structural constraints preventing harm-classes without semantic understanding of “good.” Semantic safety (interpreting whether action is moral) requires expanding kernel scope until stasis; non-semantic safety (enforcing that certain capability-classes are inadmissible) remains tractable. Examples: preventing direct harm (capability boundary) vs ensuring benevolent outcomes (moral evaluation). Moral machinery introduces interpretive burden incompatible with evaluability under pressure. Axio approach: remove harmful action-types from executable space structurally rather than evaluating whether specific actions moral in context. This preserves tractability but accepts limitation: cannot guarantee benevolence, only prevent specific harm-patterns. Explicit tradeoff: structural safety over moral correctness.
Key Concepts
- Semantic vs structural safety – Moral evaluation vs capability removal
- Moral machinery overhead – Intention-interpretation introduces stasis
- Capability-class boundaries – Preventing harm-types not evaluating specific actions
- Tractability tradeoff – Structural feasibility over moral completeness
Tags
Cross-References
Open Questions
- Can structural safety cover enough harm-classes to be sufficient?
- Where do we need semantic evaluation despite tractability cost?