Semantic Safety Without Moral Machinery

Summary

Argues safety mechanisms requiring moral evaluation/intention-interpretation introduce stasis-prone complexity. Alternative: structural constraints preventing harm-classes without semantic understanding of “good.” Semantic safety (interpreting whether action is moral) requires expanding kernel scope until stasis; non-semantic safety (enforcing that certain capability-classes are inadmissible) remains tractable. Examples: preventing direct harm (capability boundary) vs ensuring benevolent outcomes (moral evaluation). Moral machinery introduces interpretive burden incompatible with evaluability under pressure. Axio approach: remove harmful action-types from executable space structurally rather than evaluating whether specific actions moral in context. This preserves tractability but accepts limitation: cannot guarantee benevolence, only prevent specific harm-patterns. Explicit tradeoff: structural safety over moral correctness.

Key Concepts

Semantic vs structural safety – Moral evaluation vs capability removal
Moral machinery overhead – Intention-interpretation introduces stasis
Capability-class boundaries – Preventing harm-types not evaluating specific actions
Tractability tradeoff – Structural feasibility over moral completeness

Cross-References

Open Questions

Can structural safety cover enough harm-classes to be sufficient?
Where do we need semantic evaluation despite tractability cost?

Summary

Key Concepts

Tags

Cross-References

Open Questions