When is intent alignment sufficient or necessary to reduce AGI conflict?
In this post, we look at conditions under which Intent Alignment isn't Sufficient or Intent Alignment isn't Necessary for interventions on AGI systems to reduce the risks of (unendorsed) conflict to be effective. We then conclude this sequence by listing what we currently think are relatively promising directions for technical research and intervention to reduce AGI conflict. ContentsIntent alignment is not sufficient to prevent unendorsed conflictWhen would consultation with overseers fail to prevent catastrophic decisions?Conflict-causing capabilities failuresFailures of cooperative capabilitiesFailures to understand cooperation-relevant preferencesWhy not delegate work on conflict reduction?Intent alignment may not be necessary to reduce the risk of conflictTentative conclusions about directions for research & interventionReferences Intent alignment is not sufficient to prevent unendorsed conflict In the previous post, we outlined […]
Read more