Self-improvement races

Just like human factions may race toward AI and thus risk misalignment, AIs may race toward superior abilities by self-improving themselves in risky ways.

Read more

Backup Utility Functions: A Fail-Safe AI Technique

Setting up the goal systems of advanced AIs in a way that results in benevolent behavior is expected to be difficult. We should account for the possibility that the goal systems of AIs fail to implement our values as originally intended. In this paper, we propose the idea of backup utility functions: Secondary utility functions that are used in case the primary ones “fail”.

Read more