Surrogate goals and safe Pareto improvements
Caspar Oesterheld proposed surrogate goals in unpublished work while working at CLR in 2016. Tobias Baumann first published a blog post about them in a 2017 blog post, which also coined the term “surrogate goals”. Later, Caspar published a more rigorous, formal discussion of the idea under the term “safe Pareto improvements”, which is also intended to be more general. Eliezer Yudkowsy independently proposed a similar idea in an article about “Separation from hyperexistential risk”. The following articles are fully dedicated to the idea.
- Tobias Baumann (2017): Using surrogate goals to deflect threats (runner-up at the AI alignment prize)
- Tobias Baumann (2018): Challenges to implementing surrogate goals
- Tobias Baumann (2019): Surrogate goals under uncertainty
- Tobias Baumann (2019): Surrogate goals and private information
- Caspar Oesterheld (2021): Safe Pareto improvements for delegated game playing. Published in JAAMAS 36. Short version published at AAMAS 2021.
- Vojta Kovarik (2021): Formalizing Objections against Surrogate Goals
- Nicolas Macé, Anthony DiGiovanni, Jesse Clifton (2024): Individually incentivized safe Pareto improvements in open-source bargaining
- Anthony DiGiovanni, Jesse Clifton, Nicolas Macé (2024): Safe Pareto Improvements for Expected Utility Maximizers in Program Games
Surrogate goals have also been discussed or at least mentioned in, among other places, Section 4.2 of CLR’s research agenda and the 80,000 hours podcast (guest: Paul Cristiano)."