Publications

The following are selected publications from our researchers.


Cooperation, conflict, and transformative AI

Multi-agent systems

Oesterheld, Caspar; Conitzer, Vincent. Safe Pareto Improvements for Delegated Game Playing. AAMAS, 2021.
Links | BibTeX
Stastny, Julian; Riché, Maxime; Lyzhov, Alexander; Treutlein, Johannes; Dafoe, Allan; Clifton, Jesse. Normative Disagreement as a Challenge for Cooperative AI. Cooperative AI workshop and the Strategic ML workshop at NeurIPS, 2021.
Abstract | Links | BibTeX
DiGiovanni, Anthony; Clifton, Jesse. Commitment games with conditional information revelation. AAAI 2023, 2022.
Abstract | Links | BibTeX
DiGiovanni, Anthony; Macé, Nicolas; Clifton, Jesse. Evolutionary Stability of Other-Regarding Preferences Under Complexity Costs. Learning, Evolution, and Games, 2022.
Abstract | Links | BibTeX
Clifton, Jesse. Collaborative game specification: arriving at common models in bargaining. Working paper, March 2021.
Links | BibTeX
Clifton, Jesse. Weak identifiability and its consequences in strategic settings. Working paper, February 2021.
Links | BibTeX
Clifton, Jesse; Riché, Maxime. Towards cooperation in learning games. Working paper, October 2020.
Links | BibTeX
Oesterheld, Caspar. Robust program equilibrium. Theory and Decision, 86 (1), 2018.
Links | BibTeX

Strategic considerations

Clifton, Jesse. CLR's Research Agenda on Cooperation, Conflict, and TAI. Alignment Forum, December 2019.
Links | BibTeX
Clifton, Jesse. Equilibrium and prior selection problems in multipolar deployment. AI Alignment Forum, April 2020.
Links | BibTeX
Kokotajlo, Daniel. The "Commitment Races" problem. Alignment Forum, August 2019.
Links | BibTeX

Decision theory

Nathaniel Sauerberg, Caspar Oesterheld . Computing Optimal Commitments to Strategies and Outcome-conditional Utility Transfers. 2024.
Links | BibTeX
Treutlein, Johannes. Modeling evidential cooperation in large worlds. 2023.
Links | BibTeX
MacAskill, William; Vallinder, Aron; Oesterheld, Caspar; Shulman, Carl; Treutlein, Johannes. The Evidentialist’s Wager. The Journal of Philosophy, 2021.
Links | BibTeX
Bell, James; Linsefors, Linda; Oesterheld, Caspar; Skalse, Joar. Reinforcement Learning in Newcomblike Environments. NeurIPS, 2021.
Links | BibTeX
Oesterheld, Caspar. Approval-directed agency and the decision theory of Newcomb-like problems. Synthese, 2019, (Runner-up in the "AI alignment prize").
Links | BibTeX
Oesterheld, Caspar. Doing what has worked well in the past leads to evidential decision theory. 2018.
Links | BibTeX
Oesterheld, Caspar. Multiverse-wide Cooperation via Correlated Decision Making. 2017.
Links | BibTeX
Oesterheld, Caspar. Decision Theory and the Irrelevance of Impossible Outcomes. 2017.
Links | BibTeX
Treutlein, Johannes. Anthropic uncertainty in the Evidential Blackmail. 2017.
Links | BibTeX

Malevolence

Althaus, David; Baumann, Tobias. Reducing long-term risks from malevolent actors. Effective Altruism Forum, April 2020.
Links | BibTeX

Ethics & meta-ethics

Gloor, Lukas. Sequence on moral anti-realism. Effective Altruism Forum, June 2020.
Links | BibTeX
Gloor, Lukas. Tranquilism. CLR Website, July 2017.
Links | BibTeX
Knutsson, Simon; Munthe, Christian. A Virtue of Precaution Regarding the Moral Status of Animals with Uncertain Sentience. Journal of Agricultural and Environmental Ethics, 30 (2), 2017.
Links | BibTeX
Daniel, Max. Bibliography of Suffering-Focused Views. CLR Website, August 2016.
Links | BibTeX
Tomasik, Brian. The Importance of Wild-Animal Suffering. Relations, 3 (2), 2015.
Links | BibTeX
Tomasik, Brian. Should We Base Moral Judgments on Intentions or Outcomes?. CLR Website, July 2013.
Links | BibTeX
Tomasik, Brian. Dealing with Moral Multiplicity. CLR Website, December 2013.
Links | BibTeX

Prioritization & macrostrategy

Cook, Tristan. Replicating and extending the grabby aliens model. Effective Altruism Forum, April 2022.
Links | BibTeX
Cook, Tristan; Corlouer, Guillaume. The optimal timing of spending on AGI safety work; why we should probably be spending more now. Effective Altruism Forum, November 2022.
Links | BibTeX
Baum, Seth D; Armstrong, Stuart; Ekenstedt, Timoteus; Häggström, Olle; Hanson, Robin; Kuhlemann, Karin; Maas, Matthijs M; Miller, James D; Salmela, Markus; Sandberg, Anders; Sotala, Kaj; Torres, Phil; Turchin, Alexey; Yampolskiy, Roman V. Long-term trajectories of human civilization. Foresight, 21 (1), pp. 53-83, 2019.
Links | BibTeX
Kokotajlo, Daniel. Soft takeoff can still lead to decisive strategic advantage. Alignment Forum, August 2019.
Links | BibTeX
Gloor, Lukas. Rebuttal of Christiano and AI Impacts on takeoff speeds?. LessWrong, April 2019.
Links | BibTeX
Gloor, Lukas. Cause prioritization for downside-focused value systems. Effective Altruism Forum, January 2018.
Links | BibTeX
Althaus, David. Descriptive Population Ethics and Its Relevance for Cause Prioritization. Effective Altruism Forum, April 2018.
Links | BibTeX
Sotala, Kaj. How feasible is the rapid development of artificial superintelligence?. Physica Scripta, 92 (11), 2017.
Links | BibTeX
Sotala, Kaj; Gloor, Lukas. Superintelligence as a Cause or Cure for Risks of Astronomical Suffering. Informatica, 41 (4), 2017.
Links | BibTeX
Oesterheld, Caspar. Complications in evaluating neglectedness. The Universe from an Intentional Stance Blog, June 2017.
Links | BibTeX
Tomasik, Brian. How the Simulation Argument Dampens Future Fanaticism. CLR Website, June 2016.
Links | BibTeX

AI Forecasting

Kokotajlo, Daniel. What 2026 looks like. LessWrong, August 2021.
Links | BibTeX
Kokotajlo, Daniel. Fun with +12 OOMs of Compute. LessWrong, March 2021.
Links | BibTeX
Kokotajlo, Daniel. Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain. LessWrong, January 2021.
Links | BibTeX
Kokotajlo, Daniel. Against GDP as a metric for timelines and takeoff speeds. LessWrong, December 2020.
Links | BibTeX

Other

DiGiovanni, Anthony. Beginner’s guide to reducing s-risks. CLR Website, September 2023.
Links | BibTeX
Kokotajlo, Daniel. Persuasion Tools: AI takeover without AGI or agency?. LessWrong, November 2020.
Links | BibTeX
Althaus, David; Kokotajlo, Daniel. Incentivizing forecasting via social media. Effective Altruism Forum, December 2020.
Links | BibTeX
Sotala, Kaj. Sequence on non-agent and multiagent models of mind. LessWrong, January 2019.
Links | BibTeX
Oesterheld, Caspar. Moral realism and AI alignment. LessWrong, September 2018.
Links | BibTeX
Gloor, Lukas. Suffering-Focused AI Safety: In Favor of “Fail-Safe” Measures. CLR Website, June 2016.
Links | BibTeX
Gloor, Lukas. Room for Other Things: How to adjust if EA seems overwhelming. Effective Altruism Forum, March 2015.
Links | BibTeX