Publications
Contents
Cooperation, conflict, and transformative AI
Multi-agent systems
Oesterheld, Caspar; Conitzer, Vincent. Safe Pareto Improvements for Delegated Game Playing. AAMAS, 2021. Links | BibTeX @conference{safe-pareto-improvements, title = {Safe Pareto Improvements for Delegated Game Playing}, author = {Caspar Oesterheld and Vincent Conitzer}, editor = {U. Endriss and A. Nowé and F. Dignum and A. Lomuscio}, url = {https://longtermrisk.org/safe-pareto-improvements-for-delegated-game-playing/, HTML https://www.ifaamas.org/Proceedings/aamas2021/pdfs/p983.pdf, PDF}, year = {2021}, date = {2021-05-03}, booktitle = {AAMAS}, howpublished = {International Foundation for Autonomous Agents and Multiagent Systems}, keywords = {}, pubstate = {published}, tppubtype = {conference} } |
Stastny, Julian; Riché, Maxime; Lyzhov, Alexander; Treutlein, Johannes; Dafoe, Allan; Clifton, Jesse. Normative Disagreement as a Challenge for Cooperative AI. Cooperative AI workshop and the Strategic ML workshop at NeurIPS, 2021. Abstract | Links | BibTeX @conference{multi-agent-learning, title = {Normative Disagreement as a Challenge for Cooperative AI}, author = {Julian Stastny and Maxime Riché and Alexander Lyzhov and Johannes Treutlein and Allan Dafoe and Jesse Clifton }, url = {https://longtermrisk.org/normative-disagreement-as-a-challenge-for-cooperative-ai/, HTML https://arxiv.org/pdf/2111.13872.pdf, PDF }, year = {2021}, date = {2021-11-27}, booktitle = {Cooperative AI workshop and the Strategic ML workshop at NeurIPS}, abstract = {Cooperation in settings where agents have both common and conflicting interests (mixed-motive environments) has recently received considerable attention in multi-agent learning. However, the mixed-motive environments typically studied have a single cooperative outcome on which all agents can agree. Many real-world multi-agent environments are instead bargaining problems (BPs): they have several Pareto-optimal payoff profiles over which agents have conflicting preferences. We argue that typical cooperation-inducing learning algorithms fail to cooperate in BPs when there is room for normative disagreement resulting in the existence of multiple competing cooperative equilibria, and illustrate this problem empirically. To remedy the issue, we introduce the notion of norm-adaptive policies. Norm-adaptive policies are capable of behaving according to different norms in different circumstances, creating opportunities for resolving normative disagreement. We develop a class of norm-adaptive policies and show in experiments that these significantly increase cooperation. However, norm-adaptiveness cannot address residual bargaining failure arising from a fundamental tradeoff between exploitability and cooperative robustness.}, howpublished = {Peer-reviewed}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Cooperation in settings where agents have both common and conflicting interests (mixed-motive environments) has recently received considerable attention in multi-agent learning. However, the mixed-motive environments typically studied have a single cooperative outcome on which all agents can agree. Many real-world multi-agent environments are instead bargaining problems (BPs): they have several Pareto-optimal payoff profiles over which agents have conflicting preferences. We argue that typical cooperation-inducing learning algorithms fail to cooperate in BPs when there is room for normative disagreement resulting in the existence of multiple competing cooperative equilibria, and illustrate this problem empirically. To remedy the issue, we introduce the notion of norm-adaptive policies. Norm-adaptive policies are capable of behaving according to different norms in different circumstances, creating opportunities for resolving normative disagreement. We develop a class of norm-adaptive policies and show in experiments that these significantly increase cooperation. However, norm-adaptiveness cannot address residual bargaining failure arising from a fundamental tradeoff between exploitability and cooperative robustness. |
DiGiovanni, Anthony; Clifton, Jesse. Commitment games with conditional information revelation. AAAI 2023, 2022. Abstract | Links | BibTeX @conference{DiGiovanni2022, title = {Commitment games with conditional information revelation}, author = {Anthony DiGiovanni and Jesse Clifton}, url = {https://longtermrisk.org/commitment-games-with-conditional-information-revelation/, HTML https://arxiv.org/pdf/2204.03484.pdf, PDF }, year = {2022}, date = {2022-04-07}, booktitle = {AAAI 2023}, abstract = {The conditional commitment abilities of mutually transparent computer agents have been studied in previous work on commitment games and program equilibrium. This literature has shown how these abilities can help resolve Prisoner's Dilemmas and other failures of cooperation in complete information settings. But inefficiencies due to private information have been neglected thus far in this literature, despite the fact that these problems are pervasive and might also be addressed by greater mutual transparency. In this work, we introduce a framework for commitment games with a new kind of conditional commitment device, which agents can use to conditionally reveal private information. We prove a folk theorem for this setting that provides sufficient conditions for ex post efficiency, and thus represents a model of ideal cooperation between agents without a third-party mediator. Connecting our framework with the literature on strategic information revelation, we explore cases where conditional revelation can be used to achieve full cooperation while unconditional revelation cannot. Finally, extending previous work on program equilibrium, we develop an implementation of conditional information revelation. We show that this implementation forms program ϵ-Bayesian Nash equilibria corresponding to the Bayesian Nash equilibria of these commitment games.}, howpublished = {Peer-reviewed}, keywords = {}, pubstate = {published}, tppubtype = {conference} } The conditional commitment abilities of mutually transparent computer agents have been studied in previous work on commitment games and program equilibrium. This literature has shown how these abilities can help resolve Prisoner's Dilemmas and other failures of cooperation in complete information settings. But inefficiencies due to private information have been neglected thus far in this literature, despite the fact that these problems are pervasive and might also be addressed by greater mutual transparency. In this work, we introduce a framework for commitment games with a new kind of conditional commitment device, which agents can use to conditionally reveal private information. We prove a folk theorem for this setting that provides sufficient conditions for ex post efficiency, and thus represents a model of ideal cooperation between agents without a third-party mediator. Connecting our framework with the literature on strategic information revelation, we explore cases where conditional revelation can be used to achieve full cooperation while unconditional revelation cannot. Finally, extending previous work on program equilibrium, we develop an implementation of conditional information revelation. We show that this implementation forms program ϵ-Bayesian Nash equilibria corresponding to the Bayesian Nash equilibria of these commitment games. |
DiGiovanni, Anthony; Macé, Nicolas; Clifton, Jesse. Evolutionary Stability of Other-Regarding Preferences Under Complexity Costs. Learning, Evolution, and Games, 2022. Abstract | Links | BibTeX @conference{DiGiovanni2022b, title = {Evolutionary Stability of Other-Regarding Preferences Under Complexity Costs}, author = {Anthony DiGiovanni and Nicolas Macé and Jesse Clifton}, url = {https://longtermrisk.org/evolutionary-stability-of-other-regarding-preferences-under-complexity-costs/, HTML https://arxiv.org/pdf/2207.03178, PDF }, year = {2022}, date = {2022-07-07}, booktitle = {Learning, Evolution, and Games}, abstract = {The evolution of preferences that account for other agents’ fitness, or other-regarding preferences, has been modeled with the “indirect approach” to evolutionary game theory. Under the indirect evolutionary approach, agents make decisions by optimizing a subjective utility function. Evolution may select for subjective preferences that differ from the fitness function, and in particular, subjective preferences for increasing or reducing other agents’ fitness. However, indirect evolutionary models typically artificially restrict the space of strategies that agents might use (assuming that agents always play a Nash equilibrium under their subjective preferences), and dropping this restriction can undermine the finding that other-regarding preferences are selected for. Can the indirect evolutionary approach still be used to explain the apparent existence of other-regarding preferences, like altruism, in humans? We argue that it can, by accounting for the costs associated with the complexity of strategies, giving (to our knowledge) the first account of the relationship between strategy complexity and the evolution of preferences. Our model formalizes the intuition that agents face tradeoffs between the cognitive costs of strategies and how well they interpolate across contexts. For a single game, these complexity costs lead to selection for a simple fixed-action strategy, but across games, when there is a sufficiently large cost to a strategy's number of context-specific parameters, a strategy of maximizing subjective (other-regarding) utility is stable again. Overall, our analysis provides a more nuanced picture of when other-regarding preferences will evolve.}, howpublished = {Peer-reviewed}, keywords = {}, pubstate = {published}, tppubtype = {conference} } The evolution of preferences that account for other agents’ fitness, or other-regarding preferences, has been modeled with the “indirect approach” to evolutionary game theory. Under the indirect evolutionary approach, agents make decisions by optimizing a subjective utility function. Evolution may select for subjective preferences that differ from the fitness function, and in particular, subjective preferences for increasing or reducing other agents’ fitness. However, indirect evolutionary models typically artificially restrict the space of strategies that agents might use (assuming that agents always play a Nash equilibrium under their subjective preferences), and dropping this restriction can undermine the finding that other-regarding preferences are selected for. Can the indirect evolutionary approach still be used to explain the apparent existence of other-regarding preferences, like altruism, in humans? We argue that it can, by accounting for the costs associated with the complexity of strategies, giving (to our knowledge) the first account of the relationship between strategy complexity and the evolution of preferences. Our model formalizes the intuition that agents face tradeoffs between the cognitive costs of strategies and how well they interpolate across contexts. For a single game, these complexity costs lead to selection for a simple fixed-action strategy, but across games, when there is a sufficiently large cost to a strategy's number of context-specific parameters, a strategy of maximizing subjective (other-regarding) utility is stable again. Overall, our analysis provides a more nuanced picture of when other-regarding preferences will evolve. |
Clifton, Jesse. Collaborative game specification: arriving at common models in bargaining. Working paper, March 2021. Links | BibTeX @online{clifton-collaborative-game-2021, title = {Collaborative game specification: arriving at common models in bargaining}, author = {Jesse Clifton}, url = {https://longtermrisk.org/collaborative-game-specification/, HTML}, year = {2021}, date = {2021-03-06}, howpublished = {Working paper}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Clifton, Jesse. Weak identifiability and its consequences in strategic settings. Working paper, February 2021. Links | BibTeX @online{clifton-weak-identifiability-2021, title = {Weak identifiability and its consequences in strategic settings}, author = {Jesse Clifton}, url = {https://longtermrisk.org/weak-identifiability-and-its-consequences-in-strategic-settings/, HTML}, year = {2021}, date = {2021-02-13}, howpublished = {Working paper}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Clifton, Jesse; Riché, Maxime. Towards cooperation in learning games. Working paper, October 2020. Links | BibTeX @online{clifton-towards-cooperation-in-learning-games, title = {Towards cooperation in learning games}, author = {Jesse Clifton and Maxime Riché}, url = {https://longtermrisk.org/cooperation-conflict-and-transformative-artificial-intelligence-a-research-agenda-3/, PDF}, year = {2020}, date = {2020-10-01}, howpublished = {Working paper}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Oesterheld, Caspar. Robust program equilibrium. Theory and Decision, 86 (1), 2018. Links | BibTeX @article{oesterheld-robust-program-2018, title = {Robust program equilibrium}, author = {Caspar Oesterheld}, url = {https://link.springer.com/article/10.1007/s11238-018-9679-3, URL https://longtermrisk.org/files/Oesterheld2018_RobustProgramEquilibrium.pdf, PDF}, doi = {https://doi.org/10.1007/s11238-018-9679-3}, year = {2018}, date = {2018-11-01}, journal = {Theory and Decision}, volume = {86}, number = {1}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Strategic considerations
Taylor, Mia. Measurement Research Agenda. , June 2024. Links | BibTeX @online{taylor-measurement-research-agenda-2024, title = {Measurement Research Agenda}, author = {Mia Taylor}, url = {https://longtermrisk.org/measurement-research-agenda/, URL}, year = {2024}, date = {2024-06-17}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Clifton, Jesse. CLR's Research Agenda on Cooperation, Conflict, and TAI. Alignment Forum, December 2019. Links | BibTeX @online{clifton-clrs-research-2019, title = {CLR's Research Agenda on Cooperation, Conflict, and TAI}, author = {Jesse Clifton}, url = {https://longtermrisk.org/research-agenda, URL}, year = {2019}, date = {2019-12-01}, howpublished = {Alignment Forum}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Clifton, Jesse. Equilibrium and prior selection problems in multipolar deployment. AI Alignment Forum, April 2020. Links | BibTeX @online{clifton-equilibrium-selection-2020, title = {Equilibrium and prior selection problems in multipolar deployment}, author = {Jesse Clifton}, url = {https://www.alignmentforum.org/posts/Tdu3tGT4i24qcLESh/equilibrium-and-prior-selection-problems-in-multipolar-1, Alignment Forum}, year = {2020}, date = {2020-04-02}, howpublished = {AI Alignment Forum}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Kokotajlo, Daniel. The "Commitment Races" problem. Alignment Forum, August 2019. Links | BibTeX @online{kokotajlo-the-commitment-2019, title = {The "Commitment Races" problem}, author = {Daniel Kokotajlo}, url = {https://www.alignmentforum.org/posts/brXr7PJ2W4Na2EW2q/the-commitment-races-problem, URL}, year = {2019}, date = {2019-08-01}, howpublished = {Alignment Forum}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Decision theory
Nathaniel Sauerberg, Caspar Oesterheld . Computing Optimal Commitments to Strategies and Outcome-conditional Utility Transfers. 2024. Links | BibTeX @article{sauerberg2024computing, title = {Computing Optimal Commitments to Strategies and Outcome-conditional Utility Transfers}, author = {Nathaniel Sauerberg, Caspar Oesterheld}, url = {https://arxiv.org/abs/2402.06626}, year = {2024}, date = {2024-02-09}, urldate = {2024-02-09}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Treutlein, Johannes. Modeling evidential cooperation in large worlds. 2023. Links | BibTeX @article{Treutlein2023, title = {Modeling evidential cooperation in large worlds}, author = {Johannes Treutlein}, url = {https://arxiv.org/pdf/2307.04879.pdf}, year = {2023}, date = {2023-07-10}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
MacAskill, William; Vallinder, Aron; Oesterheld, Caspar; Shulman, Carl; Treutlein, Johannes. The Evidentialist’s Wager. The Journal of Philosophy, 2021. Links | BibTeX @article{macaskill-the-evidentialists-2019, title = {The Evidentialist’s Wager}, author = {William MacAskill and Aron Vallinder and Caspar Oesterheld and Carl Shulman and Johannes Treutlein}, url = {https://globalprioritiesinstitute.org/the-evidentialists-wager/, Global Priorities Institute}, year = {2021}, date = {2021-01-01}, journal = {The Journal of Philosophy}, howpublished = {Global Priorities Institute}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Bell, James; Linsefors, Linda; Oesterheld, Caspar; Skalse, Joar. Reinforcement Learning in Newcomblike Environments. NeurIPS, 2021. Links | BibTeX @article{Bell2021, title = {Reinforcement Learning in Newcomblike Environments}, author = {James Bell and Linda Linsefors and Caspar Oesterheld and Joar Skalse}, url = {https://proceedings.neurips.cc/paper/2021/file/b9ed18a301c9f3d183938c451fa183df-Paper.pdf, PDF}, year = {2021}, date = {2021-01-01}, journal = {NeurIPS}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Oesterheld, Caspar. Approval-directed agency and the decision theory of Newcomb-like problems. Synthese, 2019, (Runner-up in the "AI alignment prize"). Links | BibTeX @article{oesterheld-approvaldirected-agency-2019, title = {Approval-directed agency and the decision theory of Newcomb-like problems}, author = {Caspar Oesterheld}, url = {https://link.springer.com/article/10.1007/s11229-019-02148-2, URL}, doi = {https://doi.org/10.1007/s11229-019-02148-2}, year = {2019}, date = {2019-02-01}, journal = {Synthese}, edition = {Special issue on }, note = {Runner-up in the "[AI alignment prize](https://www.lesswrong.com/posts/4WbNGQMvuFtY3So7s/announcement-ai-alignment-prize-winners-and-next-round)"}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Oesterheld, Caspar. Doing what has worked well in the past leads to evidential decision theory. 2018. Links | BibTeX @article{Oesterheld2018, title = {Doing what has worked well in the past leads to evidential decision theory}, author = {Caspar Oesterheld}, url = {https://casparoesterheld.files.wordpress.com/2018/01/learning-dt.pdf}, year = {2018}, date = {2018-01-09}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Oesterheld, Caspar. Multiverse-wide Cooperation via Correlated Decision Making. 2017. Links | BibTeX @article{oesterheld-multiversewide-cooperation-2017, title = {Multiverse-wide Cooperation via Correlated Decision Making}, author = {Caspar Oesterheld}, url = {https://longtermrisk.org/multiverse-wide-cooperation-via-correlated-decision-making/, URL https://longtermrisk.org/files/Multiverse-wide-Cooperation-via-Correlated-Decision-Making.pdf, PDF}, year = {2017}, date = {2017-08-01}, howpublished = {CLR Website}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Oesterheld, Caspar. Decision Theory and the Irrelevance of Impossible Outcomes. 2017. Links | BibTeX @article{Oesterheld2017, title = {Decision Theory and the Irrelevance of Impossible Outcomes}, author = {Caspar Oesterheld}, url = {https://casparoesterheld.com/2017/01/17/decision-theory-and-the-irrelevance-of-impossible-outcomes/}, year = {2017}, date = {2017-01-17}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Treutlein, Johannes. Anthropic uncertainty in the Evidential Blackmail. 2017. Links | BibTeX @article{Treutlein2017, title = {Anthropic uncertainty in the Evidential Blackmail}, author = {Johannes Treutlein}, url = {https://casparoesterheld.com/2017/05/12/anthropic-uncertainty-in-the-evidential-blackmail/}, year = {2017}, date = {2017-05-12}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Malevolence
Althaus, David; Baumann, Tobias. Reducing long-term risks from malevolent actors. Effective Altruism Forum, April 2020. Links | BibTeX @online{Althaus2020, title = {Reducing long-term risks from malevolent actors}, author = {David Althaus and Tobias Baumann}, url = {https://longtermrisk.org/reducing-long-term-risks-from-malevolent-actors/, Summary https://longtermrisk.org/files/Reducing_long_term_risks_from_malevolent_actors.pdf, PDF https://forum.effectivealtruism.org/posts/LpkXtFXdsRd4rG8Kb/reducing-long-term-risks-from-malevolent-actors, EA Forum post}, year = {2020}, date = {2020-04-29}, urldate = {2020-04-29}, howpublished = {Effective Altruism Forum}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Ethics & meta-ethics
Gloor, Lukas. Sequence on moral anti-realism. Effective Altruism Forum, June 2020. Links | BibTeX @online{gloor-sequence-moral-anti-realism, title = {Sequence on moral anti-realism}, author = {Lukas Gloor}, url = {https://forum.effectivealtruism.org/posts/TwJb75GtbD4LvGiku/moral-anti-realism-sequence-1-what-is-moral-realism, #1: What Is Moral Realism? https://forum.effectivealtruism.org/posts/6nPnqXCaYsmXCtjTk/moral-anti-realism-sequence-2-why-realists-and-anti-realists, #2: Why Realists and Anti-Realists Disagree https://forum.effectivealtruism.org/posts/C2GpA894CfLcTXL2L/moral-anti-realism-sequence-3-against-irreducible, #3: Against Irreducible Normativity https://forum.effectivealtruism.org/posts/G9ASsCfsNghevtghF/moral-anti-realism-sequence-4-why-the-moral-realism-wager-1, #4: Why the Moral Realism Wager Fails https://forum.effectivealtruism.org/posts/BYjj4WdrxgPJxMre9/moral-anti-realism-sequence-5-metaethical-fanaticism, #5: Metaethical Fanaticism (Dialogue)}, year = {2020}, date = {2020-06-24}, howpublished = {Effective Altruism Forum}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Gloor, Lukas. Tranquilism. CLR Website, July 2017. Links | BibTeX @online{gloor--2017, title = {Tranquilism}, author = {Lukas Gloor}, url = {https://longtermrisk.org/tranquilism/, URL}, year = {2017}, date = {2017-07-01}, howpublished = {CLR Website}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Knutsson, Simon; Munthe, Christian. A Virtue of Precaution Regarding the Moral Status of Animals with Uncertain Sentience. Journal of Agricultural and Environmental Ethics, 30 (2), 2017. Links | BibTeX @article{knutsson-a-virtue-2017, title = {A Virtue of Precaution Regarding the Moral Status of Animals with Uncertain Sentience}, author = {Simon Knutsson and Christian Munthe}, url = {https://link.springer.com/article/10.1007/s10806-017-9662-y, URL}, doi = {https://doi.org/10.1007/s10806-017-9662-y}, year = {2017}, date = {2017-05-01}, journal = {Journal of Agricultural and Environmental Ethics}, volume = {30}, number = {2}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Daniel, Max. Bibliography of Suffering-Focused Views. CLR Website, August 2016. Links | BibTeX @online{daniel-bibliography-of-2016, title = {Bibliography of Suffering-Focused Views}, author = {Max Daniel}, url = {https://longtermrisk.org/bibliography-of-suffering-focused-views/, URL}, year = {2016}, date = {2016-08-01}, howpublished = {CLR Website}, note = {Last update: 2016-08}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Tomasik, Brian. The Importance of Wild-Animal Suffering. Relations, 3 (2), 2015. Links | BibTeX @article{tomasik-the-importance-2015, title = {The Importance of Wild-Animal Suffering}, author = {Brian Tomasik}, url = {http://www.ledonline.it/index.php/Relations/article/view/880/717, URL https://longtermrisk.org/files/the-importance-of-wild-animal-suffering.pdf, PDF}, doi = {https://doi.org/10.7358/rela-2015-002-toma}, year = {2015}, date = {2015-11-01}, journal = {Relations}, volume = {3}, number = {2}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Tomasik, Brian. Should We Base Moral Judgments on Intentions or Outcomes?. CLR Website, July 2013. Links | BibTeX @online{tomasik-should-we-2013, title = {Should We Base Moral Judgments on Intentions or Outcomes?}, author = {Brian Tomasik}, url = {https://longtermrisk.org/should-we-base-moral-judgments-on-intentions-or-outcomes/, URL}, year = {2013}, date = {2013-07-01}, howpublished = {CLR Website}, note = {Last update: 2013-11-28}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Tomasik, Brian. Dealing with Moral Multiplicity. CLR Website, December 2013. Links | BibTeX @online{tomasik-dealing-with-2013, title = {Dealing with Moral Multiplicity}, author = {Brian Tomasik}, url = {https://longtermrisk.org/dealing-with-moral-multiplicity/, URL}, year = {2013}, date = {2013-12-01}, howpublished = {CLR Website}, note = {Last update: 2017-11-15}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Prioritization & macrostrategy
Cook, Tristan. Replicating and extending the grabby aliens model. Effective Altruism Forum, April 2022. Links | BibTeX @online{cook-2022, title = {Replicating and extending the grabby aliens model}, author = {Tristan Cook}, url = {https://longtermrisk.org/replicating-and-extending-the-grabby-aliens-model/, HTML https://forum.effectivealtruism.org/posts/7bc54mWtc7BrpZY9e/replicating-and-extending-the-grabby-aliens-model, EA Forum https://www.lesswrong.com/posts/iwWjBH2rBF7ExXeEG/replicating-and-extending-the-grabby-aliens-model, LessWrong}, year = {2022}, date = {2022-04-23}, howpublished = {Effective Altruism Forum}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Cook, Tristan; Corlouer, Guillaume. The optimal timing of spending on AGI safety work; why we should probably be spending more now. Effective Altruism Forum, November 2022. Links | BibTeX @online{Cook2022, title = {The optimal timing of spending on AGI safety work; why we should probably be spending more now}, author = {Tristan Cook and Guillaume Corlouer}, url = {https://longtermrisk.org/the-optimal-timing-of-spending-on-agi-safety-work-why-we-should-probably-be-spending-more-now/, HTML https://forum.effectivealtruism.org/posts/Ne8ZS6iJJp7EpzztP/the-optimal-timing-of-spending-on-agi-safety-work-why-we, EA Forum}, year = {2022}, date = {2022-11-29}, howpublished = {Effective Altruism Forum}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Baum, Seth D; Armstrong, Stuart; Ekenstedt, Timoteus; Häggström, Olle; Hanson, Robin; Kuhlemann, Karin; Maas, Matthijs M; Miller, James D; Salmela, Markus; Sandberg, Anders; Sotala, Kaj; Torres, Phil; Turchin, Alexey; Yampolskiy, Roman V. Long-term trajectories of human civilization. Foresight, 21 (1), pp. 53-83, 2019. Links | BibTeX @article{d-longterm-trajectories-2019, title = {Long-term trajectories of human civilization}, author = {Seth D Baum and Stuart Armstrong and Timoteus Ekenstedt and Olle Häggström and Robin Hanson and Karin Kuhlemann and Matthijs M Maas and James D Miller and Markus Salmela and Anders Sandberg and Kaj Sotala and Phil Torres and Alexey Turchin and Roman V Yampolskiy}, url = {https://www.emerald.com/insight/content/doi/10.1108/FS-04-2018-0037/full/html, URL}, doi = {https://doi.org/10.1108/FS-04-2018-0037}, year = {2019}, date = {2019-03-01}, journal = {Foresight}, volume = {21}, number = {1}, pages = {53-83}, publisher = {Emerald Publishing Limited}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Kokotajlo, Daniel. Soft takeoff can still lead to decisive strategic advantage. Alignment Forum, August 2019. Links | BibTeX @online{kokotajlo-soft-takeoff-2019, title = {Soft takeoff can still lead to decisive strategic advantage}, author = {Daniel Kokotajlo}, url = {https://www.alignmentforum.org/posts/PKy8NuNPknenkDY74/soft-takeoff-can-still-lead-to-decisive-strategic-advantage, URL}, year = {2019}, date = {2019-08-01}, howpublished = {Alignment Forum}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Gloor, Lukas. Rebuttal of Christiano and AI Impacts on takeoff speeds?. LessWrong, April 2019. Links | BibTeX @online{gloor-rebuttal-of-2019, title = {Rebuttal of Christiano and AI Impacts on takeoff speeds?}, author = {Lukas Gloor}, url = {https://www.lesswrong.com/posts/PzAnWgqvfESgQEvdg/any-rebuttals-of-christiano-and-ai-impacts-on-takeoff-speeds#zFEhTxNqEp3eZbjLZ, URL}, year = {2019}, date = {2019-04-01}, howpublished = {LessWrong}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Gloor, Lukas. Cause prioritization for downside-focused value systems. Effective Altruism Forum, January 2018. Links | BibTeX @online{gloor-cause-prioritization-2018, title = {Cause prioritization for downside-focused value systems}, author = {Lukas Gloor}, url = {https://forum.effectivealtruism.org/posts/225Aq4P4jFPoWBrb5/cause-prioritization-for-downside-focused-value-systems, URL}, year = {2018}, date = {2018-01-01}, howpublished = {Effective Altruism Forum}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Althaus, David. Descriptive Population Ethics and Its Relevance for Cause Prioritization. Effective Altruism Forum, April 2018. Links | BibTeX @online{althaus-descriptive-population-2018, title = {Descriptive Population Ethics and Its Relevance for Cause Prioritization}, author = {David Althaus}, url = {https://forum.effectivealtruism.org/posts/CmNBmSf6xtMyYhvcs/descriptive-population-ethics-and-its-relevance-for-cause, URL}, year = {2018}, date = {2018-04-01}, howpublished = {Effective Altruism Forum}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Sotala, Kaj. How feasible is the rapid development of artificial superintelligence?. Physica Scripta, 92 (11), 2017. Links | BibTeX @article{sotala-how-feasible-2017, title = {How feasible is the rapid development of artificial superintelligence?}, author = {Kaj Sotala}, url = {https://iopscience.iop.org/article/10.1088/1402-4896/aa90e8, URL http://kajsotala.fi/assets/2017/10/how_feasible.pdf, PDF}, doi = {https://doi.org/10.1088/1402-4896/aa90e8}, year = {2017}, date = {2017-10-01}, journal = {Physica Scripta}, volume = {92}, number = {11}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Sotala, Kaj; Gloor, Lukas. Superintelligence as a Cause or Cure for Risks of Astronomical Suffering. Informatica, 41 (4), 2017. Links | BibTeX @article{sotala-superintelligence-as-2017, title = {Superintelligence as a Cause or Cure for Risks of Astronomical Suffering}, author = {Kaj Sotala and Lukas Gloor}, url = {https://longtermrisk.org/superintelligence-cause-cure-risks-astronomical-suffering/, URL http://www.informatica.si/index.php/informatica/article/view/1877/1098, PDF}, year = {2017}, date = {2017-08-01}, journal = {Informatica}, volume = {41}, number = {4}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Oesterheld, Caspar. Complications in evaluating neglectedness. The Universe from an Intentional Stance Blog, June 2017. Links | BibTeX @online{oesterheld-complications-in-2017, title = {Complications in evaluating neglectedness}, author = {Caspar Oesterheld}, url = {https://casparoesterheld.com/2017/06/25/complications-in-evaluating-neglectedness/, URL}, year = {2017}, date = {2017-06-01}, howpublished = {The Universe from an Intentional Stance Blog}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Tomasik, Brian. How the Simulation Argument Dampens Future Fanaticism. CLR Website, June 2016. Links | BibTeX @online{tomasik-how-the-2016, title = {How the Simulation Argument Dampens Future Fanaticism}, author = {Brian Tomasik}, url = {https://longtermrisk.org/how-the-simulation-argument-dampens-future-fanaticism, URL}, year = {2016}, date = {2016-06-01}, howpublished = {CLR Website}, note = {Last update: 2018-03-15}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
AI Forecasting
Kokotajlo, Daniel. What 2026 looks like. LessWrong, August 2021. Links | BibTeX @online{Kokotajlo2021, title = {What 2026 looks like}, author = {Daniel Kokotajlo}, url = {https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like, LessWrong}, year = {2021}, date = {2021-08-06}, journal = {LessWrong}, howpublished = {LessWrong}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Kokotajlo, Daniel. Fun with +12 OOMs of Compute. LessWrong, March 2021. Links | BibTeX @online{Kokotajlo2021b, title = {Fun with +12 OOMs of Compute}, author = {Daniel Kokotajlo}, url = {https://www.lesswrong.com/s/5Eg2urmQjA4ZNcezy/p/rzqACeBGycZtqCfaX, LessWrong}, year = {2021}, date = {2021-03-01}, howpublished = {LessWrong}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Kokotajlo, Daniel. Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain. LessWrong, January 2021. Links | BibTeX @online{Kokotajlo2021c, title = {Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain}, author = {Daniel Kokotajlo}, url = {https://www.lesswrong.com/s/5Eg2urmQjA4ZNcezy/p/HhWhaSzQr6xmBki8F, LessWrong}, year = {2021}, date = {2021-01-18}, howpublished = {LessWrong}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Kokotajlo, Daniel. Against GDP as a metric for timelines and takeoff speeds. LessWrong, December 2020. Links | BibTeX @online{Kokotajlo2020, title = {Against GDP as a metric for timelines and takeoff speeds}, author = {Daniel Kokotajlo}, url = {https://www.lesswrong.com/s/dZMDxPBZgHzorNDTt/p/aFaKhG86tTrKvtAnT, LessWrong}, year = {2020}, date = {2020-12-29}, howpublished = {LessWrong}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Other
DiGiovanni, Anthony. Beginner’s guide to reducing s-risks. CLR Website, September 2023. Links | BibTeX @online{DiGiovanni-2023, title = {Beginner’s guide to reducing s-risks}, author = {Anthony DiGiovanni}, url = {https://longtermrisk.org/beginners-guide-to-reducing-s-risks/, URL}, year = {2023}, date = {2023-09-05}, howpublished = {CLR Website}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Kokotajlo, Daniel. Persuasion Tools: AI takeover without AGI or agency?. LessWrong, November 2020. Links | BibTeX @online{Kokotajlo2020b, title = {Persuasion Tools: AI takeover without AGI or agency?}, author = {Daniel Kokotajlo}, url = {https://www.lesswrong.com/s/dZMDxPBZgHzorNDTt/p/qKvn7rxP2mzJbKfcA, LessWrong}, year = {2020}, date = {2020-11-20}, howpublished = {LessWrong}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Althaus, David; Kokotajlo, Daniel. Incentivizing forecasting via social media. Effective Altruism Forum, December 2020. Links | BibTeX @online{Althaus2020b, title = {Incentivizing forecasting via social media}, author = {David Althaus and Daniel Kokotajlo}, url = {https://forum.effectivealtruism.org/posts/842uRXWoS76wxYG9C/incentivizing-forecasting-via-social-media, Effective Altruism Forum}, year = {2020}, date = {2020-12-16}, howpublished = {Effective Altruism Forum}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Sotala, Kaj. Sequence on non-agent and multiagent models of mind. LessWrong, January 2019. Links | BibTeX @online{sotala-non-agent-multiagent-models-mind, title = {Sequence on non-agent and multiagent models of mind}, author = {Kaj Sotala}, url = {https://www.lesswrong.com/posts/M4w2rdYgCKctbADMn/sequence-introduction-non-agent-and-multiagent-models-of, Sequence introduction: non-agent and multiagent models of mind https://www.lesswrong.com/posts/x4n4jcoDP7xh5LWLq/book-summary-consciousness-and-the-brain, Book Summary: Consciousness and the Brain https://www.lesswrong.com/posts/5gfqG3Xcopscta3st/building-up-to-an-internal-family-systems-model, Building up to an Internal Family Systems model https://www.lesswrong.com/posts/AhcEaqWYpa2NieNsK/subagents-introspective-awareness-and-blending, Subagents; introspective awareness; and blending https://www.lesswrong.com/posts/oJwJzeZ6ar2Hr7KAX/subagents-akrasia-and-coherence-in-humans, Subagents; akrasia; and coherence in humans https://www.lesswrong.com/posts/hnLutdvjC8kPScPAj/integrating-disagreeing-subagents, Integrating disagreeing subagents}, year = {2019}, date = {2019-01-07}, howpublished = {LessWrong}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Oesterheld, Caspar. Moral realism and AI alignment. LessWrong, September 2018. Links | BibTeX @online{oesterheld-moral-realism-2018, title = {Moral realism and AI alignment}, author = {Caspar Oesterheld}, url = {https://www.lesswrong.com/posts/DRmoA7Nqu85Sbuo7t/moral-realism-and-ai-alignment, URL}, year = {2018}, date = {2018-09-01}, howpublished = {LessWrong}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Gloor, Lukas. Suffering-Focused AI Safety: In Favor of “Fail-Safe” Measures. CLR Website, June 2016. Links | BibTeX @online{gloor-sufferingfocused-ai-2016, title = {Suffering-Focused AI Safety: In Favor of “Fail-Safe” Measures}, author = {Lukas Gloor}, url = {https://longtermrisk.org/suffering-focused-ai-safety/, URL https://longtermrisk.org/files/fail-safe-ai.pdf, PDF}, year = {2016}, date = {2016-06-01}, howpublished = {CLR Website}, keywords = {}, pubstate = {published}, tppubtype = {online} } |
Gloor, Lukas. Room for Other Things: How to adjust if EA seems overwhelming. Effective Altruism Forum, March 2015. Links | BibTeX @online{gloor-room-for-2015, title = {Room for Other Things: How to adjust if EA seems overwhelming}, author = {Lukas Gloor}, url = {https://forum.effectivealtruism.org/posts/4fPxQjq6GFZgurSsf/room-for-other-things-how-to-adjust-if-ea-seems-overwhelming, URL}, year = {2015}, date = {2015-03-01}, howpublished = {Effective Altruism Forum}, keywords = {}, pubstate = {published}, tppubtype = {online} } |