Center on Long-Term Risk

Our goal is to address worst-case risks from the development and deployment of advanced AI systems. We are currently focused on conflict scenarios as well as technical and philosophical aspects of cooperation.

We do interdisciplinary research, make and recommend grants, and build a community of professionals and other researchers around our priorities.

More about us

Measurement Research Agenda

Author: Mia Taylor 1 Motivation The Center on Long-Term Risk aims to reduce risks of astronomical suffering (s-risk) from advanced AI systems. We’re primarily concerned with threat models involving the deliberate creation of suffering during conflict between advanced agentic AI systems. To mitigate these risks, we are interested in tracking properties of AI systems that make them more likely to be involved in catastrophic conflict. Thus, we propose the following research priorities: Identify and describe properties of AI systems that would robustly make them more likely to contribute to s-risk (section 2.1) Design measurement methods to detect whether systems have these properties (section 2.2) Use these measurements on contemporary systems to learn what aspects of training, prompting, or scaffolding influence […]

Read online

Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda

Note: This research agenda was published in January 2020. For an update on our work in multi-agent systems as of March 2021, see this post. For an update on our plans for  empirical work as of June 2024, see our measurement agenda. Author: Jesse Clifton, Center on Long-Term Risk, and Polaris Research Institute This research agenda on Cooperation, Conflict, and Transformative Artificial Intelligence outlines what we think are the most promising avenues for developing technical and governance interventions aimed at avoiding conflict between transformative AI systems. We draw on international relations, game theory, behavioral economics, machine learning, decision theory, and formal epistemology. While our research agenda captures many topics we are interested in, the focus of CLR's research is broader. We […]

Download Read online

Reducing long-term risks from malevolent actors

Summary Dictators who exhibited highly narcissistic, psychopathic, or sadistic traits were involved in some of the greatest catastrophes in human history.  Malevolent individuals in positions of power could negatively affect humanity’s long-term trajectory by, for example, exacerbating international conflict or other broad risk factors. Malevolent humans with access to advanced technology—such as whole brain emulation or other forms of transformative AI—could cause serious existential risks and suffering risks. We therefore consider interventions to reduce the expected influence of malevolent humans on the long-term future. The development of manipulation-proof measures of malevolence seems valuable, since they could be used to screen for malevolent humans in high-impact settings, such as heads of government or CEOs. We also explore possible future technologies that […]

Read online
Browse all CLR research

From our blog

Overview of Transformative AI Misuse Risks: What Could Go Wrong Beyond Misalignment

This  post provides an overview of this report. Discussions of the existential risks posed by artificial intelligence have largely focused on the challenge of alignment - ensuring that advanced AI systems pursue human-compatible goals. However, even if we solve alignment, humanity could still face catastrophic outcomes from how humans choose to use transformative AI technologies. A new analysis examines these "misuse risks" - scenarios where human decisions about AI deployment, rather than AI systems acting against human interests, lead to existential catastrophe. This includes both intentional harmful uses (like developing AI-enabled weapons) and reckless deployment without adequate safeguards. The analysis maps out how such human-directed applications of AI, even when technically aligned, could lead to permanent loss of human potential. […]

Read more

Individually incentivized safe Pareto improvements in open-source bargaining

Summary Agents might fail to peacefully trade in high-stakes negotiations. Such bargaining failures can have catastrophic consequences, including great power conflicts, and AI flash wars. This post is a distillation of DiGiovanni et al. (2024) (DCM), whose central result is that agents that are sufficiently transparent to each other have individual incentives to avoid catastrophic bargaining failures. More precisely, DCM constructs strategies that are plausibly individually incentivized, and, if adopted by all, guarantee each player no less than their least preferred trade outcome. Figure 0 below illustrates this. This result is significant because artificial general intelligences (AGIs) might (i) be involved in high-stakes negotiations, (ii) be designed with the capabilities required for the type of strategy we’ll present, and (iii) bargain poorly by default (since bargaining competence isn’t […]

Read more

Making AIs less likely to be spiteful

This report is also posted to LessWrong here. Which forms of misalignment might result in particularly bad outcomes? And to what extent can we prevent them even if we fail at ​​intent alignment? We define spite as a terminal preference for frustrating others’ preferences, at least under some conditions. Reducing the chances that an AI system is spiteful is a candidate class of interventions for reducing risks of AGI conflict, as well as risks from malevolence. This post summarizes some of our thinking on the topic. We give an overview of why spite might lead to catastrophic conflict; how we might intervene to reduce it; ways in which the intervention could fail to be impactful, or have negative impact; and things we could learn […]

Read more
Browse CLR blog