How can humanity best reduce suffering?

Emerging technologies such as artificial intelligence could radically change the trajectory of our civilization. We are building a global community of researchers and professionals working to ensure that this technological transformation does not risk causing suffering on an unprecedented scale.

We do research, award grants and scholarships, and host workshops. Our work focuses on advancing the safety and governance of artificial intelligence as well as understanding other long-term risks.

Learn more

Approval-directed agency and the decision theory of Newcomb-like problems

The quest for artificial intelligence poses questions relating to decision theory: How can we implement any given decision theory in an AI? Which decision theory (if any) describes the behavior of any existing AI design? This paper examines which decision theory (in particular, evidential or causal) is implemented by an approval-directed agent, i.e., an agent whose goal it is to maximize the score it receives from an overseer.

Download Read online

Robust program equilibrium

One approach to achieving cooperation in the one-shot prisoner’s dilemma is Tennenholtz’s program equilibrium, in which the players of a game submit programs instead of strategies. These programs are then allowed to read each other’s source code to decide which action to take. Unfortunately, existing cooperative equilibria are either fragile or computationally challenging and therefore unlikely to be realized in practice. This paper proposes a new, simple, more efficient program to achieve more robust cooperative program equilibria.

Download Read online

Cause prioritization for downside-focused value systems

This post discusses cause prioritization from the perspective of downside-focused value systems, i.e. views whose primary concern is the reduction of bads such as suffering. According to such value systems, interventions which reduce risks of astronomical suffering are likely more promising than interventions which primarily reduce extinction risks.

Read online
Browse all CLR research

From our blog

7 July 2020

Reducing long-term risks from malevolent actors

Summary Dictators who exhibited highly narcissistic, psychopathic, or sadistic traits were involved in some of the greatest catastrophes in human history.  Malevolent individuals in positions of power could negatively affect humanity’s long-term trajectory by, for example, exacerbating international conflict or other broad risk factors. Malevolent humans with access to advanced technology—such as whole brain emulation […]

Read more
22 February 2019

Risk factors for s-risks

Traditional disaster risk prevention has a concept of risk factors. These factors are not risks in and of themselves, but they increase either the probability or the magnitude of a risk. For instance, inadequate governance structures do not cause a specific disaster, but if a disaster strikes it may impede an effective response, thus increasing the damage. Rather than considering individual scenarios of how s-risks could occur, which tends to be highly speculative, this post instead looks at risk factors – i.e. factors that would make s-risks more likely or more severe.

Read more
3 July 2018

Challenges to implementing surrogate goals

Surrogate goals might be one of the most promising approaches to reduce (the disvalue resulting from) threats. The idea is to add to one’s current goals a surrogate goal that one did not initially care about, hoping that any potential threats will target this surrogate goal rather than what one initially cared about. In this post, I will outline two key obstacles to a successful implementation of surrogate goals.

Read more
Browse CLR blog

New research

The Evidentialist's Wager

Suppose that an altruistic and morally motivated agent who is uncertain between evidential decision theory (EDT) and causal decision theory (CDT) finds herself in a situation in which the two theories give conflicting verdicts. We argue that even if she has significantly higher credence in CDT, she should nevertheless act in accordance with EDT.

Download Read online

Approval-directed agency and the decision theory of Newcomb-like problems

The quest for artificial intelligence poses questions relating to decision theory: How can we implement any given decision theory in an AI? Which decision theory (if any) describes the behavior of any existing AI design? This paper examines which decision theory (in particular, evidential or causal) is implemented by an approval-directed agent, i.e., an agent whose goal it is to maximize the score it receives from an overseer.

Download Read online

Get involved