How can humanity best reduce suffering?

Emerging technologies such as artificial intelligence could radically change the trajectory of our civilization. We are building a global community of researchers and professionals working to ensure that this technological transformation does not risk causing suffering on an unprecedented scale.

We do research, award grants and scholarships, and host workshops. Our work focuses on advancing the safety and governance of artificial intelligence as well as understanding other long-term risks.

Learn more

Approval-directed agency and the decision theory of Newcomb-like problems

The quest for artificial intelligence poses questions relating to decision theory: How can we implement any given decision theory in an AI? Which decision theory (if any) describes the behavior of any existing AI design? This paper examines which decision theory (in particular, evidential or causal) is implemented by an approval-directed agent, i.e., an agent whose goal it is to maximize the score it receives from an overseer.

Download Read online

Robust program equilibrium

One approach to achieving cooperation in the one-shot prisoner’s dilemma is Tennenholtz’s program equilibrium, in which the players of a game submit programs instead of strategies. These programs are then allowed to read each other’s source code to decide which action to take. Unfortunately, existing cooperative equilibria are either fragile or computationally challenging and therefore unlikely to be realized in practice. This paper proposes a new, simple, more efficient program to achieve more robust cooperative program equilibria.

Download Read online

Cause prioritization for downside-focused value systems

This post discusses cause prioritization from the perspective of downside-focused value systems, i.e. views whose primary concern is the reduction of bads such as suffering. According to such value systems, interventions which reduce risks of astronomical suffering are likely more promising than interventions which primarily reduce extinction risks.

Read online
Browse all CLR research

From our blog

6 March 2021

Collaborative game specification: arriving at common models in bargaining

Conflict is often an inefficient outcome to a bargaining problem. This is true in the sense that, for a given game-theoretic model of a strategic interaction, there is often some equilibrium in which all agents are better off than the conflict outcome. But real-world agents may not make decisions according to game-theoretic models, and when they do, they may use different models. This makes it more difficult to guarantee that real-world agents will avoid bargaining failure than is suggested by the observation that conflict is often inefficient.   In another post, I described the "prior selection problem", on which different agents having different models of their situation can lead to bargaining failure. Moreover, techniques for addressing bargaining problems like coordination on […]

Read more
13 February 2021

Weak identifiability and its consequences in strategic settings

One way that agents might become involved in catastrophic conflict is if they have mistaken beliefs about one another. Maybe I think you are bluffing when you threaten to launch the nukes, but you are dead serious. So we should understand why agents might sometimes have such mistaken beliefs. In this post I'll discuss one obstacle to the formation of accurate beliefs about other agents, which has to do with identifiability. As with my post on equilibrium and prior selection problems, this is a theme that keeps cropping up in my thinking about AI cooperation and conflict, so I thought it might be helpful to have it written up. We say that a model is unidentifiable if there are several […]

Read more
18 January 2021

Birds, Brains, Planes, and AI: Against Appeals to the Complexity / Mysteriousness / Efficiency of the Brain

[Epistemic status: Strong opinions lightly held, this time with a cool graph.] I argue that an entire class of common arguments against short timelines is bogus, and provide weak evidence that anchoring to the human-brain-human-lifetime milestone is reasonable. In a sentence, my argument is that the complexity and mysteriousness and efficiency of the human brain (compared to artificial neural nets) is almost zero evidence that building TAI will be difficult, because evolution typically makes things complex and mysterious and efficient, even when there are simple, easily understood, inefficient designs that work almost as well (or even better!) for human purposes. In slogan form: If all we had to do to get TAI was make a simple neural net 10x the […]

Read more
Browse CLR blog

New research

The Evidentialist's Wager

Suppose that an altruistic and morally motivated agent who is uncertain between evidential decision theory (EDT) and causal decision theory (CDT) finds herself in a situation in which the two theories give conflicting verdicts. We argue that even if she has significantly higher credence in CDT, she should nevertheless act in accordance with EDT.

Download Read online

Approval-directed agency and the decision theory of Newcomb-like problems

The quest for artificial intelligence poses questions relating to decision theory: How can we implement any given decision theory in an AI? Which decision theory (if any) describes the behavior of any existing AI design? This paper examines which decision theory (in particular, evidential or causal) is implemented by an approval-directed agent, i.e., an agent whose goal it is to maximize the score it receives from an overseer.

Download Read online

Subscribe to receive monthly updates