Individually incentivized safe Pareto improvements in open-source bargaining

Summary Agents might fail to peacefully trade in high-stakes negotiations. Such bargaining failures can have catastrophic consequences, including great power conflicts, and AI flash wars. This post is a distillation of DiGiovanni et al. (2024) (DCM), whose central result is that agents that are sufficiently transparent to each other have individual incentives to avoid catastrophic bargaining failures. More precisely, DCM constructs strategies that are plausibly individually incentivized, and, if adopted by all, guarantee each player no less than their least preferred trade outcome. Figure 0 below illustrates this. This result is significant because artificial general intelligences (AGIs) might (i) be involved in high-stakes negotiations, (ii) be designed with the capabilities required for the type of strategy we’ll present, and (iii) bargain poorly by default (since bargaining competence isn’t […]

Read more

Making AIs less likely to be spiteful

This report is also posted to LessWrong here. Which forms of misalignment might result in particularly bad outcomes? And to what extent can we prevent them even if we fail at ​​intent alignment? We define spite as a terminal preference for frustrating others’ preferences, at least under some conditions. Reducing the chances that an AI system is spiteful is a candidate class of interventions for reducing risks of AGI conflict, as well as risks from malevolence. This post summarizes some of our thinking on the topic. We give an overview of why spite might lead to catastrophic conflict; how we might intervene to reduce it; ways in which the intervention could fail to be impactful, or have negative impact; and things we could learn […]

Read more

Open-minded updatelessness

Summary Bounded agents might be unaware of possibilities relevant to their decision-making. That is, they may not just be uncertain but fail to conceive of some relevant hypotheses entirely. What's more, commitment races might pressure early AGIs into adopting an updateless policy from a position of limited awareness. What happens then when a committed AGI becomes aware of a possibility that’d have changed which commitment it’d have wanted to make in the first place? Motivated by this question, we develop "open-minded" extensions of updatelessness, where agents revise their priors upon experiencing awareness growth and reevaluate their commitment to a plan relative to the revised prior. Introduction Bounded agents may be unaware of propositions relevant to the decision problem they face.1 […]

Read more