Overview of Transformative AI Misuse Risks: What Could Go Wrong Beyond Misalignment

This  post provides an overview of this report. Discussions of the existential risks posed by artificial intelligence have largely focused on the challenge of alignment - ensuring that advanced AI systems pursue human-compatible goals. However, even if we solve alignment, humanity could still face catastrophic outcomes from how humans choose to use transformative AI technologies. A new analysis examines these "misuse risks" - scenarios where human decisions about AI deployment, rather than AI systems acting against human interests, lead to existential catastrophe. This includes both intentional harmful uses (like developing AI-enabled weapons) and reckless deployment without adequate safeguards. The analysis maps out how such human-directed applications of AI, even when technically aligned, could lead to permanent loss of human potential. […]

Read more

Individually incentivized safe Pareto improvements in open-source bargaining

Summary Agents might fail to peacefully trade in high-stakes negotiations. Such bargaining failures can have catastrophic consequences, including great power conflicts, and AI flash wars. This post is a distillation of DiGiovanni et al. (2024) (DCM), whose central result is that agents that are sufficiently transparent to each other have individual incentives to avoid catastrophic bargaining failures. More precisely, DCM constructs strategies that are plausibly individually incentivized, and, if adopted by all, guarantee each player no less than their least preferred trade outcome. Figure 0 below illustrates this. This result is significant because artificial general intelligences (AGIs) might (i) be involved in high-stakes negotiations, (ii) be designed with the capabilities required for the type of strategy we’ll present, and (iii) bargain poorly by default (since bargaining competence isn’t […]

Read more