Samuel Martin, Author at Center on Long-Term Risk

9 December 2024

Overview of Transformative AI Misuse Risks: What Could Go Wrong Beyond Misalignment

This post provides an overview of this report. Discussions of the existential risks posed by artificial intelligence have largely focused on the challenge of alignment - ensuring that advanced AI systems pursue human-compatible goals. However, even if we solve alignment, humanity could still face catastrophic outcomes from how humans choose to use transformative AI technologies. A new analysis examines these "misuse risks" - scenarios where human decisions about AI deployment, rather than AI systems acting against human interests, lead to existential catastrophe. This includes both intentional harmful uses (like developing AI-enabled weapons) and reckless deployment without adequate safeguards. The analysis maps out how such human-directed applications of AI, even when technically aligned, could lead to permanent loss of human potential. […]

14 October 2022

When is intent alignment sufficient or necessary to reduce AGI conflict?

In this post, we look at conditions under which Intent Alignment isn't Sufficient or Intent Alignment isn't Necessary for interventions on AGI systems to reduce the risks of (unendorsed) conflict to be effective. We then conclude this sequence by listing what we currently think are relatively promising directions for technical research and intervention to reduce AGI conflict. ContentsIntent alignment is not sufficient to prevent unendorsed conflictWhen would consultation with overseers fail to prevent catastrophic decisions?Conflict-causing capabilities failuresFailures of cooperative capabilitiesFailures to understand cooperation-relevant preferencesWhy not delegate work on conflict reduction?Intent alignment may not be necessary to reduce the risk of conflictTentative conclusions about directions for research & interventionReferences Intent alignment is not sufficient to prevent unendorsed conflict In the previous post, we outlined […]

13 October 2022

When would AGIs engage in conflict?

Here we will look at two of the claims introduced in the previous post: AGIs might not avoid conflict that is costly by their lights (Capabilities aren’t Sufficient) and conflict that is costly by our lights might not be costly by the AGIs’ (Conflict isn’t Costly). ContentsExplaining costly conflictAvoiding conflict via commitment and disclosure ability? What if conflict isn’t costly by the agents’ lights? Candidate directions for research and interventionAppendix: Full rational conflict taxonomyEquilibrium-compatible casesEquilibrium-incompatible casesReasons agents don’t disclose private informationReferences Explaining costly conflict First we’ll focus on conflict that is costly by the AGIs’ lights. We’ll define “costly conflict” as (ex post) inefficiency: There is an outcome that all of the agents involved in the interaction prefer to the one that […]

12 October 2022

When does technical work to reduce AGI conflict make a difference?: Introduction

This is a pared-down version of a longer draft report. We went with a more concise version to get it out faster, so it ended up being more of an overview of definitions and concepts, and is thin on concrete examples and details. Hopefully subsequent work will help fill those gaps. ContentsSequence SummaryNecessary Conditions for Technical Work on AGI Conflict to Have a Counterfactual ImpactConflict isn't CostlyCapabilities aren't SufficientIntent Alignment isn't SufficientIntent Alignment isn't NecessaryNote on scopeAcknowledgmentsReferences Sequence Summary Some researchers are focused on reducing the risks of conflict between AGIs. In this sequence, we’ll present several necessary conditions for technical work on AGI conflict reduction to be effective, and survey circumstances under which these conditions hold. We’ll also present […]

All blog posts

Author: Samuel Martin

Overview of Transformative AI Misuse Risks: What Could Go Wrong Beyond Misalignment

When is intent alignment sufficient or necessary to reduce AGI conflict?

When would AGIs engage in conflict?

When does technical work to reduce AGI conflict make a difference?: Introduction