Center on Long-Term Risk: 2025 Plans
February 2025
By Mia Taylor and Tristan Cook
Contents
Overview
Many promising technical interventions for s-risk reduction are routed through the AI safety community, frontier AI labs, and AI Safety Institutes. In 2025, CLR's primary focus will be on developing expertise and collaborative relationships with these groups.
Our goal by the end of 2025 is to build a reputation for CLR as a leader in one area of research that is particularly likely to be relevant for s-risk reduction. We will also continue work on macrostrategy, particularly on our new ‘strategic readiness’ agenda – understanding when and how interventions robustly reduce s-risk.
Research
Capacity-building research
The majority of CLR's research efforts will be on an externally legible empirical agenda - working with LLMs - that both develops expertise relevant to s-risk reduction and creates opportunities for collaboration with frontier AI labs and the broader AI safety community
Contenders for the agenda include:
Personas/characters. How do models develop different personas or preferences? How do developer choices at different stages of training affect the personas that models end up with? We would aim to study the emergence of broad patterns of behavior in contemporary models, such as attitudes toward risk and resource-acquisition. Research in these would help us understand propensities that are more likely to contribute directly to s-risks.
Multiagent dynamics. How do models behave in extended multi-agent interactions? We have long focused on conflict as a major risk factor for s-risk, and developing expertise in measuring how AI systems behave in negotiations could be valuable for understanding these risks. In long interactions between multiple agents, systems have the potential to draw each other far off-distribution from where safety training took place, which may help us understand how personas generalize in these conditions
AI for strategy research. How do we get (future) AI assistants to usefully contribute to macrostrategy research or other kinds of non-empirical research? One plausible path to impact for CLR involves us leveraging human-level AI research assistants to make substantial progress on our research agendas. While we expect AI companies to continue to invest effort into creating excellent AI research engineers, we think developing assistants that can contribute to conceptual and strategic research will be more neglected.
We will finalize our choice of research direction in Q1 2025, based on tractability assessments, team fit, and opportunities for external collaboration.
Strategic readiness research
Alongside the capacity-building agenda, we will continue work on understanding how and when to develop and push for technical interventions.
This work will likely involve 'nearcasting' AI progress and interaction with s-risk threat models, scenario planning, and modelling the effects of interventions to identify conditions in which to intervene.
Hiring
We aim to hire three researchers in 2025.
Community building
We will continue select community-building activities:
- Run the fifth iteration of Summer Research Fellowship
- Continue career calls and 1:1 outreach
- Run another iteration of the Foundations Course