Research Overview
This page gives an overview of our research at the Center on Long-Term Risk. For published work, see our Publications page and Blog.
Current focuses
Our two primary research agendas are:
- Model Personas. This agenda studies and steers the emergence of malicious propensities in LLMs — traits like spitefulness, sadism, and punitiveness. We treat personas, bundles of correlated traits, as a useful abstraction for how propensities generalise out-of-distribution, and as a target for interventions.
- Safe Pareto Improvements (SPIs). SPIs are modifications to agents’ bargaining strategies that make all parties better off, regardless of their original strategies. They are an unusually robust approach to preventing catastrophic conflict between AI systems, but aren’t guaranteed to be adopted in practice. This agenda addresses the risk that early AI development forecloses the option to adopt SPIs.
Outside of these agendas, we have active interests in:
- S-risk macrostrategy, with a focus on determining when and how interventions robustly reduce s-risk. For background on this area, see Beginner's guide to reducing s-risks.
- Automating conceptual work relevant to s-risk reduction.
Historical research focuses
Our 2020 research agenda Cooperation, Conflict, and Transformative Artificial Intelligence remains highly influential on our current priorities, especially our work on SPIs.
Our older Priority areas page, first published in August 2020, is now outdated, but several of the areas it described remain relevant to our current prioritisation.