Summer Update 2025
We're writing to share some important organizational developments and research progress as we move through 2025.
Contents
Research Leadership Transition
After six months as interim research director, Mia Taylor has decided to leave CLR at the end of August. Mia will be joining Forethought as a researcher.
This was a difficult decision for Mia, but after much reflection, she feels that she’ll have more positive impact working on other priorities within longtermism.1 While s-risk reduction won't be her main research focus going forward, she intends to stay engaged with the s-risk community and look for opportunities to contribute to s-risk reduction in her future work.
At CLR we’re deeply grateful for Mia's leadership during this transition period and her commitment to seeing our summer research fellowship through to completion. During Mia’s time as interim research director, the research team have made substantial progress on two new agendas: the personas agenda and the strategic readiness agenda (more below).
Working closely with the board, we've identified several promising options for CLR's research priorities and leadership going forward, and plan to finalize and announce our plans by mid August. Tristan Cook, who has served as Managing Director for the last six months, will lead CLR after the transition.
Research Updates
Empirical Research: Studying the Emergence of Undesirable LLM Personas
We're investigating how LLMs can develop concerning behavioral traits or "personas" during training. This research builds on growing evidence that models can exhibit robust behavioral patterns that weren't explicitly intended by developers - from Bing Chat to more recent work on emergent misalignment, where models develop broadly misaligned personas from training to produce misaligned outputs in a narrow domain (e.g., writing insecure code). Understanding how and why models develop these personas is critical for preventing future AI systems from exhibiting harmful traits that weren't explicitly trained. This agenda builds on our previous Measurement Research Agenda.
Our team has been at the forefront of this research area. Niels was a lead author on the original paper on emergent misalignment and presented at ICML earlier this month. We've also contributed to several other papers in this space.
Current work focuses on developing better tools for systematically measuring model preferences and investigating how different training environments might promote the emergence of undesirable traits. We're also investigating how depictions of undesirable behavior in the pre-training data could negatively influence model behavior.
Conceptual Research: Strategic Readiness
The conceptual research stream is developing frameworks for robust s-risk interventions. Since many intuitive approaches backfire through unintended consequences,2 we're building concrete decision tools that identify which interventions actually reduce s-risks and when to deploy them. This framework primarily supports our empirical research by ensuring that interventions targeting the harmful traits we discover—like spite in AI systems—can be implemented without creating worse outcomes.
Summer Research Fellowship
We're excited to welcome four summer research fellows who joined at the end of June: Anders Cairns-Woodruff, Arun Jose, and Daniel Tan working on empirical research on model personas, and Matt Hampton working on the strategic readiness agenda.
We expect to share more updates on our research leadership and organizational direction in early autumn, once new arrangements are in place. In the meantime, please reach out to Tristan Cook (Managing Director) if you have questions or would like to get involved with our work.
- This assessment was driven in large part by Mia's ethical differences with CLR's values; Mia is not SFE.
- For background on how we think about backfire risk, see Section 4 of Beginner’s guide to reducing s-risks.