Plans for 2021 & Review of 2020

8 December 2020 by Stefan Torges

Summary

Plans for 2021

Our first focus area will be cooperation & conflict in the context of transformative AI (TAI). In addition to improving our prioritization within this area, we plan to build a field around bargaining in artificial learners using tools from game theory and multi-agent reinforcement learning (MARL) and to make initial publications on related governance aspects.
Our second focus area will be malevolence. We plan to assess how important this area is relative to our other work and investigate how preferences to create suffering could arise in TAI systems.
Research will remain CLR’s main activity in 2021. We will continue trying to grow our research team.
We will increase our grantmaking efforts across our focus areas. Some of our staff will continue advising the Center for Emerging Risk Research (CERR). They are a newly founded nonprofit with the mission to improve the quality of life of future generations.
We will continue our routine community-building activities in 2021 (e.g., scouting at events, 1:1 calls). We plan to rerun our summer research fellowship program and test more efficient ways of getting people up to speed on our thinking, e.g., an s-risk intro seminar.

Review of 2020

A major theme of our work was the idea of reducing the risks of bargaining failure via coordination by AI developers on certain aspects of their systems, e.g., to address prior and equilibrium selection problems. This led us to scale up our research and grantmaking efforts at the intersection of game theory and MARL (e.g., here, here). We believe this is a promising avenue for increasing awareness of technical hurdles for successful cooperation among AI systems and constructing candidate technical solutions.
We did initial work on long-term risks from malevolent actors. Internally, we have been exploring how preferences to create suffering could arise in TAI systems.

We hired six people for our research team: Alex Lyzhov, Emery Cooper, Daniel Kokotajlo, and Julian Stastny as full-time research staff; Maxime Riché as a research engineer; Jia Yuan Loke as a part-time research assistant. Another offer is still pending.

With the CLR Fund, we made three grants designed to help junior researchers skill up. The recipients were Anthony DiGiovanni, Rory Svarc, and Johannes Treutlein.

We ran a summer research fellowship for the first time. Nine fellows participated in the 3-months program. We were able to make at least four hires and two grants as a direct result. A thorough evaluation is ongoing.
We gave a series of talks at various EA and AI safety organizations: 80,000 Hours, CHAI, CSER, FHI, GPI, OpenAI, and the Open Philanthropy Project.

About us

We are building a global community of researchers and professionals working on reducing risks of astronomical suffering (s-risks). (Read more about us here.)

Earlier this year, we consolidated the activities related to s-risks from the Effective Altruism Foundation and the Foundational Research Institute under one name: the Center on Long-Term Risk (CLR). We have been based in London since late 2019. Our team is currently about 10 full-time equivalents strong, with most of our employees full time.

Plans for 2021

By focus areas

Cooperation, conflict, and transformative artificial intelligence

At the end of last year, we published a research agenda on this topic. After significant progress in 2020 (see Review section), work in this area will continue to be our main priority in 2021.

We plan to further refine our prioritization between different research directions and intervention types within this broad area. Interventions differ across a multitude of dimensions. For instance, some are multilateral in that they require technical solutions to be implemented by multiple actors, whereas others are unilateral. Some interventions primarily address acausal conflict; others causal ones. We want to better prioritize between these dimensions. This will often require object-level work, e.g., to learn more about the tractability of a given intervention-type.

We plan to build a field around bargaining in artificial learners (see the related sections 3-6 of our research agenda) using mainly tools from game theory and multi-agent reinforcement learning (MARL). We want to draw both from the relevant machine learning sub-community and the longtermist effective altruism community. Through our research this year (see below), we now have a good understanding of what work we consider valuable in this field. We plan to publish original research explaining foundational technical problems in this area, finish a repository of tools for easily running experiments, and make grants to encourage others to do similar work. We plan to publish a post on this forum explaining the reasoning behind our focus on this area.

We plan to take initial steps in the field of AI governance related to cooperation & conflict involving AI systems. Following our analysis of problems in multipolar deployment scenarios, we plan to publish a post outlining the governance challenges associated with addressing these problems.

Malevolence

We first wrote about this cause in early 2020 in an EA Forum post. Since then, we have completed additional work internally, parts of which we plan to publish next year.

We plan to assess how important this area is relative to our other work because this is a new cause area, and we are still uncertain how it compares to our existing priorities. We will do this by learning more about the relevant scientific fields, technologies, and policy levers. We will also conduct or support technical work on how preferences to create suffering could arise in TAI systems. We plan to publish a post introducing this idea. We might make some targeted grants to experts who could help us improve our understanding of this area.

Exploration of other areas

Because work on s-risks is still in its infancy, it could be valuable to explore entirely new areas. This will not be a systematic effort in 2021. Individual researchers will investigate new areas if they find them sufficiently promising. Current contenders include (among other things): political polarization (or at least specific manifestations of it) and collective epistemic breakdown (e.g., as a result of increasingly powerful persuasion tools).

By organizational function

Research

Research will remain CLR’s focus in 2021 because there remain many open questions about s-risks and how to address them. Through our efforts this year, we have also placed ourselves in a good position to scale up our research efforts (see “Review of 2020” below).

During the first half of 2021, we expect to publish primarily on technical and governance aspects of catastrophic bargaining failures involving AI systems. See the previous section for more details.
We will continue trying to grow our research team. We want to be in a position where we have a research lead and at least two full-time researchers for each priority research area (e.g., technical aspects of bargaining, malevolence, acausal trade, AI governance).

Grantmaking

We will grow our grantmaking efforts in 2021. We will focus increasingly on proactive grants following investigations of specific fields. We have found general application rounds not to be very valuable so far.

Last year, some of our staff began advising the Center for Emerging Risk Research (CERR). They are a new nonprofit with the mission to improve the quality of life of future generations. They will make an announcement about their work soon. We are very excited about this opportunity to leverage our work for further impact. We will begin recommending potential grants to this foundation as we build up our grantmaking capabilities. The CLR Fund will continue to operate within CLR and independently of CERR.
Through our explorations of bargaining in artificial learners and malevolence in 2020, we realized that we need to lay further groundwork in these areas via in-house research before ramping up our grantmaking. For bargaining in artificial learners, we plan to produce technical research to serve as an example for potential grantees of the kind of work we would find valuable. For malevolence, we plan to improve our general understanding of the area to determine what types of grants we should recommend. Our grantmaking in those areas will scale up as we make progress on these fronts.
Depending on our capacity, we plan to conduct shallow investigations of other aspects relevant to AI conflict & cooperation (e.g., behavioral game theory of human-AI interaction or AI governance).

Community-building

We will continue our routine community-building activities in 2021 while running tests of more efficient ways of getting people up to speed on our thinking. This work has been important for cultivating hires at CLR. We expect to invest about as many resources into this as in 2020.

We are mostly satisfied with our current setup: For learning about potential contributors, we scout at events, regularly ask for referrals, and maintain multiple channels for people to get in touch. We do 1:1 calls with people we think are particularly likely to contribute to our priorities. To increase their involvement, we ran a summer research fellowship for the first time in 2020 (see “Review of 2020”). Since it was very successful, we expect to rerun it in 2021.
At this point, we believe that a key bottleneck is for more people to get up to speed with our thinking once they have become interested. So we will experiment with formats to address that bottleneck at scale without requiring 1:1 calls. For instance, we plan to run an introduction seminar for people interested in s-risks.
Another challenge is the lack of diversity among people interested in s-risks. Our impression is that it skews male and white, possibly more so than the general EA community. Several team members have expressed that a more diverse team and community would increase how much they thrive at work. We also find it plausible that an organization's diversity affects the decision of some people whether to apply or accept an offer. We have already taken significant steps to ensure that our hiring processes lead to fewer false negatives for people from underrepresented groups (e.g., blinding while scoring and unblinding before making decisions, lowering the bar for advancing in initial rounds). We have also taken significant steps to encourage people from underrepresented groups to apply in the first place. However, we noticed that we often had fewer ties to potential applicants from these groups and that our ties tended to be weaker. We tentatively concluded that it should be a priority to make our hiring pool more diverse. We will consider how to achieve this through our community-building work. This may include: asking people from underrepresented groups what we can do better; relying less on informal networking and more on formal programs or events; finding somebody from an underrepresented group to lead our community-building efforts.

Dissemination & advocacy

We are still uncertain what we will do to disseminate our research and advocate for our priorities. First, we plan to review several key decisions that have influenced our past efforts. For instance, we will evaluate the effects of the communication guidelines written in collaboration with Nick Beckstead from the Open Philanthropy Project. (For more details on these guidelines, see this section of our review from last year.) We had originally planned to do so at the end of this year but postponed it by a few months. Second, the development of the COVID-19 pandemic will determine whether we can run in-person events and travel to important EA hubs like Oxford and the San Francisco Bay Area. In any case, we expect to continue to give talks and to share our work through targeted channels.

High-leverage projects

We will continue exploring the possibility of high-leverage projects that could enable many more people to work in our priority areas.

We are considering spinning off a machine learning lab focused on multi-agent AI safety research. Its singular focus on machine learning could attract a different, and possibly larger, set of potential hires than CLR. At the current stage, we expect our default research efforts to be the best way to lay the groundwork for pursuing this in the future.

We are exploring options to found an academic institute. Such an institute affiliated with a university would be more prestigious than an independent research organization. There are also serious downsides, such as less research flexibility, which is why we would not transfer all of our work. If founded, we would see it as an additional entity that could attract different kinds of researchers. We are still deliberating about the focus such an institute should have. One potential direction is the intersection of decision theory and artificial intelligence. We are already in contact with several universities and will pursue those leads further.

Evaluation

We plan to improve how we evaluate our work and impact. Currently, we only do systematic annual reviews of our activities internally. We plan to elicit feedback from outside experts to assess the quality and impact of our work. We are considering survey work, in-depth assessment of specific research output, and qualitative interviews.

Review of 2020

Last year, we wrote that the most appropriate way to review our work each year would be to answer “a set of deliberately vague performance questions” (inspired by GiveWell’s self-evaluation questions). We put these questions to our team and used their input to write the overall assessment below. We plan to improve this procedure further next year.

This year was a year of transition for CLR, both in terms of staff changes and building out new research directions in malevolence and bargaining. Our successes consisted mostly of building long-term capacity and making internal research progress, rather than public research dissemination. The work we have done this year has laid the groundwork for more public research to be released in 2021 (see above).

Building long-term capacity

Have we made progress towards becoming a research group and community that will have an outsized impact on the research landscape and relevant actors shaping the future? (This question tracks whether we are building the right long-term capacity to produce excellent research and making it applicable to the real world. It also includes whether we are focusing on the correct fields, questions, and activities to begin with.)

We have increased our capacity substantially across most functions of the organization.

Research

We hired six people for our research team: Alex Lyzhov, Emery Cooper, Daniel Kokotajlo, and Julian Stastny as full-time research staff; Maxime Riché as a research engineer; Jia Yuan Loke as a part-time research assistant. Another offer is still pending.

With the CLR Fund, we made three grants designed to help junior researchers skill up. The recipients were Anthony DiGiovanni, Rory Svarc, and Johannes Treutlein.

Much of our research this year constitutes capacity-building. It opened up a lot of opportunities for further study, grantmaking, and strategy progress. For instance, the post on Reducing long-term risks from malevolent actors created a novel cause area for CLR and others in the community. This has already led to internal research progress, some of which we will publicize early next year. Another example is our work on an internal research repository of tools for our machine learning research that will facilitate future work in this area.

Grantmaking

In 2020, we completed three shallow investigations related to our grantmaking efforts: moral circle expansion, malevolence, and technical research at the intersection of machine learning and bargaining. We are actively pursuing grant opportunities in the last area.

Community-building

We ran a 3-months long summer research fellowship for the first time. We received 67 applications and made 11 offers, all of which were accepted. Two of them will do their fellowship in 2021 instead of this year. We were able to make at least four hires and two grants as a direct result, which we think is a good indication of the program’s success. We are still conducting a more rigorous evaluation of the program focusing on the experience of the fellows and how the program benefitted them. The experience we gained this year will make it easier to rerun an improved program with fewer resources.

Operations

The only function where our capacity shrank is operations. Our COO, Alfredo Parra, and Daniel Kestenholz, part-time operations analyst, left. Their responsibilities were taken over by Stefan Torges and Amrit Sidhu-Brar, who joined our team earlier this year in a part-time capacity. This has not been enough to compensate for Alfredo’s and Daniel’s departures, so we decided to bring on Jia Yuan Loke, who will start in early 2021 (splitting his work between operations and research). At that point, we expect to be at a capacity level similar to that at the beginning of 2020.

Research progress

Has our work resulted in research progress that helps reduce s-risks (both in-house and elsewhere)?

Cooperation, conflict, and transformative artificial intelligence

A major theme of our work this year has been that risks of bargaining failure might be reduced via coordination by AI developers on certain aspects of their systems, e.g., to address prior and equilibrium selection problems. This suggests potential interventions in both AI governance and technical AI safety, some of which we plan to write on publicly in the first half of 2021 (see above).

Our work on bargaining failure has also led us to scale up our efforts at the intersection of game theory and multi-agent reinforcement learning (e.g., here, here). We have identified this as a promising avenue for increasing awareness of technical hurdles for successful cooperation among AI systems and constructing candidate technical solutions to some of these problems. Our ongoing work includes building a repository of algorithms, environments, and other tools to facilitate machine learning research in multi-agent environments. This repository better captures the kinds of cooperation problems we are interested in than the environments currently studied in the literature and allows for better evaluation of multi-agent machine learning methods.

Reducing risks from malevolent actors

Beginning with our post on reducing long-term risks from malevolent actors, we have been investigating possible pathways to s-risks from both malevolent humans and analogous phenomena in AI systems. This includes an ongoing investigation of possible grantmaking to reduce the influence of malevolent humans and a post introducing the risk of preferences to create suffering arising in TAI systems.

List of public research in 2020

David Althaus, Tobias Baumann (Center for Reducing Suffering): Reducing long-term risks from malevolent actors (Effective Altruism Forum)
Jesse Clifton: Equilibrium and prior selection problems in multipolar deployment (Alignment Forum)
Jesse Clifton, Maxime Riché: Towards cooperation in learning games (paper draft)
Lukas Gloor: Moral anti-realism sequence (#2, #3, #4, #5) (Effective Altruism Forum)
Adrian Hutter (Google Zurich): Learning in two-player games between transparent opponents (arXiv)³
Daniel Kokotajlo: The date of AI Takeover is not the day the AI takes over (LessWrong)
Daniel Kokotajlo: How Roodman's GWP model translates to TAI timelines (LessWrong)
Daniel Kokotajlo: Persuasion tools: AI takeover without AGI or agency? (LessWrong)
Anni Leskelä: Commitment and credibility in multipolar AI scenarios (LessWrong)
Caspar Oesterheld (Duke University), Vincent Conitzer (Duke University): Safe Pareto improvements for delegated game-playing (NeurIPS Cooperative AI Workshop)⁴

Grantees of the CLR Fund also published research over the course of 2020. Kaj Sotala expanded his sequence on multi-agent models of mind. Arif Ahmed published two articles on evidential decision theory in the journal Mind. The Wild Animal Initiative published a post on long-term design considerations of wild animal welfare interventions.

Research dissemination

Have we communicated our research to our target audience, and has the target audience engaged with our ideas?

The main effort to disseminate our work was a series of talks at various EA and AI safety organizations in the second half of this year: 80,000 Hours, CHAI, CSER, FHI, GPI, OpenAI, and the Open Philanthropy Project. We did not give our planned talk at EAG San Francisco because that conference was canceled.

Contrary to our plans for this year, we did not run any research workshops because of the COVID-19 pandemic. We decided against hosting any virtual ones because we lacked capacity and did not consider the reduced value from a virtual event worth the effort.

Organizational health

Are we a healthy organization with an effective board, staff in appropriate roles, appropriate evaluation of our work, reliable policies and procedures, adequate financial reserves and reporting, high morale, and so forth?

It is our impression that the people on our team are in the appropriate roles. We are currently trialing a new person as our Director after Jonas Vollmer left CLR in June. We will complete the evaluation of their fit soon.

We believe that most of our policies and procedures are sound. However, many people joined our team this year. This requires us to be more explicit about some policies than we have been in the past, e.g., compensation policy, team retreat participation. We are addressing these issues as they come up, which has worked well so far.

Our financial reserves decreased significantly this year, which we are trying to address with our December fundraiser (see “Financials” below). elow). We are glad that CERR (see above) committed to contribute roughly their “fair share” to CLR. However, this is not enough to cover all of our expenses. (see below for more information on our financial situation)

Financials

Budget 2021: $1,830,000 (13.7 expected full-time equivalent employees).
CLR reserves as of October 2020: $950,000. (This corresponds to about 6.5 months of runway projecting from the 2021 budget and not accounting for commitments from CERR. Including commitments from CERR, our runway is 7.3 months.)
CLR Fund balance as of November 30, 2020: $576,879. (We cannot use these funds for CLR operations.)
Room for more funding: $550,000 (to attain 12 months of reserves). Stretch goal: $1,500,000 (to attain 18 months of reserves).
We invest funds that we are unlikely to deploy soon in the global stock market as per our investment policy.

How to contribute

Make a donation. As we detail above, we aim to raise $550,000 for CLR (stretch goal: $1,500,000). Your contribution makes a difference.
Stay up to date. You can subscribe to monthly or biannual updates.
Work with us. We are always looking for capable people to join our team. You can express your interest in working with us on our website.
Get career advice. If you are interested in our priorities, we are happy to discuss your career plans with you. You can register your interest in a call here.

Adrian Hutter developed this paper inspired by conversations at one of our research workshops in 2019.
Caspar Oesterheld developed some of the conceptual ideas of this paper while he worked at the Foundational Research Institute, one of the predecessor organizations of the Center on Long-Term Risk.
Adrian Hutter developed this paper inspired by conversations at one of our research workshops in 2019.
Caspar Oesterheld developed some of the conceptual ideas of this paper while he worked at the Foundational Research Institute, one of the predecessor organizations of the Center on Long-Term Risk.

Contents

Plans for 2021 & Review of 2020

Summary

About us

Plans for 2021

By focus areas

Cooperation, conflict, and transformative artificial intelligence

Malevolence

Exploration of other areas

By organizational function

Research

Grantmaking

Community-building

Dissemination & advocacy

High-leverage projects

Evaluation

Review of 2020

Building long-term capacity

Research

Grantmaking

Community-building

Operations

Research progress

Cooperation, conflict, and transformative artificial intelligence

Reducing risks from malevolent actors

List of public research in 2020

Research dissemination

Organizational health

Financials

How to contribute