Commenting on MSR, Part 1: Multiverse-wide cooperation in a nutshell
Published on the CLR blog, where researchers are free to explore their own ideas on how humanity can best reduce suffering. (more)
This is a post I wrote about Caspar Oesterheld’s long paper Multiverse-wide cooperation via coordinated decision-making. Because I have found the idea tricky to explain – which unfortunately makes it difficult to get feedback from others on whether the thinking behind it makes sense – I decided to write a shorter summary. While I am hoping that my text can serve as a standalone piece, for additional introductory content I also recommend reading the beginning of Caspar’s paper, or watching the short video introduction here (requires basic knowledge of the “CDT, EDT or something else” debate in decision theory).
- 0. Elevator pitch
- 1. A primer on non-causal decision theory
- 2. A multiverse ensures the existence of agents with decision algorithms extremely similar to ours
- 3. We are playing a multiverse-wide prisoner’s dilemma against (close) copies of our decision algorithm
- 4. Interlude for preventing misunderstandings: Multiverse-wide cooperation is different from acausal trade!
- 5. MSR represents a shift in one’s ontology; it is not just some “trick” we can attempt for extra credit
- 6. Lack of knowledge about aliens is no obstacle because a minimally viable version of MSR can be based on what we observe on earth
0. Elevator pitch
(Disclaimer: Especially for the elevator pitch section here, I am sacrificing accuracy and precision for brevity. References can be found in Caspar’s paper.)
It would be an uncanny coincidence if the observable universe made up everything that exists. The reason we cannot find any evidence for there being stuff beyond the edges of our universe is not because it is likely that there is nothingness, but because photons from further away simply would not have had sufficient time after the big bang to reach us. This means that the universe we find ourselves in may well be vastly larger than what we can observe, in fact even infinitely larger. The theory of inflationary cosmology in addition hints at the existence of other universe bubbles with different fundamental constants forming or disappearing under certain conditions, somehow co-existing with our universe in parallel. The umbrella term multiverse captures the idea that the observable universe is just a tiny portion of everything that exists. The multiverse may contain myriads of worlds like ours, including other worlds with intelligent life and civilization. An infinite multiverse (of one sort or another) is actually amongst the most popular cosmological hypotheses, arguably even favored by the majority of experts.
Many ethical theories (in particular most versions of consequentialism) do not consider geographical distance of relevance to moral value. After all, suffering and the frustration of one’s preferences is bad for someone regardless of where (or when) it happens. This principle should apply even when we consider worlds so far away from us that we can never receive any information from there. Moral concern over what happens elsewhere in the multiverse is one requirement for the idea I am now going to discuss.
Multiverse-wide cooperation via superrationality (abbreviation: MSR) is the idea that, if I think about different value systems and their respective priorities in the world, I should not work on the highest priority according to my own values, but on whatever my comparative advantage is amongst all the interventions favored by the value systems of agents interested in multiverse-wide cooperation. (Another route to gains from trade is to focus on convergent interests, pursuing interventions that may not be the top priority for any particular value system, but are valuable from a maximally broad range of perspectives.) For simplicity reasons, I will refer to this as simply “cooperating” from now on.
A decision to cooperate, according to some views in decision theory, gives me rational reason to believe that agents in similar decision situations elsewhere in the multiverse, especially the ones who are most similar to myself in how they reason about decision problems, are likely to cooperate as well. After all, if two very similar reasoners think about the same decision problem, they are likely to reach identical answers. This suggests that they will end up either both cooperating, or both defecting. Assuming that the way agents find decisions is not strongly constrained or otherwise affected by their values, we can expect there to be agents with different values who reason about decision problems the same way we do, who come to identical conclusions. Cooperation then produces gains from trade between value systems.
While each party would want to be the sole defector, the mechanism behind multiverse-wide cooperation – namely that we have to think of ourselves as being coupled with those agents in the multiverse who are most similar to us in their reasoning – ensures that defection is disincentivized: Any party that defects would now have to expect that their highly similar counterparts would also defect.
The closest way to approximate the value systems of agents in other parts of the multiverse, given our ignorance about how the multiverse looks like, is to assume that substantial parts of it at least are going to be similar to how things are here, where we can study them. A minimally viable version of multiverse-wide cooperation can therefore be thought of as all-out “ordinary” cooperation with value systems we know well (and especially ones that include proponents sympathetic to MSR reasoning). This suggests that, while MSR combines speculative-sounding ideas such as non-standard decision theory and the existence of a multiverse, its implications may not be all that strange and largely boil down to the proposal that we should be “maximally” cooperative towards other value systems.
1. A primer on non-causal decision theory
Leaving aside for the moment the whole part about the multiverse, MSR is fundamentally about cooperating in a prisoner’s-dilemma-like situation with agents who are very similar to ourselves in the way they reason about decision problems. Douglas Hofstadter coined the term superrationality for the idea that one should cooperate in a prisoner’s dilemma if one expects the other party to follow the same style of reasoning. If they reason the same way I do, and the problem they are facing is the same kind of problem I am facing, then I must expect that they will likely come to the same conclusion I will come to. This suggests that the prisoner’s dilemma in question is unlikely to end with an asymmetric outcome – (cooperate I defect) or (defect I cooperate) –, but likely to end with a symmetric outcome – (cooperate I cooperate) or (defect I defect). Because (cooperate I cooperate) is the best outcome for both parties amongst the symmetric outcomes, superrationality suggests one is best served by cooperating.
At this point, readers may be skeptical whether this reasoning works. There seems to be some kind of shady action at a distance involved, where my choice to cooperate is somehow supposed to affect the other party’s choice, even though we are assuming that no information about my decision reaches said other party. But we can think of it this way: If reasoners are deterministic systems, and two reasoners follow the exact same decision algorithm in a highly similar decision situation, it at some point becomes logically contradictory to assume that the two reasoners will end up with diametrically opposed conclusions.
Side note: By decision situations having to be “highly similar,” I do not mean that the situations agents find themselves in have to be particularly similar with respect to little details in the background. What I mean is that they should be highly similar in terms of all decision-relevant variables, the variables that are likely to make a difference to an agent’s decision. If we imagine a simplified decision situation where agents have to choose between two options, either press a button or not (and then something happens or not), it probably matters little whether one agent has the choice to press a red button and another agent is faced with pressing a blue button. As long as both buttons do the same thing, and as long as the agents are not (emotionally or otherwise) affected by the color differences, we can safely assume that the color of the button is highly unlikely to play a decision-relevant role. What is more likely relevant are things such as the payoffs (value according what an agent cares about) the agents expect from the available options. If one agent believes they stand to receive positive utility from pressing the button, and the other stands to receive negative utility, then that is guaranteed to make a relevant difference as to whether the agents will want to press their buttons. Maybe the payoff differentials are also relevant sometimes, or are at least probabilistically relevant with some probability: If one agent only gains a tiny bit of utility, whereas the other agent has an enormous amount of utility to win, the latter agent might be much more motivated to avoid taking a suboptimal decision. While payoffs and payoff structures certainly matter, it is unlikely that it matters what qualifies as a payoff for a given agent: If an agent who happens to really like apples will be rewarded with tasty apples after pressing a button, and another agent who really likes money is rewarded with money, their decision situations seem the same provided that they each care equally strongly about receiving the desired reward. (This is the intuition behind the irrelevance of specific value systems for whether two decision algorithms or decision situations are relevantly similar or not. Whether one prefers apples, money, carrots or whatever, math is still math and decision theory is still decision theory.)
A different objection that readers may have at this point concerns the idea of superrationally “fixing” other agents’ decisions. Namely, critics may point out that we are thereby only ever talking about updating our own models, our prediction of what happens elsewhere, and that this does not actually change what was going to happen elsewhere. While this sounds like an accurate observation, the force of the statement rests on a loaded definition of “actually changing things elsewhere” (or anywhere for that matter). If we applied the same rigor to a straightforward instance of causally or directly changing the position of a light switch in our room, a critic may in the same vain object that we only changed our expectation of what was going to happen, not what actually was going to happen. The universe is lawful: nothing ever happens that was not going to happen. What we do when we want to have an impact and accomplish something with our actions is never to actually change what was going to happen; instead, it is to act in the way that best shifts our predictions favorably towards our goals. (This is not to be confused with cheating at prediction: We don’t want to make ourselves optimistic for no good reason, because the decision to bias oneself towards optimism does not actually correlate with our goals getting accomplished – it only correlates with a deluded future self believing that we will be accomplishing our goals.)
For more reading on this topic, I recommend this paper on functional decision theory, the book Evidence, Decision and Causality or the article On Correlation and Causation Part 1: Evidential decision theory is correct. For an overview on different decision theories, see also this summary.
To keep things simple and as uncontroversial as possible, I will follow Caspar’s terminology for the rest of my post here and use the term superrationality in a very broad sense that is independent of any specific flavor of decision theory, referring to a fuzzy category of arguments from similarity of decision algorithms that favor cooperating in certain prisoner’s-dilemma-like situations.
2. A multiverse ensures the existence of agents with decision algorithms extremely similar to ours
The existence of a multiverse would virtually guarantee that there are many agents out there who fulfill the criteria of “relevant similarity” compared to us with regard to their decision algorithm and decision situations – whatever these criteria may boil down to in detail.
Insertion: Technically, if the multiverse is indeed infinite, there will likely be infinitely many such agents, and infinite amounts of everything in general, which admittedly poses some serious difficulties for formalizing decisions: If there is already an infinite amount of value or disvalue, it seems like all our actions should be ranked the same in terms of the value of the outcome they result in. This leads to so-called infinitarian paralysis, where all actions are rated as equally good or bad. Perhaps infinitarian paralysis is a strong counterargument to MSR. But in that case, we should be consistent: Infinitarian paralysis would then also be a strong counterargument to aggregative consequentialism in general. Because it affects nearly everything (for consequentialists), and because of how drastic its implications would be if there was no convenient solution, I am basically hoping that someone will find a solution that makes everything work again in the face of infinities. For this reason, I think we should not think of MSR as being particularly in danger of failing for reasons of infinitarian paralysis.
Back to object-level MSR: We noted that the multiverse guarantees that there are agents out there very similar to us who are likely to tackle decision problems the same way we do. To prevent confusion, note that MSR is not based on the naive assumption that all humans who find the concept of superrationality convincing are therefore strongly correlated with each other across all possible decision situations. Superrationality only motivates cooperation if one has good reason to believe that another party’s decision algorithm is indeed extremely similar to one’s own. Human reasoning processes differ in many ways, and sympathy towards superrationality represents only one small dimension of one’s reasoning process. It may very well be extremely rare that two people’s reasoning is sufficiently similar that, having common knowledge of this similarity, they should rationally cooperate in a prisoner’s dilemma.
But out there somewhere, maybe on Earth already in a few instances among our eight-or-so billion inhabitants, but certainly somewhere in the multiverse if a multiverse indeed exists, there must be evolved intelligent beings who are sympathetic towards superrationality in the same way we are, who in addition also share a whole bunch of other structural similarities with us in the way they reason about decision problems. These agents would construe decision problems related to cooperating with other value systems in the same way we do, and pay attention to the same factors weighted according to the same decision-normative criteria. When these agents think about MSR, they would be reasonably likely to reach similar conclusions with regard to the idea’s practical implications. These are our potential cooperation partners.
I have to admit that it seems very difficult to tell which aspects of one’s reasoning are more or less important for the kind of decision-relevant similarity we are looking for. There are many things left to be figured out, and it is far from clear whether MSR works at all in the sense of having action-guiding implications for how we should pursue our goals. But the underlying idea here is that once we pile up enough similarities of the relevant kind in one’s reasoning processes (and a multiverse would ensure that there are agents out there who do indeed fulfill these criteria), at some point it becomes logically contradictory to treat the output of our decisions as independent from the decisional outputs of these other agents. This insight seems hard to avoid, and it seems quite plausible that it has implications for our actions.
If I were to decide to cooperate in the sense implied by MSR, I would have to then update my model of what is likely to happen in other parts of the multiverse where decision algorithms highly similar to my own are at play. Superrationality says that this update in my model, assuming it is positive for my goal achievement because I now predict more agents to be cooperative towards other value systems (including my own), in itself gives me reason to go ahead and act cooperatively. If we manage to form even a crude model of some of the likely goals of these other agents and how we can benefit them in our own part of the multiverse, then cooperation can already get off the ground and we might be able to reap gains from trade.
Alternatively, if we decided against becoming more cooperative, we learn that we must be suffering costs from mutual defection.This includes both opportunity costs and direct costs from cases where other parties’ favored interventions may hurt our values.
3. We are playing a multiverse-wide prisoner’s dilemma against (close) copies of our decision algorithm
We are assuming that we care about what happens in other parts of the multiverse. For instance, we might care about increasing total happiness. If we further assume that decision algorithms and the values/goals of agents are distributed orthogonally – meaning that one cannot infer someone’s values simply by seeing how they reason practically about epistemic matters – then we arrive at the conceptualization of a multiverse-wide prisoner’s dilemma.
(Note that we can already observe empirically that effective altruists who share the same values sometimes disagree strongly about decision theory (or more generally reasoning styles/epistemics), and effective altruists who agree on decision theory sometimes disagree strongly about values. In addition, as pointed out in section one, there appears to be no logical reason as to why agents with different values would necessarily have different decision algorithms.)
The cooperative action in our prisoner’s dilemma would now be to take other value systems into account in proportion to how prevalent they are in the multiverse-wide compromise. We would thus try to benefit them whenever we encounter opportunities to do so efficiently, that is, whenever we find ourselves with a comparative advantage to strongly benefit a particular value system. By contrast, the action that corresponds to defecting in the prisoner’s dilemma would be to pursue one’s personal values with zero regard for other value systems. The payoff structure is such that an outcome where everyone cooperates is better for everyone than an outcome where everyone defects, but each party would prefer to be a sole defector.
Consider for example someone who is in an influential position to give advice to others. This person can either tailor their advice to their own specific values, discouraging others from working on things that are unimportant according to their personal value system, or she can give advice that is tailored towards producing an outcome that is maximally positive for the value systems of all superrationalists, perhaps even investing substantial effort researching the implications of value systems different from their own. MSR provides a strong argument for maximally cooperative behavior, because by cooperating, the person in question ensures that there is more such cooperation in other parts of the multiverse, which in expectation also strongly benefits their own values.
Of course there are many other reasons to be nice to other value systems (in particular reasons that do not involve aliens and infinite worlds). What is special about MSR is mostly that it gives an argument for taking the value systems of other superrationalists into account maximally and without worries of getting exploited for being too forthcoming. With MSR, mutual cooperation is achieved by treating one’s own decision as a simulation/prediction for agents relevantly similar to oneself. Beyond this, there is no need to guess the reasoning of agents who are different. The updates one has to make based on MSR considerations are always symmetrical for one’s own actions and the actions of other parties. This mechanism makes it impossible to enter asymmetrical (cooperate-defect or defect-cooperate) outcomes.
(Note that the way MSR works does not guarantee direct reciprocity in terms of who benefits whom: I should not choose to benefit value system X in my part of the multiverse in the hope that advocates of value system X in particular will, in reverse, be nice to my values here or in other parts of the multiverse. Instead, I should simply benefit whichever value system I can benefit most, in the expectation that whichever agents can benefit my values the most – and possibly that turns out to be someone with value system X – will actually cooperate and benefit my values. To summarize, hoping to be helped by value system X for MSR-reasons does not necessarily mean that I should help value system X myself – it only implies that I should conscientiously follow MSR and help whoever benefits most from my resources.)
4. Interlude for preventing misunderstandings: Multiverse-wide cooperation is different from acausal trade!
Before we can continue with the main body of explanation, I want to proactively point out that MSR is different from acausal trade, which has been discussed in the context of artificial superintelligences reasoning about each others’ decision procedures. There is a danger that people lump the two ideas together, because MSR does share some similarities with acausal trade (and can arguably be seen as a special case of it). Namely, both MSR and acausal trade are standardly being discussed in a multiverse context and rely crucially on acausal decision theories. There are, however, several important differences: In the acausal trade scenario, two parties simulate each other’s decision procedures to prove that one’s own cooperation ensures cooperation in the other party. MSR, by contrast, does not involve reasoning about the decision procedures of parties different from oneself. In particular, MSR does not involve reasoning about whether a specific party’s decisions have a logical connection with one’s own decisions or not, i.e., whether the choices in a prisoner’s-dilemma-like situation can only result in symmetrical outcomes or not. MSR works through the simple mechanism that one’s own decision is assumed to already serve as the simulation/prediction for the reference class of agents with relevantly similar decision procedures.
MSR is therefore based mostly on looser assumptions than acausal trade, because it does not require having the technological capability to accurately simulate another party’s decision algorithm. There is one aspect in which MSR is based on stronger assumptions than acausal trade. Namely, MSR is based on the assumption that one’s own decision can function as a prediction/simulation for not just identical copies of oneself in a boring twin universe where everything plays out exactly the same way as in our universe, but also for an interesting spectrum of similar-but-not-completely-identical parts of the multiverse that include agents who reason the same way about their decisions as we do, but may not share our goals. This is far from a trivial assumption, and I strongly recommend doing some further thinking about this assumption. But if the assumption does go through, it has vast implications for not (just) the possibility of superintelligences trading with each other, but for a form of multiverse-wide cooperation that current-day humans could already engage in.
5. MSR represents a shift in one’s ontology; it is not just some “trick” we can attempt for extra credit
The line of reasoning employed in MSR is very similar to the reasoning employed in anthropic decision problems. For comparison, take the idea that there are numerous copies of ourselves across many ancestor simulations. If we thought this was the case, reasoning anthropically as though we control all our copies at once could, for certain decisions, change our prioritization: If my decision to reduce short-term suffering plays out the same way in millions of short-lived, simulated versions of earth where focusing on the far future is impossible to pay out, I have more reason to focus on short-term suffering than I thought.
MSR applies a similar kind of reasoning where we shift our thinking from being a single instance of something to thinking in terms of deciding for an entire class of agents. MSR is what follows when one extends/generalizes the anthropic slogan “Acting as though you are all your (subjectively identical) copies at once” to “Acting as though you are all copies of your (subjective probability distribution over your) decision algorithm at once.”
Rather than identifying solely with one’s subjective experiences and one’s goals/values, MSR also involves “identifying with” – on the level of predicting consequences relevant to one’s decision – one’s general decision algorithm. If the assumptions behind MSR are sound, then deciding not to change one’s actions based on MSR has to cause an update in one’s world model, an update about other agents in one’s reference class also not cooperating. So the underlying reasoning that motivates MSR is something that has to permeate our thinking about how to have an impact on the world, whether we decide to let it affect our decisions or not. MSR is a claim about what is rational to do given that our actions have an impact in a broader sense than we may initially think, spanning across all instances of one’s decision algorithm. It changes our EV calculations and may in some instances even flip the sign – net positive/negative – of certain interventions. Ignoring MSR is therefore not necessarily the default, “safe” option.
6. Lack of knowledge about aliens is no obstacle because a minimally viable version of MSR can be based on what we observe on earth
Once we start deliberating whether to account for the goals of other agents in the multiverse, we run into the problem that we have a very poor idea of what the multiverse looks like. The multiverse may contain all kinds of strange things, including worlds where physical constants are different from the ones in our universe, or worlds where highly improbable things keep happening for the same reason that, if you keep throwing an infinite number of fair coins, some of them somewhere will produce uncanny sequences like “always heads” or “always tails.”
Because it seems difficult and intractable to envision all the possible landscapes in different parts of the multiverse, what kind of agents we might find there, and how we can benefit the goals of these agents with our resources here, one might be tempted to dismiss MSR for being too impractical a consideration. However, I think this would be a premature dismissal. We may not know anything about strange corners of the multiverse, but we know at the very least how things are in our observable universe. As long as we feel like we cannot say anything substantial about how, specifically, the parts of the multiverse that are completely different from the things we know differ from our environment, then we may as well ignore these others parts. For practical purposes, we do not have to speculate about parts of the multiverse that would be completely alien to us (yay!), and can instead focus on what we already know from direct experience. After all, our world is likely to be representative for some other worlds in the multiverse. (This holds for the same reason that a randomly chosen television channel is more likely than not to be somewhat representative of some other television channels, rather than being completely unlike any other channel.) Therefore, we can be reasonably confident that out there somewhere, there are planets with an evolutionary history that, although different from ours in some ways, also produced intelligent observers who built a technologically advanced civilization. And while many of these civilizations may contain agents with value systems we have never thought about, some of these civilizations will also contain earth-like value systems.
It anyway seems plausible that our comparative advantage lies in helping those value systems about whom we can attain the most information. If we survey the values of people on earth, and perhaps also how much these values correlate with sympathies for the concept of superrationality and taking weird arguments to their logical conclusion, this already gives us highly useful information about the values of potential cooperators in the multiverse. MSR then implies strong cooperation with value systems that we already know (perhaps adjusted by the degree their proponents are receptive to MSR ideas).
By “strong cooperation,” I mean that one should ideally pick interventions based on considerations of personal comparative advantages: If there is a value system for which I could create an extraordinary amount of (variance-adjusted; see chapter 3 of this dissertation for an introduction) value given my talents and position in the world, I should perhaps exclusively focus on benefitting specifically that value system. Meta interventions that are positive for many value systems at once also receive a strong boost by MSR considerations and should plausibly be pursued at high effort even in case they do not come out as the top priority absent MSR considerations. (Examples for such interventions are e.g. making sure that any superintelligent AIs that are built can cooperate with other AIs, or that people who are uncertain about their values should not waste time with philosophy and instead try to benefit existing value systems MSR-style.) Finally, one should also look for more cooperative alternatives when considering interventions that, although positive for one’s own value system, may in expectation cause harm to other value systems.