Commenting on MSR, Part 2: Cooperation heuristics

Published on the CLR blog, where researchers are free to explore their own ideas on how humanity can best reduce suffering. (more)


This post was originally written for internal discussions only; it is half-baked and unpolished. The post assumes familiarity with the ideas discussed in Caspar Oesterheld’s paper Multiverse-wide cooperation via coordinated decision-making. I wrote a short introduction to multiverse-wide cooperation in an earlier post (but I still recommend reading parts of Caspar’s original paper, or this more advanced introduction, because several of the points that follow below build on topics not covered in my introduction). With that out of the way: In this post, I will comment on what I think might be interesting aspects of multiverse-wide cooperation via superrationality (abbreviation: MSR) and what I think might be its practical implications – if the idea works at all. I will focus particularly on aspects where I place more emphasis on certain considerations than Caspar does in his paper, though most of the issues I discuss are already noted by Caspar. A major theme in my comments will be exploring how the multiverse-wide compromise changes shape once we go from a formal, idealized conception of how to think about it to real-world policy suggestions for humans. For the perhaps most interesting part of the post, skip to the section "How to trade, practically."

(Epistemic status: Highly speculative. I am outlining practical implications not because I am convinced that they are what we should do, but as an exercise in what I think would follow given certain assumptions.)

Decision heuristics for cooperation

Under idealized conditions, each MSR participant would attempt to follow the same multiverse-wide compromise utility function (MCUF) reflecting the distribution of values amongst all superrational cooperators. In practice, trying to formalize a detailed, probabilistic model of what the MCUF would look like, and consulting it for every decision, is much too noisy and effortful. A more practical strategy for implementing MSR is to come up with heuristics that approximate the MCUF reasonably well for the purposes needed. Let’s call these cooperation heuristics (CHs). A simple example for a heuristic might be “Perform actions that benefit the value systems of other superrational cooperators considerably if you can do so at low cost, and refrain from hurting these value systems if you would only expect comparatively little gains from doing so.” This example heuristic is easy to follow and unlikely to go wrong. In fact, aside from the part about superrational cooperators, it sounds like a great decision rule even for people who do not buy into MSR but are interested in low-effort ways of cooperating with other people for all the "normal" reasons (the many reasons in favor of cooperation that do not involve aliens). The primary caveat about this particular CH is that it is very vague, and that the gains from trade it produces if everyone were to follow it are far from maximal. MSR is attractive because it may make it possible for us to give other value systems even more weight through further-reaching CHs, and thereby in expectation getting more gains from trade back in return.

Asymmetries amongst potential MSR participants

In the standard prisoner’s dilemma, the participants have symmetrical information and payoff structures. MSR can be viewed as a prisoner's-dilemma-like decision problem, but one that is a lot more messy than any traditional formulation of a prisoner’s dilemma. In MSR, different instantiations of superrational cooperators find themselves with information asymmetries, different goals, different biases and different resources at their disposal. Consider this non-exhaustive list of examples for potentially asymmetric features between MSR participants:

  • Frequency: How common a value is amongst superrational reasoners.
  • Sunk costs, risk aversion: Proponents of different value systems may differ with regard to how much MSR would change their priorities. Potential MSR participants may have differing levels of sunk costs, or different risk-reward tradeoffs, when they consider changing their personal priorities more towards the priorities favored by the MCUF.
  • Cooperation saliency: MSR considerations may be more salient to proponents of certain value systems than to others. (For instance, some people might be thinking about cooperation very frequently, e.g. because their priorities pre-MSR are in part opposite to what others are pursuing, which makes it more likely that they would discover MSR early on and perhaps be more drawn towards making strong updates based on it.)
  • Knowledge about other value systems: Value systems may differ in how much they know about other potential MSR participants. (For instance, there could be worlds where all evolved intelligent beings hold the same values, which would make it difficult to speculate about other values that might be part of the MCUF.)
  • Degree of being known: Value systems may also differ in how much others know about them: Simple/elegant value systems such as variants of utilitarianism are presumably known and understood by many; whereas parochial value systems are known and understood only by few.
  • Benefitability: Some value systems may allow (for both structural or empirical reasons) for easier ways of value creation than others (i.e., they are easier to benefit).
  • Civilizational maturity: Agents who reason about the nature of the MCUF may come to radically different conclusions regarding MSR depending on the stage of civilizational development they find themselves in. For instance, perhaps later civilizations will contain more agents who reason about MSR.
  • Expertise: MSR participants may differ greatly in the type of knowledge and expertise they have. Participants may often have comparative advantages for interventions favored by their own value system, simply because they may be more viscerally motivated for pursuing those interventions or may know more about the relevant prioritization implied by their value system.
  • Mainstream bias: Minority value systems could be biased in favor of joining the things that are regarded as high status in the larger community. (Conversely, value systems that are attractive to contrarians may come with a bias against joining mainstream-favored interventions.)

Asymmetries amongst potential MSR participants call into question whether we can indeed assume that our potential cooperators are finding themselves in sufficiently similar decision situations as we find ourselves in. To recap: We are (for the sake of entertaining the argument) assuming that MSR works when two agents operate on highly similar decision algorithms and find themselves in highly similar decision situations. Under these conditions, certain approaches to decision theory, which I’m for the sake of simplicity referring to with the umbrella term superrationality, recommend reasoning as though the decision outputs of the agents in question are logically entangled and are going to output the same decisions. Asymmetric features amongst potential MSR participants now make it non-obvious whether we can still talk of the decision situations different agents find themselves in as being “relevantly similar,” or whether they break the similarity because the conclusions that participants will come to, for instance with regard to whether to incorporate MSR into their behavior or not, might be affected by these asymmetries.

Asymmetries do not break MSR, but they make it messy

So does MSR break down because decision situations are never relevantly similar? I think the answer depends on the level of abstraction at which the agents are looking at a decision, i.e., how they come to construe their decision problem. We can assume that agents interested in MSR have an incentive to pick whichever general and collectively followed process for selecting cooperation heuristics produces the largest gains from trade.

Because the correct priorities for MSR participants may in many cases depend on the exact nature and distribution of asymmetric features, we can expect that on the level of concrete execution (but not at the level of the more general decision problem) “implementing MSR” could look fairly different for different agents. Even though eeryone would try to use cooperation heuristics that produce optimal benefits, individual cooperator's cooperation heuristics would recommend different types of actions depending on whether an agent finds themselves with one type of comparative advantage or another.

To illustrate this point, consider agents who expect the highest returns from MSR from a focus on convergent priorities where they would work on interventions that are positive for their own value systems, but (absent MSR considerations) not maximally positive. Selecting interventions this way produces a compromise cluster where a few different value systems mutually benefit each other through a focus on some shared priority. Value systems A, B, C and D may for instance may have a shared priority x and value systems E, F, G and H may share priority y. (By “priority,” I mean an intervention such as “reducing existential risk” or “promoting consequentialist thinking about morality.”) By focusing a priority that one’s own value system shares with other value systems, one only benefits a subset of all value systems directly (only the ones one shares such convergent priorities with). However, through the general mechanism of doing something because it is informed by non-causal effects of one’s decision algorithm, we in theory should now also expect there to be increased coordination between value systems that form a cooperation cluster around their convergent priorities.

Similarly, following a cooperation heuristic of mostly cooperating with those value systems we know best (e.g. only with value systems we are directly familiar with) makes it more likely that civilizations full of completely alien value systems will also only cooperate with the value systems they already know (which sounds reasonable and efficient).

Focusing on convergent priorities is of course not the only strategy for generating gains from trade. Whether MSR participants should spend all their resources on convergent priorities, or whether they should rather work on (other) comparative advantages they may have at greatly benefitting particular value systems, depends on the specifics of the empirical circumstances that the agents find themselves in. The tricky part about focusing on comparative advantages rather than (just) convergent priorities is that it might be one’s comparative advantage to do something that is neutral or negative according to one’s value system, or at the very least has high opportunity costs. In such a case, one needs to be particularly confident that MSR works well enough, and that one’s CH is chosen/executed diligently enough, to generate sufficient benefits.

In practice, the whole picture of who benefits whom quickly becomes extremely complicated. A map of how different agents in MSR benefit each others’ value systems would likely contain all of the following features:

  • Compromise clusters around convergent priorities: E.g. value systems [A, B, C, D] cluster around intervention x, and value systems [E, F, G, H] cluster around intervention y.
  • Partial overlap between some of these compromise clusters: E.g. value systems [A, F, H] may share a common intervention z, on which they spend part of their attention on.
  • Arrows away from some of the convergent priorities, representing agents from whichever value system focusing on personal comparative advantages some of the time, doing things they are particularly well-suited for (with regard to their talent or position). Perhaps this includes especially benefitting e.g. value systems [Q, P, R], which are particularly hard to benefit absent finding oneself with a comparative advantage for doing that.
  • Some arrows that are crossed out, representing interventions that proponents of a particular value system would endorse pre-MSR, but refrain from pursuing because they may hurt other MSR-participating value systems.

It is apparent that things would start to look confusing pretty quickly, and it seems legitimate to question whether humans can get the details of this picture right enough to reap any gains from trade at all (as opposed to shooting oneself in the foot). On the other hand, the CHs behind how each agent of a given value system should select their priorities could be kept simple. This can work if one picks cooperation heuristics in such a way that, assuming they are universally applied, maximizes the gains from trade for all value systems. (If done properly and if the assumptions behind MSR are correct, this then corresponds to maximizing the gains from trade for one’s own value system.)

Thinking in terms of cooperation heuristics

For figuring out what MSR implies for humans, I think it is important to think in terms of agents of limited intelligence and rationality executing practical CHs, as opposed to ideal reasoners computing a maximally detailed MCUF for all decisions. Using heuristics means accepting a tradeoff between accuracy loss and practicality concerns. Accuracy in following the MCUF is important, because whether rare or hard-to-benefit value systems actually benefit from a cooperation heuristic depends on whether the heuristic is sensitive enough to notice situations where one’s comparative advantage is indeed to benefit these rare value systems. This makes it challenging to find simple heuristics that nevertheless react well to the ways in which all the features in decision situations can vary.

For these reasons, I recommend being careful with talk such as the following:

“Intervention X [insert: global warming reduction, existential risk reduction, AI safety, etc] is good from a MSR perspective.”

To be clear, there is a sense in which this way of talking can be perfectly reasonable. From the perspective of the MCUF, majority-favored interventions indeed receive a boost in how valuable they should be regarded as being, as compared to their evaluation from any single value system. However, this picture risk missing out on important nuances.

For instance, interventions that benefit value systems that are unusually hard to benefit must also receive a boost if the MCUF incorporates variance normalization (see chapter 3 here for an introduction). Using variance normalization means, roughly, that one looks at the variance of how much value or disvalue is commonly at stake for each value system and compensates for certain value systems being hard to benefit. If a value system is (for whatever structural reasons) particularly difficult to benefit, then for any of the rare instances where one is able to actually benefit said value system a great deal, doing so becomes especially important and one wants the MCUF and any CHs to be such that they recommend actually pursuing these rare instances.

These considerations paint a complicated picture. The worry with phrases like “intervention x is positive for MSR” is that it may tempt us to overlook that sometimes pursuing these interventions is heavily suboptimal if the MSR-participating agent actually has a strong comparative advantage for benefitting a value systems that is normally unusually hard to benefit. When someone hears “Intervention x is positive for MSR,” they may do more of intervention x without ever checking what other interventions are positive too, and potentially more positive for their given situation. As soon as people start to take shortcuts, there is a danger that these shortcuts will disproportionately and predictably benefit some value systems and neglect others. (We can think of shortcuts as cooperation heuristics produced by a dangerously low amount of careful thinking.)

The crucial theme here is that even if everyone always does things that are "positive according to the MCUF," if people often fail to do what is maximally positive, then it is possible for some value systems to predictably lose a lot of value in expectation or even suffer expected harm overall. Therefore, this cannot be how we should in practice implement MSR. The variance-voting or “equal gains from trade” MCUF – which I describe in more detail in the section “How to trade, ideally” – is set up such that if everyone tries to maximize it to the best of their ability, then it distributes gains equally. There is no guarantee that if everyone just picks random stuff that is positive according to this MCUF, this will be good for everyone. Cooperation heuristics have to be selected with the same principle in mind: We want to pick CHs which ensure equal gains from trade provided that everyone follows them diligently to the best of their ability.

All of this suggests that whether hearing an intervention being performed somewhere in the multiverse is “positive news” in expectation or not for a given MSR participant is actually not only a feature of that intervention itself, but also of whether the cooperation heuristic behind the decision was chosen and executed wisely. That is, it can in practice depend on things such as whether the agent who performed the intervention in question had a sufficiently large comparative advantage for it, or whether the intervention was chosen correctly for reasons of convergent priorities. With this in mind, it might in many contexts be epistemically safer to talk about CHs rather than concrete interventions being what is (without caveats) “positive from an MSR perspective.”

A failure mode to avoid

Asymmetries amongst MSR participants and the issue with choosing CHs in a way that distributes the gains from trade equally make it tricky to pick cooperation heuristics wisely. One failure mode I am particularly concerned about is the following:

Superrationalizing: When the CH you think you follow is different from the CH that actually guides your behavior.

For instance, you might think your CH produces the largest expected gains given practical concerns, but, unbeknownst to you, you only chose it the way you did because of asymmetric features that, if followed universally, would give you a disproportionate benefit. Others, who you thought will arrive at the same CH, will then adopt a different CH than the one you think you are following (perhaps biased in favor of their own benefit). You therefore lose out on the gains from trade you thought your CH would produce.

Relatedly, another manifestation of superrationalizing is that one might think one is following a CH that produces large gains from trade for one’s value system, but if the de facto execution of the CH one thinks one is following is sloppy, one has no reason to assume that the predicted gains from trade would actually materialize.

For better illustration, I am going to list some examples for different kinds of superrationalizing in a more concrete context. For this purpose, let me first introduce two hypothetical value systems held by MSR participants: Straightforwardism and Complicatedism.

Straightforwardists have practical priorities that are largely shared by the majority of value systems interested in MSR. Proponents of Complicatedism on the other hand are not excited about the canon of majority-favored interventions.

For an example of superrationalizing, let us assume that the Straightforwardists pick their CH according to the following, implicit reasoning: “When MSR participants reason very crudely about the MCUF and only draw the most salient implications with a very simple CH, such as looking for things that benefit many other value systems, this will be greatly beneficial for us. Therefore, we do not have to think too much about the specifics of the MCUF and can just focus on what is beneficial for many value systems including ours.”

By contrast, proponents of Complicatedism may worry about getting skipped in the compromise if people only perform the most salient, majority-favored interventions. So they might adopt a policy of paying extra careful attention to value systems never getting harmed by MSR in expectation, and therefore focus their own efforts disproportionately on benefitting the value system Supercomplicatedism, which only has few proponents and whose prioritization is very difficult to take into account.

Of course, MSR does not work that way, and the proponents of the two value systems above are making mistakes by, perhaps unconsciously/unthinkingly, assuming that other MSR participants will be affected symmetrically by features that are specific to only their own situation. The mistake is that if one pays extra careful attention to value systems never getting harmed by MSR because one’s own value system is in a minority that seems more at risk than the average value system, then the reasoning process at work is not “No matter the circumstances, be extra careful about value systems getting harmed.” Instead, the proper description of what is going on then would be that one unfairly privileges features that are only important for one’s own value system. To put it differently, if proponents of Straightforwardism think “I allow myself to reason crudely about MSR partners, therefore other agents are likely to think crudely about it, too – which is good for me!” they are failing to see that the reason they were tempted to think crudely is not a reason that is shared by all other compromise participants.

In order to maximize the gains from trade, proponents of both value systems, Straightforwardism and Complicatedism, have to make sure that they use a decision procedure that, in expectation, benefits both value systems equally much (weighted in proportion to how prevalent and powerful the proponents are). Straightforwardists have reason to pick a CH that also helps Complicatedists sufficiently much, and Complicatedists are incentivized to not be overly cautious and risk averse. If implemented properly, asymmetries between potential MSR participants cannot be used to gain an unfair advantage. (But maybe it is simply extremely complicated to implement MSR cooperation heuristics properly.)

For a slightly different example of superrationalizing, consider a case where Complicatedists naively place too much faith into the diligence of the Straightforwardists. They may reason as follows:

“The majority of compromise participants benefit from intervention Z. Even though intervention Z is slightly negative or at best neutral for my own values, I should perform intervention Z. This is because if I am diligent enough to support Z for the common good, as it seems best for a majority of compromise participants and therefore an obvious low-hanging fruit for doing my part in maximizing the MCUF, other agents will also be diligent in the way they implement MSR. Others being diligent then implies that whichever agents are in the best position to reward my own value system will indeed do so.”

This reasoning is sound in theory. But it is also risky. Whether the Complicatedists reap gains from trade, or whether the decision procedure they actually follow (as opposed to the decision procedure they think they follow) implies that they are shooting themselves in the foot, depends on their own level of diligence in picking their MSR implications. The Complicatedists have to, through the level of diligence in the CH they de facto follow, ensure that the agents who are in fact in the best position to help Complicatedism will be diligent enough to notice this and therefore act accordingly.

It seems to me that, if the Complicatedists put all their resources into intervention Z and never spend attention researching whether they themselves might be in a particularly good position to help rare value systems or value systems whose prioritization is particularly complicated, then the reasoning process they are de facto following is itself not as diligent as they require their superrational cooperators to be. If even the Complicatedists (who themselves do not benefit from the majority-favored interventions) end up working on the majority-favored interventions because they seem like the easiest and most salient thing to pick out, why would one expect agents who actually benefit from this “low-hanging fruit” to ever work on anything else? The Complicatedists (and everyone else for that matter, at least in order to maximize the gains from trade that MSR can provide) have to make sure that they work on majority-favored interventions if and only if it is actually their multiverse-wide comparative advantage. This may be difficult to ensure, because one has to expect that people often rationalize, especially when majority-favored interventions tend to be associated with high status, or tend to draw in Complicatedists high in agreeableness who are bothered by lack of convergence in people’s prioritization.

In order to allocate personal comparative advantages in a way that reliably produces the greatest gains from trade, one has to find the right mix between exploration and exploitation. It is plausible that MSR participants should often focus on majority-favored interventions, because after all, the fact that they are majority-favored means that they make up a large portion of the MCUF. But next to that, everyone should be on the lookout for special opportunities to benefit value systems with idiosyncratic priorities. This should happen especially often for value systems that are well-represented in the MCUF, but perhaps one should also make use of randomization procedures to sometimes spend time exploring the prioritization of comparatively rare value systems (see also the proposal in “How to trade, practically”).

Randomization procedures of course also come with a danger of superrationalizing. It can be difficult to properly commit to doing something that may cost social capital or is difficult to follow through with for other reasons. Illusory low-probability commitments that one would not actually follow through if the dice comes “6” five times in a row weaken or even destroy the gains from trade one in expectation receives from this aspect of MSR. Proper introspection and very high levels of commitment to one's chosen CH become important for not shooting oneself into the foot when attempting to get MSR implications right.

An intuition I got from writing this section is that it tentatively seems to me that cooperation heuristics that exploit convergent priorities, in particular when the resulting intervention benefits one’s own value system, are less risky (in the sense of it being harder to mess things up through superrationalizing) than trades based on comparative advantages. The overall gains from trade one can achieve with such (arguably) risk-averse cooperation heuristics are certainly not maximal, but if one is sufficiently pessimistic about getting things right otherwise, then they may be the overall better deal. This is bad news for value systems that don't benefit as much from trades focused on convergent priorities.

Having said that, it seems to me also that exploiting comparative advantages can produce particularly large gains from trade, and that getting things right enough might be within what we can expect careful reasoners to manage. While it would seem incredibly intractable to attempt estimate one’s comparative advantage at benefitting a particular value system when compared to agents in unknown parts of the multiverse, what looks fairly tractable by contrast (and is similarly impactful overall) is evaluating one’s comparative advantage compared to other people on earth. Following a CH that models comparative advantages among people on earth would be a pretty good start and likely better than a status quo of not considering comparative advantages at all.

Inclusivity is not always better

Which value systems in particular MSR participants should benefit depends on their situations and especially their comparative advantages. In the last section of my introduction to MSR, I advocated for the principle that we should largely limit our cooperation heuristics to considering value systems we know well.

One might be tempted to assume that this would give suboptimal results, as limiting how inclusive one is with benefitting value systems different from one’s own determines how many value systems will be incentivized to join our compromise in total. So perhaps low inclusivity (in the sense of not speculating about the values of aliens with different value systems from us) in this way means that one’s decisions now only influence a smaller number (or lower measure given that we might be dealing with infinities) of agents in the multiverse. However, it is important to note that MSR never manages to bring other agents to follow one’s own priorities exclusively; it only grants you a proportionate share of the attention and resources of some other agents. The more types of compromise participants are added, the smaller said share of extra attention one receives per participant. (Consider: If I have to think about what my comparative advantage is amongst three value systems, that takes less time than figuring out one’s comparative advantage amongst three hundred value systems.) This means that there is no overriding incentive to choose maximally inclusive cooperation heuristics, i.e. ones that in expectation benefit maximally many value systems of superrationalists in the multiverse.

Note that this also implies that one cannot make a strong wager in favor of MSR of the sort that, if MSR works, our decisions have a vastly wider scope than if it does not work.1 While it is true, strictly speaking, that our decisions have a “wider scope” if MSR works, this is counterbalanced by us having to devote attention to more value systems in order to make it work. MSR’s gains from trade do not come from the large total numbers of participants, but from exploiting convergent priorities and comparative advantages. So while it is not important to consider maximally many plausible value systems in one’s compromise, it is important that we do include whichever value systems we expect large gains from trade from (as this superrationally ensures that others follow similarly high-impact cooperation heuristics).

If one had infinite computing power and could at any point download and execute the precise implications of an ideal MCUF containing all agents interested in MSR, then a maximally inclusive compromise would give the highest gains from trade, because for every ever-so-specific situation, the ideal MCUF would find exactly the best way of ensuring equal gains from trade for all participants. However, given that thinking about the prioritization of other value systems (especially obscure ones that only make up a tiny portion of the MCUF) comes with a cost, it may not be worthwhile to invest resources into ever-more-sophisticated CHs solely with the goal of making sure that we do not forget value systems we could in theory benefit. This reasoning supports the intuition that the best way to draw implications from MSR is by cooperating with proponents of value systems that one already causally interacts with, because these are the value systems we likely know best and are therefore in a particularly good position to benefit. Direct acquaintance is a comparative advantage!

Updateless compromise

(Epistemic status, update: This section is badly structured and probably confused at least in parts. Also it won’t be relevant to the sections below, so feel free to skip this.)

So far, I have been assuming that agents only follow cooperation heuristics that, at the stage of execution, the agent believes will generate positive utility according to their own value system. This sounds like a reasonable assumption, but there is a case to be made for exceptions to it. This concerns updateless versions of compromise.

Suppose I am eating dinner with my brother and we have to agree on a fair way of dividing one pizza. Ordinarily, the fair way to divide the pizza is to give each person one half. However, suppose I like pizza a lot more than my brother does, and that I am also much more hungry. Here, we might have the intuition that, whether person A or person B likes the pizza in question more, or is more hungry on that specific occasion, was a matter of chance that could just as well have gone one way or the other. Sure, one brother was born with genes that favor the taste of pizza more (or experienced things in life that led him to develop such a taste), but there is a sense in which it could also have gone the other way. Updatelessness is the idea that we should act as though we actually made irreversible commitments to our notion of bargaining that locked in the highest expected reward in all cases where failing to have done so would predictably lower our expected reward. Applied to the specific pizza example, it is the idea that learning more information about "who is hungrier" should not lower the total utility we would both have gotten in expectation if we had agreed to a fair compromise early enough from an original position of ignorance. So it could mean that my brother and I should disregard (= “choose not to update on”) the knowledge that one specific and now known person happens to have the less fortunate pizza preferences in this instance we are in. Why? Because there were points in the past where we could – and arguably should – have agreed on a method for future compromise on things such as pizza eating that in expectation does better than just dividing goods equally. Not knowing whether we ourselves will be hungrier or less hungry, it seems rational to commit to a compromise where the hungrier person receives more food. (There is also a more contested, even stronger sense of updatelessness that is not based on pre-commitments.)

Updatelessness applied to MSR would mean to optimize for a MCUF where variance normalization is not applied on all the things we currently know about the strategic position for proponents of different value systems, but instead to a hypothetical “point of precommitment.” Depending on the version of updatelessness at play, this could be the point in time where someone started to understand decision theory well enough to consider the benefits of updatelessness, or it could even mean going back to the “logical prior” over how much different value systems can or cannot be benefitted. (Whatever that means; I do not understand this business about either logical priors or how to distinguish different versions of updatelessness, so I will just leave it at that and hope that others may or may not do some more thinking here, following the links above.)

As I understand it, the inspiration for updateless compromise is that the gains in case one ends up being on the lucky side weigh more than the losses from where one does not. Maybe it is not apparent from the start which value systems correspond more to something like Complicatedism or something like Straightforwardism, and the sides could in theory also be reversed in at least some world-situations across the multiverse, depending on the sort of crazy things that may happen in the worlds in question. Arguably, then, there is a case to be made for committing towards updateless compromise before thinking more about MSR implications in further detail. (Or more generally, a case for a general precommitment towards updalessness in all future decision-situations where this has benefits given the knowledge at the time of precommitment.)

While I think the arguments for updatelessness are intriguing, I am skeptical whether humans can and should try to trick their brains into reasoning completely in updateless terms in decision-situations where the instances for precommitment either have passed or are difficult to reconstruct and assess retrospect. And I am even more skeptical about using updateless compromise for MSR  in particular:

Next to the psychological difficulties with updatelessness and worries whether humans are even capable of following through with the implications after learning that one is on the losing end of an updateless compromise, another problem with updateless MSR is also the apparent lack of a true original position (besides the extreme view where one just goes with a logical prior). We have previously discussed asymmetric features amongst potential MSR participants. Even someone who has not given much thought to the relative prioritization of different value systems will probably have a rough idea whether their value system is more likely to benefit from updateless compromise or not. Small asymmetries can break the entanglement of decision algorithms: If I commit to be updateless because I have a good feeling about being on the winning side, I cannot expect other agents who may not share said feeling to commit as well. I feel that MSR is already hard enough without updatelessness, and adding yet another speculative consideration to it makes me think I should start drawing the line.

Having said all that, I guess it might be reasonable though to already commit to having precommitted to be updateless in case that, after thinking more about the merits and drawbacks of the idea, one concludes that a past commitment would in fact have been the rational thing to do. (I think that’s actually the way one should think about updatelessness in general, assuming one should try updatelessness at all, so I probably misrepresentated a few things above.)

How to trade, ideally

Without (strong versions of) updatelessness, the way we ensure that our actions lead to MSR benefits is to diligently follow cooperation heuristics that do not disproportionately favor our own values. (Otherwise we would have to conclude that others are disproportionately benefit their values, which defeats the purpose.) This means that, in expectation, all the value systems should receive a substantial portion of attention somewhere in the multiverse. Ideally, assuming there were no time or resource constraints to computing a compromise strategy, an ideal reasoner would execute something like the following strategy:

1) Set up a weighted sum of the value functions of superrationalists in the multiverse.

2) Set the weights such that when universally adopted, everyone gets the same expected gains from compromise (perhaps relative to the agents’ power).

3) Maximize the resulting utility function.

Put this way, this may look simple. But it really isn’t. The way to coordinate for each value system to have resources allocated to its priorities is to maximally incorporate comparative advantages in terms of expertise and the strategic situation of the participating agents. Step 2) in the algorithm above is therefore extremely complicated, because it requires thinking about all the ways in which situations across the multiverse differ, where agents are in an especially good position to benefit certain value systems, and how likely they would be to notice this and comply depending on how the weights in the MCUF are being set. To illustrate this complexity, we can break down step 2) from above into further steps. Note that the following only gives an approximate rather than exact way to solve the problem, because a proper formalization for how to solve step 2) would twist knots into my brain:

2.1) Outline the value systems of all superrationalists and explore strategic prioritization for each value system in all world situations to come up with a ranking of promising interventions per world situation per value system.

2.2) Adjust all these interventions according to empirical compromise considerations where one can get more value out of a given intervention by tweaking it in certain ways: For instance, If two or more value systems would all agree to change each other’s promising interventions to different packages of compromise interventions that are overall preferable, perform said change.

2.3) Construct a preliminary multiverse-wide compromise utility function (pMCUF) that represents value systems weighted according to how prevalent they are amongst superrationalists, and how influential its proponents are.

2.4) Compare the world situations of all participants in MSR, predict which interventions from 2.2) will be taken by these agents under the assumption that they are approximating the pMCUF while being partly irrational in different ways, and calculate the total utility this generates for each value system in the preliminary compromise.

2.5) Adjust the weights in the pMCUF with the help of a fair bargaining solution in such a way that eventually, when applied to all possible world situations where the newly weighted MCUF will get approximated, all value systems will get (roughly) equal, variance-normalized benefits. This eventually gives you (a crude version of) the final MCUF to use.

(Step 2.5 ensures that value systems that are hard to benefit also end up receiving some attention. Without this step, hard-to-benefit value systems would often end up neglected, because MSR participants would solely be on the lookout for options to create the most total value per value system, which disproportionately favors benefitting value systems that are easy to benefit.)

How to trade, practically

Needless to say, the analysis above is much too impractical for humans to even attempt to approximate with steps of the same structure. So please don't even try!

Now, in order to produce actionable compromise plans, we have to come up with a simpler proposal. In the following, I’ll try to come up with a practical proposal that, if anything, tries to err on the side of being too simple. The idea being that if the practical proposal below seems promising, we gain confidence that implementing MSR in a way that incentivizes sufficiently many other potential participants to join is realistically feasible. Here the proposal in very sketchy terms:

  1. Only include value systems in the MCUF that we can observe on earth. Preliminarily weight these value systems according to how many proponents of said value system seriously interested in MSR there are, and how influential these proponents are.
  2. Flatten the distribution of value systems from step 1 based on prior expectations of how represented a value system would be without founder effects or general path dependencies. This could be done based on survey data on people’s moral intuitions (perhaps also including the intuitions of people who are skeptical about superrationality).
  3. Figure out which interventions are particularly valuable for each value system, e.g. by communicating with its proponents, or checking/correcting their reasoning if you think you might be better-suited than them for drawing correct implications.
  4. For value systems whose prioritization one is not very familiar with, randomize and only spend time exploring their prioritization with some non-zero probability that seems appropriate (i.e., is sensible from an all-things-considered exploration vs. exploitation tradeoff and is proportional to how prevalent the value system is). Note that, if the randomization procedure made you explore the priorities of a particularly rare value system, you now are much more likely to have a comparative advantage at benefitting it.
  5. Adjust the interventions from above to make them more “positive-sum:” Deprioritize interventions where proponents of different value systems would be harming each other; adjust interventions to make them more beneficial for other value systems if the cost is low enough; adjust interventions to make them less harmful for other value systems if the cost is low enough; highlight interventions that are positive for many different value systems, etc. (The strong version of step 5 is to merge interventions together to make them more generally beneficial.)
  6. Think of interventions that are good for everyone (or most value systems at least) but not good enough to make it on anyone’s list.
  7. Get a rough sense, perhaps just intuition or based on some quick calculations, on which value systems lose a disproportionate amount of value in step 5, and take a note to give them extra weight. Also give extra weight to interventions that are positive for many different value systems as identified in step 5.
  8. Think about your competitive advantages as compared to other proponents of MSR at following all the (adjusted) interventions you got out of the previous steps. If one thing clearly sticks out as your comparative advantage amongst people interested in MSR, focus largely on that. If multiple interventions might plausibly be your comparative advantage, use randomization with weights that represent the weights from step 2 plus adjustments in step 6.
  9. Keep an eye out for low-effort ways to benefit value systems other than the ones you’re currently focusing on. Perhaps even institutionalize thinking about this by e.g. scheduling time every month where you randomly receive one value system and spend one hour thinking about how to benefit it, and if you have an idea that seems promising enough and is not prohibitively costly, follow through implementing it. (The idea is that his heuristic makes use of sharply diminishing returns for low-effort interventions.)
  10. If possible, coordinate with other proponents of MSR to allocate resources in a better-coordinated fashion. Make use of gains from scale by gathering people interested in MSR or generally “strong cooperation,” rank them in terms of various comparative advantages, and coordinate who focuses on which interventions (this makes it much easier to figure out what one’s coalition-wide comparative advantages are).
  11. Sanity check: Go through the heuristics Caspar lists in his MSR paper to see whether the procedure you are set on following has somehow led you to something crazy.

Note that step 5 also includes very general or “meta” interventions such as encouraging people who have not made up their minds on ethical questions to simply follow MSR rather than waste time with ethical deliberation.

Admittedly, the above proposal is vague in many of the steps and things often boil down to intuition-based judgment calls, which generates a lot of room for biases to creep in. It is not obvious that this procedure still generates gains from trade if we factor in all the ways in which it could go wrong.

However, if people genuinely try to implement a cooperation heuristic that is impartially best for the compromise overall, then biases that creep in should at least be equally likely to give too much or too little weight to any given value system. In other words: There is hope even if we expect to make a few mistakes (after all, normal, non-MSR consequentialism is far from easy either.)

The relationship between “causal” cooperation and MSR

Note that while causal interaction and cooperation with proponents of other value systems interested in MSR can be highly useful as an integral part of a sensible cooperation heuristic, this should however not be confused with thinking of these other people as “actual” MSR compromise partners. It is highly debatable whether one’s own decision-making is likely to be relevantly logically entangled with the decision-making of some humans on earth. Maybe the earth is not large enough for that. But whether this is the case or not, MSR certainly does not require it. Besides, even if such entanglement was likely, the possibility of checking up on whether others are in fact reciprocating the compromise may break the entanglement of decision algorithms (cf. the EDT slogan “ignorance is evidential power”). (Although note that decision theories that incorporate updateless might continue to cooperate even after observing the other party’s action, if the reasons from similarity of decision algorithms were strong enough initially.)

So the idea behind focusing on cooperating with proponents of other value systems that we know and can interact with is not that we are superrationally ensuring that no one defects in causal interactions. Rather, the idea is that, if MSR works, each party has rational reason to act as though they are correlated with agents in other parts of the multiverse, where defection in expectation hurts their own values. This is what ensures that there are no incentives to defect. If one were to defect, one may gain an unfair advantage locally in casual interactions with others, yet one loses all the benefits from MSR in other parts of the multiverse.

Note that this leaves the problem that agents can fake to epistemically buy into MSR even though they may be highly skeptical of the idea. If one is confident that MSR would never work, one may be incentivized to lie about it and fake excitement. (Though I think this sounds like a terrible idea for the epistemic damage it would do to the community and for all the non-MSR arguments against naive consequentialism.)

Some open research questions

Overall I'm not convincing that MSR has strong action-guiding implications. To figure out whether we can trust the reasoning behind MSR, there are many things to potentially look into in more detail. Personally, I am particularly interested in the following questions:

  • Underlying assumptions: Is MSR based on the correct decision theoretical assumptions? What exactly are the things we want to be “relevantly similar” between us and other agents elsewhere in the multiverse in order for MSR to work? Are values and decision procedures distributed orthogonally among agents in the multiverse? Or does the overwhelming majority of copies of my own decision algorithm share my specific values? I expect progress on these questions to come from naturalized induction and decision theory, and from getting a better idea on how one should think about the multiverse (or different multiverse proposals).
  • Estimating the gains from trade: Whether it is warranted to change one’s actions to incorporate MSR considerations depends on both one’s credence in the underlying assumptions behind MSR being correct, and on the expected gains from trade in case the assumptions are correct (as well as the losses if not). It would be very valuable for people to think more about how large the potential gains from compromise would be for a well-executed cooperation heuristic such as “How to trade, practically.” Perhaps there are useful examples in the economics literature?
  • How to think about comparative advantages: How can we estimate whether a perceived comparative advantage for helping a given value system is strong enough or not? Are there prudential reasons for favoring erring on the side of conservatism vs. experimenting with comparative advantages, or maybe the other way around? Perhaps some of the research on portfolio approaches to global prioritization could be informative for MSR as well.
  • Descriptive ethics: What are people’s intuitions about cause-prioritization-relevant ethical questions and how would alternative histories have played out in cases where there were different founder effects for groups such as Lesswrong or effective altruists?


Thanks to Caspar Oesterheld, Johannes Treutlein, Max Daniel, Tobias Baumann and David Althaus for comments and discussions that helped inform my thinking. I first heard the term “superrationalizing” be used in the context of Hofstadter’s superrationality by Carl Shulman.


  1. Caspar added an interesting comment:
    “There is, however, a wager for views of the is-strongly-correlated-with-me property that include more agents with my values. This is basically a generalization of the EDT wager. And then these broader views may also automatically include more agents with other value systems. (E.g., if view 1 says there are 10 correlated agents with my values and 0 correlated agents with other values and view 2 says that there are 1 million correlated agents with my values and 200k correlated agents with different values, then there is a strong wager for view 2 over view 1.)“  (back)

Leave a Reply

Your email address will not be published. Required fields are marked *