Responses to apparent rationalist confusions about game / decision theory

I’ve encountered various claims about how AIs would approach game theory and decision theory that seem pretty importantly mistaken. Some of these confusions probably aren’t that big a deal on their own, and I’m definitely not the first to point out several of these, even publicly. But collectively I think these add up to a common worldview that underestimates the value of technical work to reduce risks of AGI conflict. I expect that smart agents will likely avoid catastrophic conflict overall—it’s just that the specific arguments for expecting this that I’m responding to here aren’t compelling (and seem overconfident).

For each section, I include in the footnotes some examples of the claims I’m pushing back on (or note whether I’ve primarily seen these claims in personal communication). This is not to call out those particular authors; in each case, they’re saying something that seems to be a relatively common meme in this community.

Summary:

  • The fact that conflict is costly for all the agents involved in the conflict, ex post, doesn’t itself imply AGIs won’t end up in conflict. Under their uncertainty about each other, agents with sufficiently extreme preferences or priors might find the risk of conflict worth it ex ante. (more)
  • Solutions to collective action problems, where agents agree on a Pareto-optimal outcome they’d take if they coordinated to do so, don’t necessarily solve bargaining problems, where agents may insist on different Pareto-optimal outcomes. (more)
  • We don’t have strong reasons to expect AGIs to converge on sufficiently similar decision procedures for bargaining, such that they coordinate on fair demands despite committing under uncertainty. Existing proposals for mitigating conflict given incompatible demands, while promising, face some problems with incentives and commitment credibility. (more)
  • The commitment races problem is not just about AIs making commitments that fail to account for basic contingencies. Updatelessness (or conditional commitments generally) seems to solve the latter, but it doesn’t remove agents’ incentives to limit how much their decisions depend on each other’s decisions (leading to incompatible demands). (more)
  • AIs don’t need to follow acausal decision theories in order to (causally) cooperate via conditioning on each other’s source code. (more)
  • Most supposed examples of Newcomblike problems in everyday life don’t seem to actually be Newcomblike, once we account for “screening off” by certain information, per the Tickle Defense. (more)
  • The fact that following acausal decision theories maximizes expected utility with respect to conditional probabilities, or counterfactuals with the possibility of logical causation, doesn’t imply that agents with acausal decision theories are selected for (e.g., acquire more material resources). (more)

Ex post optimal =/= ex ante optimal

...

This post continues on LessWrong here.