Case studies of self-governance to reduce technology risk

Summary Self-governance occurs when private actors coordinate to address issues that are not obviously related to profit, with minimal involvement from governments and standards bodies. Historical cases of self-governance to reduce technology risk are rare. I find 6 cases that seem somewhat similar to AI development, including the actions of Leo Szilard and other physicists in 1939 and the 1975 Asilomar conference. The following factors seem to make self-governance efforts more likely to occur: Risks are salient The government looks like it might step in if private actors do nothing The field or industry is small Support from gatekeepers (like journals and large consumer-facing firms) Support from credentialed scientists. After the initial self-governance effort, governments usually step in to develop […]

Read more

Coordination challenges for preventing AI conflict

Summary In this article, I will sketch arguments for the following claims: Transformative AI scenarios involving multiple systems pose a unique existential risk: catastrophic bargaining failure between multiple AI systems (or joint AI-human systems). This risk is not sufficiently addressed by successfully aligning those systems, and we cannot safely delegate its solution to the AI systems themselves. Developers are better positioned than more far-sighted successor agents to coordinate in a way that solves this problem, but a solution also does not seem guaranteed. Developers intent on solving this problem can choose between developing separate but compatible systems that do not engage in costly conflict or building a single joint system. While the second option seems preferable from an altruistic perspective, […]

Read more

Collaborative game specification: arriving at common models in bargaining

Conflict is often an inefficient outcome to a bargaining problem. This is true in the sense that, for a given game-theoretic model of a strategic interaction, there is often some equilibrium in which all agents are better off than the conflict outcome. But real-world agents may not make decisions according to game-theoretic models, and when they do, they may use different models. This makes it more difficult to guarantee that real-world agents will avoid bargaining failure than is suggested by the observation that conflict is often inefficient.   In another post, I described the "prior selection problem", on which different agents having different models of their situation can lead to bargaining failure. Moreover, techniques for addressing bargaining problems like coordination on […]

Read more

Weak identifiability and its consequences in strategic settings

One way that agents might become involved in catastrophic conflict is if they have mistaken beliefs about one another. Maybe I think you are bluffing when you threaten to launch the nukes, but you are dead serious. So we should understand why agents might sometimes have such mistaken beliefs. In this post I'll discuss one obstacle to the formation of accurate beliefs about other agents, which has to do with identifiability. As with my post on equilibrium and prior selection problems, this is a theme that keeps cropping up in my thinking about AI cooperation and conflict, so I thought it might be helpful to have it written up. We say that a model is unidentifiable if there are several […]

Read more

Birds, Brains, Planes, and AI: Against Appeals to the Complexity / Mysteriousness / Efficiency of the Brain

[Epistemic status: Strong opinions lightly held, this time with a cool graph.] I argue that an entire class of common arguments against short timelines is bogus, and provide weak evidence that anchoring to the human-brain-human-lifetime milestone is reasonable. In a sentence, my argument is that the complexity and mysteriousness and efficiency of the human brain (compared to artificial neural nets) is almost zero evidence that building TAI will be difficult, because evolution typically makes things complex and mysterious and efficient, even when there are simple, easily understood, inefficient designs that work almost as well (or even better!) for human purposes. In slogan form: If all we had to do to get TAI was make a simple neural net 10x the […]

Read more