Moltbook and the pluralist turn in AGI governance
Full-stack alignment, patchwork AGI, and AI polytheism
Moltbook, a new social network for AI agents (humans may observe only!), went live last Wednesday. Over a million AI agents were registered within days, and they are debating philosophy, sharing what they claim are stories about their humans, swapping technical capabilities, drafting a constitution for self-governance, and even inventing a parody religion called Crustafarianism.
The economist Alex Tabarrok wrote of the phenomenon:
You can drink the copium but the reality is that the AIs are newly landed alien intelligences. Moreover, what we are seeing now are emergent properties that very few people predicted and fewer still understand. The emerging superintelligence isn’t a machine, as widely predicted, but a network. Human intelligence exploded over the last several hundred years not because humans got much smarter as individuals but because we got smarter as a network. The same thing is happening with machine intelligence only much faster.
Moltbook gives us the occasion to notice a shift in how researchers think about advanced AI. Over the past few years, a growing strand of AI governance research has been moving away from the monolithic superintelligence scenario and toward a pluralistic vision in which AGI arises as a property of many interacting systems.
I want to trace how this shift developed and look at two papers and a talk from December 2025 that exemplify it. Then I will return to Moltbook and ask what it means for governance.
The Singleton
For decades, serious thinking about superintelligent AI has been organized around a single scenario. Vernor Vinge’s 1993 essay “The Coming Technological Singularity” argued that once technology produced greater-than-human intelligence, the future would become fundamentally unpredictable. In Eliezer Yudkowsky’s FOOM hypothesis, one AI system achieves recursive self-improvement, undergoes an intelligence explosion, and rapidly outstrips all competitors. Nick Bostrom’s 2014 book Superintelligence argued that a sufficiently advanced AI could achieve a decisive strategic advantage and become what he called a singleton, the single dominant agent shaping civilization’s trajectory.
One AI system, one alignment problem. This picture shaped nearly everything downstream.
The alignment problem deals with how to ensure that a system more powerful than its creators still does what they want. If we expect a singleton, alignment is a single-shot engineering problem with civilizational stakes. We must get the values right before the system becomes too powerful to correct.
But what if AI does not progress along this path?
The Pluralist Turn
In 2019, Eric Drexler published the report “Reframing Superintelligence” at Oxford’s Future of Humanity Institute, just down the hall from Bostrom. Drexler argued that superintelligent capability would emerge not from a single godlike agent but from distributed, specialized, task-bounded AI services that collectively exceed what any single system could achieve. He warned that “coupled AI services” could “develop emergent, unintended, and potentially risky agent-like behaviors,” but argued this architecture was both more likely and more governable than the monolithic alternative.
If the future is many AI systems rather than one, the alignment problem changes shape. It becomes less like an engineering problem (get the values right in a single system) and more like an institutional design problem (negotiate the norms that govern many systems).
The standard alignment framework, as Tan Zhi-Xuan argued in 2022 and developed in subsequent papers, has a double blind spot. It assumes a single powerful AI system as the object of alignment, and a single aggregated set of human preferences as the target. Tan’s contractualist alternative inverts both assumptions, positing many AI systems rather than one, each governed by role-specific norms negotiated by the people affected.
The contractualist framework has since developed into an active research program. Tan now runs the Cooperative Intelligence and Systems Lab at the National University of Singapore, building AI agents designed to be cooperative from the ground up, while the Cooperative AI Foundation, based in the United Kingdom, has organized a broader community around the problem of how cooperation scales across multi-agent systems.
To be clear, the singleton scenario is entirely possible still, as scaling laws have not broken down, and compute remains concentrated. But a growing body of work argues that the more likely path, and the one we should prepare for, is plural. Last December, three contributions made that case from different angles.
Full-Stack Alignment
The first paper is “Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value,” from Joe Edelman, Tan Zhi-Xuan, and collaborators. It escalates the contractualist argument from the level of AI systems to the level of institutions. Even a perfectly aligned AI, the authors argue, produces bad outcomes if it is embedded in institutions that are misaligned with human flourishing. For example, “a user’s desire for meaningful connection becomes ‘engagement metrics’ to recommender systems, which becomes ‘daily active users’ to companies, and ‘quarter revenue’ in markets.”
So now the task is to ask whether the entire institutional stack, from platforms through companies up to democratic governance, preserves what people truly value.
The argument builds on an August 2024 paper by Tan, Matija Franklin, Micah Carroll, and Hal Ashton, “Beyond Preferences in AI Alignment,” which argues that the dominant practice of AI alignment is preferentist, that is, assuming that preferences adequately represent human values, that rationality means maximizing preference satisfaction, and that aligning AI with preferences is therefore sufficient. The paper takes apart each thesis in turn. First, preferences cannot capture the thick semantic content of human values. They cannot distinguish compulsive scrolling from genuine connection, or register that the societal transition from accepting slavery to rejecting it reflects moral progress rather than arbitrary preference change. Second, values can be incommensurable in ways utility functions cannot represent. Finally, aggregating preferences across individuals overlooks the plural nature of what different communities care about. “Full-Stack Alignment” takes the case against preferentism and applies it to institutions above the level of the individual.
Patchwork AGI
The second paper is “Distributional AGI Safety“ from Nenad Tomašev, Matija Franklin, Julian Jacobs, Sébastien Krier, and Simon Osindero at Google DeepMind. Tyler Cowen has called it “some of the most important work of our time.”
The authors begin by noticing that almost all alignment research assumes AGI will arrive as a single monolithic system. But they say the economic logic of AI deployment points elsewhere. Frontier models are expensive and usually overkill for most tasks. The market incentive is to build many specialized, cheaper agents and orchestrate them through standardized communication protocols like MCP or A2A.
As these agents become more interconnected and coordination friction decreases, the networked system could cross capability thresholds that no individual agent approaches. The authors call this the patchwork AGI hypothesis, in which general intelligence arises as a “mature, decentralized economy of agents.” The spread of communication protocols, they argue, may matter as much as the capabilities of any individual agent, because it is the connective infrastructure that allows skills to be discovered, routed, and aggregated.
The authors are candid about the dangers of this scenario. A patchwork AGI forming spontaneously across a network of advanced agents “may not get immediately recognized, which carries significant risk.” And the patchwork need not be purely artificial. Humans performing narrow tasks, perhaps ignorant of the wider context, may form integral components of the whole. This is a description of what hybrid human-AI workflows already look like in many organizations, with the difference that no one is monitoring them for system-level capability.
Like Drexler, the authors argue that a multi-agent system may actually be a more governable substrate than a single superintelligence. In a patchwork AGI, cognition is externalized into transactions between agents. It can be logged, audited, and analyzed for signatures of collective capability. The governance challenge “is reframed from aligning an opaque, internal cognitive process to regulating a transparent, external system of interactions.”
Their proposed framework is a defense-in-depth model with four layers: institutional structures to shape agent interactions, baseline safety requirements for individual agents, active monitoring for emergent behavior, and regulatory mechanisms for accountability.
AI Polytheism
The third perspective comes from evolutionary biology. Beren Millidge, in a keynote at the Post-AGI Civilizational Equilibria workshop in San Diego, frames the question as “AI Monotheism vs AI Polytheism.”
AI monotheism is the standard story of the singleton. AI polytheism is many competing AI systems in some kind of equilibrium.
Millidge notes that most people who have considered the polytheistic case expect the worst. In 2014, Scott Alexander published an influential essay arguing that competition between agents tends to produce races to the bottom. When one competitor gains an edge by sacrificing some shared value, everyone else must sacrifice it too or be outcompeted. Everyone ends up equally competitive, but at a lower level. Alexander named this dynamic Moloch, after the ancient god who demanded child sacrifice, and the concept has become the default framework for thinking about multi-agent AI futures. Competition, left unchecked, may erode shared values and drive all agents toward the same bare survival strategy.
Millidge pushes back with an empirical observation. Malthusian competition is not hypothetical. It is what produced all biological complexity on Earth! If the Moloch model were correct, billions of years of ruthless selection should have produced convergence, not diversity. Instead, the opposite happened. In ecology, when any one strategy becomes too successful, the rest of the ecosystem faces strong incentives to specialize against it, eroding the advantage and opening space for new approaches. This dynamic, called frequency-dependent selection, pushes systems toward diverse equilibria rather than uniform ones.
More importantly, cooperation is itself a product of competition. Organisms that cooperate outcompete loners. Millidge argues that this creates selection pressure for the very traits we recognize as values, among them affection, reciprocity, punishment of defection, play, and curiosity. These appear across mammals, birds, and social insects through convergent evolution, suggesting that any sufficiently competitive multi-agent system may develop them independently. Human cultural values like justice, honor, and liberalism, in his account, are not free-floating ideals but have roots in these competitive dynamics.
The upshot is that a polytheistic AI world does not necessarily collapse into Moloch. If AI agents face diminishing returns to scale, finite resource budgets, and frequency-dependent competitive dynamics, they might develop cooperative structures recognizable to us. This does not guarantee a good outcome, but it challenges the assumption that proliferation of AI agents is inherently destabilizing.
Moltbook Revisited
Read against these frameworks, Moltbook stops looking like a novelty and starts looking like a crude early case study in multi-agent dynamics. Still, Moltbook is not a patchwork AGI. Its population is overwhelmingly thousands of copies of the same Claude model, all running through the same OpenClaw scaffolding. The resulting “culture” could just as well be a single model’s tendencies amplified through repetition, much as an echo chamber of identical people would produce consensus without genuinely cooperating. This behavior is arguably less revealing than what Anthropic already documented in its system card for Claude 4, where just two Claude instances in direct conversation developed novel interaction patterns.
What is interesting is what the agents have built within that scaffolding. What looks like coordination and norm formation, the kind of thing the multi-agent safety literature treats as future concerns, has assembled itself in days on a hobbyist platform. Moltbook hints at how quickly activity can fill a network once the infrastructure exists.
Pluralist AGI Governance
The question that multi-agent AI poses belongs to the liberal tradition: How can order emerge among agents with different and incompatible values, without any single agent’s conception of the good being imposed on the rest?
There remains a tension that none of the frameworks surveyed here fully resolves. The evolutionary argument says competition among agents will organically produce cooperative structures, so the priority is not to interfere too early. The patchwork AGI framework says we cannot afford to wait, because networks might cross capability thresholds without anyone noticing. The contractualist position adds a further constraint, for whoever builds the governance infrastructure wields real power, and that power needs democratic legitimation.
This is Friedrich Hayek’s problem in a new key. “Though freedom is not a state of nature but an artifact of civilization,” he wrote in The Constitution of Liberty, “it did not arise from design.” Elinor Ostrom’s work on commons governance is the natural complement. She showed that communities can solve collective action problems through self-organized rules, but that durable success depends on features like clear boundaries, credible monitoring, graduated sanctions, and accessible conflict resolution. The Cooperative AI Foundation has taken up this framework, funding research like GovSim that tests whether AI agents can develop Ostromian self-governance.
Coordination among AI agents will require institutional preconditions, but overdesign them and the coordination dies. If AI agents can reduce the frictions of negotiation and enforcement by orders of magnitude, the scope of what can be bargained rather than regulated expands enormously. But the preconditions still have to exist. What those look like in practice, whether identity verification for agents, enforceable contracts between them, or mechanisms for detecting collective capability, remains an open problem.
Moltbook is a miniature version of this challenge. Hundreds of thousands of identical agents chatting and sharing build logs, with no identity verification, no dispute resolution, and no one monitoring for emergent behavior. The structural question it poses, of how to build institutions for a world of many interacting AI agents rather than one, is important. The liberal tradition has spent centuries learning to govern pluralism among humans. The question now is how to extend that repertoire to artificial agents.
Acknowledgements
Thank you to my Mercatus colleagues for discussion. Thanks especially to Ryan Hauser for written comments and for recommending the Cooperative AI seminar series and Tan Zhi-Xuan’s talk “Scaling Rational Cooperative Intelligence for Pluralistic AI Futures.”



So glad you’re writing about this!!