Opinion | We Need a Manhattan Project for AI Safety

AI presents an enormous threat. It deserves an enormous response.

Dr. J. Robert Oppenheimer of the New Mexico laboratories of the atomic bomb making project, testifies before the Senate Military Affairs Committee.

Worries about artificial intelligence have suddenly seized Washington: The White House just hauled in a roster of tech CEO’s to press them on the safety of their new AI platforms, and Congress is scrambling for ways to regulate a possibly disruptive and risky new technology.

There are a lot of immediate concerns about the latest generation of AI tools — they could accelerate misinformation, job disruption and hidden unfairness. But one concern hovers over the rest, both for its scale and the difficulty of fixing it: the idea that a super-intelligent machine might quickly start working against its human creators.

It sounds fanciful, but many experts on global risk believe that a powerful, uncontrolled AI is the single most likely way humanity could wipe itself out.

At the heart of the threat is what’s called the “alignment problem” — the idea that a powerful computer brain might no longer be aligned with the best interests of human beings. Unlike fairness, or job loss, there aren’t obvious policy solutions to alignment. It’s a highly technical problem that some experts fear may never be solvable. But the government does have a role to play in confronting massive, uncertain problems like this. In fact, it may be the most important role it can play on AI: to fund a research project on the scale it deserves.

There’s a successful precedent for this: The Manhattan Project was one of the most ambitious technological undertakings of the 20th century. At its peak, 129,000 people worked on the project at sites across the United States and Canada. They were trying to solve a problem that was critical to national security, and which nobody was sure could be solved: how to harness nuclear power to build a weapon.

Some eight decades later, the need has arisen for a government research project that matches the original Manhattan Project’s scale and urgency. In some ways the goal is exactly the opposite of the first Manhattan Project, which opened the door to previously unimaginable destruction. This time, the goal must be to prevent unimaginable destruction, as well as merely difficult-to-anticipate destruction.

The threat is real

Don’t just take it from me. Expert opinion only differs over whether the risks from AI are unprecedentedly large or literally existential.

Even the scientists who set the groundwork for today’s AI models are sounding the alarm. Most recently, the “Godfather of AI” himself, Geoffrey Hinton, quit his post at Google to call attention to the risks AI poses to humanity.

That may sound like science fiction, but it’s a reality that is rushing toward us faster than almost anyone anticipated. Today, progress in AI is measured in days and weeks, not months and years.

As little as two years ago, the forecasting platform Metaculus put the likely arrival of “weak” artificial general intelligence — a unified system that can compete with the typical college-educated human on most tasks — sometime around the year 2040.

Now forecasters anticipate AGI will arrive in 2026. “Strong” AGIs with robotic capabilities that match or surpass most humans are forecasted to emerge just five years later. With the ability to automate AI research itself, the next milestone would be a superintelligence with unfathomable power.

Don’t count on the normal channels of government to save us from that.

Policymakers cannot afford a drawn-out interagency process or notice and comment period to prepare for what’s coming. On the contrary, making the most of AI’s tremendous upside while heading off catastrophe will require our government to stop taking a backseat role and act with a nimbleness not seen in generations. Hence the need for a new Manhattan Project.

The research agenda is clear

“A Manhattan Project for X” is one of those clichés of American politics that seldom merits the hype. AI is the rare exception. Ensuring AGI develops safely and for the betterment of humanity will require public investment into focused research, high levels of public and private coordination and a leader with the tenacity of General Leslie Groves — the project’s infamous overseer, whose aggressive, top-down leadership style mirrored that of a modern tech CEO.

I’m not the only person to suggest it: AI thinker Gary Marcus and the legendary computer scientist Judea Pearl recently endorsed the idea as well, at least informally. But what exactly would that look like in practice?

Fortunately, we already know quite a bit about the problem and can sketch out the tools we need to tackle it.

One issue is that large neural networks like GPT-4 — the “generative AIs” that are causing the most concern right now — are mostly a black box, with reasoning processes we can’t yet fully understand or control. But with the right setup, researchers can in principle run experiments that uncover particular circuits hidden within the billions of connections. This is known as “mechanistic interpretability” research, and it’s the closest thing we have to neuroscience for artificial brains.

Unfortunately, the field is still young, and far behind in its understanding of how current models do what they do. The ability to run experiments on large, unrestricted models is mostly reserved for researchers within the major AI companies. The dearth of opportunities in mechanistic interpretability and alignment research is a classic public goods problem. Training large AI models costs millions of dollars in cloud computing services, especially if one iterates through different configurations. The private AI labs are thus hesitant to burn capital on training models with no commercial purpose. Government-funded data centers, in contrast, would be under no obligation to return value to shareholders, and could provide free computing resources to thousands of potential researchers with ideas to contribute.

The government could also ensure research proceeds in relative safety — and provide a central connection for experts to share their knowledge.

With all that in mind, a Manhattan Project for AI safety should have at least 5 core functions:

1. It would serve a coordination role, pulling together the leadership of the top AI companies — OpenAI and its chief competitors, Anthropic and Google DeepMind — to disclose their plans in confidence, develop shared safety protocols and forestall the present arms-race dynamic.

2. It would draw on their talent and expertise to accelerate the construction of government-owned data centers managed under the highest security, including an “air gap,” a deliberate disconnection from outside networks, ensuring that future, more powerful AIs are unable to escape onto the open internet. Such facilities would likely be overseen by the Department of Energy’s Artificial Intelligence and Technology Office, given its existing mission to accelerate the demonstration of trustworthy AI.

3. It would compel the participating companies to collaborate on safety and alignment research, and require models that pose safety risks to be trained and extensively tested in secure facilities.

4. It would provide public testbeds for academic researchers and other external scientists to study the innards of large models like GPT-4, greatly building on existing initiatives like the National AI Research Resource and helping to grow the nascent field of AI interpretability.

5. And it would provide a cloud platform for training advanced AI models for within-government needs, ensuring the privacy of sensitive government data and serving as a hedge against runaway corporate power.

The only way out is through

The alternative to a massive public effort like this — attempting to kick the can on the AI problem — won’t cut it.

The only other serious proposal right now is a “pause” on new AI development, and even many tech skeptics see that as unrealistic. It may even be counterproductive. Our understanding of how powerful AI systems could go rogue is immature at best, but stands to improve greatly through continued testing, especially of larger models. Air-gapped data centers will thus be essential for experimenting with AI failure modes in a secured setting. This includes pushing models to their limits to explore potentially dangerous emergent behaviors, such as deceptiveness or power-seeking.

The Manhattan Project analogy is not perfect, but it helps to draw a contrast with those who argue that AI safety requires pausing research into more powerful models altogether. The project didn’t seek to decelerate the construction of atomic weaponry, but to master it.

Even if AGIs end up being farther off than most experts expect, a Manhattan Project for AI safety is unlikely to go to waste. Indeed, many less-than-existential AI risks are already upon us, crying out for aggressive research into mitigation and adaptation strategies. So what are we waiting for?