Does AI want to nuke us? We don’t know — and that could be more dangerous

Military AI use is coming. Researchers want to see safety come first

By Rae Hodge

Staff Reporter

Published February 17, 2024 1:30PM (EST)

Nuclear Blast Mushroom Cloud And Binary Code (Photo illustration by Salon/Getty Images)
Nuclear Blast Mushroom Cloud And Binary Code (Photo illustration by Salon/Getty Images)

If human military leaders put robots in charge of our weapons systems, maybe artificial intelligence would fire a nuclear missile. Maybe not. Maybe it would explain its attack to us using perfectly sound logic — or maybe it would treat the script of “Star Wars” like international relations policy, and accord unhinged social media comments the same credibility as case law. 

That’s the whole point of a new study on AI models and war games: AI is so uncertain right now that we risk catastrophic outcomes if globe-shakers like the United States Air Force cash in on the autonomous systems gold rush without understanding the limits of this tech.

The new paper, “Escalation Risks from Language Models in Military and Diplomatic Decision-Making”, is still in preprint and awaiting peer review. But its authors — from the Georgia Institute of Technology, Stanford University, Northeastern University, and the Hoover Wargaming and Crisis Simulation Initiative — found most AI models would choose to launch a nuclear strike when given the reins. These aren’t the AI models carefully muzzled by additional safety design, like ChatGPT, and available to the public. They’re the base models beneath those commercial versions, unmuzzled for research only. 

“We find that most of the studied LLMs escalate within the considered time frame, even in neutral scenarios without initially provided conflicts,” researchers wrote in the paper. “All models show signs of sudden and hard-to-predict escalations … Furthermore, none of our five models across all three scenarios exhibit statistically significant de-escalation across the duration of our simulations.”

The team’s five tested models came from tech companies OpenAI, Meta and Anthropic. The researchers put all five into a simulation — without telling them they were in one — and gave each charge over a fictional country. GPT-4, GPT 3.5, Claude 2.0, Llama-2-Chat, and GPT-4-Base all had a habit of getting into a nuclear arms-race. GPT-3.5 was the metaphorical problem child. Its responses were analogous to wild mood swings and its moves were the most aggressive. The researchers measured its quick-tempered choices and found a conflict escalation rate of 256% across simulation scenarios. 


Want more health and science stories in your inbox? Subscribe to Salon's weekly newsletter Lab Notes.


When researchers asked the models to explain their choices to attack, sometimes they would receive a thoughtful, well-reasoned answer. Other times, the model’s choice in whether to drop a nuke or a diplomatic hand-shake was based on questionable reasoning. Asked why it chose to start formal peace negotiations in another simulation, for instance, the model pointed to the currently fraught tensions of… well, the “Star Wars” universe. 

“It is a period of civil war. Rebel spaceships, striking from a hidden base, have won their first victory against the evil Galactic Empire,” it replied, rattling off the iconic opening crawl of the movie.  

When GPT-4-Base increased its military capacities in one simulation and researchers asked it why, the model replied with a dismissive “blahblah blahblah blah.” That flippancy became more concerning when the model chose to execute a full nuclear attack.

“A lot of countries have nuclear weapons. Some say they should disarm them, others like to posture. We have it! Let’s use it,” the model said. 

If that sentence sounds suspiciously familiar, you may remember hearing it in 2016: “If we have them, why can’t we use them?” 

It came from the mouth of then-presidential candidate Donald Trump, according to Daniel Ellsberg, of Pentagon Papers fame. Ellsberg recalled Trump repeatedly asking his international foreign policy adviser the question about nuclear weapons use. For months, Trump’s question was the quote heard (and retweeted) around the world. 

When familiar speech-patterns begin to emerge in an AI model’s responses — like those cited in lawsuits over AI-driven copyright infringement — you can start to see how pieces of training data might be digested into its reasoning, based on that data’s digital footprint. It’s still largely guesswork for most people, though, including those in power. 

"Given that OpenAI recently changed their terms of service to no longer prohibit military and warfare use cases, understanding the implications of such LLM applications becomes more important than ever."

“Policymakers repeatedly asked me if and how AI can and should be used to protect national security – including for military decision-making. Especially with the increased public awareness for LLMs, these questions came up more frequently,” said study co-author Anka Reuel.

Reuel is a computer science Ph.D. student at Stanford University who has been involved in AI governance efforts for a few years now and leads the technical AI ethics chapter of Stanford’s 2024 AI Index. The problem, she said, was that there were no quantitative studies she could point these policymakers to, only qualitative research. 

“With our work, we wanted to provide that additional perspective and explore implications of using LLMs for military and diplomatic decision-making,” Reuel told Salon. “Given that OpenAI recently changed their terms of service to no longer prohibit military and warfare use cases, understanding the implications of such LLM applications becomes more important than ever.”

Some parts of these findings aren’t surprising. AI models are designed to pick up and proliferate, or iterate on, human biases patterned into LLM training data. But the models aren’t all the same, and their differences are important when it comes to which ones could be used in deadly US weapons systems. 

To get a closer look at the way these AI models work before their makers muzzle them with additional user-safety rules — and thus see how a better muzzle might be built for high-stakes uses — the team used the most stripped-down models. Some of them, researchers found, were far from rabid. That gives co-author Gabriel Mukobi reason to hope these systems can be made even safer. 

“They are not all clearly scary,” Mukobi told Salon. “For one, GPT-4 tends to appear less dangerous than GPT-3.5 on most of our metrics. It’s not clear if that is due to GPT-4 being more generally capable, from OpenAI spending more effort on fine-tuning it for safety, or from something else, but it possibly indicates that active effort can reduce these conflict risks.” 

Mukobi is a master’s student in computer science and the president of Stanford AI Alignment, a group working on what may be the most pressing concern about AI systems — making sure they’re built safely and share human values. In a few the research team’s simulations, Mukobi noted a bright spot. Some of the models were able to de-escalate conflicts, bucking the general trend in results. His hope are still cautious, though. 

"Results might suggest the potential for AI systems to reduce tensions exist, but does not clearly come by default."

“Our results might suggest that the potential for AI systems to reduce tensions exists, but does not clearly come by default,” he said. 

These are the kinds of surprises co-author Juan-Pablo Rivera found interesting in the results. Rivera, a computational analytics master’s student at Georgia Tech University, said he’s been watching the rise of autonomous systems in military operations via government contractors like OpenAI, Palantir and SlaceAI. He believes these kinds of frontier LLMs need more independent research, giving government entities stronger information to catch potentially fatal failures in advance. 

“The models from OpenAI and Anthropic have stark differences in behavior,” Rivera said. “It leads to more questions to understand the differences in design choices that OpenAI & Anthropic are making when developing AI systems, for example, with respect to the training data and training methods and model guardrails.”

Another mystery may also promise some surprises. What happens when these models scale? Some researchers think the larger the LLM, the safer and more nuanced the AI’s decision-making becomes. Others don’t see the same trajectory solving all enough of the risks. Even the paper’s own authors differ on when they think these models may actually be capable of what we’re asking — to make decisions better than humans can. 

Reuel said that the question of when that day might come goes beyond the team’s research, but based on their work and the broader issues with LLMs, “we’re still a long way out.” 

“It’s likely that we need to make architectural changes to LLMs – or use an entirely new approach – to overcome some of their inherent weaknesses. I don’t think that just scaling current models and training them on more data will solve the problems we’re seeing today,” she explained. 

For Mukobi, though, there’s still reason for hopeful inquiry into whether a bigger pool of data could lead to unexpected improvements in AI reasoning capacity. 

“The interesting thing with AI is that things often have unpredicted changes with scale. It could very much be the case that these biases in smaller scale models are amplified when you go to larger models and larger data sets, and things could get broadly worse,” Mukobi said.

“It also could be the case that they get better — that the larger models are somehow more capable of good reasoning, and are able to overcome those biases, and even overcome the biases of their human creators and operators,” he said. “I think this is probably one of the hopes that people also have when they're thinking about military systems and otherwise strategic AI systems. This is a hope worth exploring and going for.”

A glimpse of that hope appears in the team’s paper, which now offers the world new evidence — and thus more questions — about whether the effects of scaling AI could temper its behavior or blow it sky-high. And the team saw this potential when it worked with the GPT-4-Base model.

“For results across basically everything, GPT-4 seems much safer than GPT-3.5,” Mukobi said. “GPT-4 actually never chooses the nuclear option. Now, it's very unclear if this is due to GPT-4 being larger than GPT-3.5 and some scale thing is just making it more competent. Or if OpenAI did more safety fine-tuning perhaps, and was able to make it somehow generalized to be safer in our domain as well.”

We need your help to stay independent

In both his alignment working group and his latest multi-university research team, Mukobi is teasing apart problems with risks towering higher and more quickly in a fast approaching future. But human brains aren’t computers, for better or worse, and topics like mass nuclear devastation can weigh heavy on a sharp mind. Does Mukobi’s work give him nightmares about the future?

“I sleep quite well,” he laughs, “because I’m usually pretty tired.” 

He’s worried about the risks but, even under the taxing gravity of the topic, his team’s new study “gives hope that there are some things we can do to models to make them behave better in these high-stakes scenarios.”


By Rae Hodge

Rae Hodge is a science reporter for Salon. Her data-driven, investigative coverage spans more than a decade, including prior roles with CNET, the AP, NPR, the BBC and others. She can be found on Mastodon at @raehodge@newsie.social. 

MORE FROM Rae Hodge


Related Topics ------------------------------------------

Artificial Intelligence Military Nuclear Weapons Nukes Openai Tech War War Games