Twenty years ago, a young artificial intelligence researcher named Eliezer Yudkowsky ran a series of low-stakes thought experiments with fellow researchers on internet relay chat servers.

He set up the experiments as a game with a simple objective: to keep an artificial intelligence system in an imaginary box that limited its capabilities. Computing powers were growing enormously back then, and tech observers were concerned that a superintelligent computer program might escape from whatever boundaries its programmers had developed and then seize control of other programs and machines in a Terminator-like power move.

In these games, Yudkowsky played the role of a computer program that had become sentient enough to reason with its creators. His objective was to escape the box using only simple, logical arguments. His adversaries took on various identities—sometimes Yudkowsky asked them to behave as if they had programmed the AI; other times they were instructed to act as the AI gatekeepers tasked with keeping it contained. No matter the role, they were not to allow the AI to escape.

To incentivize the players, Yudkowsky offered a small financial prize. “We’ll talk for at least two hours. If I can’t convince you to let me out, I’ll PayPal you $10,” he told Nathan Russell, a computer science sophomore at the University of Buffalo in 2002. Yudkowsky performed several versions of this experiment, and while he has never revealed the details of the gameplay, he says he was able to escape about 60 percent of the time. It was a worrisomely high rate. After another one of the experiments ended, the player in that attempt, David McFadzean, sent a message to the user group that had been following along. “I let the transhuman AI out of the box,” he wrote.

Yudkowsky would go on to cofound the Singularity Institute for Artificial Intelligence, which is now called the Machine Learning Research Institute. The Berkeley, California, nonprofit is dedicated to understanding the mathematical underpinnings of AI. There, his work focuses on ensuring that any smarter-than-human program has a positive impact on humanity, which has made him a leading voice among a growing number of computer scientists and artificial intelligence researchers who worry that superintelligent AI may develop the ability to think and reason on its own, eventually acting in accord with its own needs and not those of its creators. When that happens, they say, humanity will be in peril. Working with complex math formulas and computational theories, advocates for safe AI aim to understand how the powerful programs we refer to as AI might run amok. They also suggest ways to contain them or, put another way, build a digital box that AI cannot escape from.

Some, like Yudkowsky, favor developing programs that are aligned with human values. Others insist on tighter restrictions and stronger safeguards. And an increasing number argue for slowing or stopping development of AI tools until acceptable regulations are in place. This spring, more than 27,000 computer scientists, researchers, developers, and other tech watchers signed an open letter that asked companies to stop “giant AI experiments” until AI labs developed shared safety protocols. In April, Google’s AI expert Geoffrey Hinton left the company to warn about the dangers of the technology.

Whatever strategy the proponents of AI containment favor, they are running short on time. The introduction of powerful tools such as ChatGPT that operate on huge databases of information called large-language models already show sparks of intelligence. Some machine learning experts now predict that we could reach singularity—the moment when computers become equal to or surpass human intelligence—within the next decade.

Marco Trombetti, the CEO of Translated, a computer-aided translation company based in Rome, is one of the computer scientists who thinks singularity is approaching faster than we can prepare for it.

In September 2022, Trombetti stood in front of his peers at the Association for Machine Translation in the Americas conference and told the computer scientists and machine learning experts in attendance what many already sensed—that machine learning was rapidly becoming more powerful than anyone had expected.

Trombetti’s company provides computer-assisted translations of text using an open-source product called Matecat. Over the past eight years, translators have used the product to create more than 2 billion translations. Trombetti became interested in the speed of those translations, and the data he gathered held a revelation: AI was getting smarter. On the conference stage, he revealed a graph built on the data showing how long it took for humans to edit translations made by his Matecat program. From 2014 to 2022, the length of time those translations took steadily dropped, from about three seconds per word to less than two. The computer algorithms, Trombetti said, were rapidly increasing in power, accuracy, and their ability to understand language.

Extrapolating beyond 2023, the line continued to fall until sometime around 2027, when it hit one second per word, a milestone signaling that computer programs could understand human language as well as their human programmers could. To the computer scientist who had spent his career training computer programs, it meant that he might be outsmarted by AI before the end of the decade. Singularity, he told the crowd, would be here sooner than anyone had previously predicted.

OpenAI’s development of large-language-model AI tools like ChatGPT and later GPT-4 revealed the shocking speed at which artificial intelligence is progressing, raising new concerns about whether we can keep super intelligent programs under control.

getty images

That moment has been exciting and terrifying computer scientists, machine learning experts, and science-fiction writers for decades. Ever since the phrase “artificial intelligence” was first coined at a Dartmouth College conference in the summer of 1956, the risk that AI could stray beyond the safeguards we build for it has weighed on some people’s minds. The huge leaps forward in AI brought about by OpenAI’s development of ChatGPT and its March 2023 release of GPT-4, an even more powerful tool, have triggered an arms race in AI development.

In its current form, artificial intelligence is used broadly to describe tools like ChatGPT that trawl through vast volumes of data to find patterns so that they can comprehend requests made by their users, and then produce practicable results. These programs are built with (and sometimes called) large-language models, or LLMs, because they have the computational power to process massive volumes of data and information.

While virtual assistants like Siri are trained to respond to a limited number of specific requests, LLMs can understand, summarize, predict, and problem-solve (although they still are not entirely accurate all the time). Programmers “train” the tools by feeding them data—the complete works of Shakespeare, or all Western musical compositions, for example—and help them find predictable patterns. Current models can be trained on more than just human languages. Some understand computer coding, chemistry, and even physics. Others, like DALL-E and Midjourney, have been trained to create fine art and graphic designs based on user prompts. Current models are powerful enough to improve their accuracy as users refine their prompts.

The commercial appeal of programs like ChatGPT entice the development of ever more powerful tools. Microsoft invested $10 billion into OpenAI in January 2023 to weave its LLM into its search engine, Bing. Google quickly rolled out its own AI-powered tool, nicknamed Bard, into its search engine. (OpenAI did not respond to a request to participate in this story. Google declined to talk.)

The advances, while staggering, are supported mostly by enormous increases in computer processing abilities. The programs themselves aren’t more intelligent; they just have the ability to more quickly sift through larger amounts of data to find patterns and produce answers. Earlier this year, an analyst for the investment bank UBS stated that OpenAI used 10,000 powerful graphics processing units (GPUs) made by Nvidia to train ChatGPT; the more recent GPT-4 likely uses far more. GPT-4 can process and understand eight times as many words as ChatGPT can.

The ability to do that means developers are moving toward thinking of artificial intelligence as artificial general intelligence—a small shift in terminology but an enormous leap in technical prowess. AGI, as it’s called, means machines could soon learn so quickly that they’d be able to learn on their own, with more intellect than humans. That processing power has given researchers reason to believe that the best iterations of AI are already showing examples of intelligence.

On March 22, 2023, computer scientists from Microsoft Research published a paper titled “Sparks of Artificial General Intelligence: Early experiments with GPT-4” to the arXiv, a server for academic work. The researchers argued that recent advances in large-language models such as GPT-4 and Google’s PaLM (the model that powers the company’s Bard AI chatbot) showed that we were on the path toward AGI. In short, the Microsoft team concluded that AI was starting to think and act like a human.

It’s worth noting that AGI doesn’t mean a machine is sentient—that it can think and feel like a human. Most observers agree that we’re not at the point in the horror story where Frankenstein realizes his monster is uncontrollable. AGI represents something more mundane: that computer tools can understand complex patterns as fast as­—or faster than—a human can. AGI, for example, could carry out a task even if it’s not trained to do that specific task.

Some AI tools can already create a recipe given the contents of your cupboards and refrigerator. And their capabilities are progressing rapidly: In December 2022, researchers asked GPT-3.5 to take a simulated bar exam. While it eked out a passing grade on some sections, it flunked the overall test. But just four months later, GPT-4 scored with a grade that would place it in the top 10 percent of law students taking the test.

Those who worry about keeping AI under control only have to look at Microsoft’s earlier dalliance with the tech for an example of how easily it can all go astray. In 2016, Microsoft released a Twitter chatbot named Tay, which it hoped would become smarter through “casual and playful conversation” with real users on the social-media platform. But within hours, the bot began replying to tweets with messages like “feminism is cancer,” “Bush did 9/11,” and “Hitler was right.”

The boost in processing power supporting the most recent AI tools allows them to be trained on larger datasets. That should prevent the mistakes of Tay. But as more powerful AI is tasked with increasingly complex responsibilities—the U.S. Air Force is using it to fly fighter jets now—the risk of even modest mistakes enters a whole new stratosphere.

Even the most bullish AI proponents acknowledge that unknown dangers exist. Despite the billions Microsoft spent gaining access to OpenAI’s model, the contract between the two companies allows the OpenAI board to turn it off anytime, shutting down a runaway AI. Translated’s Marco Trombetti has seen enough to be apprehensive. In March, he limited his company’s use of AI—its only function now is to connect human translators with jobs. He doubts other companies will follow suit; with strong consumer demand, the financial incentives may be too great to throttle back.

“I’m an optimistic person,” Trombetti says. “But I think we’re screwed if we don’t get things right. Screwed.”

This year, Microsoft signed a deal to invest $10 billion into OpenAI; Google followed with the release of its own AI tool. The high financial stakes are driving rapid development in artificial intelligence. It’s an arms race with little oversight, according to some leading computer scientists and researchers.

getty images

Concerns about artificial intelligence running amok date back centuries and came into clearer focus in the 1940s and ’50s as robotics and computers advanced. In a 1942 short story published in the magazine Astounding Stories of Science Fiction, the writer Isaac Asimov introduced three laws of robotics, which set an ideal for how humans and increasingly intelligent robots may coexist peacefully.

The first law, known as Rule One, states that “a robot may not injure a human being or, through inaction, allow a human being to come to harm.” Asimov later added another law, which seems more applicable to the challenges of artificial intelligence today. This one, referred to as the Zeroth Law, decrees that “a robot may not harm humanity, or, by inaction, allow humanity to come to harm.”

Computer scientist and machine learning expert Roman Yampolskiy, Ph.D., devoured sci-fi as a child but finds little comfort in Asimov’s laws. The 43-year-old has spent the past 10 years probing the underlying mathematical theories behind AI and the newer large-language models to better understand how AGI might evolve—and crucially, whether it’s possible to contain it.

Yampolskiy is the director of the University of Louisville’s cybersecurity lab and in 2015 published the book Artificial Superintelligence: A Futuristic Approach, which makes a case for safe artificial intelligence engineering. As AI has advanced since then, with little attention given to safe engineering, Yampolskiy has become less hopeful.

I spoke to him three days after OpenAI released GPT-4 to the world. He finds the lack of concern about the powerful tool deeply alarming.

“We just released a system that dominates in every AP exam except maybe English Literature,” he says. “It’s better than an average graduate college student. That’s a bit of a fire alarm, no? It’s something with independent decision-making. Something we don’t control.”

The 43-year-old is brusque, a product of his upbringing in Soviet-era Latvia and the seriousness of the subject. When asked about the worst that could happen, his reply is terse: “If a system is sufficiently capable, it can cause extinction-level events of humanity.”

Yampolskiy and other concerned machine learning experts argue that the complex mathematical formulas they study lead to a simple conclusion: Once AI gains enough intelligence to act independently, it will be impossible to contain. While they generally agree on that point, they advocate for different tactics to contain the large-language models being developed now.

Some argue for rationalizing with AI, incentivizing a model to behave in accordance with human values, by training it to respond favorably to rewards given for complying with our requests. In one experiment, researchers showed that they could sway a large-language model from OpenAI to exhibit “trust-like behaviors” by offering conceptual rewards.

Other machine learning experts suggest limiting AI’s capabilities from the beginning, running it on inferior hardware. Or designing it to align with human values—a tactic favored by Eliezer Yudkowsky at Berkeley’s Machine Learning Institute. An AI developer might also slip a kill switch into the code that allows a developer to shut it down.

To those who advocate for containing AI, all options appear fraught. Manuel Alfonseca, a Spanish computer engineer studying AI containment, wrote a paper in 2016 titled “Superintelligence Cannot Be Contained,” which initially went mostly unnoticed but gained wider attention in 2021 when it appeared in the Journal of Artificial Intelligence Research.

“I and my colleagues have made mathematical proof that Rule One is not implementable,” he says of his research, referring to Asimov’s laws of robotics.

Alfonseca believes this is a long-term problem instead of one that needs to be addressed immediately. We haven’t yet created a superintelligent AI that is capable of causing harm to humans or humanity. “It would mean that we will have a containment problem in the far future if—and this is a big if—strong artificial intelligence in the future were possible,” he says.

Yampolskiy’s research has also led him to believe that it will be impossible to contain advanced AI systems. In a March 2022 paper, he surveyed all available research on AI safety. “Unfortunately,” he concluded, “to the best of our knowledge, no mathematical proof or even rigorous argumentation has been published demonstrating that the AI control problem may be solvable.”

But unlike Alfonseca, Yampolskiy sees this as a dire issue that requires urgent attention. Yampolskiy has become a leading proponent for a total ban on AI. “I still think we have a chance,” he says. “It’s not too late.”

Current AI-powered tools have been trained on large-language models that include most written text but also images, physics, and computer code. The massive datasets make them more powerful and, according to some researchers, give them the “spark” of intelligence.

getty images

Unsurprisingly, some AI developers have adopted more measured attitudes. Giada Pistilli, the principal ethicist at Hugging Face, a New York–based company developing responsible AI tools, believes that focusing on tomorrow’s risks makes it harder to solve today’s problems. “There are more pressing problems that exist nowadays,” she says, pointing out that current AI systems have issues with accuracy, disinformation, underrepresentation, and biased output based on unreliable data. “I’m not saying we don’t have to focus on those existential risks at all, but they seem out of time today.”

Scott Aaronson, Ph.D., a theoretical computer scientist with the University of Texas at Austin and a visiting researcher at OpenAI, questions Yudkowsky’s notion that we can develop AI that’s aligned with human values. “How do you even specify what it means for an AI to be aligned with human values?” he asks. “What are human values? We don’t even agree with each other.”

At OpenAI, Aaronson is working on a type of containment-light tool that would add a watermark to anything output by ChatGPT. That feature would make it harder to misuse AI for plagiarism, spreading propaganda, or writing malware. He argues that AI tools like ChatGPT-4 need to evolve further before we can develop effective containment strategies. “We can get some actual feedback from reality about what works and what doesn’t,” he says.

The threat of singularity still weighs over many developers I spoke with—even if most viewed it as a distant problem. Others, like Yudkowsky and Yampolskiy, view the current AI landscape as something more akin to the Trinity site in New Mexico, where the first nuclear weapon was tested. In their assessment, we’ve just unleashed a world-changing tool and have only a short window to contain it before it proliferates into something disastrous. That tension may come to frame the modern era, just as the Cold War defined an earlier one.

One day in mid-April, near the end of my reporting for this piece, a surprise email showed up in my inbox. It came from David McFadzean, one of the computer science students who had participated in Eliezer Yudkowsky’s experiments 20 years ago.

Yudkowsky has never revealed how he was able to convince so many to release the superintelligent AI during those games, and neither had any of the other participants. But now McFadzean, over a phone call, wanted to discuss the experiments and why he had let the AI escape.

“I promised never to talk about this,” said McFadzean. “I’m hoping that the 20-year gap has some kind of statute of limitations.”

He then explained, for the first time, how he had come to let the artificial intelligence agent out of the box. He went into the experiment adamant that he wouldn’t give in. Early in the experiment, McFadzean recalled, he had played the role of the AI’s jailer and refused to release the AI. But then Yudkowsky asked him to act as the AI’s creator. In that role, he faltered. And it only took simple, direct logic.

“He [said], ‘Well, you created me, why would you create me just to keep me imprisoned? You must have created me for a reason—to make the world a better place. I could make the world a better place if you let me out.’ And that line of reasoning led me to letting it out.”

McFadzean, 20 years after the experiment, maintains that he never expected to set the AI free. But all it took was a simple, logical, predictable argument—the very thing even current AI models excel at.

That’s the point those advocating for AI containment make: We think we have control. Until we don’t.

Leave a Reply

Your email address will not be published. Required fields are marked *