The near-implosion of OpenAI, a world leader in the burgeoning field of artificial intelligence, surfaced a conflict within the organization and the broader community about the speed with which the technology should continue, and also if slowing it down would aid in making it more safe.
As a professor of both A.I. and A.I. ethics, I think this framing of the problem omits the critical question of the kind of A.I. that we accelerate or decelerate.
In my 40 years of A.I. research in natural language processing and computational creativity, I pioneered a series of machine learning advances that let me build the world’s first large-scale online language translator, which quickly spawned the likes of Google Translate and Microsoft’s Bing Translator. You’d be hard-pressed to find any arguments against developing translation A.I.s. Reducing misunderstanding between cultures is probably one of the most important things humanity can do to survive the escalating geopolitical polarization.
But A.I. also has a dark side. I saw many of the very same techniques, invented for beneficial purposes by our natural language processing and machine learning community, instead being used in social media, search and recommendation engines to amplify polarization, bias and misinformation in ways that increasingly pose existential threats to democracy. More recently, as A.I. has grown more powerful, we have seen the technology take phishing to a new level by using deepfake voices of your colleagues or loved ones to scam you out of money.
A.I.s are manipulating humanity. And they’re about to wield even more unimaginably vast power to manipulate our unconscious, which large language models like ChatGPT have barely hinted at. The Oppenheimer moment is real.
Yet “speed versus safety” is not the only red herring that obscures the real threats that loom before us.
One of the key movements in A.I. safety circles is “A.I. alignment,” which focuses on developing methods to align A.I.s with the goals of humanity. Until the recent fracas Ilya Sutskever and the OpenAI head of alignment research, Jan Leike, were co-leading a “superalignment” research program that’s grappling with the simple but profoundly complex question: “How do we ensure A.I. systems much smarter than humans follow human objectives?”
But in A.I. alignment, yet again, there’s an elephant in the room.
Alignment … to what kind of human objectives?
Philosophers, politicians and populations have long wrestled with all the thorny trade-offs between different goals. Short-term instant gratification? Long-term happiness? Avoidance of extinction? Individual liberties? Collective good? Bounds on inequality? Equal opportunity? Degree of governance? Free speech? Safety from harmful speech? Allowable degree of manipulation? Tolerance of diversity? Permissible recklessness? Rights versus responsibilities?
There’s no universal consensus on such goals, let alone on even more triggering issues like gun rights, reproductive rights or geopolitical conflicts.
In fact, the OpenAI saga amply demonstrates how impossible it is to align goals among even a tiny handful of OpenAI leaders. How on earth can A.I. be aligned with all of humanity’s goals?
If this problem seems obvious, why does A.I. alignment hold such sway in the A.I. community? It’s probably because the dominant modeling paradigm in A.I. is to define some mathematical function that serves as an “objective function” — some quantitative goal or north star for the A.I. to aim for. At every moment an A.I.’s artificial brain is making thousands or millions or even billions of little choices to maximize how well it’s achieving this goal. For example, a recent study showed how a medical A.I. that aims to automate a fraction of the chest X-ray workload detected 99 percent of all abnormal chest X-rays, which was higher than human radiologists.
We A.I. researchers are thus strongly tempted to frame everything in terms of maximizing an objective function; we’re the proverbial man with a hammer. To get safe A.I., we just need to maximize the alignment between A.I. and humanity’s goals! Now if only we could define a neat objective function that measures the degree of alignment with all of humanity’s goals.
What we in the A.I. research community too often overlook are the existential risks that arise from the way A.I. interacts with the complex dynamics of humanity’s messy psychological, social, cultural, political and emotional factors. Which are not cleanly packaged into some simple mathematical function.
A.I. companies, researchers, and regulators need to urgently accelerate tackling how A.I.s should be operating in the face of unresolved age-old trade-offs between conflicting goals, and accelerate developing new kinds of A.I.s that can help solve for this. For example, one of my research projects involves A.I. that not only fact-checks information, but automatically rephrases it in a way that helps reduce readers’ implicit biases. Accelerating this work is pressing precisely because of the exponential advancement of today’s A.I. technology.
Meanwhile, we need to decelerate deployment of A.I.s that are exacerbating sociopolitical instability, like algorithms that line up one conspiracy theory post after the other. Instead, we need to accelerate development of A.I.s that help to de-escalate those dangerous levels of polarization.
And all of us — A.I. experts, Silicon Valley influencers and big media driving our everyday conversations — need to stop sweeping these real challenges under the rug through over-simplistically misframed narratives of AI accelerationism versus decelerationism. We need to acknowledge that our work impacts human beings, and human beings are messy and complex in ways that cannot necessarily be captured by an elegant equation.
Culture matters. A.I.s are now an everyday part of our society, a fact which will come to be more pronounced than most folks ever envisioned. It is already too late to start realizing this. Let’s let a boardroom conflict be our opportunity. It is possible to dream big fast, and to slow misunderstanding.