Who’s Responsible for AI Safety?

By Ankesh Chandaria, Strategy Advisor and AI Researcher

May 22, 2024

(

May 22, 2024

ai safety

large language models

ethics

tl;dr

The past few weeks have brought the issue of AI Safety to the fore in two contrasting ways: first, OpenAI launched GPT-4o, a promising new model that will be free for public use. Around the same time, Ilya Sutskever (OpenAI's renowned co-founder and chief scientist) left the company amidst reports that OpenAI had dissolved its long-term AI Safety team. Although the company reiterated its commitment to safe development, questions linger. Second, the UK's Department of Science, Innovation and Technology released its interim International Scientific Report on the Safety of Advanced AI. The report focuses on general-purpose AI and explores various risks as well as some technical risk mitigation techniques. It also leaves plenty unanswered. With so much uncertainty, it's imperative we seek to resolve the most practically-relevant question as soon as possible: who's responsible for AI Safety?

The Big Red Button

Imagine you’re on a spaceship. Beyond a window, you can make out the multicolour planet that is our home: its vivid blues, deep greens, and the rivers of city lights that run a stunning white-gold in the dark of night.

Before you, you have a big red button. You don’t know exactly what will happen if you push this button. What you do know (and this much is guaranteed) is that if you push it, you’ll make $100,000. You’ve been told there’s a chance that pushing the button will make some other people somewhere down there on Earth better off. There’s just one tiny catch. There’s also a chance that pushing the button could wipe out human life as we know it.

Would you push the button?

What if, instead of wiping out human life, there’s simply a chance that the next push might set humanity on a course towards a dystopian future? We wouldn’t know that we were headed that way. There might be no turning back. The end wouldn’t be today – it might be a hundred years from now – but it wouldn’t have happened but for the push of that button.

What about now?

Maybe knowing there’ll be money in your pocket is enough to take any risk.

I’d like to think it’s more likely that (since you’re the sort of measured and thoughtful person taking the time to read this article) you’d want to know a little more before making such a monumental decision. If people are going to benefit, perhaps you’d want to know who those people are. Is it only the upper and middle class who’d be better off? People in “developed” countries? Maybe you’d want to know what form that prosperity would take. Greater wealth? Fulfilling employment? A cure for cancer?

You’ll probably also want to know the odds of a bad outcome.

Unfortunately, the only certainty is that every push of the button will take humanity closer to the tipping point. The next push might be the one to send us over the edge. But it might also do some serious good.

The big red button is the problem of advanced AI development. Does the next groundbreaking innovation usher in a bountiful future or a dystopian end-of-days? No one can seem to agree because, of course, no one actually knows. In any case, far removed from the furthest-reaching consequences of their actions and incentivized to create exciting new technology, innovators may not have the expertise (or, in a worst-case scenario, desire) to grapple with such a complicated risk/reward analysis. They may just push the button.

So what should we do?

AI Safety researchers approach the development of artificial intelligence with an interest in mitigating misuse, preventing accidents and bringing to light latent issues that may become more problematic as technology evolves. At face value, this approach might sound like a bit of a buzzkill. In the face of all this exciting technology, a safety-first approach demands additional costs and invariably slows down innovation. There’s certainly a case to be made that we should stop and smell the roses (see for example, Regina Rini’s argument that we ought not to “miss the magic” [1]).

However, I would argue that when you face the sort of tremendous uncertainty that comes with powerful, nascent technology, it is essential to approach each decision as if the worst outcome is just as likely as the best. Where potential outcomes have seismic implications, a cautious approach should really not even be up for debate.

That’s not to say we shouldn’t enjoy the benefits of such incredible developments. Rather, it’s to insist that we must do so responsibly and not put individual benefit, wealth, or the excitement of new discovery in front of significant downside risk.

You may be wondering why I bring this up now.

Two things happened these past couple of weeks that have brought this issue to the fore: (1) an eventful week at OpenAI; and (2) the release of the interim International Scientific Report on the Safety of Advanced AI by the UK’s DSIT and AI Safety Institute.

OpenAI: Super Dis-Alignment?

As of late, there has rarely been a day in which OpenAI has not been in the news. The past couple of weeks have been no exception. On 13 May 2024, the company grabbed headlines by announcing that its GPT-4o model would be made freely available to the public. The launch video features Mira Murati, OpenAI’s CTO, and others at the company using the system in impressive (and, depending on where you fall on the existential risk spectrum, seriously troubling) ways.

‍

We’ll leave aside the strategic tone-deafness (come on guys, this stuff is just common sense!) of the company’s decision to give the LLM’s interface a creepy-and-oddly-flirtatious-Scarlett-Johansson-from-Her-esque voice (so much so the actress herself seemed troubled by it [2]), and instead contextualize this launch within what else is going on at the company.

You may recall last year’s OpenAI board issues, which saw Sam Altman removed as CEO by co-founder Ilya Sutskever [3]. Sutskever quickly repented, after which Altman was promptly reinstated and most of the rest of the board fired.

On 14 May, 2024, Sutskever announced that he was leaving OpenAI with no clear indication of what prompted this decision [4]. Not long after, news broke that another of OpenAI’s researchers, Daniel Kokotajlo, had left the firm citing loss of confidence that an Altman-led company would “behave responsibly around the time of AGI” [5]. It was also revealed that OpenAI has axed its long-term AI Safety team – the Superalignment Team – which had been launched by Sutskever last year [6]. OpenAI has since responded on X that safety remains a priority; however, the dissolution of the team specifically intended to deal with long-term safety risk certainly leaves some serious questions. From an organizational perspective, notwithstanding any good faith desire to make sure that the company culture prioritizes safety, it is incredibly difficult to incentivize developers towards both innovation and caution. The strength of a distinct alignment team is really in a basic separation of powers: the removal of a conflict of interest and the ability for the safety team to do its job with incentives that are independent of the innovation drivers that spur developers forward.

The point here is not to focus on OpenAI as OpenAI-the-company, but as a paradigm of the sort of large, private entity that produces and controls the few general AI models we seem to be settling on. It is concerning that any such company would be driving full force into the future (a shiny new GPT-4o!) without the same guardrails that it had just a few months ago.

Of course, I can’t speak for how OpenAI actually approaches safety due to opaqueness around the operational structure. What I can articulate is that, for anyone watching from the sidelines, these developments serve as a reminder of the ongoing tension between development and safety -- particularly in private enterprise -- and why it is essential to advocate for a clear and trustworthy approach to AI Safety across the board. It can't simply be left in company hands.

The Report

This brings us to the interim report released by DSIT (the UK’s Department for Science, Innovation and Technology) and the AI Safety Institute on 17 May 2024. Coming in at around 90 pages, the report is a compilation of the work of 75 international experts.

The report focuses on general-purpose AI (not to be confused with AGI, artificial general intelligence). Technically, the authors consider an AI model to be general-purpose if it can “perform, or can be adapted to perform, a wide variety of tasks” [7]. An AI system is considered general purpose if “based on a general-purpose model, but also if based on a specialized model that was derived from a general-purpose model” [8]. This encompasses, for example, OpenAI’s SORA, Google’s AlphaFold and large language models like GPT-4 or Meta’s Llama-3. These general-purpose models and systems can be contrasted against ‘narrow AI’, which is more domain-specific – i.e., designed to do a single task. The report does not comment on narrow AI tools, their effects, or associated risks.

Is it worth a read?

Yes, so long as you take it for what it is: a summary of the current state of play with respect to general-purpose AI. It’s an excellent primer on the subject.

The first two sections of the report explore the capabilities of general-purpose AI (present and future) and methodologies for assessing and understanding these systems. The remainder of the report is dedicated to an articulation of the various risks that comes with this sort of AI, as well as some of the technical approaches being explored to mitigate those risks. Identified risks include:

Malicious use risks: individual harm through fake content, disinformation, cyber offence / hacking, dual-use science risks
Risk from malfunctions: product functionality issues, bias and underrepresentation, loss of control
Systemic risks: labour, global AI divide, market concentration, environmental, privacy, copyright infringement

Almost every section features a familiar refrain: one version or the other of a caveat expressing significant uncertainty — whether this is due to nascent ideas, disagreement between experts, or simply no way of measuring the issue at hand. This shouldn’t be surprising as much of this work is in its infancy. However, if you’re looking for clear answers, this is not the place to find them.

With that in mind, the report does venture into systemic risks that most users of general-purpose AI might not typically consider. While many readers will have read about labour market concerns (the robots are going to take our jobs!) and loss of control risks (AI is going to take over!), it’s less likely that they will have considered issues such as the global AI divide, market concentration risks, environmental concerns, or even privacy risks. The report also extends to a review of cross-cutting risk factors such as issues stemming from the speed of proliferation of AI tech or the disincentives AI developers have to mitigate risks due to market share competition. Even if for nothing else, I would recommend flipping through these sections (4.3 and 4.4).

The report is descriptive rather than prescriptive. While I was hoping for more non-technical suggestions for how to address the risks identified, the authors determined this to be outside the scope of the report. As such, it seemed like there was a missed opportunity to offer some much-needed substantive guidance with respect to the technical risk mitigation techniques summarized in section 5. But these are early days and getting 75 experts to agree on prescriptive solutions may well be asking for far too much. They alone can't be responsible for AI Safety. That said, guidance has to originate from somewhere.

Finally, it’s worth acknowledging that this is an interim report and was (by the Chair’s own admission) extremely rapidly put together. The next iteration will no doubt be even more comprehensive.

Closing Words

It is reassuring that some of the smartest minds in the world continue to come together to discuss and think seriously about AI Safety issues. However, it is concerning that at the same time a company like OpenAI seems to be bounding forth with innovation whilst being unable to publicly keep its house in order when it comes to questions of safety and alignment.

I don’t want to take away from the magic of a development like GPT-4o, but it’s hard not to fret about how many times a minimally-regulated private entity might push the big red button while others sit around and discuss the risks of doing so in a manner that has no real bearing on how those with that tempting button in front of them decide to act.

So who’s responsible for AI Safety? If companies can’t do the work properly due to conflicting incentives internally (compounded by the pressure to make their investors a good return) and it might not be appropriate to place the burden on experts alone, what’s left? Regulatory bodies and legislators? In front of a U.S. Senate subcommittee in 2023, OpenAI's Altman seemed to all but implore the government to regulate AI [9]. Meanwhile, recent attempts at legislation (e.g., the E.U.'s groundbreaking Artificial Intelligence Act) which try to balance flexibility and certainty, are being criticized for not doing enough [10]. What about individual developers, hobbyists, and end-users? A case can be made that so long as AI is a tool, ultimate responsibility rests with the one who wields it. The question is really: who ought to be responsible for AI Safety? The answer, it seems to me, is everyone. And in order to make sure we get there, all the necessary carrots and sticks need to be in place. But that’s a matter for another article.

References

[1] Rini, R. (2023, March 16). "Don't Miss the Magic". Philosophers on Next-Generation Large Language Models - Daily Nous. Daily Nous - News for & About the Philosophy Profession. https://dailynous.com/2023/03/16/philosophers-on-next-generation-large-language-models/#rini

[2] Scarlett Johansson hits out at ‘eerily similar’ OpenAI chatbot voice. (n.d.). Financial Times. https://www.ft.com/content/10416e10-3462-45c5-8a8f-c684983b179d

[3] Peters, J., & Heath, A. (2023, November 17). Sam Altman fired as CEO of OpenAI. The Verge. https://www.theverge.com/2023/11/17/23965982/openai-ceo-sam-altman-fired

[4] Metz, C. (2024, May 15). OpenAI’s chief scientist, Ilya Sutskever, is leaving the company. The New York Times. https://www.nytimes.com/2024/05/14/technology/ilya-sutskever-leaving-openai.html

[5] Tangermann, V. (2024, May 13). OpenAI Safety Worker Quit Due to Losing Confidence Company “Would Behave Responsibly Around the Time of AGI.” Futurism. https://futurism.com/openai-safety-worker-quit-confidence-agi

[6] Field, H. (2024, May 18). OpenAI dissolves team focused on long-term AI risks, less than one year after announcing it. CNBC. https://www.cnbc.com/2024/05/17/openai-superalignment-sutskever-leike.html

[7] DSIT & AI Safety Institute. (2024). International Scientific Report on the Safety of Advanced AI (DSIT research paper series number 2024/009). DSIT. https://assets.publishing.service.gov.uk/media/66474eab4f29e1d07fadca3d/international_scientific_report_on_the_safety_of_advanced_ai_interim_report.pdf, page 16

[8] Ibid.

[9] Kang, C. (2023, May 16). Sam Altman, ChatGPT creator and OpenAI CEO, urges Senate for AI regulation. The New York Times. https://www.nytimes.com/2023/05/16/technology/openai-altman-artificial-intelligence-regulation.html

[10] Davies, P. (2024, March 22). EU AI Act reaction: Tech experts say the world’s first AI law is “historic” but “bittersweet.” Euronews. https://www.euronews.com/next/2024/03/16/eu-ai-act-reaction-tech-experts-say-the-worlds-first-ai-law-is-historic-but-bittersweet