The Multi-Stakeholder Approach

In this section, we put forward a framework to analyze the functions and relationships between stakeholders in AI governance. And in this framework, we outline the following three main entities.

Government Agencies

Government Agencies oversee AI policies using legislative, judicial, and enforcement powers, as well as engage in international cooperation.

**Example frontier AI lifecycle.**
Frontier AI Regulation: Managing Emerging Risks to Public Safety (Anderljung et al., 2023)

Recommended Papers List

Bipartisan Framework for U.S. AI Act
Click to have a preview.
On September 8, 2023, US Senator, Richard Blumenthal, announced on Twitter that they, along with US Senator, Josh Hawley, introduced a bipartisan framework for the US AI Act. According to Blumenthal, the bipartisan framework is a comprehensive legislative blueprint for real and enforceable artificial intelligence (AI) protections.
EU AI Act: first regulation on artificial intelligence
Click to have a preview.
The use of artificial intelligence in the EU will be regulated by the AI Act, the world’s first comprehensive AI law. Find out how it will protect you. A man faces a computer generated figure with programming language in the background
This illustration of artificial intelligence has in fact been generated by AI As part of its digital strategy, the EU wants to regulate artificial intelligence (AI) to ensure better conditions for the development and use of this innovative technology. AI can create many benefits, such as better healthcare; safer and cleaner transport; more efficient manufacturing; and cheaper and more sustainable energy. In April 2021, the European Commission proposed the first EU regulatory framework for AI. It says that AI systems that can be used in different applications are analysed and classified according to the risk they pose to users. The different risk levels will mean more or less regulation. Once approved, these will be the world’s first rules on AI.
Frontier ai regulation: Managing emerging risks to public safety
Click to have a preview.
Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term" frontier AI" models: highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety. Frontier AI models pose a distinct regulatory challenge: dangerous capabilities can arise unexpectedly; it is difficult to robustly prevent a deployed model from being misused; and, it is difficult to stop a model’s …
Why and How Governments Should Monitor AI Development
Click to have a preview.
In this paper we outline a proposal for improving the governance of artificial intelligence (AI) by investing in government capacity to systematically measure and monitor the capabilities and impacts of AI systems. If adopted, this would give governments greater information about the AI ecosystem, equipping them to more effectively direct AI development and deployment in the most societally and economically beneficial directions. It would also create infrastructure that could rapidly identify potential threats or harms that could occur as …

Industry and AGI Labs

Industry and AGI Labs research and deploy AI technologies, making them subjects of the governance framework while proposing techniques to govern themselves and affecting governance policy.

**The main steps of Alaga et al. proposed evaluation-based coordination scheme.**
Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers (Alaga et al., 2023)

Recommended Papers List

Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers
Click to have a preview.
As artificial intelligence (AI) models are scaled up, new capabilities can emerge unintentionally and unpredictably, some of which might be dangerous. In response, dangerous capabilities evaluations have emerged as a new risk assessment tool. But what should frontier AI developers do if sufficiently dangerous capabilities are in fact discovered? This paper focuses on one possible response: coordinated pausing. It proposes an evaluation-based coordination scheme that consists of five main steps: (1) Frontier AI models are evaluated for dangerous capabilities. (2) Whenever, and each time, a model fails a set of evaluations, the developer pauses certain research and development activities. (3) Other developers are notified whenever a model with dangerous capabilities has been discovered. They also pause related research and development activities. (4) The discovered capabilities are analyzed and adequate safety precautions are put in place. (5) Developers only resume their paused activities if certain safety thresholds are reached. The paper also discusses four concrete versions of that scheme. In the first version, pausing is completely voluntary and relies on public pressure on developers. In the second version, participating developers collectively agree to pause under certain conditions. In the third version, a single auditor evaluates models of multiple developers who agree to pause if any model fails a set of evaluations. In the fourth version, developers are legally required to run evaluations and pause if dangerous capabilities are discovered. Finally, the paper discusses the desirability and feasibility of our proposed coordination scheme. It concludes that coordinated pausing is a promising mechanism for tackling emerging risks from frontier AI models. However, a number of practical and legal obstacles need to be overcome, especially how to avoid violations of antitrust law.
FACT SHEET: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI
Click to have a preview.
Since taking office, President Biden, Vice President Harris, and the entire Biden-Harris Administration have moved with urgency to seize the tremendous promise and manage the risks posed by Artificial Intelligence (AI) and to protect Americans’ rights and safety. As part of this commitment, President Biden is convening seven leading AI companies at the White House today – Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI – to announce that the Biden-Harris Administration has secured voluntary commitments from these companies to help move toward safe, secure, and transparent development of AI technology.
Pause Giant AI Experiments: An Open Letter
Click to have a preview.
AI systems with human-competitive intelligence can pose profound risks to society and humanity, as shown by extensive research and acknowledged by top AI labs. As stated in the widely-endorsed Asilomar AI Principles, Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources. Unfortunately, this level of planning and management is not happening, even though recent months have seen AI labs locked in an out-of-control race to develop and deploy ever more powerful digital minds that no one – not even their creators – can understand, predict, or reliably control.
Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries
Click to have a preview.
Companies like OpenAI, Google DeepMind, and Anthropic have the stated goal of building artificial general intelligence (AGI) - AI systems that perform as well as or better than humans on a wide variety of cognitive tasks. However, there are increasing concerns that AGI would pose catastrophic risks. In light of this, AGI companies need to drastically improve their risk management practices. To support such efforts, this paper reviews popular risk assessment techniques from other safety-critical industries and suggests ways in which AGI companies could use them to assess catastrophic risks from AI. The paper discusses three risk identification techniques (scenario analysis, fishbone method, and risk typologies and taxonomies), five risk analysis techniques (causal mapping, Delphi technique, cross-impact analysis, bow tie analysis, and system-theoretic process analysis), and two risk evaluation techniques (checklists and risk matrices). For each of them, the paper explains how they work, suggests ways in which AGI companies could use them, discusses their benefits and limitations, and makes recommendations. Finally, the paper discusses when to conduct risk assessments, when to use which technique, and how to use any of them. The reviewed techniques will be obvious to risk management professionals in other industries. And they will not be sufficient to assess catastrophic risks from AI. However, AGI companies should not skip the straightforward step of reviewing best practices from other industries.
Towards best practices in AGI safety and governance: A survey of expert opinion
Click to have a preview.
A number of leading AI companies, including OpenAI, Google DeepMind, and Anthropic, have the stated goal of building artificial general intelligence (AGI)-AI systems that achieve or exceed human performance across a wide range of cognitive tasks. In pursuing this goal, they may develop and deploy AI systems that pose particularly significant risks. While they have already taken some measures to mitigate these risks, best practices have not yet emerged. To support the identification of best practices, we sent a survey to 92 leading …

Third Parties

Third Parties, including academia, non-governmental organizations (NGOs), and non-profit organizations (NPOs), perform not only auditing on corporate governance, AI systems, and their applications but also assist the government in policy-making.

**Blueprint for how to audit LLMs: A three-layered approach.**
Auditing large language models: a three‑layered approach (Mökander et al., 2023)

Recommended Papers List

GPT-4 Technical Report
Click to have a preview.
We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4’s performance based on models trained with no more than 1/1,000th the compute of GPT-4.
Auditing large language models: a three-layered approach
Click to have a preview.
Large language models (LLMs) represent a major advance in artificial intelligence (AI) research. However, the widespread use of LLMs is also coupled with significant ethical and social challenges. Previous research has pointed towards auditing as a promising governance mechanism to help ensure that AI systems are designed and deployed in ways that are ethical, legal, and technically robust. However, existing auditing procedures fail to address the governance challenges posed by LLMs, which display emergent capabilities and are adaptable to a wide range of downstream tasks. In this article, we address that gap by outlining a novel blueprint for how to audit LLMs. Specifically, we propose a three-layered approach, whereby governance audits (of technology providers that design and disseminate LLMs), model audits (of LLMs after pre-training but prior to their release), and application audits (of applications based on LLMs) complement and inform each other. We show how audits, when conducted in a structured and coordinated manner on all three levels, can be a feasible and effective mechanism for identifying and managing some of the ethical and social risks posed by LLMs. However, it is important to remain realistic about what auditing can reasonably be expected to achieve. Therefore, we discuss the limitations not only of our three-layered approach but also of the prospect of auditing LLMs at all. Ultimately, this article seeks to expand the methodological toolkit available to technology providers and policymakers who wish to analyse and evaluate LLMs from technical, ethical, and legal perspectives.
Evaluating Language-Model Agents on Realistic Autonomous Tasks
Click to have a preview.
In this report, we explore the ability of language model agents to acquire resources, create copies of themselves, and adapt to novel challenges they encounter in the wild. We refer to this cluster of capabilities as “autonomous replication and adaptation” or ARA. We believe that systems capable of ARA could have widereaching and hard-to-anticipate consequences, and that measuring and forecasting ARA may be useful for informing measures around security, monitoring, and alignment. Additionally, once a system is capable of ARA, placing bounds on a system’s capabilities may become significantly more difficult. We construct four simple example agents that combine language models with tools that allow them to take actions in the world. We then evaluate these agents on 12 tasks relevant to ARA. We find that these language model agents can only complete the easiest tasks from this list, although they make some progress on the more challenging tasks. Unfortunately, these evaluations are not adequate to rule out the possibility that near-future agents will be capable of ARA. In particular, we do not think that these evaluations provide good assurance that the “next generation” of language models (eg 100x effective compute scaleup on existing models) will not yield agents capable of ARA, unless intermediate evaluations are performed during pretraining. Relatedly, we expect that fine-tuning of the existing models could produce substantially more competent agents, even if the fine-tuning is not directly targeted at ARA.
Model Card and Evaluations for Claude Models
Click to have a preview.
This report includes the model card [1] for Claude models, focusing on Claude 2, along with the results of a range of safety, alignment, and capabilities evaluations. We have been iterating on the training and evaluation of Claude-type models since our first work on Reinforcement Learning from Human Feedback (RLHF) [2]; the newest Claude 2 model represents a continuous evolution from those early and less capable ‘helpful and harmless’ language assistants.
Model evaluation for extreme risks
Click to have a preview.
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through “dangerous capability evaluations”) and the propensity of models to apply their capabilities for harm (through “alignment evaluations”). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.
Update on ARC’s recent eval efforts
Click to have a preview.
We believe that capable enough AI systems could pose very large risks to the world. We do not think today’s systems are capable enough to pose these sorts of risks, but we think that this situation could change quickly and it’s important to be monitoring the risks consistently. Because of this, ARC is partnering with leading AI labs such as Anthropic and OpenAI as a third-party evaluator to assess potentially dangerous capabilities of today’s state-of-the-art ML models. The dangerous capability we are focusing on is the ability to autonomously gain resources and evade human oversight.
Aligning AI regulation to sociotechnical change
Click to have a preview.
How do we regulate a changing technology, with changing uses, in a changing world? This chapter argues that while existing (inter) national AI governance approaches are important, they are often siloed. Technology-centric approaches focus on individual AI applications; law-centric approaches emphasize AI’s effects on pre-existing legal fields or doctrines. This chapter argues that to foster a more systematic, functional and effective AI regulatory ecosystem, policy actors should instead complement these approaches with a regulatory perspective that emphasizes how, when, and why AI applications enable patterns of ‘sociotechnical change’. Drawing on theories from the emerging field of ‘TechLaw’, it explores how this perspective can provide informed, more nuanced, and actionable perspectives on AI regulation.
What’s next for ai ethics, policy, and governance? a global overview
Click to have a preview.
Since 2016, more than 80 AI ethics documents - including codes, principles, frameworks, and policy strategies - have been produced by corporations, governments, and NGOs. In this paper, we examine three topics of importance related to our ongoing empirical study of ethics and policy issues in these emerging documents. First, we review possible challenges associated with the relative homogeneity of the documents’ creators. Second, we provide a novel typology of motivations to characterize both obvious and less obvious goals of the documents. Third, we discuss the varied impacts these documents may have on the AI governance landscape, including what factors are relevant to assessing whether a given document is likely to be successful in achieving its goals.