OpenAI Preparedness Framework 2025: What It Actually Guarantees About AI Safety
Table of Contents
- What Is OpenAI’s Preparedness Framework?
- How Affordance Theory Exposes AI Safety Gaps
- Which AI Risks the Framework Actually Covers
- OpenAI’s Definition of Severe Harm and Why It Matters
- The CEO Override: Unilateral Power Over AI Safety
- Competitive Dynamics: Racing to the Bottom on Safety
- Real-World Consequences of AI Safety Framework Gaps
- Comparing AI Safety Frameworks Across the Industry
- What a Robust AI Safety Framework Should Include
📌 Key Takeaways
- Only 3 of 24 risks evaluated: The framework requests systematic evaluation of just biological/chemical weapons, cybersecurity, and AI self-improvement — deprioritizing 87.5% of identified AI risk categories.
- CEO can override all safety: Sam Altman can unilaterally reject Safety Advisory Group recommendations, control SAG resourcing, and co-leads his own oversight committee — creating a structural conflict of interest.
- Medium harm means deployment: OpenAI removed Low and Medium risk levels as “not operationally involved,” meaning systems with Medium capabilities for enabling thousands of deaths face no additional safeguards.
- Competitors can lower the bar: A competitive dynamics clause allows OpenAI to reduce safety standards when competitors release dangerous AI capabilities — institutionalizing a race to the bottom.
- The framework is voluntary: Researchers conclude the document is “materially nothing more than a PDF file” that does not guarantee any specific AI risk mitigation practices.
What Is OpenAI’s Preparedness Framework?
Published on April 15, 2025, OpenAI’s Preparedness Framework Version 2 is a 22-page document that describes how the company tracks and prepares for frontier AI capabilities that could create new risks of severe harm. It replaced the Beta version from December 2023 and represents OpenAI’s most detailed public statement on how it approaches AI safety evaluation.
The framework positions itself as one component of OpenAI’s broader “safety stack” and was released in the context of the AI Seoul Summit 2024 Frontier AI Safety Commitments — voluntary pledges made by leading AI developers. However, a team of researchers from the Australian National University, MIT FutureTech, and Vanderbilt University subjected the document to rigorous analysis using affordance theory, and their findings reveal significant gaps between what the framework appears to promise and what it actually commits to.
The researchers — Sam Coggins, Alexander Saeri, Katherine Daniell, Lorenn Ruster, Jessie Liu, and Jenny Davis — applied the Mechanisms and Conditions model to systematically evaluate every policy statement in the document. Their conclusion is stark: “OpenAI’s April 2025 Preparedness Framework does not guarantee any AI risk mitigation practices.” This analysis, benchmarked against the MIT AI Risk Repository of over 1,600 identified AI risks, provides the most systematic evaluation of any major AI company’s safety framework to date.
How Affordance Theory Exposes AI Safety Gaps
The analytical framework used by the researchers draws on affordance theory — a concept from design studies pioneered by James Gibson and later developed by Jenny Davis and colleagues. The Mechanisms and Conditions model classifies how artefacts create opportunities for action along a spectrum from strongest to weakest:
Demands are the strongest mechanism — they require action and cannot be worked around. Requests ask for action but can be ignored. Encourages and Discourages make actions easier or harder respectively. Allows is the weakest — merely neutral, neither promoting nor preventing action. Refuses blocks action entirely and cannot be circumvented.
When applied to the Preparedness Framework, this methodology reveals a critical insight: the document contains many requests but virtually no demands. The framework requests evaluation of certain risk categories, but does not demand it. The CEO can ignore, override, or deprioritize any safety recommendation without structural consequence. This distinction between “asking” and “requiring” is precisely where the gap between appearance and reality opens up — a gap that traditional policy reading often misses because the language sounds authoritative even when the commitments are weak.
Which AI Risks the OpenAI Preparedness Framework Actually Covers
The MIT AI Risk Repository identifies 7 domains and 24 subdomains of AI risk, spanning everything from discrimination and toxicity to environmental harm and multi-agent risks. The Preparedness Framework’s coverage of these risks is strikingly narrow.
At the highest priority, the framework requests systematic evaluation of just three categories: biological and chemical weapons capabilities, cybersecurity (cyberattack capabilities), and AI self-improvement. These represent only 12.5% of identified risk categories.
At mid-level priority, the framework requests research but not systematic evaluation for several categories including AI pursuing its own goals, nuclear weapons development, competitive dynamics, and governance failure. Notably, competitive dynamics is framed not as a risk to mitigate but as a justification for increasing risk tolerance — a remarkable inversion of the expected safety posture.
The remaining 21 of 24 categories are merely “allowed” — the weakest possible mechanism. All of Discrimination and Toxicity (3 subdomains), all of Privacy and Security (2 subdomains), all of Misinformation (2 subdomains), all of Human-Computer Interaction (2 subdomains), and most Socioeconomic and Environmental risks receive no systematic attention. This means bias, surveillance, deepfakes, job displacement, environmental harm, and power centralisation — issues that affect billions of people today — are not guaranteed any evaluation at all. For a deeper exploration of how AI governance frameworks compare, see our interactive library on AI policy.
Transform AI safety research into interactive experiences your team and stakeholders actually engage with.
OpenAI’s Definition of Severe Harm and Why It Matters
How a company defines “severe harm” determines when safety mechanisms activate. OpenAI sets the bar at more than 1,000 deaths or grave injuries, or more than $100 billion in economic damage. This threshold has profound implications for what gets evaluated and what gets deployed.
The original Beta framework included four risk levels: Low, Medium, High, and Critical. Version 2 removed Low and Medium, stating these levels “were not operationally involved” — meaning they never triggered any safety action in practice. The implications are extraordinary: AI systems assessed at Medium capability for enabling severe harm face no additional safeguards before deployment.
The researchers highlight a concrete precedent: OpenAI deployed its o1 model after finding it had Medium capabilities for biological and chemical severe harms AND Medium capabilities for persuasion severe harms. In other words, a system assessed as having moderate capability to contribute to thousands of deaths was released to the public with no additional safety measures beyond what was already in place.
The persuasion dimension is particularly concerning. The paper notes that AI persuasion capabilities have “allegedly already facilitated numerous deaths” — and that Article 5 of the EU AI Act explicitly prohibits deployment of manipulation capabilities. What European law prohibits, OpenAI’s framework encourages deployment of, provided the risk level remains at Medium rather than High.
The CEO Override: Unilateral Power Over AI Safety
The governance structure of the Preparedness Framework concentrates extraordinary power in a single individual. The framework identifies three key actors: OpenAI Leadership (defined as “the CEO or a person designated by them”), the Safety Advisory Group (SAG), and the Safety and Security Committee (SSC) of the Board of Directors.
The CEO’s authority over the process is comprehensive. Sam Altman unilaterally determines who sits on the SAG, what resources they receive, and how much time they have to conduct evaluations. The framework explicitly states that “the SAG does not have the ability to filibuster” — meaning it cannot delay or block deployment decisions even if safety concerns remain unresolved. If the SAG raises safety objections, the CEO can simply reject them.
The oversight structure designed to check CEO power is equally compromised. The Safety and Security Committee of the Board is co-led by CEO Sam Altman himself alongside Chair Bret Taylor, with directors Adam D’Angelo and Nicole Seligman. The CEO co-leads the very committee meant to provide oversight of CEO decisions — a structural conflict of interest that the researchers identify as fundamentally undermining the framework’s governance claims.
Historical precedent reinforces these concerns. In November 2023, OpenAI’s board attempted to hold the CEO accountable and was itself overturned within days. The researchers argue this demonstrates that “the Board has already been tested in its ability to hold the CEO accountable and has failed.” The governance mechanisms that would theoretically constrain unilateral CEO action have already proven inadequate under pressure.
Competitive Dynamics: Racing to the Bottom on AI Safety
Perhaps the most concerning element of the Preparedness Framework is its explicit provision for lowering safety standards in response to competitive pressure. If another AI developer releases a system with High or Critical capability levels, OpenAI’s framework allows the company to reduce its own safeguards — with certain stated conditions.
The researchers argue this clause institutionalises a race to the bottom on AI safety. Rather than maintaining absolute safety standards regardless of competitor behaviour, the framework treats safety as a relative competitive position that can be adjusted downward when the market moves. This creates a dynamic where any single company’s decision to lower standards provides justification for all others to follow.
The three stated conditions for competitive adjustment provide thin protection. The decision ultimately rests with the same CEO who controls every other aspect of the safety evaluation process. There is no external verification mechanism, no independent assessment of whether competitive conditions genuinely warrant reduced safeguards, and no public transparency about when or why such adjustments occur.
This dynamic is not theoretical. The paper references Meta’s influence on OpenAI’s risk prioritisation criteria and notes that the Frontier Model Forum — a consortium of major US tech companies formed in 2023 — operates on similar voluntary principles. If the industry’s leading developers all adopt frameworks that allow competitive lowering of safety standards, the structural incentive is for a continuous erosion of protections rather than their strengthening. Explore more AI governance analysis in our interactive library.
Make AI policy documents accessible to every stakeholder — turn complex PDFs into interactive experiences.
Real-World Consequences of AI Safety Framework Gaps
The gap between OpenAI’s safety framework and actual risk mitigation has tangible consequences for organisations and governments that depend on OpenAI’s models. The researchers highlight the Australian Public Service as a case study: the Australian Digital Transformation Agency uses Microsoft 365 Copilot, which is underpinned by OpenAI’s models. This means an entire national government’s productivity tools rely on AI systems whose safety evaluation framework covers only 12.5% of identified risk categories.
Downstream users — organisations that deploy products built on OpenAI’s models — cannot independently verify that adequate safety evaluations have been conducted. They must rely on the upstream developer’s safety framework as their primary assurance. When that framework is, as the researchers characterise it, “materially nothing more than a PDF file,” downstream users are effectively operating without meaningful safety guarantees.
The persuasion risk illustrates this concretely. OpenAI deployed o1 with Medium persuasion capabilities. Organisations using o1-based products — for customer service, content generation, decision support — are deploying persuasion-capable AI without necessarily understanding the implications. The EU AI Act Article 5 prohibits manipulation capabilities, yet European organisations using OpenAI-powered tools may be inadvertently operating systems that conflict with their own regulatory obligations.
Comparing AI Safety Frameworks Across the Industry
OpenAI’s Preparedness Framework does not exist in isolation — it reflects a broader pattern across the AI industry. The Seoul Summit Frontier AI Safety Commitments, signed by major developers in 2024, established voluntary pledges that critics argue lack enforcement mechanisms. The researchers note that the affordance analysis method they applied to OpenAI can be replicated for any safety framework, including those from Anthropic, Meta, and Google DeepMind.
The pattern of voluntary self-governance in AI safety echoes earlier industries’ experience with self-regulation. Financial services, pharmaceuticals, and environmental protection all went through periods where industry self-regulation was promoted as sufficient, only for significant failures to demonstrate the need for external oversight. The researchers argue that AI safety is following the same trajectory, with voluntary frameworks serving primarily as public relations instruments rather than genuine safety mechanisms.
America’s AI Action Plan, published by the Executive Office of the President in 2025, and the UK DSIT framework represent governmental approaches that could provide external accountability. The EU AI Act goes further with legally binding requirements including prohibited practices, high-risk categorisation, and mandatory conformity assessments. The contrast between the EU’s binding approach and OpenAI’s voluntary framework illustrates the fundamental question: can AI safety depend on developer goodwill, or does it require regulatory enforcement?
What a Robust AI Safety Framework Should Include
The researchers’ analysis implies clear criteria for what a meaningful AI safety framework would require. Broader risk coverage is essential — evaluating only 3 of 24 risk categories leaves the vast majority of potential harms unaddressed. A comprehensive framework would systematically evaluate all identified risk domains, including discrimination, privacy, misinformation, environmental impact, and power centralisation.
Binding commitments must replace voluntary requests. In affordance terms, the framework should demand safety evaluations rather than merely request them. This means structural mechanisms that cannot be bypassed by any single individual, regardless of their position.
Independent oversight is non-negotiable. The Safety Advisory Group must have guaranteed resources, independence from CEO control, and sufficient time to conduct thorough evaluations. The oversight committee must not include the person it is meant to oversee. External auditors with genuine authority to halt deployment should be part of the governance structure.
Lower deployment thresholds would prevent the deployment of systems with Medium capabilities for severe harm. If an AI system could contribute to hundreds of deaths — even if not thousands — deployment should trigger additional safeguards, not proceed without restriction. The removal of Low and Medium risk levels was a step backward from safety, not a simplification of the framework.
Finally, external accountability mechanisms must replace the current self-referential structure. Whether through regulatory requirements like the EU AI Act, independent safety audits, or mandatory disclosure to a supervisory body, the framework must answer to someone other than the developer whose commercial interests it governs. The research team’s affordance analysis provides a replicable method for evaluating whether future frameworks meet these standards. To explore how organisations are adapting to AI governance requirements, visit our interactive library.
Turn AI safety research and policy documents into interactive experiences that drive real understanding.
Frequently Asked Questions
What is OpenAI’s Preparedness Framework?
OpenAI’s Preparedness Framework Version 2 is a 22-page voluntary self-governance document published April 15, 2025, describing how OpenAI tracks frontier AI capabilities that could create new risks of severe harm defined as over 1,000 deaths or over $100 billion in economic damage. It is not a legally binding commitment but positions itself as part of OpenAI’s broader safety stack.
How many AI risk categories does the Preparedness Framework cover?
The framework requests systematic evaluation of only 3 out of 24 risk categories identified in the MIT AI Risk Repository: biological and chemical weapons capabilities, cybersecurity, and AI self-improvement. The remaining 21 categories including discrimination, privacy, misinformation, and environmental harm are merely allowed but not systematically evaluated.
Can OpenAI’s CEO override safety recommendations?
Yes. The framework grants the CEO unilateral authority to reject Safety Advisory Group recommendations, control SAG resourcing and membership, and even lower safety standards if competitors release dangerous capabilities. The CEO also co-leads the Safety and Security Committee meant to provide oversight of CEO decisions, creating a structural conflict of interest.
What does OpenAI consider severe harm from AI?
OpenAI defines severe harm as the death or grave injury of more than 1,000 people or more than $100 billion in economic damage. Critically, the framework removed Low and Medium risk levels because they were not operationally involved, meaning AI systems with Medium capabilities for enabling thousands of deaths can be deployed without additional safeguards.
What is the competitive dynamics clause in OpenAI’s framework?
The Preparedness Framework includes a provision allowing OpenAI to lower its own safety safeguards if another AI developer releases a system with High or Critical capability levels. Critics argue this institutionalizes a race to the bottom on AI safety, where competitive pressure systematically degrades protection standards across the entire industry.