AI Safety Report 2025: Key Findings on Advanced AI Risks and Global Governance Frameworks

By Marcus Webb
·
March 20, 2026
·
14 min read

Why the AI Safety Report 2025 Matters for Every Industry
How 96 Experts Across 30 Nations Built the AI Safety Framework
AI Capability Scaling: The 10,000x Compute Projection by 2030
Malicious Use Risks: From Deepfakes to Biological Threats
AI Malfunction Risks: Bias, Reliability, and Loss of Control
Systemic Risks: Labor Markets, Inequality, and Market Concentration
Technical Safety Mitigations: Current Tools and Their Limitations
The Evidence Dilemma: Why AI Policymakers Face Impossible Choices
International AI Governance: Building Safety Frameworks for the Future
Autonomous AI Agents: The Emerging Frontier of Risk

📌 Key Takeaways

Unprecedented Expert Consensus: 96 international AI experts across 30 nations produced the most comprehensive scientific assessment of advanced AI risks to date, covering malicious use, malfunctions, and systemic threats.
Exponential Compute Growth: Training compute is increasing approximately 4x annually, with projections suggesting 100x more compute by end of 2026 and potentially 10,000x by 2030 compared to 2023 levels.
Three-Pillar Risk Framework: The report categorizes AI risks into malicious use (deepfakes, cyber offense, bio threats), malfunctions (bias, loss of control), and systemic risks (labor disruption, market concentration, environmental impact).
Safety Tools Are Immature: Current mitigations including interpretability, watermarking, and red-teaming can reduce some risks but none constitute complete solutions — layered approaches are essential.
Policy Evidence Dilemma: Governments face a fundamental tension between acting early on limited evidence or waiting too long as AI capabilities rapidly advance, requiring new governance frameworks with early-warning mechanisms.

Why the AI Safety Report 2025 Matters for Every Industry

The International Scientific Report on the Safety of Advanced AI, published in January 2025, represents a watershed moment in the global effort to understand and manage the risks posed by rapidly advancing artificial intelligence systems. Commissioned through the AI Seoul Summit process, this landmark report synthesizes the collective knowledge of 96 international experts nominated by 30 countries alongside major international organizations including the United Nations, European Union, and OECD.

Unlike previous assessments that focused on narrow applications, this AI safety report concentrates specifically on general-purpose AI — systems capable of performing a wide variety of tasks across domains. The scope encompasses the most advanced frontier models and near-term plausible future systems that could reshape industries from healthcare to finance within the next decade. For organizations integrating AI into their operations, the findings carry direct implications for risk management, compliance planning, and strategic decision-making.

The report arrives at a critical juncture. As the Libertify Interactive Library has documented through multiple analyses, the pace of AI development has accelerated dramatically, with new models demonstrating capabilities that were considered years away just months earlier. OpenAI’s o3 model, announced after the report’s writing cutoff, showed significantly stronger performance on hard benchmarks for programming, abstract reasoning, and scientific reasoning — in some tests outperforming many human experts.

How 96 Experts Across 30 Nations Built the AI Safety Framework

The report’s credibility stems from its unprecedented international collaboration. An Expert Advisory Panel nominated by 30 countries plus the UN, EU, and OECD guided the research process, ensuring diverse perspectives from nations at different stages of AI development and deployment. The writing cutoff was set at December 5, 2024, following an interim report published in May 2024.

This collaborative approach produced a document that deliberately avoids specific policy prescriptions, instead focusing on scientific evidence to inform policymakers. The methodology reflects a recognition that AI safety is inherently transnational — risks from advanced systems cross borders instantly, while the capacity to develop and deploy these systems remains concentrated in a handful of countries and corporations.

The expert panel identified a critical research gap: the lack of reliable statistics on the frequency and prevalence of AI-related harms including deepfakes, fraud, and privacy breaches. Important domains such as biosecurity and classified cyber capabilities have limited open evidence, while the opacity of training datasets and closed-source models reduces the ability of third-party researchers to evaluate risks independently.

AI Capability Scaling: The 10,000x Compute Projection by 2030

Perhaps the most striking data in the report concerns the trajectory of AI capability growth. Recent state-of-the-art models have experienced roughly 4x annual increases in compute used for training and approximately 2.5x annual increases in training dataset size. These are not marginal improvements — they represent exponential scaling that fundamentally changes what AI systems can accomplish.

The report projects two critical milestones if current trends continue and combine with algorithmic gains. By the end of 2026, some frontier models may be trained using approximately 100x more training compute than 2023’s most compute-intensive models. By 2030, that figure could reach an extraordinary 10,000x increase. These projections are contingent on continued trends in compute availability, data access, and algorithmic efficiency.

Two distinct scaling modalities drive this acceleration. Training scaling involves larger models, more compute, and more data. Inference-time scaling allows models to use more compute during problem-solving, including longer chains-of-thought and iterative self-improvement. The combination of both approaches has produced breakthrough results that surprised even the researchers who developed them.

DeepSeek’s R1 model, released in January 2025, highlighted efforts to reduce inference costs while maintaining high performance, illustrating that scaling improvements are not confined to the largest technology companies. This democratization of capability has both positive implications for innovation and concerning implications for misuse potential.

Transform complex AI safety reports into interactive experiences your team will actually engage with.

Try It Free →

Malicious Use Risks: From Deepfakes to Biological Threats

The report’s first risk category addresses deliberate exploitation by bad actors. Established harms already occurring include targeted deepfakes and non-consensual intimate imagery (NCII), scams and fraud at unprecedented scale, and political manipulation through synthetic media. The National Institute of Standards and Technology (NIST) has documented the growing sophistication of these threats.

More concerning are the emerging risks identified by the panel. Scaling improvements in scientific reasoning and coding capabilities create additional concerns for cyber offense and biological misuse. One major AI company cited in the report raised its biological-risk assessment for its best model from “low” to “medium” after recent capability increases — a notable escalation that signals the pace at which risk profiles are evolving.

The report specifically examines how advanced AI systems could lower barriers to creating biological and chemical weapons. While current systems do not provide complete instructions, the trajectory of capability improvement suggests this threshold could be approached within years rather than decades. The expert consensus is that this category of risk requires particular attention because the consequences of failure are catastrophic and largely irreversible.

Cyber offense capabilities represent another area of acute concern. As AI systems become more proficient at discovering software vulnerabilities and generating exploit code, the asymmetry between offense and defense may shift further in favor of attackers. The report notes that AI-powered attacks could operate at machine speed, overwhelming traditional human-led defense mechanisms.

AI Malfunction Risks: Bias, Reliability, and Loss of Control

The second risk category covers unintended AI behaviors that cause harm even when systems are used as designed. Reliability failures — where models produce confidently stated but incorrect outputs — remain a fundamental challenge. Bias and discrimination embedded in training data perpetuate systemic inequities across hiring, lending, healthcare, and criminal justice applications.

The report introduces particular attention to loss-of-control scenarios, where AI systems operate outside meaningful human oversight. As models become more capable and autonomous, the gap between what humans can verify and what systems can do widens. The emergence of AI agents — autonomous systems that can plan, act on computer systems or in the physical world, and delegate tasks — amplifies these concerns substantially.

Interpretability remains a critical challenge. Despite advances in mechanistic interpretability and probing techniques, researchers still cannot reliably explain why large language models produce specific outputs. This opacity makes it difficult to predict when systems will fail, to identify embedded biases, or to verify that safety constraints are being followed. The report characterizes current interpretability tools as useful but fundamentally insufficient for the scale of assurance needed.

Systemic Risks: Labor Markets, Inequality, and Market Concentration

The third and most far-reaching risk category examines macro-scale consequences from widespread AI deployment. Labor market disruption tops the list, with the report noting that general-purpose AI’s ability to automate cognitive tasks across industries could produce displacement at a pace and scale that existing social safety nets are not designed to handle.

The global R&D and compute divide between high-income and low- and middle-income countries (LMICs) threatens to exacerbate existing inequalities. The concentration of AI development capacity in a few nations and corporations creates dependencies that could have geopolitical consequences. The report calls for targeted capacity building and compute access programs for LMICs to prevent an AI development gap from becoming permanent.

Market concentration presents another systemic risk. The enormous capital requirements for training frontier models — running into billions of dollars — create natural monopolies and single points of failure. If critical AI infrastructure is controlled by a handful of companies, disruptions to those systems could cascade across economies. The OECD’s AI Policy Observatory has been tracking these concentration dynamics closely.

Environmental impact receives significant attention in the report. The energy consumption associated with training and deploying large AI models is substantial and growing. As compute scales by orders of magnitude, the carbon footprint of AI development could become a significant factor in global emissions targets unless efficiency improvements keep pace. Organizations exploring AI adoption can find related analyses on environmental sustainability reports in our library.

Make AI governance reports engaging — convert PDFs into interactive video experiences in minutes.

Get Started →

Technical Safety Mitigations: Current Tools and Their Limitations

The report provides a comprehensive assessment of available safety measures, organized into training-time and deployment-time interventions. Training safer models involves data curation to remove harmful content, fine-tuning for alignment with human values, and reinforcement learning from human feedback (RLHF) methods. These approaches have produced measurable improvements in model behavior but remain imperfect — adversarial users can often bypass safety training through creative prompting.

Runtime controls include sandboxing AI systems to limit their access to external resources, implementing access controls that restrict which users can invoke which capabilities, deploying content moderation filters, and maintaining human-in-the-loop oversight for high-stakes decisions. The report emphasizes that these controls must be layered rather than relied upon individually.

Watermarking AI-generated content has emerged as a promising approach for attribution and detection, but the report notes that current watermarking methods can be circumvented by moderately sophisticated adversaries. Similarly, differential privacy techniques can protect individual data points during training but involve trade-offs with model performance that limit practical adoption.

Red-teaming — the practice of adversarially testing AI systems to discover failure modes — is widely adopted by frontier model developers but lacks standardization. The report recommends developing common evaluation frameworks, third-party audit regimes, and metrics that allow meaningful comparison of safety properties across different systems and developers. Additional interactive analyses exploring AI governance are available through the Libertify Interactive Library.

The Evidence Dilemma: Why AI Policymakers Face Impossible Choices

One of the report’s most significant conceptual contributions is the articulation of the “evidence dilemma” facing policymakers worldwide. Governments must decide how to regulate AI systems based on rapidly changing, often contradictory evidence. Acting early on limited evidence risks over-regulation that stifles beneficial innovation or mis-regulation that fails to address actual risks. Waiting for stronger evidence may be too late if rapid capability jumps occur.

The report proposes several governance innovations to navigate this dilemma. Early-warning systems would monitor capability developments and trigger policy responses when predefined thresholds are crossed. Pre-release safety evidence requirements would shift the burden to developers to demonstrate safety before wide deployment. Information-sharing mechanisms between companies, researchers, and regulators would improve collective understanding of emerging risks.

The concept of safety triggering mechanisms — predetermined capability or behavior thresholds that automatically activate stronger mitigations, audits, or restrictions — offers a way to prepare regulatory responses in advance. Rather than reacting to incidents after they occur, policymakers could establish graduated frameworks that scale oversight proportionally to capability increases.

International coordination represents perhaps the most challenging governance requirement. AI risks are transnational by nature — a model developed in one country can cause harm worldwide within hours of deployment. Yet regulatory approaches vary dramatically across jurisdictions, from the EU AI Act’s comprehensive regulation to more market-oriented approaches in other regions. The report emphasizes that this fragmentation itself constitutes a risk, as developers may simply deploy from the least regulated jurisdiction.

International AI Governance: Building Safety Frameworks for the Future

The report outlines a multi-layered approach to international AI governance built around five priority areas. First, improving measurement and evaluation through standardized benchmarks and third-party evaluation regimes that track both capability progress and safety improvements. Referenced benchmarks include GPQA, SWE-bench, ARC-AGI, AIME 2024, and FrontierMath, each designed to test specific reasoning capabilities.

Second, strengthening developer obligations through pre-release testing requirements, staged deployment protocols, and mandatory red-teaming. The report stops short of prescribing specific regulatory frameworks but makes clear that voluntary safety commitments alone are insufficient given the competitive pressures in AI development.

Third, developing mechanisms for early-warning and incident reporting that enable information sharing between companies, researchers, and regulators. The opacity of frontier AI development currently leaves regulators with inadequate visibility into capability advances that may have safety implications. Cross-border incident response protocols are needed to handle AI-related harms that span jurisdictions.

Fourth, investing substantially in mitigation research and capacity building. The report specifically calls for funding interpretability research, robust evaluation methodologies, adversarial testing techniques, and secure deployment architectures. Building regulatory capacity in LMICs is identified as essential to preventing an AI governance gap from emerging alongside the technology gap.

Fifth, adopting layered risk management that combines technical, legal, and socio-technical safeguards. No single intervention is sufficient — effective safety requires safer model training, runtime monitoring, access controls, legal frameworks, awareness campaigns, education programs, and workforce transition support working in concert.

Autonomous AI Agents: The Emerging Frontier of Risk

The report dedicates particular attention to AI agents — autonomous, goal-oriented systems capable of planning, acting on computer systems or in the physical world, and delegating tasks with minimal human oversight. This emerging architecture represents a qualitative shift from AI as a tool to AI as an independent actor, raising governance challenges that existing frameworks are not designed to address.

Agent systems can chain multiple capabilities together, browse the internet, write and execute code, interact with APIs, and coordinate with other agents. The report notes that these systems introduce risks beyond those of individual models, including cascading failures when agent chains malfunction, difficulty attributing responsibility when autonomous decisions cause harm, and challenges in maintaining meaningful human oversight over multi-step processes that operate faster than human comprehension.

The report emphasizes that the future path of general-purpose AI is not predetermined. Societal choices, regulation, and coordinated risk management will shape outcomes. The report notes that non-technical variables matter significantly: regulatory regimes, capital availability, chip supply, energy constraints, and corporate incentives will all affect capability trajectories and risk profiles. These factors are inherently difficult to predict, which reinforces the need for adaptive governance frameworks that can respond to changing circumstances rather than rigid rules that may become obsolete.

For organizations navigating these challenges, the key takeaway is clear: AI safety is not a technical problem alone but a sociotechnical challenge requiring coordinated action across developers, deployers, policymakers, and civil society. The 2025 report provides the most comprehensive evidence base to date for making informed decisions about AI adoption, risk management, and governance participation.

Turn dense AI safety reports into compelling interactive content — engage stakeholders with Libertify.

Start Now →

Frequently Asked Questions

What is the International Scientific Report on AI Safety 2025?

The International Scientific Report on AI Safety 2025 is a landmark document authored by 96 international AI experts, commissioned through the AI Seoul Summit process. It synthesizes the current scientific understanding of risks from advanced general-purpose AI systems, covering malicious use, malfunctions, and systemic risks across 30 nations and multiple international organizations including the UN, EU, and OECD.

What are the three main categories of AI risk identified in the report?

The report identifies three primary risk categories: malicious use risks (deliberate exploitation including deepfakes, cyber attacks, and biological misuse), malfunction risks (unintended failures such as bias, discrimination, and loss of control), and systemic risks (large-scale consequences including labor market disruption, market concentration, environmental impact, and global inequality in AI access).

How fast is AI compute scaling according to the 2025 safety report?

The report documents approximately 4x annual increases in compute used for training state-of-the-art models and roughly 2.5x annual increases in training dataset size. If these trends continue alongside algorithmic improvements, frontier models could use 100x more training compute by end of 2026 and up to 10,000x more by 2030, compared to 2023 levels.

What is the evidence dilemma for AI policymakers described in the report?

The evidence dilemma refers to policymakers facing a fundamental tension: acting early on limited evidence risks over-regulation or mis-regulation, while waiting for stronger evidence may be too late if rapid capability jumps occur. The report recommends new governance designs including early-warning systems and pre-release safety evidence requirements to address this challenge.

What technical safety measures does the AI Safety Report 2025 recommend?

The report recommends a layered approach combining multiple mitigations: interpretability tools for understanding model behavior, watermarking for AI-generated content, differential privacy techniques, runtime monitoring and sandboxing, red-teaming and adversarial testing, safer training practices including RLHF, staged deployment protocols, and pre-release safety evaluations. However, the report emphasizes that none of these methods alone constitutes a complete solution.

Your documents deserve to be read.

PDFs get ignored. Presentations get skipped. Reports gather dust.

Libertify transforms them into interactive experiences people actually engage with.

Transform Your First Document Free →

No credit card required · 30-second setup