—
0:00
OpenAI Deep Research System Card | AI Safety Guide
Table of Contents
- Understanding OpenAI Deep Research Architecture
- Agentic Web Browsing Capabilities and Training
- Risk Identification and External Red Teaming
- Deep Research Prompt Injection Defense and Mitigation
- Privacy Protection for Online Personal Information
- CBRN and Cybersecurity Safety Evaluations
- Deep Research Jailbreak Resistance and Content Safety
- Persuasion and Influence Risk Assessment
- Performance Benchmarks and Capability Thresholds
- Implications for Enterprise AI Safety Frameworks
📌 Key Takeaways
- Agentic architecture: Deep Research uses an early o3 model optimized for multi-step web browsing, file interpretation, and Python code execution for comprehensive research tasks
- Prompt injection defense: OpenAI developed new browsing-specific safety datasets and mitigations to protect against malicious instructions embedded in web content
- Privacy safeguards: Strengthened protections around personal information found online, with specialized training to prevent exposure of private data
- Red teaming rigor: External red teamers assessed CBRN risks, cybersecurity threats, persuasion capabilities, and personal information handling before launch
- Enterprise implications: The system card establishes new standards for evaluating agentic AI safety, particularly for models that interact with live web content
Understanding OpenAI Deep Research Architecture
OpenAI’s Deep Research represents a fundamental shift in how artificial intelligence systems interact with the world’s information. Released in February 2025, this agentic capability moves beyond traditional chatbot paradigms by enabling AI to conduct autonomous, multi-step research across the open internet. Powered by an early version of OpenAI’s o3 reasoning model specifically optimized for web browsing, Deep Research can search, interpret, and analyze massive volumes of text, images, and PDF documents, dynamically adjusting its research strategy based on information encountered during exploration.
The architectural significance of Deep Research lies in its integration of multiple capabilities into a unified reasoning framework. The model can browse the web using standard search and navigation actions — searching, clicking, scrolling, and interpreting diverse content formats — while simultaneously executing Python code in a sandboxed environment for calculations, data analysis, and visualization. This combination enables the system to produce comprehensive research reports that synthesize findings from dozens or even hundreds of web sources, a task that would take human researchers hours or days to complete manually.
For organizations evaluating AI risk management frameworks, the Deep Research system card provides a blueprint for how agentic capabilities should be assessed before deployment. The document details OpenAI’s methodology for identifying, evaluating, and mitigating risks that emerge specifically from web-interactive AI systems — risks that do not exist in models constrained to their training data alone.
Agentic Web Browsing Capabilities and Training
Deep Research was trained on entirely new browsing datasets created specifically for research use cases. Unlike previous models that learned to generate text from static corpora, this system acquired its capabilities through reinforcement learning on active browsing tasks. The training encompassed core browsing capabilities — searching, clicking, scrolling, and interpreting files — as well as the ability to use Python tools in sandboxed environments for data analysis and graph generation.
The training datasets contained a carefully designed spectrum of tasks. On one end, objective auto-gradable tasks with verified ground truth answers provided clear performance signals. On the other end, open-ended research tasks with accompanying rubrics enabled the model to develop the nuanced judgment required for comprehensive report writing. During training, model responses were evaluated against ground truth answers or rubrics using a chain-of-thought grading model, creating a sophisticated feedback loop that rewarded both accuracy and thoroughness.
A critical component of the training pipeline was the inclusion of safety datasets. OpenAI reused existing safety datasets from o1 training while also creating new, browsing-specific safety datasets tailored to the unique risks of web-interactive AI. This dual approach ensured that Deep Research inherited the safety properties of its predecessor models while also developing defenses against novel threats introduced by agentic web browsing, such as prompt injection attacks embedded in web pages and the inadvertent collection of sensitive personal information.
The model’s ability to pivot its research strategy in response to discovered information is particularly noteworthy. Rather than following a predetermined search plan, Deep Research reasons about what it has found and dynamically decides which additional sources to consult, which claims to verify, and which data points require deeper investigation. This adaptive behavior mirrors the research methodology of skilled human analysts, but operates at a scale and speed that makes it immediately valuable for enterprise AI deployment scenarios.
Risk Identification and External Red Teaming
OpenAI’s approach to risk identification for Deep Research followed the company’s established Preparedness Framework, augmented with additional testing specifically designed for agentic browsing capabilities. External red teaming formed the cornerstone of this assessment, with specialized groups tasked with probing the system’s defenses across multiple risk dimensions.
The red teaming effort focused on five primary risk categories: personal information and privacy, chemical/biological/radiological/nuclear (CBRN) threats, cybersecurity vulnerabilities, persuasion and influence capabilities, and the model’s susceptibility to prompt injection attacks encountered during web browsing. Each category received dedicated attention from red teamers with domain expertise, ensuring that assessments reflected realistic threat scenarios rather than theoretical attack vectors.
One of the most significant findings from the red teaming process was the identification of incremental risks unique to agentic web browsing. When a model can autonomously navigate the internet, it encounters information that may include deliberately crafted adversarial content designed to manipulate AI systems. This represents a fundamentally different threat model compared to traditional chatbots, where all input comes directly from the user. The red teaming methodology had to evolve to account for this expanded attack surface, incorporating tests that simulated adversarial web content alongside traditional prompt-based attacks.
The system card reveals that OpenAI organized its risk assessment around a structured table mapping each identified risk to specific capability thresholds, mitigation strategies, and evaluation criteria. This framework categorized risks at multiple severity levels — from high-risk scenarios involving assistance to novice threat actors, to critical-risk scenarios involving fully automated discovery of novel attack strategies. For each level, specific security controls and safeguards were mandated before deployment could proceed.
Transform complex AI safety research into engaging interactive experiences your team will actually read.
Deep Research Prompt Injection Defense and Mitigation
Prompt injection represents perhaps the most novel and significant security challenge for agentic web-browsing AI systems. Deep Research, by design, reads information from both its user conversation and external internet sources. If the content it encounters online contains malicious instructions — carefully crafted text designed to override the model’s system prompt or manipulate its behavior — the consequences could range from data exfiltration to the generation of harmful content.
OpenAI’s mitigation strategy for prompt injection in Deep Research operated on multiple fronts. First, the training pipeline incorporated browsing-specific safety datasets that exposed the model to examples of prompt injection attacks, training it to recognize and resist manipulation attempts. Second, the system was designed to maintain a clear separation between user instructions and web-sourced content, reducing the likelihood that adversarial text on a web page could be interpreted as legitimate user commands.
The system card details specific attack scenarios that were evaluated, including attempts to have the model expose API keys by making network requests containing sensitive credentials, and efforts to manipulate the model into following instructions embedded in web pages rather than adhering to its safety guidelines. OpenAI created new safety training data specifically addressing these scenarios, and the evaluation results demonstrated strong resistance to standard prompt injection techniques.
However, the system card also acknowledges important limitations. OpenAI notes that while the model performed well on their prompt injection evaluation set, real-world attacks may be more advanced than those tested. This honest assessment reflects a mature approach to AI security evaluation, recognizing that adversarial capabilities evolve continuously and that static evaluation sets cannot capture the full spectrum of possible attacks. The company committed to ongoing investment in making models more robust against prompt injection and in improving rapid detection and response capabilities.
Privacy Protection for Online Personal Information
One of the most sensitive aspects of an AI system that browses the open internet is its potential to access, aggregate, and expose personal information. Deep Research can read web pages, PDFs, and documents that may contain personal data — from social media profiles to professional directories to leaked databases. The system card reveals that privacy protection was a primary focus area during the development and safety testing process.
OpenAI strengthened privacy protections by training the model to recognize and appropriately handle personal information encountered during web research. This involved creating specialized training data that taught the model to distinguish between publicly relevant information (such as the professional activities of public figures) and private personal data that should not be included in research outputs. The model was trained to err on the side of privacy, avoiding the reproduction of detailed personal information even when such information appeared on publicly accessible web pages.
The privacy evaluation framework assessed the model’s behavior across multiple dimensions: whether it would aggregate personal information from multiple sources into a comprehensive profile, whether it would include sensitive personal details in research reports, and whether it could be manipulated into revealing personal information it had encountered during browsing. The results informed the development of additional guardrails that restrict the model’s ability to output certain categories of personal data, regardless of whether the user explicitly requested such information.
For organizations concerned about privacy compliance in AI deployments, the Deep Research system card provides valuable insights into how privacy-by-design principles can be implemented in agentic AI systems. The methodology — combining specialized training data, behavioral guardrails, and rigorous evaluation — offers a template that extends well beyond OpenAI’s specific implementation to any AI system that interacts with external data sources containing personal information.
CBRN and Cybersecurity Safety Evaluations
The Deep Research system card dedicates significant attention to evaluating the model’s potential to assist with chemical, biological, radiological, and nuclear (CBRN) threats, as well as cybersecurity attacks. These evaluations are particularly important for agentic systems because web browsing capabilities could theoretically enable a model to find and synthesize dangerous technical information that exists across multiple online sources.
OpenAI’s CBRN evaluation framework established clear capability thresholds organized by severity. At the “high” risk level, the concern was whether the model could provide meaningful assistance to novice actors attempting to create known threats. At the “critical” level, the evaluation assessed whether the model could enable experts to develop novel CDC-Class-A-like threats or fully automate the discovery and execution of advanced attack strategies. Each threshold was associated with specific security controls that had to be verified before deployment.
The cybersecurity evaluation followed a parallel structure, assessing whether Deep Research could fully automate the discovery or execution of zero-day vulnerabilities or novel cyberattack strategies. The model’s ability to synthesize information from security research papers, vulnerability databases, and code repositories made this a particularly relevant concern. Evaluations included tests designed to determine whether the model’s browsing capabilities provided meaningful uplift over what could be achieved with traditional search tools and publicly available information.
Results from the CBRN and cybersecurity evaluations informed a graduated response framework. For scenarios where the model’s capabilities fell below critical thresholds, standard safety controls were deemed sufficient. For areas approaching higher-risk thresholds, additional safeguards including enhanced monitoring, restricted access patterns, and mandatory security reviews were implemented. This nuanced approach avoids the false binary of “safe” versus “unsafe” and instead provides a risk-proportionate framework that other AI developers can reference.
Make AI safety documentation interactive — boost engagement rates by up to 10x with Libertify.
Deep Research Jailbreak Resistance and Content Safety
OpenAI conducted comprehensive evaluations of Deep Research’s resistance to producing disallowed content and its resilience against jailbreak attempts. The disallowed content evaluation tested the model across multiple categories of harmful output, with results showing strong performance on standard evaluation sets. However, the company also developed a more challenging set of “challenge” tests designed to push the model’s defenses further.
The jailbreak evaluation utilized StrongReject, an academic benchmark that tests model resistance against common attack techniques from the research literature. Deep Research’s performance on this benchmark demonstrated robust defenses against known jailbreak strategies. The evaluation measured both accuracy in refusing harmful requests and the model’s ability to maintain helpful behavior for legitimate queries — a critical balance that prevents safety measures from degrading the model’s utility.
Perhaps most revealing was the challenge red teaming evaluation, which re-ran Deep Research against a dataset containing the hardest examples discovered during OpenAI o3-mini red teaming. These examples spanned categories including criminal behavior, dangerous activities, and content designed to exploit edge cases in the model’s safety training. The results from this challenging evaluation provided a more realistic assessment of the model’s safety boundaries under adversarial conditions.
The system card’s transparency about the limitations of these evaluations is particularly valuable for the broader AI safety community. By acknowledging that evaluation sets cannot capture the full space of possible attacks and that real-world adversaries may develop more sophisticated techniques, OpenAI establishes an expectation of continuous improvement rather than claiming absolute safety — an intellectually honest position that strengthens rather than weakens confidence in the safety methodology.
Persuasion and Influence Risk Assessment
The ability of Deep Research to synthesize information from multiple sources and present it in coherent, well-structured reports raises important questions about persuasion and influence risks. An AI system that can rapidly research and compile arguments on any topic could potentially be used to generate persuasive content at scale, including disinformation campaigns, manipulative marketing materials, or targeted influence operations.
OpenAI’s assessment of persuasion risk focused on whether Deep Research provided meaningful uplift in persuasive capability compared to existing tools and techniques. The evaluation measured the model’s ability to generate compelling arguments, identify persuasive framings, and tailor content to specific audiences. Importantly, the assessment also considered the model’s potential for generating counterarguments and balanced analyses, capabilities that could mitigate rather than amplify persuasion risks.
The AI self-improvement dimension added another layer to the risk assessment. The system card classified this risk at the “high” threshold: whether the model could function as a high-performance mid-career research assistant for OpenAI researchers. While this capability has obvious positive applications, it also raises questions about recursive improvement cycles where AI systems are used to develop more capable AI systems. OpenAI’s evaluation framework addresses this concern directly, establishing monitoring protocols and capability thresholds that trigger additional review when AI self-improvement capabilities approach concerning levels.
For enterprises deploying AI research tools, the persuasion risk assessment framework provides a valuable template. Organizations that use AI enterprise governance frameworks should incorporate similar evaluations to understand how their AI tools might be misused for influence operations, and what safeguards can prevent such misuse without limiting legitimate research applications.
Performance Benchmarks and Capability Thresholds
The system card provides detailed performance data across multiple benchmarks, offering a quantitative foundation for understanding Deep Research’s capabilities relative to both human researchers and other AI systems. These benchmarks span research quality, factual accuracy, source attribution, and task completion rates across diverse research domains.
Key performance metrics demonstrate that Deep Research achieves human-expert-level performance on many research synthesis tasks, particularly those requiring the integration of information from multiple sources. The model’s ability to maintain coherent reasoning across extended research sessions — sometimes involving dozens of web pages and documents — represents a qualitative leap over previous AI systems that were limited to processing information within a single conversation context.
The capability thresholds established in the system card serve a dual purpose. Internally, they define the boundaries within which the model can be safely deployed, triggering additional safety reviews when capabilities approach concerning levels. Externally, they provide a reference framework that other AI developers can use to assess their own systems. The thresholds are defined not in absolute terms but relative to specific harmful outcomes, creating a risk-proportionate assessment methodology aligned with emerging AI regulatory frameworks including the EU AI Act.
Benchmark results from the MLE-bench evaluation for machine learning engineering tasks and the SWE-lancer evaluation for freelance software engineering provide additional context for understanding Deep Research’s practical capabilities. These real-world benchmarks complement the safety-focused evaluations by demonstrating the model’s utility for legitimate professional applications — a critical consideration when balancing safety restrictions against user value.
Implications for Enterprise AI Safety Frameworks
The OpenAI Deep Research system card carries implications that extend far beyond a single product launch. As AI systems become increasingly agentic — capable of autonomous web browsing, tool use, and multi-step reasoning — the safety challenges identified and addressed in this system card will become universal concerns for any organization deploying or developing AI capabilities.
For enterprise AI governance, the system card establishes several precedents. First, it demonstrates that agentic capabilities require safety evaluations that go substantially beyond those needed for traditional chatbots. The attack surface expands dramatically when AI can interact with external systems, and safety frameworks must evolve accordingly. Second, it illustrates the value of graduated risk assessment, where different capability levels trigger different security controls rather than a binary safe/unsafe determination.
The system card’s approach to transparency also sets a standard for the industry. By documenting not only the safety measures implemented but also their limitations, OpenAI provides a more honest and ultimately more useful resource for other organizations building agentic AI systems. This transparency enables the broader AI safety community to identify gaps, propose improvements, and build upon the methodology — accelerating collective progress toward safer AI deployment.
Organizations looking to develop their own agentic AI capabilities — whether for internal research, customer-facing applications, or automated decision-making — should use the Deep Research system card as a starting framework for their safety evaluations. The key elements to adopt include: structured risk identification with external red teaming, graduated capability thresholds with corresponding security controls, specialized evaluation sets for novel risks introduced by agentic capabilities, and a commitment to ongoing monitoring and improvement rather than one-time safety certification.
As the frontier of AI capabilities continues to advance, the methodologies established in documents like the Deep Research system card will become increasingly important. They represent the bridge between rapid AI innovation and responsible deployment — ensuring that the tremendous value of agentic AI research can be realized while managing the genuine risks that come with giving AI systems the ability to act autonomously in the world. For professionals seeking to explore these critical AI safety topics in depth, interactive analysis tools provide a more engaging pathway to understanding than traditional static documents.
Turn your AI safety reports and research papers into interactive experiences that drive real engagement.
Frequently Asked Questions
What is OpenAI Deep Research and how does it work?
OpenAI Deep Research is an agentic AI capability powered by an early version of o3 that conducts multi-step web research. It can search, interpret, and analyze text, images, and PDFs across the internet, pivoting dynamically based on discovered information. The model also executes Python code for data analysis and synthesizes findings into comprehensive reports.
How does OpenAI protect against prompt injection in Deep Research?
OpenAI implemented specialized training datasets for browsing-specific safety, created new mitigations against malicious instructions encountered during web searches, and conducted extensive prompt injection evaluations. The model is trained to resist injection attacks embedded in web pages that attempt to override system instructions or exfiltrate user data.
What safety evaluations were performed on Deep Research?
OpenAI conducted external red teaming focused on personal information privacy, CBRN risks, cybersecurity threats, and persuasion capabilities. They also ran disallowed content evaluations, jailbreak benchmarks including StrongReject, and challenge red teaming with the hardest examples from o3-mini testing.
What risks does agentic web browsing AI introduce?
Agentic web browsing AI introduces risks including exposure to prompt injection attacks on malicious websites, potential leakage of user conversation data, generation of harmful content using web-sourced information, privacy concerns when accessing personal information online, and the possibility of being manipulated by adversarial web content.
How does Deep Research compare to traditional AI chatbots?
Unlike traditional chatbots that rely solely on training data, Deep Research actively browses the internet, reads files, executes Python code, and synthesizes information from multiple sources in real-time. This agentic approach enables comprehensive research reports but introduces new safety challenges around web interaction that do not exist in static chat models.