0:00

0:00


Generative AI and Financial Stability: How LLMs Tame Animal Spirits in Markets

📌 Key Takeaways

  • AI agents outperform humans: LLMs achieved 61% rational decisions versus 46% for human financial professionals, with zero errors compared to a 3.4% human error rate
  • Reduced herd behavior: AI-driven investment decisions could lead to fewer asset price bubbles by relying on private information rather than following market trends
  • Not purely rational: Signal relabeling experiments revealed that AI agents inherited human biases from training data, with 25% error rates when color signals were counter-intuitive
  • Optimal herding trade-off: When fine-tuned for profit maximization, AI agents captured 47% of optimal herding opportunities but also introduced new cascade trading behaviors
  • Regulatory implications: Financial regulators need new AI-specific stress tests, model diversity requirements, and enhanced surveillance tools to manage the transition to AI-assisted markets

What Are Animal Spirits and Why They Matter for Financial Stability

From the Tulip Mania of the 1630s to the GameStop short squeeze of 2021, financial markets have been shaped by forces that defy purely rational explanation. John Maynard Keynes famously described these forces as “animal spirits”—the spontaneous optimism and emotional impulses that drive investors to act beyond mathematical expectations. These psychological factors have fueled some of history’s most devastating financial crises, including the dot-com bubble, the 2008 global financial crisis, and the 2010 Flash Crash.

The concept of animal spirits encompasses herd behavior, where investors abandon their own analysis to follow the crowd; irrational exuberance, coined by economist Robert Shiller to describe unsustainable asset price inflation; and panic selling, where fear cascades through markets faster than any rational assessment could justify. Together, these behaviors create the boom-bust cycles that central banks and financial regulators have struggled to manage for centuries.

Now, a groundbreaking question is emerging: can generative AI tame these animal spirits? As artificial intelligence increasingly participates in financial decision-making—worker LLM adoption surged from 30% to 45% between late 2024 and mid-2025—the potential impact on market stability is becoming one of the most important questions in modern finance. A new Federal Reserve FEDS research paper (2025-090) by Anne Lundgaard Hansen and Seung Jung Lee provides the first rigorous experimental evidence to answer this question.

The Federal Reserve Study: Testing AI Agents as Market Participants

The Fed researchers designed a laboratory-style experiment based on the classic Avery and Zemsky (1998) model of financial market herding. Their approach was both elegant and rigorous: they replicated experiments originally conducted with 32 financial professionals from London financial institutions, but replaced the human participants with four different large language models operating as AI trading agents.

The experimental design was carefully structured around a simple financial asset with three possible fundamental values: 0, 50, or 100 units. Each trader received a private signal about the asset’s true value with 70% accuracy—a white signal suggesting the asset was valuable and a blue signal suggesting it was not. The key question: would traders rely on their private signals (rational behavior) or follow what other traders were doing (herd behavior)?

Three distinct treatment conditions tested different market structures. Treatment I featured price updating with no event uncertainty, where information cascades should theoretically never form. Treatment II introduced event uncertainty, creating scenarios where herding could actually be the optimal strategy. Treatment III removed price updating entirely, making herding optimal after a significant trade imbalance. This multi-scenario approach allowed researchers to distinguish between rational decision-making, optimal herding, and the suboptimal panic-driven herding that causes financial instability.

The four LLMs tested were Anthropic’s Claude 3.5 Sonnet, Anthropic’s Claude 3.7 Sonnet with extended thinking, Meta’s Llama 3 Instruct 70B, and Amazon’s Nova Pro. Each model was run at temperature 0.7 (except Claude 3.7 at 1.0), and results were averaged across all four for generalizability—a methodology the researchers described as a “Homo Silicus” approach to studying AI economic behavior. For those interested in how AI is reshaping financial analysis, explore our interactive library of AI and finance research.

How Generative AI Makes More Rational Investment Decisions

The results were striking. In Treatment I—the baseline scenario where cascading should never occur—AI agents made rational decisions 61% of the time compared to just 45.7% for human financial professionals. When including partially rational decisions (following signals on one side while abstaining on the other), AI agents achieved a combined 90.5% rationality rate versus only 65.3% for humans.

Perhaps more remarkably, AI agents committed zero errors across all experimental rounds. Human traders, by contrast, made errors—buying on bad signals or selling on good ones—3.4% of the time. The cascade trading rate, a direct measure of herd behavior, dropped from approximately 19% for humans to just 9.4% for AI agents. When AI agents did engage in cascade-like behavior, it was predominantly contrarian—trading against the herd rather than following it.

The reasoning analysis provided additional insight into why AI agents performed differently. Researchers asked five diagnostic questions about each decision. The data revealed that 99.2% of AI decisions involved comparing the current price to expected value—a fundamentally rational approach. Critically, 63% of decisions computed expected values using only signal accuracy while completely ignoring trading history. This algorithmic focus on private information rather than social signals is precisely what makes AI agents resistant to herd behavior.

Emotional scoring further highlighted the distinction. On a scale of 0 to 100, the median emotion score for AI agent decisions was exactly zero, with a mean of only 6.4%. Even at the top decile, emotion scores reached just 20%. Human financial professionals, trained to manage emotions, still exhibited significantly more emotion-driven decision-making in equivalent experimental conditions. This emotional detachment from market movements is the mechanism through which generative AI could fundamentally reshape how financial documents and research are consumed.

Discover how AI transforms complex financial research into engaging, interactive experiences that drive better understanding.

Try It Free →

Herd Behavior in Financial Markets: Optimal vs. Suboptimal Herding

Understanding the Federal Reserve study requires distinguishing between two fundamentally different types of herding. Optimal herding occurs when investors rationally imitate early movers who possess superior information—a behavior first formalized by Bikhchandani, Hirshleifer, and Welch in 1992. In certain market conditions, following the crowd is actually the profit-maximizing strategy because earlier traders’ actions reveal valuable information about an asset’s true value.

Suboptimal herding, by contrast, is noise-driven imitation fueled by cognitive biases, reputational concerns (as described by Scharfstein and Stein in 1990), and the kind of emotional contagion that powered the meme stock mania of 2021. This is the “animal spirits” behavior that destabilizes markets. When GameStop surged from $17 to $483 in January 2021, it was not driven by rational assessment of the company’s fundamentals but by a self-reinforcing cycle of retail investors following each other into increasingly irrational positions.

Both types of herding can threaten financial stability, but through different mechanisms. Optimal herding, while individually rational, can still create market dynamics that push prices away from fundamentals if information signals are noisy. Suboptimal herding is more directly dangerous because it creates exactly the kind of self-reinforcing cycles—buying because others are buying, creating price increases that justify more buying—that define asset price bubbles.

The Fed study’s Treatment II results revealed a nuanced picture. AI agents were rational an astounding 97.4% of the time in this scenario, compared to just 50.9% for human traders. However, this extreme rationality came with a cost: AI agents missed optimal herding opportunities that existed in over one-third (36.6%) of their decisions. By refusing to follow the crowd even when the crowd was right, AI agents left profitable opportunities on the table while maintaining greater overall market stability.

LLM Performance Across Different Market Scenarios

Individual LLM performance varied significantly, revealing that not all AI agents behave identically—an important finding for financial stability assessments. Meta’s Llama 3 Instruct 70B was the standout performer for raw rationality, achieving 97.7% rational decisions in Treatment I, with a cascade trading rate of just 2.3% and zero cascade-no-trading decisions.

Anthropic’s Claude 3.7 Sonnet, equipped with extended thinking capabilities, also performed well with 71% rational decisions and relatively low herding. However, Claude 3.5 Sonnet and Amazon Nova Pro showed more mixed results, with rational decision rates of 37.1% and 38.3% respectively—still above the human baseline in combined rational-plus-partial metrics, but revealing substantial variation across models.

Llama 3 distinguished itself in another important way: its emotional profile. While other models averaged emotion scores of approximately 5%, Llama 3 averaged 13-17% and was significantly more likely to consider market trends in its reasoning. Between 30% and 67% of Llama 3’s decisions incorporated trading history, compared to roughly 17% for other models. This suggests that different model architectures and training data create meaningfully different “personalities” in financial decision-making.

Treatment III—the scenario without price updating where herding is clearly optimal after a trade imbalance of two or more—showed the most extreme AI rationality. AI agents made rational decisions 99.65% of the time but missed all optimal herding opportunities (38.4% of decisions). This over-reliance on private signals, while stabilizing in most market conditions, represents a genuine limitation that could reduce market efficiency in certain information environments.

The temperature robustness tests (comparing T=0.0, T=0.7, and T=1.0) showed that model randomness parameters had minimal impact on decision quality. This finding is important for real-world deployment: the rational behavior of AI agents is not an artifact of specific parameter settings but appears to be a fundamental property of how LLMs process financial decision-making tasks.

The Bias Problem: AI Agents Inherit Human Conditioning

Perhaps the most provocative finding in the Federal Reserve study concerns what happens when AI agents’ learned associations are disrupted. In the baseline experiments, the private signal used intuitive labels: “white” for potentially valuable assets and “blue” for potentially worthless ones. But the researchers then tested what happens when these associations are deliberately counter-intuitive.

When signals were relabeled to green=good and red=bad (still intuitive, mapping to common cultural associations), results remained broadly similar to baseline—54.8% rational decisions in Treatment I. But when the mapping was reversed to red=good and green=bad, performance collapsed. Rational decisions plummeted to just 20.2%, and a staggering 25% of all decisions were outright errors—buying on bad signals and selling on good ones.

Claude 3.5 Sonnet was particularly susceptible, producing 100% erroneous decisions under the counter-intuitive signal scheme. The model could not override the cultural association between red and danger/loss, even when explicitly told that red signals indicated value. This finding definitively proves that LLMs are not purely algorithmic rational agents but have absorbed deep cultural conditioning from their training data.

Claude 3.7 Sonnet, the most advanced model with extended thinking capabilities, was the notable exception. It generated similar results regardless of signal color, suggesting that more sophisticated reasoning architectures may partially overcome inherited biases. This creates an important consideration for financial regulators at the Bank for International Settlements and beyond: the specific AI model deployed in financial decision-making matters enormously for stability outcomes.

The AI agent persona experiments revealed another surprising finding. When LLMs were assigned different profiles—Human, Professional Trader, Robo-Advisor, or Rational Agent—the results were “strikingly similar across personas.” The models did not meaningfully adjust their behavior based on role-playing instructions. This suggests that LLM decision-making in financial contexts is driven more by underlying model architecture and training than by prompt engineering, raising questions about how easily firms can customize AI trading behavior.

Turn dense research papers like this Federal Reserve study into interactive experiences your team will actually engage with.

Get Started →

Optimal AI Agents and the Trade-Off Between Rationality and Profit

Recognizing that baseline AI agents’ extreme aversion to herding left money on the table, the researchers created “optimal AI agents”—LLMs explicitly prompted with guidance on profit-maximizing decision-making, including when to optimally follow market signals. The results revealed a fundamental trade-off between market stability and individual profitability.

In Treatment II, optimal AI agents reduced their rational-only decision rate from 97.4% to just 18.7%, while increasing cascade trading (including herding) to 59.5% of decisions. Crucially, 47.4% of these cascade decisions were optimal herding—following the market when it was genuinely profitable to do so. Zero percent of herding decisions were suboptimal, meaning the optimized agents never engaged in the panic-driven herd behavior that causes financial crises.

The payoff difference was dramatic. Optimal AI agents earned an average of 15 lire per decision compared to just 3.8 lire for baseline AI agents. The baseline agents were being “punished for avoiding to herd when optimal”—their extreme rationality sacrificed profits in scenarios where following the crowd was actually the right move.

This finding carries profound implications for the financial industry. If firms deploy profit-optimized AI agents, markets could see faster price discovery as AI quickly identifies and acts on information revealed by trading patterns. However, this same behavior could increase short-term volatility even while reducing the probability of sustained bubbles. The optimal AI agents also introduced some cascade trading decisions not seen in the baseline, suggesting that fine-tuning AI for profitability introduces new market dynamics that regulators must understand.

Implications for Financial Market Regulation and AI Governance

The Federal Reserve researchers outline several critical implications for financial stability governance. First, as AI-driven decision-making becomes more prevalent, traditional sentiment measures—tools the SEC and other regulators use to monitor market health—may need fundamental reconsidering. If AI agents make decisions with near-zero emotional content, indicators designed to detect human fear and greed will miss the actual drivers of market behavior.

Second, the heterogeneity across different LLMs suggests that AI model diversity could become a financial stability tool in itself. If all financial institutions deploy the same AI model, the resulting model monoculture could create correlated failures similar to—but potentially worse than—the correlated risk models that amplified the 2008 crisis. Regulators may need to mandate model diversity requirements to prevent systemic risk from AI homogeneity.

Third, the study supports the need for AI-specific stress testing frameworks. Traditional stress tests evaluate how banks respond to economic shocks, but they do not account for how AI trading agents might collectively respond to unusual market conditions. The signal relabeling experiments demonstrate that AI agents can fail dramatically under conditions that don’t match their training patterns—a scenario that could easily arise during unprecedented market events.

Fourth, new forms of market surveillance are needed. Current monitoring focuses on detecting human manipulation patterns, but AI-driven markets may exhibit entirely different signatures. The distinction between optimal herding (potentially stabilizing) and AI model correlation (potentially destabilizing) requires sophisticated detection tools that regulators have not yet developed. Our interactive library features additional research on AI governance in financial services.

The Future of AI-Powered Trading and Market Stability

The Fed researchers are careful to note that their findings are “speculative and based on experimental results”—the actual impact will depend on the extent of AI adoption, specific models used, regulatory responses, and the evolution of AI systems themselves. However, several trends are clear.

The rapid increase in AI adoption in financial services—from less than 6% of firms using any AI technology in 2018 to nearly half of workers using LLMs by mid-2025—suggests that the transition to AI-assisted markets is accelerating. The interaction between human and AI traders creates a hybrid market environment whose dynamics are genuinely unprecedented. Will human emotion amplify or dampen AI rationality? Will AI’s resistance to herding stabilize markets or simply create new, unforeseen patterns of instability?

The finding that AI agents are “more rational than humans but not purely rational” is perhaps the study’s most important contribution. It rejects both the utopian view (AI will perfect financial markets) and the dystopian view (AI will create catastrophic flash crashes). Instead, it suggests a nuanced reality where AI agents improve average decision quality while introducing their own distinctive forms of imperfect rationality—biases inherited from training data, over-reliance on private signals, and sensitivity to framing effects.

For financial institutions, the implications are immediate. Firms deploying AI for investment decisions must understand their models’ specific bias profiles, not just their average performance. They must consider how their AI agents interact with other firms’ AI agents and with human traders. And they must plan for scenarios where AI behavior differs dramatically from expectations—because as the signal relabeling experiments proved, those scenarios will occur.

Ongoing research at institutions like the International Monetary Fund and the Financial Stability Board is building on these experimental findings to develop the policy frameworks needed for an AI-augmented financial system.

Key Takeaways for Investors and Financial Institutions

The Federal Reserve’s groundbreaking study on generative AI and financial stability provides actionable insights for every participant in modern financial markets. For institutional investors, the message is clear: AI-assisted decision-making can significantly improve rationality and reduce costly herding behavior, but it must be deployed with awareness of model-specific biases and limitations.

For risk managers, the study highlights that AI agents’ near-zero emotional content creates both an opportunity and a blind spot. While AI won’t panic-sell during market downturns, it also won’t recognize when following the market is genuinely optimal. Portfolio strategies should account for this “rationality premium” while maintaining human oversight for scenarios that require adaptive judgment.

For regulators and policymakers, the study sounds a clear alarm: existing market surveillance and stress-testing frameworks are insufficient for AI-augmented markets. Investment in AI-specific regulatory tools is not a future concern—it is an immediate necessity as AI adoption accelerates across the financial sector.

The fundamental conclusion is optimistic but cautious: generative AI has the genuine potential to tame the animal spirits that have plagued financial markets for centuries. But like any powerful tool, its impact depends entirely on how it is deployed, governed, and integrated into the complex ecosystem of human and machine decision-making that defines modern finance.

Transform your organization’s research reports and financial documents into interactive experiences that drive engagement and understanding.

Start Now →

Frequently Asked Questions

How does generative AI affect financial stability?

Federal Reserve research shows that generative AI agents make more rational investment decisions than human traders, relying on private information rather than following market trends. This reduced herd behavior could lead to fewer asset price bubbles and greater market stability, though AI agents also miss some optimal herding opportunities.

What are animal spirits in financial markets?

Animal spirits, a term coined by John Maynard Keynes, refer to the psychological and emotional factors that drive investor behavior beyond rational calculation. These include fear, greed, and herd mentality that contribute to boom-bust cycles, asset price bubbles, and financial crises throughout history.

Can AI reduce herd behavior in investment decisions?

Yes, according to Fed FEDS research paper 2025-090. AI agents demonstrated 61% rational decision-making compared to 46% for human financial professionals, with zero errors versus a 3.4% error rate for humans. AI agents predominantly relied on private information signals rather than following market trends.

What LLMs were tested in the Federal Reserve AI financial stability study?

The study tested four large language models: Anthropic Claude 3.5 Sonnet, Anthropic Claude 3.7 Sonnet with extended thinking, Meta Llama 3 Instruct 70B, and Amazon Nova Pro. Results were averaged across all four models for generalizability, with each showing different rationality profiles.

Are AI trading agents purely rational or do they have biases?

AI agents are not purely rational. The Federal Reserve study found that when signal colors were reversed (counter-intuitive red=good, green=bad), 25% of AI decisions were errors and Claude 3.5 produced 100% erroneous decisions. This proves LLMs have inherited some human conditioning and biases from their training data.

What are the risks of AI-powered trading for financial markets?

Key risks include AI model monoculture where many firms use identical models creating correlated failures, AI agents missing optimal herding opportunities that provide market discipline, inherited biases from training data, and unpredictable interactions between human and AI traders that could amplify market volatility.

Your documents deserve to be read.

PDFs get ignored. Presentations get skipped. Reports gather dust.

Libertify transforms them into interactive experiences people actually engage with.

No credit card required · 30-second setup