Machine Learning Financial Stress Prediction: How BIS Research Reveals AI Can Forecast Market Crises

📌 Key Takeaways

  • 27% accuracy improvement: Random forest models outperform linear regression by up to 27% when forecasting tail financial stress events out of sample.
  • Novel Market Condition Indicators: BIS researchers construct new stress measures for Treasury, FX, and money markets using rolling-window PCA on 239 months of data.
  • 44 predictors analyzed: The ML framework processes fund flows, macro conditions, volatility metrics, and intermediary capacity variables simultaneously.
  • SHAP explainability: Shapley values reveal that funding liquidity and investor overextension are the strongest drivers of future market stress.
  • Cross-market spillovers confirmed: Stress in one market segment predicts stress in others, consistent with liquidity spiral theory and self-reinforcing dynamics.

What Are Market Condition Indicators and Why Traditional Financial Stress Indices Fall Short

Machine learning financial stress prediction represents a paradigm shift in how economists and policymakers monitor risks across global markets. For decades, financial stability assessment relied on single-market proxies — most prominently the CBOE Volatility Index (VIX) — to gauge the overall health of the financial system. While the VIX captures equity-implied volatility effectively, it tells us remarkably little about brewing crises in Treasury markets, foreign exchange, or short-term funding channels. This blind spot proved costly during episodes like the 2019 repo market crisis and the March 2020 Treasury market dislocation, where equity volatility was a lagging rather than leading indicator of systemic stress.

BIS Working Paper 1250, authored by Iñaki Aldasoro, Peter Hördahl, Andreas Schrimpf, and Xingyu Sonya Zhu, addresses this gap by constructing Market Condition Indicators (MCIs) — composite stress measures designed to capture distress across three critical market segments: US Treasuries, foreign exchange, and money markets. Unlike traditional indices that aggregate heterogeneous signals into a single number, MCIs preserve the market-specific nature of stress. A Treasury MCI spike in isolation signals different risks and requires different policy responses than simultaneous stress across all three segments.

The researchers build these indicators using rolling-window Principal Component Analysis (PCA), extracting the dominant source of variation from a curated set of market variables within each segment. This approach captures correlated movements in spreads, volatilities, and liquidity measures that together define “stress” in a statistically rigorous way. The resulting MCIs cover January 2003 through May 2024 — 239 monthly observations spanning the Global Financial Crisis, the European debt crisis, the 2018 volatility selloff, COVID-19, and the 2022–2023 rate hiking cycle. For a deeper look at how machine learning adapts to shifting financial regimes, explore our analysis of evolving machine learning in non-stationary environments.

How BIS Researchers Built Real-Time Stress Indicators for Treasury, FX, and Money Markets

The construction of each Market Condition Indicator follows a systematic methodology grounded in dimensionality reduction. For the Treasury MCI, the researchers select variables that capture different facets of bond market stress: the term premium, bid-ask spreads on benchmark maturities, dealer inventory imbalances, and realized volatility of yields across the curve. Rolling-window PCA is applied with a 60-month lookback, ensuring the indicator adapts to structural changes in market microstructure over time rather than being anchored to a single historical calibration period.

The FX MCI draws on currency option-implied volatilities, cross-currency basis swap spreads, and realized volatility of major dollar pairs. These variables collectively capture both hedging costs for global investors and stress in dollar funding markets — a critical channel through which financial crises propagate internationally. The Bank for International Settlements has long documented how FX swap markets serve as a barometer for global dollar liquidity conditions, making this indicator particularly valuable for monitoring international financial stability.

The money market MCI incorporates the TED spread (the gap between LIBOR and Treasury bill rates), overnight repo rate volatility, commercial paper spreads, and Federal Reserve facility usage data. This indicator proved especially adept at flagging the September 2019 repo market disruption — an event that barely registered on the VIX but signaled serious plumbing problems in the financial system. Together, the three MCIs provide a multi-dimensional view of financial stress that no single index can replicate.

Critically, the researchers validate their MCIs against known stress episodes. Each indicator correctly identifies the Global Financial Crisis as the most severe stress event in its respective market, while also flagging intermediate episodes — such as the 2013 Taper Tantrum in Treasury markets and the 2015 Swiss franc shock in FX markets — that sector-agnostic indices often underweight or miss entirely.

Why Random Forest Machine Learning Financial Stress Models Outperform Linear Regression

The central innovation of BIS Working Paper 1250 lies not in the construction of stress indicators themselves, but in applying machine learning financial stress prediction techniques to forecast their future evolution. The researchers assemble a predictor set of 44 variables spanning four categories: fund flow dynamics (mutual fund flows, ETF creation/redemption activity, hedge fund leverage indicators), macroeconomic conditions (PMI, industrial production, yield curve slope), volatility and risk appetite measures (VIX, MOVE index, credit spreads), and intermediary capacity variables (dealer balance sheet data, primary dealer positioning, repo market volumes).

With 44 potential predictors and only 239 monthly observations, the classic econometric approach of multivariate linear regression faces a fundamental curse of dimensionality. The model has enough parameters to fit the training data closely, but this close fit comes at the expense of generalization. In the researchers’ out-of-sample tests, multivariate OLS performs worse than a simple rolling historical average — a devastating indictment of its predictive value. The model memorizes noise rather than extracting signal.

Random forests solve this problem through two complementary mechanisms. First, they build hundreds of individual decision trees, each trained on a bootstrapped subsample of the data, then average their predictions. This ensemble averaging dramatically reduces variance without increasing bias. Second, at each split in each tree, the algorithm considers only a random subset of the available predictors, which decorrelates the individual trees and prevents any single noisy variable from dominating the ensemble. The result is a model that naturally regularizes against overfitting while still capturing the nonlinear interactions and threshold effects that characterize financial markets. As research on central bank AI applications confirms, machine learning is increasingly central to institutional financial analysis.

Transform complex financial research into interactive experiences your team will actually explore.

Try It Free →

Quantile Regression Forests: Predicting the Full Distribution of Financial Market Stress

Standard random forests provide mean (or median) forecasts — useful for predicting where stress levels are most likely headed on average. But for financial stability monitoring, policymakers care most about tail outcomes: the probability that stress will spike to crisis levels, not where it will settle on a typical day. Quantile regression forests extend the random forest framework to predict any quantile of the conditional distribution of future stress, making them a natural tool for tail risk assessment.

The technical insight is elegant. In a standard random forest, the prediction for a new observation is the average of the target values in the terminal leaf nodes across all trees. In a quantile regression forest, instead of averaging, the algorithm retains all the individual target values in the leaf nodes and computes the requested quantile from this empirical distribution. Want the 90th percentile of future Treasury stress given current conditions? The quantile forest constructs that estimate by looking at the upper tail of the historical stress values that landed in the same terminal nodes as today’s predictor configuration.

This distinction matters enormously for practical financial stability monitoring. A median forecast might suggest calm conditions ahead, while the 90th percentile forecast simultaneously flags a 10% chance of severe stress — and it is this tail probability that should trigger macroprudential attention. The BIS researchers exploit this capability fully, generating forecasts at the 10th, 25th, 50th, 75th, and 90th percentiles of each MCI’s conditional distribution. The resulting “fan charts” provide policymakers with a visual, intuitive representation of the range of possible outcomes and the probability mass in each tail.

The paper demonstrates that quantile regression forests excel precisely where they matter most: in the tails. The accuracy gains over linear methods are modest at the median but grow substantially at the 90th percentile, where the random forest achieves up to 27% better out-of-sample accuracy than OLS. This pattern is consistent with the nonlinear, regime-switching nature of tail financial stress — crisis dynamics are qualitatively different from normal-times dynamics, and linear models simply cannot capture this shift.

Out-of-Sample Forecasting Results — How ML Achieves Up to 27% Better Accuracy on Tail Risks

The out-of-sample evaluation framework is deliberately conservative. The researchers use an expanding-window approach: the model is trained on all data up to month t, then produces forecasts for months t+1, t+3, and t+6. The training window expands as new data arrives, but no future information ever leaks into the forecast. Performance is measured using root mean squared error (RMSE) relative to a naive benchmark — the rolling historical average of the respective MCI.

The headline result: random forest models reduce out-of-sample RMSE by 12–27% relative to the rolling average benchmark for tail (90th percentile) stress forecasts across all three market segments. The improvement is largest for the Treasury MCI at the 6-month horizon, where the random forest achieves a 27% RMSE reduction. For the FX and money market MCIs, the gains range from 15–22% depending on the horizon and quantile.

By contrast, multivariate OLS regression delivers negative improvements — it performs worse than simply using the historical average. This is not a minor underperformance; in some specifications, OLS increases RMSE by 30–40% relative to the benchmark, confirming severe overfitting. Even regularized linear models (LASSO, ridge regression) tested by the researchers show only marginal improvements over the rolling average, suggesting that the advantage of random forests stems from their ability to capture nonlinearity, not merely from better regularization.

Particularly noteworthy is the temporal pattern of forecasting success. The random forest models are most accurate during and immediately before stress episodes — exactly when accurate forecasts are most valuable. During calm periods, the model’s advantage over naive benchmarks is modest, which makes intuitive sense: when nothing unusual is happening, historical averages are reasonable predictors. The model earns its keep by detecting the early warning signs that precede stress regime shifts, capturing the nonlinear buildup of vulnerabilities that linear models cannot see.

Explaining Machine Learning Financial Stress Predictions with Shapley Values

A common criticism of machine learning in policy applications is the “black box” problem: complex models may deliver accurate predictions, but if policymakers cannot understand why a model is forecasting elevated stress, they cannot craft targeted interventions. The BIS researchers address this directly by applying SHAP (SHapley Additive exPlanations) values — a game-theory-based framework that decomposes each prediction into the additive contribution of each input variable.

SHAP values answer a precise question: how much does each predictor shift the forecast away from the baseline (average) prediction? For any given month, the SHAP decomposition shows which variables are pushing the stress forecast up and which are pulling it down. Aggregated across the full sample, SHAP values reveal the average importance of each variable category, providing a transparent ranking of what drives machine learning financial stress predictions.

The results are illuminating. For Treasury stress, the most important predictors are funding liquidity variables (TED spread, repo market conditions, cross-currency basis) and intermediary capacity measures (dealer positioning, balance sheet constraints). For FX stress, global financial cycle variables (VIX, dollar index, commodity prices) and fund flow dynamics dominate. For money market stress, the combination of funding liquidity and investor overextension indicators (mutual fund outflows, leverage ratios) carries the highest predictive weight.

These findings align with — and provide empirical support for — the theoretical liquidity spiral framework of Brunnermeier and Pedersen (2009), which posits that funding liquidity and market liquidity interact in self-reinforcing feedback loops during crises. The SHAP analysis shows that funding liquidity variables become disproportionately important in the tails of the stress distribution, precisely as the theory predicts.

Make BIS research accessible — turn dense PDFs into interactive experiences your audience explores.

Get Started →

Funding Liquidity, Investor Overextension, and the Global Financial Cycle as Key Stress Predictors

Diving deeper into the SHAP results reveals a narrative about the anatomy of financial stress that transcends any single model or market. Three predictor categories consistently emerge as the most important across all three MCIs, though their relative ranking shifts depending on the market segment and forecast horizon.

Funding liquidity captures the ease and cost at which financial intermediaries can obtain short-term financing. When funding markets tighten — reflected in widening TED spreads, elevated repo rates, or dislocated cross-currency bases — the probability of stress across all market segments rises. This is intuitive: intermediaries facing funding pressure must deleverage, reducing market-making capacity and amplifying price dislocations across the assets they trade. The SHAP analysis quantifies this channel, showing that a one-standard-deviation deterioration in funding liquidity conditions raises the 90th percentile Treasury MCI forecast by approximately 0.4 standard deviations.

Investor overextension captures the vulnerability created when asset managers and leveraged investors are positioned aggressively. Variables like mutual fund flow imbalances, hedge fund leverage estimates, and ETF creation/redemption dynamics proxy for the degree to which investors are stretched. When these indicators signal extreme positioning, the system is fragile — vulnerable to shocks that trigger forced selling and self-reinforcing stress. The BIS data shows that investor overextension was elevated in the months preceding both the COVID-19 market crash and the 2022 gilt crisis in the UK.

The global financial cycle, proxied by the VIX, the broad dollar index, and global capital flow measures, captures the macroeconomic environment in which stress events unfold. Tighter global financial conditions — a strong dollar, elevated VIX, and capital flow reversals from emerging markets — create the backdrop against which local stress episodes escalate into broader crises. The interaction between global conditions and local vulnerabilities is precisely the kind of nonlinear dynamic that random forests capture and linear models miss.

Cross-Market Spillovers and Self-Reinforcing Stress Dynamics in Interconnected Financial Markets

One of the paper’s most significant findings concerns cross-market spillovers: stress in one market segment systematically predicts future stress in others. The researchers test this by including lagged MCIs from other market segments as predictors. The results confirm that Treasury stress predicts future money market stress (and vice versa), while FX stress has bidirectional predictive power for both Treasury and money market conditions.

These spillover patterns are consistent with the interconnected nature of modern financial markets. Consider a scenario where Treasury market stress rises due to a sudden repricing of term premiums. Dealers holding large Treasury inventories face mark-to-market losses, which constrain their balance sheets and reduce their capacity to intermediate in repo and FX swap markets. The resulting tightening in funding markets (money market stress) raises hedging costs for international investors (FX stress), who may then reduce Treasury holdings, further amplifying the original stress. For an exploration of how financial system interconnections affect other market segments, see our coverage of stablecoins and Treasury yields.

The SHAP analysis of cross-market terms reveals an asymmetry: Treasury stress is a stronger predictor of money market stress than the reverse, suggesting that the Treasury market plays a central role in propagating stress across the financial system. This finding has direct implications for the Federal Reserve’s financial stability mandate, as it implies that Treasury market disruptions should receive heightened surveillance given their systemic amplification potential.

The self-reinforcing nature of these dynamics also explains why linear models fail: the relationship between stress in different markets is weak during normal times but strengthens dramatically during crises. A linear model estimates a single constant coefficient for these cross-market effects, averaging over calm and stressed periods. The random forest, by contrast, learns that cross-market spillovers matter most when stress is already elevated — precisely the threshold effect that drives crisis amplification.

Policy Applications of Machine Learning for Central Banks and Financial Stability Monitoring

The practical implications of this research for central bank operations are substantial. Current financial stability monitoring frameworks rely heavily on a combination of market-based indicators (VIX, credit spreads, term premiums) and survey-based measures (senior loan officer surveys, financial conditions indices). These tools provide valuable information but suffer from three limitations that the ML framework addresses directly.

First, existing indicators are predominantly backward-looking or contemporaneous. The VIX tells you that markets are stressed now, not that they will be stressed in three months. The BIS random forest models provide genuine forward-looking forecasts at 1-, 3-, and 6-month horizons, giving policymakers a window of time to prepare macroprudential responses. A central bank financial stability committee receiving monthly fan charts from quantile regression forests would have a materially more informative picture of evolving risks than one relying solely on current market conditions.

Second, the SHAP-based decomposition provides actionable intelligence about the sources of risk. If the model forecasts elevated tail stress driven primarily by funding liquidity deterioration, the appropriate policy response (e.g., adjusting standing repo facility parameters) differs from a scenario where investor overextension is the primary driver (where communication about risk-taking might be more appropriate). This diagnostic capability transforms the ML model from a simple alarm system into a structured analytical framework for policy deliberation.

Third, the cross-market spillover analysis enables targeted surveillance. If the model identifies Treasury stress as the primary transmission channel to other markets, central bank monitoring can focus resources on Treasury market microstructure — dealer positioning, auction results, settlement fails — rather than spreading attention thinly across all market segments. This prioritization is especially valuable given the limited bandwidth of financial stability teams.

The European Central Bank’s Financial Stability Review has already begun incorporating ML-based stress indicators, and the BIS framework provides a rigorous template for other central banks to follow. The key advantage is that the framework is modular: new MCIs can be constructed for additional market segments (corporate bonds, crypto markets, emerging market debt), and the predictor set can be expanded as new data sources become available.

Limitations, Future Directions, and the Road to Real-Time AI-Powered Financial Risk Assessment

Despite its impressive results, the BIS framework has important limitations that future research should address. The most significant is the relatively small sample size: 239 monthly observations provide limited exposure to truly extreme stress events. The Global Financial Crisis represents a single observation in the extreme tail, and the model’s performance during the next unprecedented crisis is inherently uncertain. Researchers working on this frontier should explore data augmentation techniques, synthetic stress scenario generation, and transfer learning approaches that leverage cross-country data to expand the effective sample size.

The monthly frequency of the analysis also limits its operational utility for real-time monitoring. Financial crises unfold over days, not months. Extending the framework to weekly or daily frequency would dramatically increase its value for central bank operations but introduces challenges around data availability (many predictor variables are only available monthly) and noise amplification at higher frequencies. A promising intermediate approach would combine the monthly ML framework with higher-frequency market-based nowcasting models that update between monthly forecast cycles.

Model interpretability, while substantially improved by SHAP values, remains an ongoing challenge. SHAP decompositions are additive and local — they explain individual predictions but may not fully capture the global nonlinear structure of the random forest. Emerging techniques like SHAP interaction values and counterfactual explanations could provide richer narratives about why the model expects stress to rise, making the framework even more suitable for policy communication.

Finally, the transition from academic research to production deployment raises questions about model governance, retraining schedules, and performance monitoring. A central bank deploying this framework would need robust backtesting infrastructure, clear protocols for model updates when new data arrives, and transparent documentation of model limitations for policy committees. These operational considerations are often underemphasized in academic work but are critical for real-world adoption. The BIS team has laid a strong foundation — the next challenge is building the institutional infrastructure to deploy machine learning financial stress prediction at scale.

Turn cutting-edge research into engaging presentations — make AI-driven finance insights accessible to any audience.

Start Now →

Frequently Asked Questions

How do machine learning models predict financial market stress better than traditional approaches?

Machine learning models like random forests capture complex nonlinear interactions among dozens of predictors simultaneously, something traditional linear regression cannot do. BIS Working Paper 1250 demonstrates that random forests outperform multivariate linear models by up to 27% in out-of-sample forecasting of tail financial stress, particularly because they automatically detect threshold effects and variable interactions that linear models miss entirely.

What are Market Condition Indicators (MCIs) and how do they differ from the VIX?

Market Condition Indicators are composite stress measures built using rolling-window Principal Component Analysis across multiple market segments — specifically US Treasury, foreign exchange, and money markets. Unlike the VIX, which captures equity-implied volatility only, MCIs reflect stress conditions in fixed income, currency, and short-term funding markets. This makes them complementary to the VIX and better at capturing stress episodes that originate outside equity markets, such as the 2019 repo crisis.

What are the most important predictors of future financial market stress?

According to SHAP value analysis in the BIS study, the most important predictors include funding liquidity measures (such as the TED spread and cross-currency basis), investor overextension indicators (like mutual fund flow imbalances and leverage ratios), and global financial cycle variables (including the dollar index and VIX). The relative importance of these predictors shifts depending on the market segment and whether you are forecasting median or tail stress conditions.

Why do multivariate linear regression models fail at out-of-sample financial stress prediction?

With 44 predictors and only 239 monthly observations, multivariate linear regression suffers from severe overfitting. The model fits noise in the training data rather than genuine predictive signals. BIS researchers found that linear models actually perform worse than simple historical averages in out-of-sample tests, while random forests — which use ensemble averaging and feature subsampling — naturally regularize against overfitting and extract robust signals even from high-dimensional datasets.

How can central banks use this ML framework for financial stability monitoring?

Central banks can deploy this framework as an early warning system that complements existing stress indices. The quantile regression forest approach provides not just point forecasts but entire probability distributions of future stress, enabling policymakers to assess tail risks and calibrate macroprudential responses. SHAP-based explanations make the model transparent enough for policy communication, identifying which specific risk factors are driving elevated stress probabilities at any given time.

Your documents deserve to be read.

PDFs get ignored. Presentations get skipped. Reports gather dust.

Libertify transforms them into interactive experiences people actually engage with.

No credit card required · 30-second setup

Our SaaS platform, AI Ready Media, transforms complex documents and information into engaging video storytelling to broaden reach and deepen engagement. We spotlight overlooked and unread important documents. All interactions seamlessly integrate with your CRM software.