BIS Research: How Financial Regulators Can Address AI Explainability Challenges

📌 Key Takeaways

  • Non-Binary Approach: AI explainability exists on a spectrum rather than a yes/no determination
  • Risk-Based Calibration: Explainability requirements should match the criticality of the use case
  • Performance Trade-offs: Completely restricting complex AI models could limit risk management capabilities
  • Multiple Techniques Required: No single explainability method is sufficient for complex AI models
  • LLM Complexity: Large language models present the most acute explainability challenges for regulators

The AI Explainability Challenge in Financial Services

The rapid adoption of artificial intelligence in financial services has created a fundamental tension between innovation and regulation. According to groundbreaking research from the Bank for International Settlements (BIS), AI explainability has emerged as the top concern for financial institutions when engaging with regulators and supervisors.

This challenge isn’t academic. A recent Bank of England and Financial Conduct Authority survey revealed that half of respondents reported having only partial understanding of the AI technologies they use, particularly when relying on third-party models.

The stakes are significant: complex AI models offer superior performance in credit risk assessment, fraud detection, and regulatory compliance, but their “black box” nature makes it difficult for institutions to explain decisions to customers, regulators, and even internal stakeholders. This creates regulatory, prudential, and reputational risks that financial institutions must carefully navigate.

Understanding Explainability vs. Interpretability

The BIS research makes a crucial distinction that shapes how regulators should approach AI oversight. Explainability answers the question “why did the model produce this output?” while interpretability addresses “how did the model arrive at this output?”

Inherently interpretable models, such as decision trees and generalized additive models, follow clear if-then-else rules that can be directly traced. These models are both explainable and interpretable. However, complex AI models like deep neural networks with multiple hidden layers and thousands of parameters interacting non-linearly make it virtually impossible to conclusively attribute outputs to specific input combinations.

The research emphasizes that explainability is non-binary. Rather than demanding a simple yes/no determination, regulators should assess AI models on a spectrum, recognizing that different levels of explainability may be appropriate for different use cases and risk profiles.

Transform complex documents into engaging interactive experiences that stakeholders actually understand and use.

Try It Free →

Current Model Risk Management Gaps

Existing Model Risk Management (MRM) frameworks weren’t designed with advanced AI in mind. The Federal Reserve Board and Office of the Comptroller of the Currency guidance, for example, dates from April 2011 — predating modern AI developments entirely.

While current MRM guidelines implicitly address explainability through governance, documentation, validation, and monitoring provisions, they create an ironic paradox: models lacking explainability in high-risk areas receive the highest risk ratings, making explainability requirements most stringent precisely where they are hardest to meet.

The research identifies several gaps in current frameworks:

  • Limited coverage beyond regulatory capital purposes
  • Lack of specific guidance for AI model validation
  • Insufficient consideration of third-party and proprietary models
  • Ambiguous definitions of what constitutes a “model change” for continuously learning systems

Global Regulatory Landscape and Guidelines

Only five jurisdictions have issued specific MRM guidelines that address AI considerations: Canada (OSFI), Japan (FSA), UAE (CBUAE), UK (PRA), and the United States (FRB/OCC). Notably, the International Association of Insurance Supervisors (IAIS) 2025 guidance provides the most comprehensive international framework covering AI explainability for insurers.

Most existing guidelines focus primarily on models used for regulatory capital purposes, leaving broader AI applications in financial institutions with limited regulatory guidance. This creates uncertainty for institutions deploying AI across customer service, operations, and risk management functions.

The research suggests that regulatory approaches vary significantly in their treatment of explainability requirements, with some jurisdictions taking more prescriptive approaches while others rely on principles-based frameworks that leave implementation details to individual institutions.

Technical Approaches: SHAP, LIME, and Beyond

The BIS research provides detailed analysis of post-hoc explainability techniques that financial institutions can use to make black box models more transparent:

SHAP (Shapley Additive Explanations) uses game-theoretic Shapley values to attribute a model’s prediction to individual input features. For example, in insurance pricing: “Your premium was influenced primarily by your driving history (40% impact), vehicle type (30% impact), and location (20% impact).”

LIME (Local Interpretable Model-agnostic Explanations) fits simpler surrogate models to slightly perturbed data around specific data points to identify the most significant features influencing predictions. This technique is particularly useful for providing local explanations for individual decisions.

Counterfactual explanations identify the smallest changes to input features that would alter the prediction. For instance: “You were denied a loan because your annual income was £30,000. If your income had been £45,000, you would have been offered a loan.”

However, these techniques have significant limitations:

  • They can be inaccurate and unstable
  • They’re susceptible to manipulation
  • No ground truth exists for assessing correctness
  • They may provide misleading but plausible explanations

Make your financial research and regulatory documents more accessible and actionable for stakeholders.

Get Started →

The Unique Challenge of Large Language Models

Large Language Models (LLMs) present the most acute explainability challenges for financial regulators. OpenAI’s GPT-3, for example, has 175 billion parameters — values adjusted during training on internet-scale data. These models:

  • Produce probabilistic outputs that vary even with identical inputs
  • Are trained on vast, often undisclosed datasets
  • Have propensity for hallucination and generating plausible but incorrect information
  • Rely on proprietary foundation models that firms cannot access or understand

Cognitive load theory suggests humans can only understand approximately seven rules or nodes, making it virtually impossible to fully comprehend sophisticated AI decision-making processes.

Chain-of-thought prompting and attribution graphs offer some promise for LLM explainability, but these techniques are still in their infancy and may not reflect actual internal reasoning processes.

Balancing Performance with Transparency

The research argues that regulators should explicitly recognize the trade-off between explainability and model performance. Prohibiting complex but high-performing AI models could prevent financial institutions from better managing risks and improving client experiences.

For example, Intesa Sanpaolo has successfully used machine learning to calculate regulatory capital for credit risk, demonstrating superior performance compared to traditional approaches. Requiring full explainability for such models could force institutions back to inferior risk management techniques.

The research suggests that explainability waivers could be appropriate where:

  • The performance gap is substantial and well-documented
  • The use case has appropriate risk levels
  • Adequate safeguards are implemented
  • Enhanced monitoring and governance are in place

Risk-Based Tiering and Safeguards

Rather than applying uniform explainability requirements, the BIS research recommends a risk-based tiering approach. Credit underwriting decisions that significantly impact consumers should have higher explainability standards than email classification systems or operational efficiency tools.

When complex models don’t fully meet explainability standards, institutions can implement compensating safeguards:

  • Circuit breakers: Automated mechanisms to halt model use in extreme or unexpected scenarios
  • Enhanced monitoring: Continuous oversight of model outputs for consistency and unusual patterns
  • Human oversight: Qualified staff review of model decisions, particularly for high-impact use cases
  • Stability testing: Regular third-party validation of model performance and reliability
  • Rapid response capability: Readiness to quickly cease model use when performance flaws are identified

For regulatory capital use cases specifically, the research suggests that complex AI models could be restricted to certain risk categories and exposures, with risk weights subject to more stringent output floors than currently envisioned under Basel III frameworks.

Turn your regulatory compliance documents into interactive resources that teams can navigate and understand.

Start Now →

Future of AI Regulation in Financial Services

The BIS research concludes that successful AI regulation in financial services will require significant investment in regulatory capacity. Supervisory authorities need staff capable of evaluating AI models and understanding explainability submissions — a challenge when these skills are in high demand across the industry.

Key recommendations for the future include:

  • Extending MRM guidance beyond regulatory capital to cover broader AI applications
  • Requiring institutions to establish acceptable explainability standards for relevant use cases
  • Mandating use of multiple explainability techniques rather than single methods
  • Regular updates to MRM guidelines reflecting evolving industry practices
  • Different explainability expectations for different audiences (supervisors vs. consumers)

The research emphasizes that the goal isn’t to stop AI innovation in financial services, but to ensure it develops within appropriate risk management frameworks. As AI capabilities continue to advance, regulatory approaches must evolve to balance innovation, safety, and transparency.

For financial institutions, this means proactively developing explainability capabilities, investing in appropriate safeguards, and engaging constructively with regulators to shape future frameworks. The institutions that successfully navigate these challenges will be best positioned to harness AI’s benefits while maintaining regulatory compliance and stakeholder trust.

Frequently Asked Questions

What is the difference between AI explainability and interpretability?

Explainability refers to why a model produced a specific output, while interpretability refers to how the model arrived at that output. Inherently interpretable models (like decision trees) are explainable, but the reverse is not necessarily true for complex AI models.

What are the main AI explainability techniques mentioned in the BIS research?

The key techniques include SHAP (Shapley Additive Explanations), LIME (Local Interpretable Model-agnostic Explanations), counterfactual explanations, and visualization techniques like ICE plots. Each has specific strengths and limitations.

Why do Large Language Models present unique explainability challenges?

LLMs have billions of parameters, are trained on internet-scale data, produce probabilistic outputs that vary even with identical inputs, and often rely on proprietary foundation models that firms cannot access or understand.

Should financial regulators require full explainability for all AI models?

The BIS research suggests a nuanced approach: explainability requirements should be risk-based and calibrated to the criticality of the use case. A complete ban on complex models could prevent institutions from better managing risks and improving customer experiences.

What safeguards can compensate for limited AI explainability?

Safeguards include circuit breakers, enhanced monitoring, human oversight, stability testing, frequent third-party validation, and readiness to rapidly cease model use when performance flaws are identified.

Your documents deserve to be read.

PDFs get ignored. Presentations get skipped. Reports gather dust.

Libertify transforms them into interactive experiences people actually engage with.

No credit card required · 30-second setup