
AI Governance Framework for Financial Services: SR 11-7, Model Risk, and Regulatory Expectations
Financial services AI governance is governed by SR 11-7, OCC guidance, and increasingly the EU AI Act. Here's how to build a model risk management framework that meets examiner expectations.
Financial services AI governance has a well-established regulatory foundation. SR 11-7 — the Federal Reserve's guidance on model risk management — has governed quantitative model validation since 2011 and applies directly to AI systems used in consequential financial decisions. OCC Bulletin 2011-12 extends equivalent expectations to national banks. CFPB's fair lending guidance applies to AI used in credit decisions. And the EU AI Act's high-risk classification captures most financial AI that affects consumer rights.
This isn't an emerging regulatory area. Financial regulators have clear expectations. Examiners are asking about AI governance. The question isn't whether your AI needs a governance framework — it's whether your framework meets the regulatory standard.
SR 11-7: The Model Risk Management Foundation
SR 11-7 defines a model as "a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates." Most AI systems used in credit, risk, trading, and compliance functions fall squarely within this definition.
SR 11-7's three requirements apply to AI models:
Model development and implementation: The bank must understand the conceptual soundness of the model, the quality of the data used to develop it, testing performed, and limitations. For AI models, this means documenting training data provenance, validation methodology, performance metrics, and known failure modes — not just performance on benchmarks.
Model validation: An independent review of the model's conceptual soundness, data quality, and performance. Critically, the validation function must be independent of the model development and use functions. SR 11-7 calls this "effective challenge" — the ability to question model assumptions, data choices, and performance claims without organizational pressure to validate. AI models require the same independence as traditional quantitative models.
Governance and controls: Management oversight of model risk across the organization. This includes maintaining a model inventory, tracking model changes, defining risk tiers (high/medium/low) based on materiality, and ensuring that model changes go through appropriate review before deployment.
Model Inventory Requirements
Every AI model used in consequential financial decisions must be in your model inventory. "Consequential" means the output influences credit decisions, risk assessments, regulatory capital calculations, trading decisions, or customer-facing recommendations.
For each model, the inventory should record:
- Model name and unique identifier
- Owner (business line) and developer (internal or vendor)
- Model purpose and the decisions it informs
- Risk tier (high/medium/low) based on materiality and complexity
- Model type (statistical, ML, deep learning, LLM)
- Training data description and date
- Last validation date and outcome
- Current status (active / under review / retired)
- Known limitations and approved use conditions
- Third-party vendor relationship if applicable (including vendor's model documentation obligations)
Regulators expect the model inventory to be complete, current, and accessible to examiners. Gaps in inventory — models in production that aren't documented — are a significant examination finding.
AI-Specific Governance Challenges in Financial Services
SR 11-7 was written for statistical models. AI and machine learning systems present additional governance challenges that the 2011 guidance didn't fully anticipate, and that examiners are increasingly focused on.
Explainability: Traditional statistical models (logistic regression, scorecard models) produce interpretable outputs — you can trace a credit decision to specific input variables and their coefficients. Many AI models, particularly deep learning and large language models, don't produce this kind of explanability natively. For consumer credit decisions, ECOA and Regulation B require adverse action notices that identify specific reasons for denial. AI systems used in credit decisions must produce explanations that meet this standard, which may require additional tooling (SHAP values, LIME, attention-based explanations) on top of the base model.
Distributional shift: AI models trained on historical data may behave differently when market conditions, customer demographics, or economic conditions change. A credit model trained on pre-pandemic data performed poorly during the pandemic — but the failure was gradual and not immediately visible from aggregate performance metrics. Financial AI governance must include distributional shift monitoring: tracking whether the distribution of inputs to the model is drifting from the training distribution, as a leading indicator of performance degradation.
Vendor model opacity: Many financial services AI products are built on vendor-managed models where the institution doesn't have full access to training data, model architecture, or validation results. SR 11-7 requires the institution to conduct appropriate due diligence on vendor models and cannot fully delegate model risk management to the vendor. If your AI vendor can't provide sufficient model documentation, you cannot validate the model to SR 11-7 standards — and you may not be able to use it for consequential decisions.
Model change management: Cloud AI APIs update their models without formal change control processes equivalent to what SR 11-7 requires. When an API provider updates the model behind your endpoint, you may have a different model in production than the one that was validated. This is a model risk management failure — you're operating an unvalidated model. Contract provisions requiring change notification and testing windows address this, or ownership of the model eliminates it entirely.
The Effective Challenge Requirement
SR 11-7's effective challenge requirement is the governance structure most commonly violated in AI deployments. The team that builds or uses the model should not be the team that validates it. The validator must have:
- Independence from the model development team organizationally
- Access to model documentation, training data, and performance records
- Authority to require model changes or withdrawal of approval
- Resources to perform meaningful independent testing
For AI systems, effective challenge requires that validators can re-run the model on independent test data, understand the training pipeline well enough to identify conceptual soundness issues, and assess performance across demographic subgroups for fair lending purposes.
Practical implementation: define the validation function clearly (internal model risk group, external validator, or combination). Define what documentation the development team must provide to the validator. Establish a validation schedule by risk tier (high-risk models: annual; medium: biennial). Document validation findings, management responses, and remediation timelines.
Fair Lending and Algorithmic Bias
AI used in credit decisions is subject to ECOA (Equal Credit Opportunity Act) and the Fair Housing Act, which prohibit credit discrimination based on protected characteristics. The CFPB and federal banking regulators expect financial institutions to monitor AI credit models for disparate impact — when a model produces outcomes that disproportionately harm protected groups, even without discriminatory intent.
Fair lending governance for AI models requires:
Disparate impact testing: Before deployment and at regular intervals, test model outputs across protected classes (race/ethnicity, sex, national origin, age) to identify disproportionate outcomes. The threshold for "disparate impact" follows fair lending precedent — typically 80% or higher disparate impact ratio triggers review.
Proxy variable analysis: AI models can learn to use proxy variables (zip code, name, shopping patterns) that correlate with protected characteristics. Governance must include analysis of whether the model is effectively using prohibited characteristics through proxies.
Adverse action explanation: Ensure the model can produce adverse action notices that identify specific, accurate reasons for credit denial as required by Regulation B. Generic explanations ("algorithmic factors") may not meet the regulatory standard.
Subgroup performance monitoring: Track model performance (accuracy, false positive rate, false negative rate) separately by demographic subgroup as part of ongoing monitoring. Diverging performance across groups is a bias signal.
Human-in-the-Loop Requirements for Financial AI
For high-risk financial decisions, human-in-the-loop structures are both a governance requirement and a risk management practice.
Credit decisions: Automated credit decisions at the margin (borderline applicants) should include human review. Define the score range or risk tier where human review is required. Automated decisions at the extremes (clearly approvable or clearly declinable) present lower risk; it's the borderline population where AI errors are most likely and most consequential.
Market risk and trading: AI-assisted trading systems should have human oversight mechanisms that can identify and interrupt anomalous behavior. The 2010 Flash Crash is the reference case — automated systems operating without meaningful human oversight can amplify market volatility rapidly.
Fraud and AML: AI-generated fraud alerts and AML suspicious activity reports require human review before Suspicious Activity Report (SAR) filing. The BSA requirement for meaningful review applies regardless of how the alert was generated.
Model output monitoring: Assign responsibility for reviewing AI model output quality to a specific function. This is different from validation — it's ongoing operational monitoring that detects performance degradation between formal validation cycles.
Audit Trail Specification
Financial services AI audit trails must satisfy both internal model risk management and regulatory examination requirements. Minimum fields per AI query for consequential decisions:
| Field | Value |
|---|---|
| Decision ID | Unique per application or transaction |
| Timestamp | UTC |
| Model ID | Specific model version from inventory |
| Input summary | Key variables that drove the output (SHAP values for credit models) |
| Model output | Score, recommendation, or classification |
| Decision result | Approved / Denied / Referred for review |
| Human reviewer | If applicable, reviewer identity and decision |
| Adverse action codes | For credit denials |
Retention: ECOA requires retention of credit application records for 25 months; BSA records for 5 years. AI decision records should match the underlying transaction retention requirement.
Vendor Model Due Diligence
For AI vendors whose models are used in consequential financial decisions, SR 11-7 requires documentation that your institution cannot fully obtain from black-box vendors. At minimum, request:
- Model development documentation (methodology, training data description, validation history)
- Performance metrics by demographic subgroup for fair lending relevant models
- Third-party validation reports if available
- Data handling and security practices (SOC 2, ISO 27001)
- Change notification process and version control practices
- Contractual commitment to provide model documentation for examination purposes
If the vendor cannot provide adequate documentation for model risk management, you cannot use that model for SR 11-7 regulated decisions. This is not negotiable with examiners.
The Model Ownership Advantage
For financial institutions that own their fine-tuned models, several SR 11-7 governance challenges simplify substantially:
- Change management: Model versions are under your control. You decide when to update. Validation is triggered by your decision, not by a vendor's API update.
- Documentation completeness: You have access to training data, model architecture, and performance metrics. Validation documentation can be complete.
- Explainability: You can instrument the model with explainability tooling appropriate to your use case.
- Audit trail: Inference runs on your infrastructure, integrated with your existing audit logging.
The governance overhead of owned models is higher than using a cloud API — but it's the right kind of overhead, aligned with SR 11-7's requirements rather than in tension with them.
Ertas Data Suite provides complete audit trails, operator-level logging, and on-premise inference for financial services organizations where data sovereignty and model documentation are non-negotiable. For the fine-tuning pipeline, Ertas Studio handles training on your data and export to deployment-ready GGUF format.
Turn unstructured data into AI-ready datasets — without it leaving the building.
On-premise data preparation with full audit trail. No data egress. No fragmented toolchains. EU AI Act Article 30 compliance built in.
Keep reading

AI Governance Framework for Healthcare: HIPAA, FDA SaMD, and Clinical Oversight Requirements
A practical AI governance framework for healthcare organizations. Covers HIPAA compliance, FDA Software as a Medical Device requirements, clinical human-in-the-loop design, and audit trail specifications.

AI Governance Framework for Law Firms: Privilege, Supervision, and Model Accountability
Law firms face unique AI governance requirements: attorney-client privilege, supervisory rules, confidentiality obligations, and court expectations around AI-assisted work product. Here's how to build the framework.

AI Governance Framework for Construction and Engineering: Safety, Liability, and Professional Accountability
Construction and engineering AI governance is driven by safety obligations, professional engineer liability, and the high-stakes nature of physical infrastructure decisions. Here's the framework for responsible deployment.