Banking's Model Risk Framework Wasn't Built for LLMs. Regulators Just Admitted It — Now Banks Have a Window to Act. | SuperML.dev

There’s a telling moment buried in the Office of the Comptroller of the Currency’s Spring 2026 Semiannual Risk Perspective. Amid the standard language about credit quality and commercial real estate, regulators flag that the governance challenges of advanced AI — “lack of explainability, data privacy and data poisoning issues, cybersecurity threats, and validation challenges where industry approaches are evolving” — make “appropriate governance and risk management essential for risk mitigation.” And then, in the same breath, they note that the OCC, the FDIC, and the Federal Reserve plan to issue a request for information on model risk management “in the near future” that will specifically address banks’ use of AI, including generative AI and agentic AI.

Read that again: the three principal banking regulators in the United States are about to formally ask the industry how it’s governing AI — because they don’t yet know what good looks like, and neither does the industry. That is not a bureaucratic footnote. That is a starting gun.

The Fed gave an even more pointed signal. Federal Reserve Vice Chair for Supervision Michelle Bowman, speaking on May 1, 2026, said explicitly that the Federal Reserve’s recently amended model risk management guidance “applies only narrowly to traditional models and basic AI applications” and does not yet extend to generative or agentic AI. For anyone who has been telling their board that SR 11-7 compliance covers the bank’s LLM deployments: it doesn’t. The regulator who wrote the guidance just said so.

Why SR 11-7 Was Never Going to Work for LLMs

The Supervisory Guidance on Model Risk Management — SR 11-7 / OCC Bulletin 2011-12 — was published in April 2011. The iPhone 4 was out. Excel was the dominant “model” in most bank risk departments. The framework is built on three pillars: model development and implementation, model validation, and governance and controls. It is rigorous, well-designed, and thoroughly wrong for probabilistic language systems.

The traditional model risk framework assumes that you can specify inputs, document assumptions, validate outputs against known benchmarks, and back-test performance over time. A logistic regression predicting credit default has a defined feature set, a bounded output space, and a performance metric that maps cleanly to business reality. You can stress it, challenge its assumptions, and explain exactly why it predicted what it predicted.

None of that applies cleanly to an LLM deployed in a banking workflow. The “model” is a multi-billion-parameter system pre-trained on data that no bank owns or audited. Its outputs are non-deterministic — run the same prompt twice and you may get different answers. “Validation” as traditionally defined requires a ground truth you can compare against, but when an LLM is drafting SAR narratives or summarizing credit analysis, the ground truth is often “what a skilled analyst would write,” which is expensive to collect and impossible to back-test at scale. And explainability — the cornerstone of model risk governance in credit and market risk — is essentially unsolved for transformer architectures at inference time.

This is not a criticism of LLMs. It is a structural description of why the governance tooling built for one class of model does not transfer to another.

The OCC Just Gave Banks Permission — and a Deadline

The Spring 2026 Risk Perspective contains one piece of language that deserves particular attention: the OCC states that banks “may consider expanding their use of generative AI and agentic AI for material financial decisions.” That is a significant shift in tone from previous supervisory communications. Regulators are no longer treating AI in material financial decisions as a fringe experiment. They are acknowledging it as a coming reality that the existing framework is not equipped to govern.

The same report is clear that the current moment is one of supervised tolerance, not approved practice. The OCC “supports banks’ efforts to integrate AI into core functions, while managing the risk in a safe and sound manner,” but it is “actively reviewing” its own supervisory expectations and guidance. Translation: the current framework is provisional. The RFI is the first step toward replacing it.

For bank AI teams, this creates a specific window — probably 12 to 18 months — before formal guidance arrives and constrains design choices. That window is not a vacation from governance. It is an opportunity to build governance infrastructure around what actually works rather than waiting for regulators to prescribe what they think should work based on testimony from vendors and consultants. Banks that go into the RFI comment period with operational experience running LLMs under documented governance programs will have a structural advantage in shaping the rules. Banks that wait will be shaped by them.

The Agentic AI Layer Makes This Harder

The timing of the OCC report is not coincidental. Banks are moving toward agentic AI at exactly the moment when the governance framework is most uncertain. Fiserv launched agentOS in May 2026 with live deployments across AML triage, loan onboarding, and deposit analysis. FIS announced a Financial Crimes AI Agent built with Anthropic, aimed at compressing AML case investigations from days to minutes. JPMorgan, Citi, and BNY are all building toward agentic infrastructure.

Each of those deployments inherits the model risk gap — and adds new ones. An AI agent doesn’t just produce a prediction; it takes actions. It queries systems, drafts documents, routes cases, and in some architectures, initiates transactions. The governance question is no longer just “was the model output accurate?” It becomes “did the agent take the right action, with the right permissions, under the right conditions, with a recoverable audit trail, at a speed that human oversight cannot fully monitor in real time?”

That last phrase — from the American Banker’s reporting on bank AI continuity risks — is worth sitting with. Banking’s core systems were designed for deterministic, auditable processes. A core system is the operational nerve center of a bank, and agents that can read from and write to that environment create a very different risk profile. When an agent makes a decision in 400 milliseconds that would have taken a compliance officer four hours, human oversight has already been bypassed by design.

The OCC’s report acknowledges this directly, noting that AI increases “the speed, scale, and sophistication of cyberattacks,” but the same speed-and-scale dynamic applies to operational failure. A misconfigured agentic workflow in a large bank doesn’t fail at human speed. It fails at machine speed, across thousands of cases, before anyone notices the error pattern in the logs.

The SuperML Take

The OCC report and the Bowman speech together represent a specific kind of regulatory communication: a public acknowledgment that existing frameworks are inadequate, combined with a signal that new frameworks are coming. This is how regulators prime an industry for change. The SR 11-7 framework took years to develop after the 2008 crisis exposed model risk failures at scale. The forthcoming AI guidance is being telegraphed earlier, which is a genuine improvement — but it also means the window for proactive positioning is shorter than many banks realize.

The production-ready version of this story is not “regulators are concerned about AI.” That’s been true for three years. The production-ready version is: the governance instruments that banks have relied on since 2011 have been explicitly declared inadequate for the AI systems banks are currently deploying. The gap between current deployment and current governance is not a gray area. It is an acknowledged gap, documented in a federal regulatory publication, with a formal inquiry process incoming.

For senior AI engineers and AI-forward finance executives, the practical question is not whether governance will get harder — it will — but whether your institution is building governance infrastructure that will survive contact with the new rules or one that will need to be rebuilt when the RFI becomes a bulletin. The banks that are instrumenting their LLM pipelines for auditability now, building human-in-the-loop checkpoints into agentic workflows, and documenting their validation approaches for probabilistic models are not just doing good engineering. They are building regulatory defensibility before the specific requirements are known.

Six to twelve months from now, when the OCC/FDIC/Fed RFI drops, the comment period will largely be theater. The real influence will come from institutions that can say “here is what governance looks like in practice, here is what we tested, here is what failed, and here is why our framework accounts for agentic behavior that SR 11-7 was never designed to capture.” That is the institution that gets cited in the final guidance. The institution that waited to read the guidance before building governance gets cited in the exam findings.

The gap between where bank AI governance is today and where it will need to be is not subtle. It is documented, acknowledged, and on a regulatory timetable. The question is who closes that gap on their own terms.

Architecture Impact

What changes in system design? Banks deploying LLMs or agentic AI for material financial decisions must begin treating these systems as model-risk-managed assets even under frameworks not yet designed for them. This means building parallel governance infrastructure: prompt versioning and change management analogous to model version control, human-in-the-loop checkpoints mapped to materiality thresholds, and output logging sufficient to reconstruct the reasoning chain behind any consequential decision. Agentic workflows require event-level audit logs — agent → tool → action → outcome — not just model-level performance dashboards.

What new failure mode appears? The governance gap creates a specific liability pattern: a bank deploys an agentic AI system under informal governance, the system produces a systematic error at scale (misclassified AML alerts, incorrect credit decisions, improperly routed cases), and when regulators ask for the model validation documentation, the bank cannot produce it because the governance framework was never designed for this class of model. This is not a hypothetical. It is the SR 11-7 playbook applied to a new failure domain: the damage is not from the model failing technically, but from the institution failing to govern it.

What enterprise teams should evaluate:

Model risk teams: Begin scoping LLM validation procedures now — what does “challenging model assumptions” mean for a fine-tuned foundation model? What does back-testing mean for generative outputs? Build draft answers before the RFI prescribes them.
AI engineering teams: Instrument every production LLM and agentic workflow with structured output logging, version tracking, and human override audit trails. The audit trail is the governance artifact. If you can’t produce it, you can’t defend the deployment.
Compliance and legal teams: Map each LLM use case against the OCC’s materiality language — “material financial decisions” — and prioritize governance depth accordingly. Not every AI tool needs SR 11-7 treatment; the ones touching credit, AML, or customer-facing risk do.
CISOs and operational risk teams: Account for agentic AI failure modes in business continuity plans. An agent workflow that fails at machine speed across thousands of cases requires a different incident response playbook than a human-operated process.

Cost / latency / governance / reliability implications: Adding proper governance to LLM deployments is not free. Structured output logging for a high-volume AI pipeline adds storage and compute overhead — estimates from production deployments suggest 8–15% additional cost for comprehensive audit instrumentation at scale. More significant is the latency of human-in-the-loop checkpoints: inserting a mandatory human review step into an agentic AML triage workflow that currently runs in seconds can collapse throughput by an order of magnitude if not designed carefully. The governance architecture needs to distinguish between synchronous review (required for irreversible actions) and asynchronous audit (sufficient for recoverable decisions), or banks will end up with governance theater that satisfies the letter of future requirements while breaking the operational economics that justified AI deployment in the first place.

What to Watch

The OCC/FDIC/Federal Reserve RFI on model risk management is the most important regulatory document in US banking AI governance since SR 11-7 itself. Watch for its publication date — the window between publication and comment deadline will be short, and the comment period will be when the practical architecture choices get locked in as regulatory expectations. Watch also for Federal Reserve Vice Chair Bowman’s follow-up speeches on AI supervision; she has signaled an intent to assess whether the Fed’s own guidance is “fit for the future,” which could produce interim guidance before the tri-agency RFI.

On the industry side, watch how Fiserv’s agentOS governance framework evolves under live banking deployments — specifically whether Salem Five, City National, or the other pilot institutions publish anything about their audit and override procedures. If any of the major banks comment publicly on their agentic AI governance programs, that commentary will shape the regulatory conversation. And watch the OCC’s exam findings over the next 12 months: if the first enforcement actions related to AI model governance start appearing, the urgency of the architecture conversation will change dramatically.

Banking's Model Risk Framework Wasn't Built for LLMs. Regulators Just Admitted It — Now Banks Have a Window to Act.

Why SR 11-7 Was Never Going to Work for LLMs

The OCC Just Gave Banks Permission — and a Deadline

The Agentic AI Layer Makes This Harder

The SuperML Take

Architecture Impact

What to Watch

Sources

Want more enterprise AI architecture breakdowns?

Contents

Tags

Related Articles

SR 26-2 Blew a Hole in Bank AI Governance. Now Every Model Risk Team Has to Fill It.

Shadow AI Is Now a Material Cybersecurity Risk. The SEC Just Proved It.

CFPB Killed Disparate Impact. Your AI Credit Model Still Has Exposure.

Share Article

Comments

Related Posts

SR 26-2 Blew a Hole in Bank AI Governance. Now Every Model Risk Team Has to Fill It.

Shadow AI Is Now a Material Cybersecurity Risk. The SEC Just Proved It.

CFPB Killed Disparate Impact. Your AI Credit Model Still Has Exposure.

FDA Has No Framework for Agentic Clinical AI. ARPA-H Is About to Create One.