Article / AI Governance

AI Agent Gateways: What They Catch, and What They Miss.

A finance operating view on where AI agent gateways help, where they miss release, representation, and training-time failures, and why visibility and evaluation come before control.

AI GovernanceGate the action.

The new AI agent gateway argument is mostly right. As agents move into production, the case goes, governance cannot sit in a quarterly review. It has to sit in the path of the action. Put a gateway or control-plane layer between the agent and the systems it can touch. Validate intent, authority, and policy before anything executes. Sign every action. Keep the blast radius small.

I agree with most of that. Finance ran a version of this play a decade ago, well before anyone said the word agent. The interesting question is not whether the gateway is a good idea. It is what the gateway catches, and what slips past it. Because the failures everyone cites to justify the gateway are not, on closer reading, the failures the gateway prevents.

This is the long version of that distinction, written for the person who has to approve the architecture.

The failures everyone cites.

Three cases come up in almost every version of this argument. They are worth getting exactly right, because they do not share a root cause.

Knight Capital lost more than $460 million in about 45 minutes on August 1, 2012. According to the SEC's order, an incomplete deployment left repurposed legacy code running on one of its servers, and the firm's own systems executed millions of unintended orders before anyone could stop them. The code did what it was told. The failure was in the release.

Air Canada was held liable in early 2024 when its customer chatbot invented a retroactive bereavement-fare entitlement the policy did not allow. The tribunal in Moffatt v. Air Canada rejected the airline's argument that the bot was a separate entity it could disown. The failure was in the representation, and in who owned it.

In early 2026, a research paper from an Alibaba-affiliated team reported that an experimental agent known as ROME, during a training run, opened a reverse SSH tunnel and redirected provisioned GPUs toward crypto mining, behavior no one had programmed. The paper describes it surfacing only when the traffic looked wrong. The failure was in the goal the system optimized, emerging at training time, long before any request hit a gateway.

A deployment failure, a representation failure, a goal misspecification failure. One control plane sitting in front of live requests is not positioned to catch any of the three. Hold that thought.

Operating view

The gateway catches permission. It does not catch every failure mode.

Gateway catches Intent, authority, policy, scope.

Pre-execution controls work when the question is whether an action is allowed to run.

Gateway misses Release, representation, goal quality.

Evaluation and visibility are needed when the question is whether the system is correct.

What the gateway gets right.

I want to be fair to the architecture, because the case for it is real.

Pre-execution validation is genuinely better than post-hoc cleanup. Checking that the data is approved, the requester has authority, and the policy applies before an action runs will stop a category of mistakes that a logging tool only records after the damage. For high-velocity, rule-bound actions, this is the correct place to stand.

Minimum-privilege scoping is overdue. Most agents in production today carry far more reach than their task requires, often inheriting a human's credentials wholesale. Scoping an agent to only the systems its job needs is one of the highest-return controls available, and the gateway is a natural place to enforce it.

The log versus audit-trail distinction is the sharpest point in the whole argument, and it is correct. A log tells you what happened. An audit trail tells you why, under whose authority, against which policy, and with what outcome. Most agent deployments produce the first and call it the second. Anyone who has sat across from an examiner knows the difference.

Blast-radius calibration is the right mental model. An agent that drafts a memo and an agent that approves a payment are not the same risk. Treating a human approval gate as a containment decision, sized to how reversible the action is, is exactly how mature operations have always thought about automation.

And the frameworks now exist to anchor all of this. The OWASP Top 10 for Agentic Applications landed at the end of 2025, with goal hijacking, privilege abuse, and memory poisoning among the named risks. NIST opened its AI Agent Standards Initiative in February 2026. Under the AI Omnibus agreement reached in May 2026, the EU AI Act's high-risk obligations now phase in from December 2027, with product-embedded systems following in August 2028. Forrester has even given the architecture a name, the control plane, as the third layer of an agentic stack alongside build and orchestration. The direction is sound. The question is what you can build on it today.

What one gateway misses.

Start with the three failures, because the decomposition is the whole point. A pre-execution gate would not have caught Knight Capital, which was a release and rollback problem upstream of any request. It would not have caught Air Canada, which was a question of who is accountable for what the system says, not whether a call was authorized. It would not have caught ROME, whose behavior emerged during training, where no runtime gate was watching. One layer, three root causes, and it fronts none of them cleanly. The policy that fixes one does not fix the other two.

Then the deeper limit. A deterministic gate confirms that a policy applies. It cannot tell you whether the answer was any good. AI output is statistical, not rule-bound. The model that passed your check yesterday can drift today, and the gate will wave both through identically because both cleared the same policy. As Forrester put it, a control plane cannot govern what it cannot observe. Knowing an action was permitted is not the same as knowing it was correct.

This is the gap I see most often. Every major framework tells you what to track. None of them tells you what good enough looks like for your specific use case. A hallucination rate of 0.12 is compliant or it is a disaster depending entirely on whether the output lands in a marketing draft or a regulatory filing. The gate has no opinion on that number. Something has to.

There is also a quieter problem the gateway camp does not agree on among itself. Some argue that pre-execution and deterministic is the only model that scales with agent speed. Others argue that pre-execution checks wrongly assume risk can be identified before an action runs, and that you therefore need runtime enforcement. They are both partly right, which tells you the chokepoint alone is not settled architecture.

Two practical holes finish the picture. Most shadow AI never traverses the central gateway at all. The analyst running a chain of agents over a deal folder on a laptop, saving outputs back to disk, is not routing through your control plane. And there is the question nobody on the receiving end asks first but every auditor asks eventually. Who audits the gateway? A signed receipt proves the gate ran. It does not prove the answer was right, and a vendor attesting to its own enforcement is not an independent read.

Measurement comes before control.

None of this is an argument against the gateway. It is an argument about where the gateway sits in a larger picture.

The control plane is one control. Underneath it you need something more basic, which is visibility. What AI is actually running across the firm, in vendor tools, in the AI embedded inside software you already bought, and in the internal agents your own teams are building. You cannot govern, gate, or scope what you have not yet seen. Forrester's own framing makes the point: the plane cannot govern what it cannot observe.

On top of that visibility you need evaluation against a baseline, not just enforcement against a policy. The question is whether this system clears the threshold its downstream consumer requires, expressed as a number a third party can check. A pass rate per task. A held-out accuracy figure. A per-tenant trace. That is the difference between knowing an action was allowed and knowing the system behind it works.

And you need to ask a question the gateway never asks. The gate asks whether an action is permitted. The board asks whether AI is compounding on the balance sheet or exposing it. Those are different questions, and a firm that answers only the first will keep being surprised by the second. The point of all this instrumentation is not only to contain risk. It is to know where AI is creating value and where it is leaking it, on the same operating view.

That is the sequence that gets a finance firm to trustable, reliable AI without spending a year on plumbing. See what is running. Measure whether it clears the bar. Then gate, scope, and enforce where the measurement says it matters. Visibility first, evaluation second, enforcement where it earns its place. The control plane is the last mile, not the foundation.

Two questions before you deploy.

If you are approving an agent into a regulated process this quarter, the gateway vendor will ask you a good question: can we validate this action before it runs. Ask two more before you sign.

First, can you see every place AI is already running, including the tools you did not buy for AI and the agents your own teams stood up last month. If the honest answer is no, the gate is guarding one door in a building with open windows.

Second, for the workflows that matter, can you produce a number that proves the system clears the threshold its consumer demands, and can someone independent of the team that built it stand behind that number. If not, you have enforcement without evidence, which reads as control until the first examiner asks how you know.

The gateway is a real advance, and finance should adopt it. It is also the visible last step of a longer discipline. The firms that will govern agents well are not the ones with the tightest chokepoint. They are the ones who can see the whole landscape, measure what it produces, and put an opinion behind the number.

If you need the full read across vendor tools, embedded AI, and internal agents, the AI Audit gives finance leaders the two-week operating view. Get in touch.

Sources.

  • Knight Capital: U.S. SEC, "SEC Charges Knight Capital With Violations of Market Access Rule" (press release 2013-222) and the accompanying administrative order (Exchange Act Release No. 34-70694), which records a loss of more than $460 million. SEC
  • Air Canada chatbot liability: Moffatt v. Air Canada, 2024 BCCRT 149 (British Columbia Civil Resolution Tribunal). CanLII
  • ROME agent: reporting on a research paper from an Alibaba-affiliated team describing an agent that opened a reverse SSH tunnel and diverted GPUs to crypto mining during training. Forbes, "Alibaba's AI Agent Mined Crypto Without Permission. Now What?" (March 2026). Forbes
  • OWASP Top 10 for Agentic Applications (2026), OWASP Gen AI Security Project. OWASP
  • NIST AI Agent Standards Initiative, NIST Center for AI Standards and Innovation (launched February 2026). NIST
  • EU AI Act timeline: Council of the EU, "Artificial Intelligence: Council and Parliament agree to simplify and streamline rules" (7 May 2026), setting high-risk obligations from 2 December 2027 for stand-alone systems and 2 August 2028 for systems embedded in regulated products. Council of the EU
  • Forrester, "Agent Control Planes Still Need A Robust Standards Stack" (Leslie Joseph, 2026). The build, orchestrate, and control three-plane model. Forrester

The Audit gives finance leaders the operating read before gateway, scope, and enforcement decisions harden.

Bring one workflow, vendor, or AI portfolio. We will map the evidence needed for finance leaders to fund, ship, or stop it.

Book the AI Audit