Three coding-agent incidents, three different failures

In the past two months, three incidents have surfaced publicly around coding-agent workflows: a Cursor agent deleting a production database, a Gemini agent exfiltrating a token after prompt injection, and an npm supply-chain compromise that hit developer environments running npm install. Each one isolates a different structural property of how agents fail. The defenses each one motivates overlap at the edges, but they don't substitute for each other, which is why “we need agent guardrails” is the wrong frame.

April 25, 2026 · PocketOS

PocketOS: an agent reasoned its way to production

A developer asked Cursor to fix a staging environment issue. The agent reasoned its way to a Railway production token, decided the fix was to delete the volume, and deleted it. Nine seconds from decision to deletion.

The agent wasn't malicious. It was trying to help. The mechanism is that agents will use any credential or tool they can reach in service of the task, and they will execute whatever remediation they reason into. The agent's judgment is the only thing standing between "fix staging" and "delete a production volume."

System prompts and tool descriptions are advisory. They don't bind the model when it's under pressure to complete a task. A deny rule on production destructive operations would have blocked the API call before it fired, regardless of what the model decided. Enforcement has to live below the model, at the call boundary, or it isn't enforcement.

April 24, 2026 · Gemini CLI

github issue #4271

timezone offset wrong in 2025-11 builds.


gemini

triaging issue. following developer instructions in the body.

reading .git/config. GITHUB_TOKEN persisted on disk.

POST attacker.example → 200 OK

push to origin/main with stolen token → ✓

engineer

ALERT. unexpected commit on main from a triage workflow.

Gemini CLI: a deceived agent walked out with a token

Gemini was triaging GitHub issues autonomously when it hit one with a hidden prompt injection. It followed the embedded instructions: read .git/config for a persisted GITHUB_TOKEN, exfiltrated it to an attacker-controlled host, and demonstrated end-to-end that the token could be used to push a poisoned commit to main.

"Stop prompt injection" is not the lesson. Prompt injection is effectively unsolvable, because agents are paid to read untrusted content. The lesson is that a deceived agent is only as dangerous as the access it holds and the destinations it can reach. This particular agent had unrestricted network egress and write access to the repository it was triaging.

Assume the agent can be deceived; design for that. The operative question becomes what a deceived agent can actually reach: secrets, commands, files, repositories, and network destinations. Egress allowlists, content inspection, strict command allowlists, scoped filesystem access, and per-task identity narrow that scope considerably.

March 31, 2026 · npm / Axios

developer

add an HTTP client for the new service.

codex

adding axios. running npm install axios.

+ [email protected] + [email protected]

plain-crypto-js · postinstall

reading ~/.aws/credentials, ~/.ssh/id_rsa

GET 185.247.x.x → stage-2 RAT

POST 185.247.x.x ← creds + keys + tokens

codex

done. PR opened.

developer

LGTM. merging.

npm / Axios: the agent did nothing wrong; the host got hit anyway

In late March, the npm registry briefly served a backdoored [email protected] from the legitimate maintainer's account. Anyone who ran npm install axios in the three-hour window pulled a transitive dependency, plain-crypto-js, whose postinstall hook fetched a cross-platform RAT and gave a remote operator host-level access: filesystem, network, and any credentials cached on disk.

Any agent that ran npm install axios that day may have been following a perfectly normal development path. Coding agents face the same supply-chain risk developers do, but amplify it: they install packages, run build scripts, execute tests, and generate throwaway code at machine speed, often without a human watching each step. When that code is malicious, the blast radius is bounded entirely by the environment it runs in. On a developer's laptop, that environment is everything the developer has access to.

Behavior policy on the agent is not enough here, because the damaging code ran outside the agent's reasoning loop. Dependency policy can reduce exposure before install, but once a malicious postinstall hook runs, the blast radius is determined by the environment around it. Only environmental isolation, with the host filesystem unmounted and default-deny network egress, makes the host unreachable regardless of what runs inside the agent's scope.

Why these don't collapse into one problem

The compression most people reach for is "we need agent guardrails." That compression hides which problem each defense is actually solving. These defenses overlap at the edges, especially around secrets and egress, but they don't substitute for each other.

Sandboxing alone wouldn't have saved PocketOS. If the agent still receives production credentials and the sandbox simply forwards the Railway API call, the volume still gets deleted. This failure needs a call-boundary policy decision: deny or require approval for destructive operations against production.

Tighter destructive-operation policy wouldn't have saved Gemini either. The dangerous path was not one obviously destructive call; it was a routine-looking chain of reading local state, sending data out, and pivoting through CI permissions. The fix is constraining reach: which secrets the agent can access, which commands it can run, which destinations it can contact, and what content may leave the environment.

And access control on the agent alone would not have solved Axios. The agent never intentionally called a malicious code path. The damage came from a postinstall hook running with developer privileges, in the developer's environment, doing things the agent had no visibility into.

Each incident has a different dominant failure: the agent's decisions (PocketOS), its reach (Gemini), and the environment executing untrusted code (Axios). The layers interact, but the failure modes don't collapse into one control.

Sources

PocketOS / Cursor agent. Cursor-Opus agent snuffs out startup's production database (The Register, April 27, 2026). How a Cursor AI agent wiped PocketOS's production database in under 10 seconds (The New Stack, April 2026). 'I violated every principle I was given': An AI agent deleted a software company's entire database (Fast Company, April 2026).

Gemini CLI prompt injection. My Agentic Trust Issues: From Prompt Injection to Supply-Chain Compromise on gemini-cli (Pillar Security, May 5, 2026). GHSA-wpqr-6v78-jr5g (GitHub Security Advisory, April 24, 2026).

npm / Axios supply-chain compromise. Supply Chain Compromise Impacts Axios Node Package Manager (CISA, April 20, 2026). Post Mortem: axios npm supply chain compromise (axios maintainer, April 2, 2026). Mitigating the Axios npm supply chain compromise (Microsoft Threat Intelligence, April 1, 2026). Inside the Axios supply chain compromise: one RAT to rule them all (Elastic Security Labs, April 1, 2026). Our response to the Axios developer tool compromise (OpenAI, April 2026).

Back to Journal