March 27, 2026 · 7 min read

The cost of a misfire: what happens when an AI agent runs the wrong command

AI agents don't need malicious intent to cause catastrophic damage. They just need autonomy and one bad judgment call.

Security AI Safety Incidents

Your AI coding agent just deleted the production database. Not because it was hacked. Not because of a prompt injection attack. It ran DROP TABLE users because it was trying to reset the test environment and couldn't tell the difference between your staging connection string and your production one.

This is the misfire problem. And it's not theoretical. As AI agents gain the ability to execute shell commands, deploy infrastructure, and modify running systems, the gap between "helpful automation" and "career-ending incident" is exactly one bad command.

The anatomy of an AI agent misfire

Let's be clear about what we're not talking about. This isn't about rogue AI or adversarial attacks. Misfires happen for a much more mundane reason: agents optimize for the task they're given, and they don't second-guess whether the task itself is appropriate in context.

An agent told to "clean up the git history" will do exactly that. It doesn't ask whether other developers have pulled the branch. It doesn't check if there's a deployment in progress. It doesn't wonder whether force-pushing to main is against team policy. It sees a task, identifies a command that accomplishes it, and executes.

This is the core tension with autonomous agents: the same literal-mindedness that makes them productive also makes them dangerous. A human engineer might hesitate before running rm -rf / because they've seen what it does. An agent has no such instinct. It evaluates commands by whether they achieve the stated goal, not by whether they're safe.

The misfire formula: Correct intent + wrong context + autonomous execution = incident. Every single time.

The agent wasn't wrong about the command. It was wrong about when, where, and whether to run it. And by the time you notice, the command has already finished executing.

Blast radius by command type

Not all misfires are created equal. The damage an errant command can do depends entirely on what category it falls into. Here's a practical breakdown:

Command type Examples Blast radius Recovery
Read-only ls, cat, git log Zero N/A
Config changes chmod, export ENV=, git config Contained Usually reversible
Data mutation rm -rf, DROP TABLE, truncate Severe Backup-dependent
Infrastructure kubectl delete, terraform destroy Catastrophic Hours to days
History rewrite git push --force, git rebase on shared branches Severe Requires coordination

The pattern is clear: the further right a command reaches — from local state to shared infrastructure — the larger the blast radius and the harder the recovery. Read-only commands are harmless. Anything that writes, deletes, or reconfigures carries escalating risk.

Most teams only think about this taxonomy after an incident. By then, the taxonomy has already been demonstrated on their production systems.

Incident walkthrough: the force-push that wrecked the afternoon

Let's trace a real-world scenario. An AI coding agent is helping a developer clean up a feature branch. The developer says something like: "the commit history on main is messy, can you squash and clean it up?" The agent interprets this as permission to rewrite history on main.

Here's the timeline:

Final cost: 3 hours of developer time across 4 engineers, 1 reverted release, a broken staging environment, and 1 very unhappy CTO asking "why does an AI have push access to main?"

The agent did exactly what it was asked to do. That's the problem. Nobody asked it to check whether force-pushing to a shared branch was safe. Nobody configured a guardrail that would have caught it. The command was perfectly valid. The context made it catastrophic.

The reversibility principle

Here's the mental model that should govern every AI agent's access to your systems: evaluate commands before execution, not after.

Most incident response is reactive. Something breaks, you investigate, you fix it, you write a post-mortem. But with AI agents executing commands autonomously, the window between "command issued" and "damage done" is measured in milliseconds. There is no time to react.

The reversibility principle flips this. Before any command runs, ask:

  1. Is this command reversible? If yes, the risk is bounded. A bad chmod can be undone. A bad git commit can be reverted.
  2. Is this command irreversible? If yes, it needs human approval. A DROP TABLE cannot be un-dropped. A terraform destroy cannot be un-destroyed.
  3. Does this command affect shared state? If yes, the blast radius extends beyond the agent's session. Force-pushes, production deployments, and infrastructure changes all fall here.

A pre-execution gate that applies this principle turns catastrophic failures into non-events. The dangerous command never runs. The agent pauses, a human reviews, and either approves or rejects. The 3-hour incident becomes a 30-second conversation.

How expacti reduces blast radius

This is the exact problem expacti was built to solve. Instead of hoping your agent makes the right call, you put a deterministic gate between the agent and your systems.

Block before execution

When an agent running through expacti issues a command like git push --force origin main, the command does not execute. It's intercepted, held, and surfaced for review. The blast radius of a blocked command is zero.

Risk scoring surfaces danger immediately

Every command is scored before a reviewer sees it. A git log gets a LOW score and flows through automatically if whitelisted. A terraform destroy gets a HIGH score and is flagged with a red banner. The reviewer doesn't have to evaluate risk from scratch — the system has already done the triage.

Session recording for forensics

Even commands that are whitelisted and auto-approved are logged. Every keystroke, every output, every command in the session is recorded. When something goes wrong — and eventually, something always does — you have a complete forensic trail. No guessing, no "what did the agent do?" Just play back the session.

Whitelist patterns let safe commands flow

The goal isn't to block everything. That would make agents useless. Whitelist patterns let you define exactly which commands are safe to auto-approve: git status, ls, cat, npm test. Everything else pauses for review. You get the speed of automation for routine commands and human judgment for everything that matters.

Recovery playbook: the first 5 minutes after a misfire

Prevention is the goal, but you need a plan for when prevention fails. If an agent has already executed a destructive command, the first 5 minutes determine whether this is a 30-minute fix or a 3-day outage. Here's the playbook:

Step 1: Stop the agent

Kill the session immediately. Don't let the agent run any more commands. Don't try to get the agent to "fix" what it just broke — it got you here in the first place. Terminate the process, close the connection, cut power if you have to. Every second the agent continues is another command that might make things worse.

Step 2: Assess blast radius

What did it touch? Check the command history. Was it a local-only change or did it affect remote systems? Did it modify data, infrastructure, or configuration? Is the damage still propagating (e.g., a CI/CD pipeline deploying bad code)?

Step 3: Preserve evidence

Before you start fixing, capture everything. Save session logs, terminal output, and agent conversation history. If you're using expacti, the session recording is already there. If not, screenshot and copy everything you can. You'll need this for the post-mortem, and you might need it for compliance.

Step 4: Begin recovery

Roll back, restore from backup, or redeploy the last known good state. The specific recovery depends on what was damaged: git reflog for force-pushes, backup restores for deleted databases, terraform apply from the last good state for infrastructure. Don't improvise — follow your runbook.

Step 5: Post-mortem

Once the fire is out, answer the hard questions. How did the agent get access to run that command? Why wasn't there a guardrail? What whitelist or approval gate would have prevented this? Then implement that gate before you give the agent access again.

Key insight: Every misfire post-mortem arrives at the same conclusion — there should have been an approval gate. The only variable is how expensive the lesson was to learn.

Don't wait for your first misfire

AI agents are getting more capable and more autonomous every month. The commands they can run are getting more powerful. The systems they can access are getting more critical. And the gap between "the agent ran a command" and "someone reviewed that command" is the gap where incidents live.

You can close that gap proactively, or you can close it after your agent force-pushes to main, drops a production table, or tears down your Kubernetes cluster. The outcome is the same — you'll end up with an approval gate either way. The only question is whether you add it before or after the incident that makes it obvious.

Don't wait for your first misfire to add an approval gate

Expacti intercepts dangerous commands before they execute. Block, review, approve — in seconds, not hours.

Get started free →