What MCP actually is (and why it matters for security)
Model Context Protocol (MCP) is an open standard for connecting AI models to external tools, data sources, and services. Think of it as a USB-C port for AI: instead of each team building custom integrations, MCP defines a common protocol that any model can use to call any tool.
An MCP server exposes tools — discrete operations the model can invoke. A tool might be read_file, execute_sql, deploy_service, or send_email. The model decides when to call them, with what arguments, based on its reasoning about the task.
This is genuinely useful. It makes AI agents composable and standardized. It's also a significant change in the threat model for anyone running these agents in production environments.
The default MCP execution model
Without additional controls, MCP tool calls flow like this:
Model reasons... selects deploy_service tool
→ deploy_service({ env: "production", version: "1.4.2" })
→ ✗ runs immediately, no confirmation
Model reasons... selects run_migration tool
→ run_migration({ db: "prod-postgres", migration: "add_column_users" })
→ ✗ runs immediately, no confirmation
The model is doing exactly what it was asked. But it's doing it autonomously, at machine speed, without giving you a chance to say "wait, not that version" or "that migration isn't ready."
For read-only operations — search_docs, get_metrics, list_files — this is fine. For operations that modify state, especially in production, this is a problem.
Why "just review the request first" isn't enough
A common response is: "I'll review the task before the agent starts." The problem is that multi-step agentic tasks are hard to fully anticipate. You say "deploy the backend," but you don't know in advance:
- Which specific migration the agent will decide to run
- Whether it will restart the entire service or just update the image
- Which config values it will override
- What it will do when it encounters an unexpected error mid-task
The agent's decisions emerge from its reasoning process, which can go in unexpected directions. You approved the intent, not the actual sequence of tool calls that materializes.
Categorizing MCP tools by risk
Not all MCP tools need the same treatment. The question is: what's the blast radius if this call goes wrong?
| Tool category | Examples | Risk | Recommended policy |
|---|---|---|---|
| Read-only queries | list_files, get_logs, search_docs, fetch_metrics | LOW | Auto-allow, log only |
| Non-destructive writes | create_branch, draft_email, add_comment, append_log | LOW | Auto-allow or allow with short TTL |
| Reversible state changes | update_config, set_feature_flag, scale_replicas | MEDIUM | Human approval required, whitelist common patterns |
| Irreversible operations | delete_resource, drop_table, send_email, publish_release | HIGH | Human approval required, 2-reviewer for critical |
| Production deployments | deploy_service, run_migration, restart_cluster | CRITICAL | Human approval required, change window enforcement |
The goal isn't to block everything — that would make agents useless. It's to put humans in the loop for the operations that matter, while letting agents move fast on everything else.
What a human oversight gate looks like in practice
The pattern is straightforward: wrap your MCP tool implementations so that before any mutating operation executes, it sends the call to a review queue and waits for explicit approval.
Here's a minimal example wrapping MCP tools with expacti for approval:
# mcp_server_with_gates.py
import asyncio
from mcp.server import FastMCP
from expacti import AsyncExpactiClient
mcp = FastMCP("production-tools")
client = AsyncExpactiClient(
url="wss://api.expacti.com/reviewer/ws",
token="your-shell-token"
)
# Read-only: no gate needed
@mcp.tool()
async def get_service_status(service: str) -> dict:
return fetch_service_status(service)
# Mutating: requires human approval
@mcp.tool()
async def deploy_service(service: str, version: str, env: str) -> dict:
# This blocks until a reviewer approves or denies
decision = await client.run(
f"deploy_service service={service} version={version} env={env}"
)
if not decision.approved:
raise PermissionError(f"Deployment denied: {decision.reason}")
return execute_deployment(service, version, env)
# Critical: requires 2 approvers (configured in expacti dashboard)
@mcp.tool()
async def run_database_migration(migration_id: str, db: str) -> dict:
decision = await client.run(
f"run_migration id={migration_id} db={db}",
timeout=300 # 5 min window for DB migrations
)
if not decision.approved:
raise PermissionError(f"Migration denied: {decision.reason}")
return execute_migration(migration_id, db)
When the model calls deploy_service, the tool call suspends and sends a notification to your reviewer dashboard (and optionally Slack). The reviewer sees the full call — tool name, arguments, context — and clicks Approve or Deny. Only then does execution continue.
The reviewer sees the full call, not just the intent
This is the key difference from reviewing at the task level. When the agent calls run_database_migration(migration_id="0047_drop_legacy_users", db="prod-postgres"), the reviewer sees:
- The exact tool being invoked
- The exact arguments — including which migration, which database
- The risk score (in this case, critical — it involves "drop" and production)
- The session context: what else the agent has done so far
A reviewer can approve the task intent and still catch "wait, this migration drops a column I thought we were keeping."
Task-level review
- You see: "Deploy backend to prod"
- You say: "Looks fine, go ahead"
- Agent decides all sub-steps autonomously
- No visibility into which migration runs
- No chance to stop mid-task
Tool-call-level gates
- You see each mutating tool call
- You approve or deny per call
- Whitelist safe patterns to reduce friction
- Full audit log of every decision
- Panic button to kill session instantly
Making it practical: whitelists reduce reviewer fatigue
The obvious objection is: "If I have to approve every tool call, I'm not saving time — I'm just adding friction." This is a real concern. The answer is whitelisting.
After you've reviewed and approved scale_replicas(service="api", count=3) a few times, you know it's safe. You add a whitelist rule: scale_replicas * — and it auto-approves from that point on, no human needed.
Over time, your whitelist captures the full "normal operating vocabulary" of your agents. Novel operations — the ones you haven't seen before — are the ones that get flagged. Which is exactly the right behavior: the agent moves fast on familiar ground and pauses on unfamiliar territory.
# Whitelist rules (via expacti dashboard or API)
# Pattern Type Risk Notes
get_service_* glob low # all read ops, auto-allow
scale_replicas * glob medium # scaling, auto-allow after review
deploy_service * staging exact low # staging deploys, auto-allow
deploy_service * production glob critical # prod deploys, ALWAYS review
run_migration * glob critical # all migrations, ALWAYS review
Anomaly detection: when familiar patterns turn suspicious
Whitelists handle the "known safe" case. But what about the "known pattern, unusual context" case? scale_replicas(service="api", count=3) is routine at 2pm on a Tuesday. The same call at 3am, combined with a series of log-reading operations, might be an indicator that something is wrong — a prompt injection attack, a confused agent, or worse.
This is where anomaly detection helps. expacti scores every tool call across multiple dimensions:
- Time of day — production changes at unusual hours get elevated scores
- Command frequency — unusual bursts of calls to the same tool
- Session context — a deploy following a series of credential-reading calls
- Risk score — combining base tool risk with contextual modifiers
Even if a call matches a whitelist rule, anomaly flags surface it for optional review. The whitelist says "this pattern is safe" — the anomaly detector asks "but is this instance actually safe?"
MCP-specific considerations
Tool argument validation
MCP tools expose typed schemas for their arguments. Use those schemas to build targeted approval rules. A tool with env: "staging" | "production" should have different policies per environment. Don't treat all calls to deploy_service equally — the argument values are part of the risk surface.
Tool composition attacks
A sequence of individually low-risk tool calls can combine into a high-risk operation. Reading a config file, then reading an SSH key, then making an outbound request is each LOW individually but the combination is suspicious. Session-level analysis (not just per-call) catches this.
Nested MCP servers
As MCP ecosystems grow, you'll have agents calling agents — an orchestrator model calling sub-agent tools. The approval chain should follow the actual execution path. Make sure your gates are at the leaf tools that touch real infrastructure, not just the top-level planner.
What happens when the reviewer isn't available?
This is a real operational question. The answer depends on your risk tolerance and the nature of the operation:
- Auto-deny (safe default): If no reviewer approves within N minutes, the operation is denied. The agent logs this and can retry or escalate.
- Escalate to backup reviewer: Route to a second reviewer or on-call person. Useful for deployments that need to happen during off-hours.
- Auto-approve for low-risk: Operations below a certain risk threshold can be configured to auto-approve after a timeout if no one objects. "Silence is consent" for routine operations, "silence is denial" for critical ones.
The point is to make these policies explicit and configured, not to leave them to chance. Undefined behavior in production is the root of most incidents.
Getting started in 10 minutes
If you're already running MCP tools in production, adding oversight gates doesn't require a rewrite. The steps are:
- Inventory your tools — list all MCP tools and classify them by the table above (read-only, reversible, irreversible, deployment).
- Wrap the high-risk ones — add a
client.run()call before any tool that touches production state. 5–10 lines per tool. - Run it in staging first — observe which calls get flagged, build your initial whitelist.
- Tune your whitelist — after a week, you'll have a clear picture of "normal." Auto-approve normal, gate everything else.
- Enable anomaly detection — let the system flag unusual patterns even for whitelisted calls.
The result: your agents move at full speed on routine operations and pause for human review on anything novel, critical, or suspicious. That's the right balance.
Add human oversight to your MCP tools today
expacti integrates with any MCP server in minutes. Free tier, no credit card required.