MCP Tools and the Case for Human Oversight Gates

What MCP actually is (and why it matters for security)

Model Context Protocol (MCP) is an open standard for connecting AI models to external tools, data sources, and services. Think of it as a USB-C port for AI: instead of each team building custom integrations, MCP defines a common protocol that any model can use to call any tool.

An MCP server exposes tools — discrete operations the model can invoke. A tool might be read_file, execute_sql, deploy_service, or send_email. The model decides when to call them, with what arguments, based on its reasoning about the task.

This is genuinely useful. It makes AI agents composable and standardized. It's also a significant change in the threat model for anyone running these agents in production environments.

⚠️ The security shift

Before MCP, the model generated text and a human copy-pasted commands. With MCP, the model calls tools directly. The human is no longer in the execution path by default.

The default MCP execution model

Without additional controls, MCP tool calls flow like this:

User: "Deploy the new backend to production"

Model reasons... selects deploy_service tool
→ deploy_service({ env: "production", version: "1.4.2" })
→ ✗ runs immediately, no confirmation

Model reasons... selects run_migration tool
→ run_migration({ db: "prod-postgres", migration: "add_column_users" })
→ ✗ runs immediately, no confirmation

The model is doing exactly what it was asked. But it's doing it autonomously, at machine speed, without giving you a chance to say "wait, not that version" or "that migration isn't ready."

For read-only operations — search_docs, get_metrics, list_files — this is fine. For operations that modify state, especially in production, this is a problem.

Why "just review the request first" isn't enough

A common response is: "I'll review the task before the agent starts." The problem is that multi-step agentic tasks are hard to fully anticipate. You say "deploy the backend," but you don't know in advance:

Which specific migration the agent will decide to run
Whether it will restart the entire service or just update the image
Which config values it will override
What it will do when it encounters an unexpected error mid-task

The agent's decisions emerge from its reasoning process, which can go in unexpected directions. You approved the intent, not the actual sequence of tool calls that materializes.

ℹ️ The multi-step problem

In a 15-step agentic task, step 12 might be a destructive operation that was perfectly reasonable given steps 1–11 but that you would have rejected if you'd known it was coming. Post-hoc review of logs is not the same as pre-execution approval.

Categorizing MCP tools by risk

Not all MCP tools need the same treatment. The question is: what's the blast radius if this call goes wrong?

Tool category	Examples	Risk	Recommended policy
Read-only queries	list_files, get_logs, search_docs, fetch_metrics	LOW	Auto-allow, log only
Non-destructive writes	create_branch, draft_email, add_comment, append_log	LOW	Auto-allow or allow with short TTL
Reversible state changes	update_config, set_feature_flag, scale_replicas	MEDIUM	Human approval required, whitelist common patterns
Irreversible operations	delete_resource, drop_table, send_email, publish_release	HIGH	Human approval required, 2-reviewer for critical
Production deployments	deploy_service, run_migration, restart_cluster	CRITICAL	Human approval required, change window enforcement

The goal isn't to block everything — that would make agents useless. It's to put humans in the loop for the operations that matter, while letting agents move fast on everything else.

What a human oversight gate looks like in practice

The pattern is straightforward: wrap your MCP tool implementations so that before any mutating operation executes, it sends the call to a review queue and waits for explicit approval.

Here's a minimal example wrapping MCP tools with expacti for approval:

# mcp_server_with_gates.py
import asyncio
from mcp.server import FastMCP
from expacti import AsyncExpactiClient

mcp = FastMCP("production-tools")
client = AsyncExpactiClient(
    url="wss://api.expacti.com/reviewer/ws",
    token="your-shell-token"
)

# Read-only: no gate needed
@mcp.tool()
async def get_service_status(service: str) -> dict:
    return fetch_service_status(service)

# Mutating: requires human approval
@mcp.tool()
async def deploy_service(service: str, version: str, env: str) -> dict:
    # This blocks until a reviewer approves or denies
    decision = await client.run(
        f"deploy_service service={service} version={version} env={env}"
    )
    if not decision.approved:
        raise PermissionError(f"Deployment denied: {decision.reason}")

    return execute_deployment(service, version, env)

# Critical: requires 2 approvers (configured in expacti dashboard)
@mcp.tool()
async def run_database_migration(migration_id: str, db: str) -> dict:
    decision = await client.run(
        f"run_migration id={migration_id} db={db}",
        timeout=300  # 5 min window for DB migrations
    )
    if not decision.approved:
        raise PermissionError(f"Migration denied: {decision.reason}")

    return execute_migration(migration_id, db)

When the model calls deploy_service, the tool call suspends and sends a notification to your reviewer dashboard (and optionally Slack). The reviewer sees the full call — tool name, arguments, context — and clicks Approve or Deny. Only then does execution continue.

The reviewer sees the full call, not just the intent

This is the key difference from reviewing at the task level. When the agent calls run_database_migration(migration_id="0047_drop_legacy_users", db="prod-postgres"), the reviewer sees:

The exact tool being invoked
The exact arguments — including which migration, which database
The risk score (in this case, critical — it involves "drop" and production)
The session context: what else the agent has done so far

A reviewer can approve the task intent and still catch "wait, this migration drops a column I thought we were keeping."

Task-level review

You see: "Deploy backend to prod"
You say: "Looks fine, go ahead"
Agent decides all sub-steps autonomously
No visibility into which migration runs
No chance to stop mid-task

Tool-call-level gates

You see each mutating tool call
You approve or deny per call
Whitelist safe patterns to reduce friction
Full audit log of every decision
Panic button to kill session instantly

Making it practical: whitelists reduce reviewer fatigue

The obvious objection is: "If I have to approve every tool call, I'm not saving time — I'm just adding friction." This is a real concern. The answer is whitelisting.

After you've reviewed and approved scale_replicas(service="api", count=3) a few times, you know it's safe. You add a whitelist rule: scale_replicas * — and it auto-approves from that point on, no human needed.

Over time, your whitelist captures the full "normal operating vocabulary" of your agents. Novel operations — the ones you haven't seen before — are the ones that get flagged. Which is exactly the right behavior: the agent moves fast on familiar ground and pauses on unfamiliar territory.

# Whitelist rules (via expacti dashboard or API)
# Pattern          Type    Risk    Notes
get_service_*      glob    low     # all read ops, auto-allow
scale_replicas *   glob    medium  # scaling, auto-allow after review
deploy_service * staging  exact   low     # staging deploys, auto-allow
deploy_service * production  glob  critical  # prod deploys, ALWAYS review
run_migration *    glob    critical  # all migrations, ALWAYS review

Anomaly detection: when familiar patterns turn suspicious

Whitelists handle the "known safe" case. But what about the "known pattern, unusual context" case? scale_replicas(service="api", count=3) is routine at 2pm on a Tuesday. The same call at 3am, combined with a series of log-reading operations, might be an indicator that something is wrong — a prompt injection attack, a confused agent, or worse.

This is where anomaly detection helps. expacti scores every tool call across multiple dimensions:

Time of day — production changes at unusual hours get elevated scores
Command frequency — unusual bursts of calls to the same tool
Session context — a deploy following a series of credential-reading calls
Risk score — combining base tool risk with contextual modifiers

Even if a call matches a whitelist rule, anomaly flags surface it for optional review. The whitelist says "this pattern is safe" — the anomaly detector asks "but is this instance actually safe?"

MCP-specific considerations

Tool argument validation

MCP tools expose typed schemas for their arguments. Use those schemas to build targeted approval rules. A tool with env: "staging" | "production" should have different policies per environment. Don't treat all calls to deploy_service equally — the argument values are part of the risk surface.

Tool composition attacks

A sequence of individually low-risk tool calls can combine into a high-risk operation. Reading a config file, then reading an SSH key, then making an outbound request is each LOW individually but the combination is suspicious. Session-level analysis (not just per-call) catches this.

Nested MCP servers

As MCP ecosystems grow, you'll have agents calling agents — an orchestrator model calling sub-agent tools. The approval chain should follow the actual execution path. Make sure your gates are at the leaf tools that touch real infrastructure, not just the top-level planner.

What happens when the reviewer isn't available?

This is a real operational question. The answer depends on your risk tolerance and the nature of the operation:

Auto-deny (safe default): If no reviewer approves within N minutes, the operation is denied. The agent logs this and can retry or escalate.
Escalate to backup reviewer: Route to a second reviewer or on-call person. Useful for deployments that need to happen during off-hours.
Auto-approve for low-risk: Operations below a certain risk threshold can be configured to auto-approve after a timeout if no one objects. "Silence is consent" for routine operations, "silence is denial" for critical ones.

The point is to make these policies explicit and configured, not to leave them to chance. Undefined behavior in production is the root of most incidents.

💡 The right mental model

Your MCP tools are the hands of the agent. Your whitelist is the muscle memory — operations so familiar they don't need conscious attention. Your approval gate is the moment of conscious intent for anything new, risky, or unusual. All three are needed.

Getting started in 10 minutes

If you're already running MCP tools in production, adding oversight gates doesn't require a rewrite. The steps are:

Inventory your tools — list all MCP tools and classify them by the table above (read-only, reversible, irreversible, deployment).
Wrap the high-risk ones — add a client.run() call before any tool that touches production state. 5–10 lines per tool.
Run it in staging first — observe which calls get flagged, build your initial whitelist.
Tune your whitelist — after a week, you'll have a clear picture of "normal." Auto-approve normal, gate everything else.
Enable anomaly detection — let the system flag unusual patterns even for whitelisted calls.

The result: your agents move at full speed on routine operations and pause for human review on anything novel, critical, or suspicious. That's the right balance.

Add human oversight to your MCP tools today

expacti integrates with any MCP server in minutes. Free tier, no credit card required.

▶ See the demo Get started free