Building a Human-in-the-Loop SSH Proxy in Rust

Why We Built a Custom SSH Proxy

Most SSH proxies are about logging. Record what happened, audit it later, maybe raise an alert. That is useful, but it does not prevent anything. By the time your SIEM fires, rm -rf / has already finished.

We needed something different: a proxy that can block a command before it reaches the target server. The operator types DROP TABLE users, the proxy intercepts it, sends it to a reviewer over WebSocket, and holds the SSH session in limbo until a human says "yes" or "no." If denied, the command never executes. The target server never even sees it.

No off-the-shelf SSH proxy does this. OpenSSH's ForceCommand cannot selectively block. Session recording tools like asciinema are post-hoc. Bastion hosts authenticate, but they do not approve. So we built expacti-sshd from scratch in Rust, using the russh crate.

Architecture: Server + Client in One Process

The core insight is that an SSH proxy is both a server (accepting connections from the operator) and a client (connecting to the target host). With russh, we run both roles in the same Tokio runtime:

// Simplified connection flow
//
// SSH Client  →  expacti-sshd (Server Handler)  →  Target SSH Server
//                       ↓
//                 CommandBuffer (PTY parsing)
//                       ↓
//                 Approval Service (WebSocket)

When a client connects, the proxy's russh::server::Handler implementation creates a new SshProxy struct. That struct holds:

A CommandBuffer — the PTY parser that reconstructs typed commands from raw bytes
An Arc<AtomicBool> flag for the approval state
A channel sender to forward bytes to the target via an mpsc channel
A reference to the WebSocket approval client

The target-side connection runs in a separate spawned task, bridged to the client side through Tokio channels. This separation is critical: the server handler is synchronous (called by russh on each SSH message), but the target forwarding is fully async.

PTY-Level Command Interception

This is where things get messy. When a user types in an interactive SSH session, the proxy does not receive neat, line-delimited commands. It receives raw bytes — one keystroke at a time, mixed with ANSI escape sequences, control characters, and multi-byte UTF-8.

Our CommandBuffer is a state machine that reconstructs the command the user intended to type. Here is a simplified version of the core logic:

pub struct CommandBuffer {
    buf: Vec<char>,
    utf8_buf: Vec<u8>,
    esc_state: u8,  // 0=normal, 1=saw ESC, 2=inside CSI
}

impl CommandBuffer {
    /// Feed a byte. Returns Some(command) on Enter.
    pub fn push(&mut self, b: u8) -> Option<String> {
        match b {
            // Enter → emit the buffered command
            0x0d => {
                let cmd: String = self.buf.iter().collect();
                self.buf.clear();
                return Some(cmd);
            }
            // Backspace / DEL
            0x7f | 0x08 => { self.buf.pop(); }
            // Ctrl+W → delete word
            0x17 => {
                while let Some(&c) = self.buf.last() {
                    if c == ' ' { break; }
                    self.buf.pop();
                }
                while self.buf.last() == Some(&' ') {
                    self.buf.pop();
                }
            }
            // Ctrl+U → clear line
            0x15 => { self.buf.clear(); }
            // ESC → start escape sequence
            0x1b => { self.esc_state = 1; }
            // Printable ASCII
            0x20..=0x7e if self.esc_state == 0 => {
                self.buf.push(b as char);
            }
            _ => { /* handle escape sequences, UTF-8 */ }
        }
        None
    }
}

The ANSI Escape Problem

Arrow keys, Home/End, F-keys, and terminal resize events all generate multi-byte ANSI escape sequences. If you naively push these into the command buffer, you get garbage like ls[A[A[B instead of ls.

Our parser uses a three-state machine: Normal (collecting printable chars), Saw ESC (just received 0x1b), and Inside CSI (processing a Control Sequence Introducer). In the CSI state, we consume parameter bytes until we hit a final byte (0x40–0x7E), then discard the entire sequence. This correctly handles everything from simple arrow keys (ESC [ A) to complex bracketed paste sequences.

UTF-8 Multi-Byte Sequences

SSH sends raw bytes. A single emoji or CJK character arrives as 3–4 bytes across potentially multiple data() callbacks. The buffer accumulates high bytes in a separate utf8_buf, checks the expected length from the leading byte, and only pushes to the command buffer once the full character is assembled:

// Multi-byte UTF-8 assembly
b if b >= 0x80 => {
    self.utf8_buf.push(b);
    let expected = match self.utf8_buf[0] {
        0xc0..=0xdf => 2,
        0xe0..=0xef => 3,
        0xf0..=0xf7 => 4,
        _ => { self.utf8_buf.clear(); return None; }
    };
    if self.utf8_buf.len() == expected {
        if let Ok(s) = std::str::from_utf8(&self.utf8_buf) {
            for c in s.chars() { self.buf.push(c); }
        }
        self.utf8_buf.clear();
    }
}

The Atomic Approval Flow

When the user presses Enter and the CommandBuffer emits a command, we need to pause the session and wait for a human decision. The challenge: the data() handler is called by russh for every incoming SSH packet. We cannot block it (that would freeze the entire SSH connection). We cannot await inside it (the handler is not async in the way we need). So we use a shared atomic flag.

pub struct SshProxy {
    awaiting_approval: Arc<AtomicBool>,
    cmd_buf: CommandBuffer,
    // ...
}

The flow works like this:

User types a command; CommandBuffer returns it on Enter
Handler sets awaiting_approval.store(true, SeqCst)
Handler spawns an async task that sends the command to the approval service over WebSocket
While the flag is true, all subsequent keystrokes are silently dropped — the user cannot type anything new
The spawned task receives the decision and sets the flag back to false

On approval: the task sends 0x0d (Enter) to the target, causing it to execute the command that was already echoed to its PTY. On denial: the task sends 0x15 (Ctrl+U) to clear the target's line buffer, and sends a denial message back to the client.

// Inside the data() handler
if let Some(command) = self.cmd_buf.push(byte) {
    let command = command.trim().to_string();
    if command.is_empty() {
        self.send_to_target(vec![0x0d]).await;
        continue;
    }

    self.awaiting_approval.store(true, Ordering::SeqCst);

    let flag = self.awaiting_approval.clone();
    let tx = self.target_tx.clone();
    tokio::spawn(async move {
        match request_approval(&command).await {
            Decision::Approved => {
                let _ = tx.send(TargetCmd::Data(vec![0x0d])).await;
            }
            Decision::Denied(reason) => {
                let _ = tx.send(TargetCmd::Data(vec![0x15])).await;
                // Send denial message to client...
            }
        }
        flag.store(false, Ordering::SeqCst);
    });
}

Why AtomicBool instead of a Mutex? The flag is only ever read or written as a single boolean. An atomic is lock-free, has no poisoning, and is trivially Send + Sync. A mutex would work but adds unnecessary overhead for a single-bit state.

Bidirectional Bridging with tokio::select!

The proxy needs to shuttle data in both directions: client-to-target and target-to-client. This is a classic multiplexing problem, and tokio::select! handles it elegantly.

When the client requests a PTY and shell, we open a channel to the target and spawn a forwarding task:

tokio::spawn(async move {
    loop {
        tokio::select! {
            // Target → Client: forward output
            msg = target_channel.wait() => {
                match msg {
                    Some(ChannelMsg::Data { ref data }) => {
                        let _ = client_handle
                            .data(client_channel_id,
                                  CryptoVec::from_slice(data))
                            .await;
                    }
                    Some(ChannelMsg::Eof) | None => {
                        let _ = client_handle
                            .close(client_channel_id).await;
                        break;
                    }
                    _ => {}
                }
            }
            // Client → Target: receive forwarded commands
            cmd = cmd_rx.recv() => {
                match cmd {
                    Some(TargetCmd::Data(bytes)) => {
                        let _ = target_channel
                            .data(&bytes[..]).await;
                    }
                    None => {
                        let _ = target_channel.close().await;
                        break;
                    }
                    _ => {}
                }
            }
        }
    }
});

The key design choice: the client never writes directly to the target channel. Instead, the server handler sends TargetCmd messages through an mpsc channel. This decoupling lets the approval flow inject or suppress bytes without racing with the forwarding task.

The auth_none Trick

SSH authentication happens before any channels are opened. A normal SSH proxy would need to handle authentication — checking passwords, validating public keys, managing certificates. That is a lot of complexity and a large attack surface.

Our approach: accept everything at the proxy level.

impl Handler for SshProxy {
    async fn auth_none(
        &mut self, _user: &str
    ) -> Result<Auth, Self::Error> {
        Ok(Auth::Accept)
    }

    async fn auth_password(
        &mut self, _user: &str, _password: &str
    ) -> Result<Auth, Self::Error> {
        Ok(Auth::Accept)
    }

    async fn auth_publickey(
        &mut self, _user: &str, _key: &_
    ) -> Result<Auth, Self::Error> {
        Ok(Auth::Accept)
    }
}

The proxy blindly accepts any credentials. Real authentication is deferred to the target server. When the proxy opens a client connection to the target, the target performs its own auth. If the target rejects the credentials, the session simply fails — no shell is established.

Security note: This design means the proxy itself does not enforce authentication. It must sit behind a network boundary (VPN, firewall, or mTLS) so that only authorized operators can reach it. The proxy's job is authorization (approving commands), not authentication (verifying identity).

This separation of concerns keeps the proxy simple. We do not need to replicate OpenSSH's authentication stack, manage authorized_keys files, or handle PAM. The target server — which already has that infrastructure — handles it.

Test Strategy

Testing an SSH proxy is hard. You need a real SSH server, a real SSH client, and a way to simulate the approval flow. Our solution: run everything in-process.

Mock SSH Target

We spin up a minimal russh::server that accepts all auth and echoes exec commands with a :ok suffix:

fn spawn_mock_target() -> u16 {
    let port = free_port();
    tokio::spawn(async move {
        // Accept all auth, echo "{command}:ok\r\n"
        // on exec requests
        run_mock_ssh_server(port).await;
    });
    port
}

In-Process Approval Mock

Instead of running a real WebSocket server, we inject an mpsc::Sender<ApprovalRequest> directly into the proxy. The mock auto-approves everything unless the command contains "DENY":

fn spawn_approval_mock() -> mpsc::Sender<ApprovalRequest> {
    let (tx, mut rx) = mpsc::channel(64);
    tokio::spawn(async move {
        while let Some(req) = rx.recv().await {
            let decision = if req.command.contains("DENY") {
                Decision::Denied("blocked".into())
            } else {
                Decision::Approved
            };
            let _ = req.respond.send(decision);
        }
    });
    tx
}

What We Test

We have unit tests for the PTY parser (backspace handling, Ctrl+W, ANSI stripping, UTF-8 assembly) and the WebSocket client protocol. On top of that, 6 end-to-end integration tests exercise the full proxy stack:

Approved exec — command reaches target, output contains :ok
Denied exec — denial message returned, target never sees the command
Sequential commands — multiple execs in one session, each independently approved
Mixed decisions — approve one, deny the next, approve again
Auth passthrough — proxy accepts any credentials
Connection teardown — clean shutdown when client or target disconnects

The full test suite runs in under 3 seconds. No external processes, no Docker containers, no network calls.

Lessons Learned

PTY is Messier Than Expected

We started with a naive "split on newline" approach. It lasted about 10 minutes. Real terminal input includes:

Backspace that erases the last character (but different terminals send 0x7f or 0x08)
Arrow keys for history navigation (ESC [ A/B) that should not appear in the command
Tab completion that changes the buffer contents server-side (we forward tabs and let the target handle completion)
Multi-line commands with backslash continuation
Paste events that arrive as a burst of bytes, sometimes with bracketed paste markers

The PTY parser went through four rewrites before stabilizing. The current version handles all of the above correctly, but we still discover edge cases occasionally (screen, tmux, and mosh all have their own PTY quirks).

russh API Quirks

The russh crate is excellent — it is the only pure-Rust SSH implementation that supports both server and client. But it has some sharp edges:

Handler trait methods are async but called sequentially — you cannot spawn long-running work inside them without blocking subsequent SSH messages. We solved this by spawning tasks and communicating via channels.
CryptoVec is not Clone — you need CryptoVec::from_slice() to copy data between channels, which adds an allocation per forwarded packet.
Channel IDs are opaque — you need to carefully track which server-side channel maps to which client-side channel, especially when multiplexing multiple channels (PTY + forwarded ports).

Testing Async SSH is Hard

The biggest challenge was not the SSH protocol itself but timing. Tests that connect, send a command, and check output need to account for:

TCP connection establishment latency
SSH handshake (key exchange, auth, channel open)
Approval round-trip through the mock
Output propagation back through the proxy

We solved this with generous timeouts, retry loops on connection attempts, and careful channel synchronization. The in-process approval mock (versus a WebSocket server) eliminates one entire network hop, which makes tests both faster and more deterministic.

What's Next

We are working on recording and replay — capturing the full PTY stream so reviewers can see not just what command was typed but the entire terminal context. We are also exploring AI-assisted risk scoring for SSH commands, similar to what we already do for API-submitted commands in the main expacti backend.

If you are building tools that sit in the data path of infrastructure access, Rust + Tokio + russh is a compelling stack. The type system catches entire classes of concurrency bugs at compile time, and the performance is such that the proxy adds sub-millisecond latency to the SSH session (the approval round-trip dominates, as it should).

Try Expacti

Human-in-the-loop command approval for AI agents and infrastructure access. Free tier available.

Live Demo

Building a Human-in-the-Loop SSH Proxy in Rust: Lessons from expacti-sshd

Why We Built a Custom SSH Proxy

Architecture: Server + Client in One Process

PTY-Level Command Interception

The ANSI Escape Problem

UTF-8 Multi-Byte Sequences

The Atomic Approval Flow

Bidirectional Bridging with tokio::select!

The auth_none Trick

Test Strategy

Mock SSH Target

In-Process Approval Mock

What We Test

Lessons Learned

PTY is Messier Than Expected

russh API Quirks

Testing Async SSH is Hard

What's Next

Try Expacti