Ch. 10

Sandboxing & Runtime Protection

Part 3 / Security & Authorization

Why containers aren’t a sandbox

A common misconception: “We run our agents in Docker containers, so they’re sandboxed.” Containers provide process isolation, not security isolation. They were designed for deployment consistency, not adversarial containment. The distinction matters because agents are adversarial workloads - they execute code generated by a probabilistic model that can be manipulated through prompt injection.

FeatureContainerSecurity Sandbox
Filesystem isolationPartial (volumes leak)Full (explicit allowlist)
Network isolationConfigurableDefault deny
Syscall filteringOptional (seccomp)Mandatory
Resource limitsConfigurableEnforced
Escape difficultyMediumHigh
Designed forDeploymentAdversarial containment

A container with default settings shares the host kernel, can access mounted volumes, can make arbitrary network requests, and can execute any system call the kernel supports. A security sandbox starts from a deny-all posture and explicitly allows only the capabilities the agent needs. The difference is philosophical: containers assume the workload is trusted and provide convenience. Sandboxes assume the workload is untrusted and provide containment.

Ephemeral environment isolation

The strongest sandboxing pattern gives each agent its own ephemeral environment - a fresh, isolated VM or container that is destroyed after the task completes. No persistent state means no accumulated security risks or data leakage between sessions. Each agent starts clean and ends clean.

Ona takes this approach in production: every agent gets an isolated cloud environment with its own filesystem, network, and process space. The environment is pre-configured with the project’s dependencies and tools, then destroyed when the task is done. This eliminates an entire class of cross-session attacks and makes the blast radius of any single agent failure exactly one environment. If an agent is compromised, the attacker gains access to one ephemeral environment that will be destroyed in minutes - not to a persistent server with access to other sessions.

The ephemeral pattern also solves the state accumulation problem. Long-running agent environments accumulate state - temporary files, cached credentials, modified configurations - that can leak between sessions or create unexpected behavior. Ephemeral environments eliminate this by starting fresh every time.

For teams that can’t use a managed platform, the same pattern can be implemented with OS-level primitives.

OS-Level sandboxing

Production agent sandboxing uses OS-level primitives that restrict what a process can do at the kernel level. These are the same primitives that browsers use to sandbox web content and that container runtimes use for isolation - battle-tested, well-understood, and available on every modern Linux system.

seccomp (Secure Computing Mode) restricts which system calls a process can make. A typical agent needs file I/O, network sockets, process management, and memory allocation. It doesn’t need raw socket access, kernel module loading, or filesystem mounting. A seccomp profile that allows only the necessary syscalls prevents an entire class of privilege escalation attacks. The profile should be as restrictive as possible - start with a minimal set and add syscalls only when the agent demonstrably needs them.

Landlock (Linux Security Module) restricts filesystem access at the kernel level. Unlike file permissions, which are identity-based (this user can access this file), Landlock is path-based (this process can access these directories). An agent sandboxed with Landlock can read and write files in its working directory but cannot access /etc/passwd, /root/.ssh, or any other sensitive path - even if it’s running as root. Landlock is particularly useful for agents because it can be applied without changing the agent’s code or requiring root privileges.

**Anthropic’s sandbox-runtime** combines multiple isolation layers into a single runtime designed specifically for AI agents. It layers seccomp profiles, filesystem restrictions, network policies, and resource limits into a cohesive sandbox that’s easy to configure and hard to escape. It’s open-source and worth studying even if you don’t use it directly - the architecture demonstrates how to compose multiple isolation primitives into an effective sandbox.

Policy-Based runtime protection

Beyond OS-level sandboxing, policy engines enforce business rules that can’t be expressed as syscall filters or filesystem restrictions. A policy engine can enforce rules like “this agent can make at most 100 API calls per session,” “this agent cannot access production databases between 2 AM and 6 AM,” or “this agent must get human approval before modifying any file in the infrastructure directory.”

Policy engines sit between the agent and its tools, intercepting every action and evaluating it against the policy before allowing it to proceed. This adds latency - typically 1-5ms per policy check - but provides a flexible, auditable enforcement layer that can be updated without redeploying the agent.

The key design decision is whether policies are deny-by-default (everything is blocked unless explicitly allowed) or allow-by-default (everything is allowed unless explicitly blocked). For agent workloads, deny-by-default is the right choice. The set of things an agent should be able to do is finite and knowable. The set of things it shouldn’t do is infinite and unknowable. Start from deny-all and add permissions as needed.

Defense in depth architecture

The complete security architecture layers all defenses. Authorization (Zanzibar/OpenFGA) controls what the agent is allowed to do. Sandboxing (seccomp/Landlock/ephemeral environments) controls what it can physically do. Prompt injection defense (input sanitization, output filtering) controls what it’s tricked into doing. Policy engines (OPA/custom) enforce business rules that span all three layers.

No single layer is sufficient. Authorization can be bypassed by prompt injection. Sandboxing can be circumvented if the agent has legitimate access to a tool that can be abused. Prompt injection defense is probabilistic and will miss sophisticated attacks. Policy engines depend on having the right policies. Each layer catches what the previous layers miss. The goal isn’t perfect security - it’s making the cost of a successful attack high enough that the risk is manageable.

Related Concepts: Zanzibar (8.1), Prompt Injection (9.1) Related Workflows: Security Checklist for Agent Deployment (Chapter 24) Related Tools: Anthropic sandbox-runtime, OpenFGA, AgentGuard

“Standards enable ecosystems. The wild west phase is ending.”

Protocols are the connective tissue of the agent stack. MCP connects agents to tools. A2A connects agents to each other. AGENTS.md connects agents to codebases. This section covers how they work, how to implement them, and where they’re headed.