Every billboard in San Francisco right now is covered with ads for sandboxes: a way to run your coding agents in an isolated virtual machine.

That’s because everyone is moving from previous-generation “agents-with-tools” (e.g. AI SDK, Mastra) to current-generation “agents-with-Linux,” aka coding agents (e.g. Claude Code, Pi, OpenCode).

Providing the flexibility of Linux and a filesystem helps these agents perform vastly better and gives a much easier foundation to work with.

The problem is: Linux needs a heavy virtual machine. So sandboxes ate the industry as the default way to give agents one.

But sandboxes are expensive. For “agents-with-tools,” you were paying for a few megabytes of RAM per-agent. Now you’re reserving gigabytes of RAM and a full CPU per-agent, most of which sits reserved and unused, all while RAM and CPU are becoming astronomically expensive.

By our estimates, you’re using 47x more RAM than needed when comparing a basic shell in WebAssembly (~22 MB of RAM) to the cheapest mainstream sandbox provider (Daytona, 1 GiB of RAM).

There’s a better way.

Breaking Down the Sandbox: The Virtual Operating System

If you take a step back and look at why you need a sandbox, you’ll realize you might not need a full Linux VM after all.

You can roughly break down everything a coding agent needs to do into:

Terminal commands
Filesystem access
Outbound networking
Dev servers

All of these pieces can be replicated securely with significantly less RAM and faster startup times by emulating Linux in your backend.

We call this a “virtual operating system.” An OS that looks like Linux to the agent, but isn’t actually a full Linux sandbox.

This is the premise of agentOS, an open-source project for running lightweight, WebAssembly-based Linux VMs.

Pi already supports virtual operating systems: it provides hooks to swap its system tool calls for a virtual OS like this one.

Agents in Your Existing Backend, No Extra Infrastructure

When running agents in sandboxes, everything is done by executing commands in the sandbox remotely (e.g. spawning Claude, reading files, Git operations).

So where does a virtual operating system actually run?

Inside your existing backend, no extra infrastructure required.

This works just like the “agents-with-tools” (e.g. AI SDK) architecture did: when you need to run an agent, you run the harness in your backend and expose the virtual operating system (e.g. commands, filesystem, networking) the same way you’d expose a traditional tool.

”But the Agent Needs a Real Shell”

The instinct is that running cat, ls, and grep requires a real operating system. It doesn’t. The agent just needs those commands to behave correctly, not a kernel underneath them.

For simple use cases, these terminal commands can be replaced with a simple lookup map from the command (“ls”) to the underlying code behind it (return a list of files).

Specifically in agentOS, we’ve taken this further and compiled real Linux commands to WebAssembly so they can be executed cheaply without a full sandbox. These commands cover the Linux behavior agents actually rely on, in addition to full shell/Node.js/Python scripting, process trees, $PATH, and shebang support that agents expect.

It’s worth noting that this is where you need to be careful with security: if you allow your agent to run arbitrary code, you need to be careful of timing and DoS attacks. agentOS already hardens this to prevent these exploits, matching the same security model as Cloudflare Workers & Chromium.

”But I Need a Real Filesystem”

Most developers believe that if you need a filesystem, you need a sandbox, since the virtual machine is attached to the volume. It’s like attaching a hard drive to your computer: you can’t move the computer without the hard drive. Right?

Wrong.

Almost every filesystem volume you’ve used in the cloud today is network-attached – more like using Google Drive than actually attaching a physical hard drive to your computer.

The reasons for this are interesting and nuanced, but worth a separate blog post.

Attaching a filesystem to a coding agent that you boot up on the fly is not much different than traditional network-attached storage:

Agent reads from filesystem (e.g. “cat billion-dollar-saas.md”)
Your virtual OS receives the read request, forwards to underlying storage (e.g. S3, Archil, Mesa)
Agent sees output from “cat” command

A simple implementation of a filesystem like this can operate over S3: each file is an object and each read is a GET and each write is a PUT.

I’d recommend considering other options like Archil and Mesa, since filesystems can be nuanced and heavily dependent on your use case (throughput, latency, cost, forking).

agentOS provides a POSIX-compliant filesystem by default. It allows you to mount filesystems from any source (e.g. S3, SQLite, external FS, providers) and your agents use native Linux commands to read from these filesystems.

”A Sandbox Is More Secure Than My Virtual Operating System” (It’s the Opposite)

The assumption is that the sandbox locks down your agent. But the problem is sandboxes provide no real control over the operating system it runs on (networking, filesystem, etc) so you end up bolting on security to lock down the agent after the fact. A virtual operating system gives you that control natively, not less of it.

When your agent makes a network request, it’s probably calling something like: curl 'https://agentos-sdk.dev/docs/filesystem/'

A simple implementation of this can map that curl command to a “fetch” request.

And here’s where security actually outperforms most sandboxes: you have full programmatic control of the networking layer in your virtual operating system, so you’re able to put more granular constraints on what your agent is able to do. For example: which domains it can access, how much ingress/egress it uses, and rate limiting to prevent abuse. The same granular control applies to every other layer of the operating system: filesystem access, processes, and anything else the agent touches.

Most sandboxes to this day do not have that level of network security, despite being a huge attack vector for attackers and botnets.

agentOS provides the entire networking stack (including TCP, UDP, and HTTP) with granular network restrictions out of the box, including SSRF protection.

”The Agent Has to Run a Real Dev Server”

Building custom frontends and API endpoints with coding agents (“just-in-time software”) is becoming ubiquitous across SaaS products and internal tools. It’s easy to assume the agent needs a full server running somewhere to do it. Most of the time it doesn’t.

Traditionally, you’d give the agent a dev server (e.g. Vite, Next) and a private preview URL to reach it.

For simple use cases where the output is a static build, a lot of “dev servers” like this don’t actually need to run a full server. Instead, this can be done statelessly by:

Exposing a virtual command to the agent that runs “vite build” on the host
Writing build output to the virtual filesystem (e.g. at /build/index.html)
Exposing a preview endpoint that routes the request path to the filesystem (e.g. GET /preview/{sandbox}/build/index.html maps to an S3 request for that file)

For anything that needs a live server (hot reloading, SSR, WebSockets), agentOS provides full Node.js and Python support that can run dev servers just like you would on native Linux. It includes native secure preview URLs too.

”But Sandboxes Let Me Not Pay for Idle Resources”

Pausing is genuinely useful, but with a virtual operating system you don’t need it. It’s no different than the previous-generation “agent-with-tools” architecture like the AI SDK. When you need to run a prompt for an agent, just run the agents with access to the virtual operating system in your existing backend code. No need for pausing or special fault tolerance. If something fails, just retry the request like your infrastructure is already setup to do.

agentOS additionally provides an idle timeout that lets you determine how long to keep the instance in-memory before pausing it. This lets agentOS also support stateful workloads like dev servers or long-running jobs.

”But I Need to Fork and Snapshot State”

Forking sounds like a VM-level feature. It’s really a filesystem feature, and you can do it without a sandbox.

How you do it depends on your use case: filesystems like Mesa provide forking out of the box, while a simpler approach like S3 can just copy the objects to a new directory.

agentOS also provides a native concept of forking using a “layering” filesystem like OverlayFS, the same filesystem that powers Docker under the hood.

When You Really Need a Sandbox, Mount It

There are some cases where you truly do need a sandbox: Docker-in-Docker, GPU workloads, or native binaries that don’t have a WebAssembly build. In our experience these are the exception, not the rule.

In this case, you can use a hybrid architecture: run your agent in a virtual OS by default, and escalate to a full sandbox only for the rare operation it can’t handle.

In agentOS, we call this “sandbox mounting”: the agent upgrades itself on demand, using existing providers such as E2B and Daytona. You serve the majority of requests in cheap, WebAssembly-based sandboxes while the last few edge cases run on expensive ones, all with the exact same code.

Prior Art

The virtual operating system architecture has already been deployed at companies like Mintlify, Upstash, and Turso.

Additionally, since we’ve started building agentOS, a few other projects have been trying similar things:

Rivet’s agentOS (our project): broad Linux compatibility using WebAssembly, includes Claude Code + Codex + Pi + OpenCode support and full filesystems
Vercel’s just-bash: JavaScript reimplementation of Linux tools
Cloudflare’s Project Think: custom harness built for Cloudflare’s proprietary stack, uses JavaScript instead of Bash for code execution

The Future of Sandboxes

Infrastructure for AI is early, and a lot of new ideas are being tried. I’m confident the industry will move away from “traditional” sandboxes within a few months, especially as RAM prices soar.

We’re building agentOS completely open-source for this upcoming shift.