feat(code-interpreter): add DOCKER driver for self-hosted sandbox execution#20105
feat(code-interpreter): add DOCKER driver for self-hosted sandbox execution#20105krzysztof7363 wants to merge 6 commits intotwentyhq:mainfrom
Conversation
Welcome!
Hello there, congrats on your first PR! We're excited to have you contributing to this project. |
Prepares for a new DOCKER code-interpreter driver that talks to the host Docker daemon.
…cution Adds a fourth value to CodeInterpreterDriverType alongside LOCAL, E_2_B, and DISABLED. The DOCKER driver runs user code in a per-request container on the host Docker daemon, giving self-hosters a production-safe option without the E2B SaaS dependency. Design: - Per-call lifecycle: stage a host work dir, create + start a container with a long-running placeholder cmd, exec python -u -c <code>, stream demuxed stdout/stderr, harvest outputs from the host side of the bind mount, remove the container. - Bind mount a host tempdir at /home/user instead of using putArchive/getArchive: Docker's archive API targets the rootfs layer, which is shadowed by tmpfs/bind mount overlays, so the sandbox process and the archive API see two different views of the same path. Bind mount lets both actors share one filesystem (see moby/moby#40885). - Hardening defaults: ReadonlyRootfs, CapDrop: ALL, no-new-privileges, bounded Memory and PidsLimit, optional NetworkMode for isolated-egress configurations. - Timeout: races exec against a SIGKILL on the placeholder; on timeout returns exit code 124 and an "Execution timed out" error string. - LineEmitter buffers partial lines across byte-chunks so callbacks see whole lines (matching the E2B driver's per-line streaming contract). Configuration (all under ConfigVariablesGroup.CODE_INTERPRETER_CONFIG): - DOCKER_SANDBOX_IMAGE (default twentycrm/sandbox:latest) - DOCKER_SANDBOX_NETWORK (optional; intended for an internal compose network shared with the server container) - DOCKER_SANDBOX_WORK_DIR (default /var/run/twenty-sandbox; must resolve to the same path on the host Docker daemon and inside the server container — typically a named volume mounted at both sides) - DOCKER_SANDBOX_MEMORY_MB (default 512) - DOCKER_SANDBOX_PIDS_LIMIT (default 256) - Reuses existing CODE_INTERPRETER_TIMEOUT_MS. The factory's buildConfigKey now hashes CODE_INTERPRETER_CONFIG for DOCKER (same treatment as E2B) so config changes invalidate the cached driver.
Minimal Python runtime the DOCKER code-interpreter driver boots per
request. Based on python:3.12-slim with pandas, numpy, matplotlib,
pillow, and the office-document libraries (python-docx, python-pptx,
openpyxl, pypdf, pdfplumber, reportlab) so typical user code and the
pre-seeded sandbox-scripts (docx/pdf/pptx/xlsx helpers) work without
pip installs at runtime.
Build from twenty-server as context so the sandbox-scripts COPY
resolves:
cd packages/twenty-server
docker build \
-t twentycrm/sandbox:dev \
-f ../twenty-docker/sandbox/Dockerfile \
src/engine/core-modules/code-interpreter
The TwentyMCP helper injected into the sandbox calls back into Twenty's /mcp endpoint at the URL stored in TWENTY_SERVER_URL, which was hard-coded to the public SERVER_URL. That works for E2B (SaaS sandboxes with public internet access) but breaks for the new DOCKER driver, where sandboxes run on an internal-only network and can't resolve the public hostname. Add a CODE_INTERPRETER_SERVER_URL config var that, when set, overrides SERVER_URL for the sandbox's MCP callback. Operators running the DOCKER driver set it to the internal server hostname (e.g. http://server:3000 under docker-compose). E2B / LOCAL deployments leave it unset and inherit the existing behavior.
…for sandboxes Lets operators route DOCKER-driver sandbox containers through a Docker runtime other than runc (e.g. gVisor's runsc, Sysbox) to get stronger isolation without changing the rest of the driver. DOCKER_SANDBOX_RUNTIME is optional; unset keeps the current behaviour.
1963146 to
b201da3
Compare
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
5 issues found across 9 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/twenty-docker/sandbox/README.md">
<violation number="1" location="packages/twenty-docker/sandbox/README.md:38">
P2: README incorrectly states bind-mounting `/home/user` keeps baked `/home/user/scripts` visible; bind mounts obscure image contents at the mount point.</violation>
</file>
<file name="packages/twenty-server/src/engine/core-modules/twenty-config/config-variables.ts">
<violation number="1" location="packages/twenty-server/src/engine/core-modules/twenty-config/config-variables.ts:675">
P1: `DOCKER_SANDBOX_NETWORK` is documented as required for Docker mode but is effectively optional, so sandbox containers can launch without an explicitly isolated network.</violation>
</file>
<file name="packages/twenty-server/src/engine/core-modules/code-interpreter/drivers/docker.driver.ts">
<violation number="1" location="packages/twenty-server/src/engine/core-modules/code-interpreter/drivers/docker.driver.ts:106">
P1: Unsanitized `file.filename` is written to host disk, allowing path traversal/arbitrary file write outside the temporary sandbox directory.</violation>
<violation number="2" location="packages/twenty-server/src/engine/core-modules/code-interpreter/drivers/docker.driver.ts:111">
P2: Directory permissions are set to `0o777` despite comment requiring sticky world-writable directories; use `0o1777` to enforce sticky-bit semantics.</violation>
<violation number="3" location="packages/twenty-server/src/engine/core-modules/code-interpreter/drivers/docker.driver.ts:160">
P2: Forced timeout kill can route execution to generic error handling, causing timeouts to return `exitCode: 1` instead of the intended timeout result.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
📊 API Changes ReportGraphQL Schema ChangesGraphQL Schema Changes[error] Error: Unable to read JSON file: /home/runner/work/twenty/twenty/main-schema-introspection.json: Not valid JSON content GraphQL Metadata Schema ChangesGraphQL Metadata Schema Changes[error] Error: Unable to read JSON file: /home/runner/work/twenty/twenty/main-metadata-schema-introspection.json: Not valid JSON content REST API Analysis ErrorError OutputREST Metadata API Analysis ErrorError Output |
Five fixes flagged by review on PR twentyhq#20105: - Reject input filenames that contain path separators, traversal sequences, or NUL bytes before writing them to the host work dir. Closes a path-traversal write under operator-controlled hostDir (P1). - Make DOCKER_SANDBOX_NETWORK actually required when CODE_INTERPRETER_TYPE=DOCKER. The combo @ValidateIf+@IsOptional was self-cancelling — replace @IsOptional with @isdefined so a sandbox never launches without an explicitly isolated network (P1). - Move the pre-seeded sandbox-scripts/ from /home/user/scripts to /opt/sandbox-scripts so the driver's bind mount on /home/user no longer shadows them. Add PYTHONPATH so user code can import them without juggling sys.path. Update the README to reflect the new layout. - Use 0o1777 (sticky) for the host work dir mode instead of 0o777, to match the comment that already promised sticky semantics. - Restructure the timeout flow so a kill-induced stream rejection doesn't escape into the generic catch and clobber the result with exitCode: 1 — timeouts now reliably return exitCode: 124 with the "Execution timed out" error. Also gracefully handle a failing exec.inspect() after the kill. Plus Prettier formatting on the touched files.
|
Heads-up on the three remaining red checks ( All three fail at the same step — the Two ways to resolve:
I'm happy to adjust the workflow to do (2) if that direction sounds reasonable — let me know. Everything else ( |
|
This is an interesting approach/contribution thanks. Before we get into this direction, which is a pretty significant choice for the project, I'd like to make sure we've explored all options!
Why not use Local Driver in that case? I'd say the main benefit of introducing a new driver is to introduce a secured way for people to enable Code Execution. This should be considered not just in the context of the code interpreter of the AI chat but ideally would be a solution that would also enable Apps and Workflow which can currently only run on AWS Lambda safely, so really not great for self-hosters and much nicer if we can find a great solution to tackle all this at once (safely, for all use-cases). |
|
OK I had a quick look and what you did looks promising. But I think we want to explore having gvisor as the default configuration (document default docker compose etc) and also see how your solution would apply to code execution driver (for apps/code workflow node). I feel like what you did is very interesting but need to push it further for it to make it to the core. I don't have much time to dig into it this week myself (+I'm not too versed on the devops side, will need a second opinion from someone more qualified in our team) but we'll happily review. Thanks a lot for the contribution! |
|
Thanks @FelixMalfait — both points make sense. Why not LOCAL for self-hosters?
gVisor as the defaultAgree. The branch already exposes the toggle (
Happy to push this as a follow-up commit on this branch. Apps / Workflow code-execution driverI haven't dug into the Two reasonable shapes for that consolidation:
I lean toward (1) — keeps this PR's review surface tight, and the shared module is a natural follow-up that you can sequence after the gVisor-default work lands. Let me know which direction you'd prefer; I can adjust scope. No hurry on the DevOps review — happy to wait for the second opinion before the next push. |
|
When I said "then why not local for self-hosters" it was more a rethorical questions, since you said in a previous message docker without gvisor wasn't secured for a multi-tenant env. The real value-add for me here should be providing a good simple secure solutions for all self-hosters. For the question of mixing concerns, it's one of those cases where I'd rather have everything in the same PR so we can have the full picture and make the right architectural decision ; I don't mind the big diff as long as it isn't slop. Let's try to do something elegant and easy/preconfigured the right way by default for self-hosters For the docs we usually update the docs folder (twenty-docs) rather than created isolated READMEs. Thanks a lot! |
|
Ok so we discussed this internally and I wonder if the right direction (so that self-hosting is easy) isn't for us to implement something like this https://github.com/rivet-dev/secure-exec ; and create a VM-like environment rather than a real env. We need to look at this option |
|
For Apps/Workflow ( For the AI code interpreter, the picture's more nuanced. secure-exec maps onto agent code that runs complex queries against the API without moving data through the context window — JS-as-glue-language. That's a real and valuable use case. But the data-analysis ecosystem (pandas, numpy, matplotlib, openpyxl, pypdf, pdfplumber, python-docx, python-pptx, reportlab — the libraries this PR's twentycrm/sandbox image bundles) is harder to put on V8: Pyodide (smolagents uses it) covers the numerics core but is uneven on the long tail; native CPython doesn't have that gap. Context on this PR: I built it because I wanted to play with Twenty's AI features without paying E2B and without running Python on my host. It solves that problem today — the only other options are LOCAL (refused in production), E2B (paid), or DISABLED. Compared to those, a Docker driver (with gVisor as a hardening option) is a meaningful improvement for self-hosters who want the Python data-analysis path now. For small deployments, Docker may be good enough. In a single-tenant environment, using the Docker executor (even without gVisor) is less about security and more about containerizing the execution environment. For a multi-tenant hosted environment, E2B (or a similar solution, e.g., CubeSandbox if one wants to self-host a compatible one) is the way to go. |
|
Hi, DockerDriver enables a read-only root filesystem but does not provide a writable /tmp (tmpfs/bind) or set TMPDIR, so Python tempfile usage will fail and bundled sandbox-scripts/user code will crash. Severity: action required | Category: correctness How to fix: Mount writable /tmp tmpfs Agent prompt to fix - you can give this to your LLM of choice:
We noticed a couple of other issues in this PR as well - happy to share if helpful. Found by Qodo. Free code review for open-source maintainers. |
|
Hi, DOCKER_SANDBOX_NETWORK validation allows empty strings but the factory converts empty string to undefined, silently omitting NetworkMode and running sandboxes on the default Docker network (likely with egress). Severity: action required | Category: security How to fix: Reject empty network strings Agent prompt to fix - you can give this to your LLM of choice:
We noticed a couple of other issues in this PR as well - happy to share if helpful. Found by Qodo code review |
Summary
Adds a
DOCKERoption toCODE_INTERPRETER_TYPEso self-hosters can run Twenty'scode_interpreteragainst a local Docker sandbox instead of E2B (paid SaaS) orLOCAL(refused in production).The driver speaks dockerode against the host Docker socket. Each
execute()call:DOCKER_SANDBOX_WORK_DIR) — the same path bind-mounted into the server container so the host daemon resolves it.CapDrop: ALL,no-new-privileges, configurable memory + PIDs limits) on a configurable Docker network.docker execback through the existingStreamCallbacksinterface; harvests files written underoutput/from the host side of the bind mount.Five commits, each independently revertable:
chore(deps): add dockerode to twenty-serverfeat(code-interpreter): add DOCKER driver for self-hosted sandbox executionfeat(docker): add twentycrm/sandbox image for DOCKER code interpreterfeat(code-interpreter): add CODE_INTERPRETER_SERVER_URL override(sandbox-internal MCP callback URL when sandbox network can't resolve the publicSERVER_URL)feat(code-interpreter): allow selecting a non-default Docker runtime for sandboxes(lets operators route sandboxes through gVisor'srunscor Sysbox; no-op when unset)Notes for review
E2Bdriver in restricted-network deployments and could be split out.twentycrm/sandbox) ships pandas/numpy/matplotlib/pillow/python-docx/python-pptx/openpyxl/pypdf/pdfplumber/reportlab — same tier as E2B's data-analysis template. Built frompackages/twenty-docker/sandbox/Dockerfile.DOCKER_SANDBOX_RUNTIME=runsc, commit 5) or stay on a microVM driver. Documented in the sandbox image README.code_interpretertool (helper routing, UI rendering, prompt) that this stack surfaced. Not a hard dependency for compilation, but operationally pairs with this PR.Test plan
docker build -f packages/twenty-docker/sandbox/Dockerfile -t twentycrm/sandbox:dev ..CODE_INTERPRETER_TYPE=DOCKER,DOCKER_SANDBOX_WORK_DIR=<host-path>,DOCKER_SANDBOX_NETWORK=twenty_sandbox, mount/var/run/docker.sockinto the server container.code_interpreterreturnsstdout+exitCode === 0; matplotlib output renders inline.TwentyMCPhelper to fetch records and compute over them.DOCKER_SANDBOX_RUNTIME=runsc. Inspect a live sandbox:docker inspect --format '{{.HostConfig.Runtime}}' <id>reportsrunsc.CODE_INTERPRETER_SERVER_URL=http://server:3000: sandbox-internal MCP calls reach the server even whenSERVER_URLis a public URL the sandbox network can't resolve.docker ps -ashows no lingering exited sandbox containers after a run.