Why T3Code might want two runtimes

OTP supervision for multi-provider AI agents — an additive architectural exploration

Bastian Venegas Arevalo · Ranvier Technologies · Relates to PR #581

T3Code's PR #581 centralizes harness adapters behind a single HarnessService WebSocket server with event projection via projector.ts. This is the right direction — a unified contract across providers is exactly what multi-agent orchestration needs.

This page explores what happens when you take that same contract and add an OTP supervision layer underneath it. Not replacing the Node server — extending it with structural guarantees for the part that's hardest to get right: managing concurrent, crash-prone agent sessions.

This is an independent experiment. Not affiliated with or endorsed by the T3Code team. The T3Code team has their own roadmap — this is one developer's architectural exploration.

A note on scope. The HarnessService abstraction in PR #581 is itself still under discussion — Theo's own review comment asks what differentiates it from the existing ProviderService / ProviderServiceLive layer. This exploration treats the harness boundary as a useful architectural surface regardless of where it ultimately lives in T3Code's stack. If the team consolidates HarnessService back into ProviderService, the OTP mapping still applies — it's the supervision boundary that matters, not the class name.

The constraint that shapes everything

Anthropic only lets you access Claude Code through a Claude Max subscription (instead of per-usage API billing) if you use their Agent SDK. The Agent SDK is implemented in Node. There is no Elixir version. So Claude and Codex must stay Node — that's non-negotiable.

This means any OTP layer has to be additive. It sits underneath the existing Node server, connected by a single WebSocket, handling supervision and lifecycle while Node retains SDK access, persistence, and TypeScript contracts.

The convergence everyone's noticing

This isn't a novel observation. George Guimarães documented the same convergence across LangGraph, AutoGen, CrewAI, and Langroid — every major agent framework is independently reinventing the actor model. Isolated state, message passing, supervision hierarchies, fault recovery: all solved problems on the BEAM since 1986.

T3Code's HarnessService centralization follows the same pattern. The question isn't whether you need these primitives — you clearly do, that's what PR #581 is building. The question is whether you build them in application code or get them from the runtime.

Convergence · what frameworks are reinventing
What you need
Who's reinventing it
OTP primitive
Isolated agent state
LangGraph AutoGen 0.4 CrewAI HarnessService
GenServer
Message passing
Langroid AutoGen 0.4 harnessWs protocol
send / receive
Crash recovery
Checkpoints try/except missing
Supervisor
Event projection
State reducers projector.ts
ETS + GenServer
Memory isolation
asyncio (shared) V8 shared heap
Process heaps
Lifecycle mgmt
Runtime registry HarnessService
DynamicSupervisor

Green chips = T3Code PR #581 already solving this. Dashed = structural gap that OTP fills. Framework analysis via Guimarães (2026).

Guimarães argues that you can get about 70% of the way there with enough engineering — the remaining 30%, which includes preemptive scheduling, per-process garbage collection, and true fault isolation, requires runtime-level support.

Guimarães later qualified this claim, noting that the BEAM's advantages vary by workload and honestly naming the ecosystem gaps (immature LLM tooling, smaller talent pool, Python/Node-first SDKs). For T3Code specifically — a desktop tool, not a server — two of the four properties he cites (preemptive scheduling, hot code swapping) don't meaningfully apply. The two that do — per-process garbage collection and true fault isolation — are the ones this exploration focuses on.

The HarnessService pattern is that 70%. It gives you centralized lifecycle, event projection, and a unified contract. What it can't give you — because Node's V8 doesn't support it — is per-session crash isolation and independent memory containment. That's the 30% that an OTP layer fills.

The proposed boundary

Architecture · what stays, what moves
Browser / Electron Node server HarnessService + projector.ts (PR #581) Agent SDK Claude + Codex projector.ts Event projection SQLite + TS contracts Persistence + types Agent SDK stays Node — Anthropic subscription constraint Phoenix WS OTP engine — additive layer Supervision + lifecycle + crash isolation DynamicSupervisor Claude GenServer Codex GenServer Cursor GenServer OpenCode GenServer
Concern Owner
Agent SDK access (Claude, Codex) Node stays
Canonical event types + contracts Node (TypeScript) stays
SQLite persistence Node stays
Browser / Electron WebSocket Node stays
Provider process supervision Elixir (OTP) new
Crash isolation Elixir (BEAM heaps) new
Per-session memory containment Elixir (process heaps) new
Event normalization Elixir new
Event projection + snapshot Both shared

The isolation argument

When you run 4+ concurrent agent sessions — especially with subagents — the V8 shared heap becomes a liability. One leaky or crashing session affects every other session sharing that process. OTP's per-process heaps make this structurally impossible. Try it:

Interactive · crash a session, see what happens
Shared V8 heap — 48 MB

Stress test results

8 test types, single-run results framed as architectural demonstrations — not statistical claims. The structural properties (per-process heaps, supervision cleanup) are BEAM guarantees, not observations.

~229%
V8 heap growth with 1 leaky session (48→158 MB, +110 MB)
~2%
BEAM total memory growth with same leak (bounded per-process)
169ms
Node p99 event loop lag during leak
0
Sibling sessions affected in OTP

The crossover point

We scaled concurrent sessions from 5 to 200, all streaming simultaneously. From 5 to 100, both runtimes are functionally identical — you would not be able to tell them apart. At 150, Node's event loop lag creeps to 199ms. At 200, Node degrades sharply: latency jumps to 3.3 seconds, throughput drops (the event loop is so saturated it processes fewer events, not more), and event loop lag hits 1.2 seconds. Elixir at 200 sessions: 607ms latency, 18,000 events/sec, near-zero scheduler utilization, constant ~268KB per session.

This is the number that matters: if T3Code stays at 1–10 sessions, PR #581's Node approach is simpler and sufficient. If it grows into a control plane with concurrent subagent trees, the structural advantage materializes around 150 sessions — and it's not gradual. It's a cliff.

Real workload: 10 subagents across 2 sessions

Codex in plan mode spawned 10 real subagents. Node oscillated 1.8–54 MB on the shared heap. Elixir held 54–63 MB flat while processing 14,918 events and 824 tool calls. Both completed all work — the difference is in predictability under load.

Honest scorecard

Elixir wins on isolation, observability, and per-session memory attribution. Node wins on raw throughput and SDK ecosystem access. Hence: two runtimes, each handling what it does best.

What adding a new provider costs

Adding a new provider to T3Code today means writing a TypeScript adapter, extending the contracts, adding tests, and wiring it through the orchestration layer. In the OTP architecture, providers that speak a CLI or HTTP protocol (and don't require a Node-specific SDK) can be added as a single GenServer module + a clause in SessionManager.provider_module/1, with no changes to the Node side. For providers that do require Node SDKs — Claude (Agent SDK) and Codex (app server) today — the Node adapter is still necessary. The OTP layer manages supervision and lifecycle, but the SDK call itself stays in Node.

The savings are real but scoped: the more providers that communicate via standard protocols (CLI, HTTP, WebSocket), the more the OTP architecture reduces per-provider integration cost. For SDK-gated providers, both sides need code.

Why subagents change the calculus

If T3Code stayed a one-agent-at-a-time tool, I would not push hard for Elixir. Subagents change the math. Imagine a future session:

Parent task
  → Codex subagent for implementation
  → OpenCode subagent for search
  → Claude subagent for synthesis

Now ask the annoying questions:

What happens if one subagent crashes mid-turn?
What happens if one subagent is blocked on approval while another keeps streaming?
What happens if the parent cancels and all children must stop cleanly?
What happens if one child leaks memory or gets wedged on reconnect?

A Node-only architecture can answer all of them — but it answers them behaviorally: careful process accounting, careful cleanup chains, careful shared-state discipline. OTP answers them structurally: each subagent is a GenServer under a DynamicSupervisor, parent crash cascades to children via process links, one child's memory leak is contained to its own heap, and restart policies are declarative, not imperative.

The more autonomous and unpredictable agents become, the stronger this argument gets. "Start child, monitor child, restart child, stop subtree" stop being architecture words. They become the product.

Honest tradeoffs

BEAM runtime adds ~60–80 MB to the Electron bundle. That's real. For a desktop app, it matters. The counter-argument is that with 4+ concurrent sessions running subagents, you're already in "supervise a tree of unstable runtimes" territory — and that's literally what OTP was designed for. But the bundle size cost is worth acknowledging upfront.

It's a second runtime to maintain. The Elixir ecosystem is smaller than Node's. Hiring is harder. The mitigation is that the OTP layer is intentionally thin (total ~3,900 LOC across all modules) — it does one thing well and delegates everything else to Node.

The bridge WebSocket is an additional failure surface. In the Node-only approach, events flow directly — no bridge, no extra hop. With the harness, there's a Phoenix Channel between Elixir and Node. That adds ~1ms of latency (invisible for most operations) but also a real failure mode: if the bridge drops during a streaming turn, events are lost until reconnection. The WAL ring buffer mitigates this with replay, but the failure mode does not exist in a single-runtime design.

Cross-runtime debugging is objectively harder. When something fails in Node, you get a continuous stack trace. With the harness, an error can start in Elixir (GenServer crash), manifest in Node (bridge disconnect), and surface in the browser (events stop). Two log systems, two debuggers. This is a cost that compounds over time if the boundary isn't sharp.

The Agent SDK constraint is real and permanent. As long as Anthropic gates subscription-based Claude Code access through their Node SDK, Claude and Codex must route through Node. The OTP layer manages the provider process lifecycle, but the SDK call itself stays in the Node adapter. This is a feature, not a bug — it respects the ecosystem as it exists.


This fork is offered as an architectural exploration, not a merge request. The goal is to contribute to the discussion — and honestly, to learn from it. Theo has more experience shipping Node + Electron at scale than most people writing about this topic. If this whole OTP angle is an over-engineered solution to a problem that doesn't materialize in practice, that's a genuinely useful answer. The worst outcome isn't "no" — it's building something nobody pressure-tested with someone who's shipped this at scale.
View the fork on GitHub

Addendum: failure scenario scorecard

We compared both approaches across ten concrete failure scenarios. Not benchmarks — failure scenarios. Because the OTP argument is not about throughput. It's about what happens when things go wrong.

# Scenario Node Elixir
1 Provider CLI hangs Session stale, others unaffected GenServer stale, others unaffected tie
2 Provider CLI crashes child.on("exit") cleans up Port exit handled, supervisor cleanup tie
3 Memory leak in one session Shared heap +110 MB, lag +169ms p99 Leak bounded per-process (94–352KB) elixir
4 Unhandled exception 5/5 survivors continue 5/5 survivors continue tie
5 50+ concurrent sessions Degrades sharply at ~150 (3.3s p99 at 200) 607ms at 200, near-zero scheduler util elixir
6 Process spawn/teardown churn OS handles it Port + process links, similar tie
7 Subagent trees Manual parent-child tracking DynamicSupervisor, process links elixir
8 Bridge WebSocket disconnect Does not exist — events flow directly Events lost until reconnection node
9 Development complexity One language, one debugger Two languages, cross-runtime debugging node
10 Desktop packaging Electron bundles Node, zero extra cost +60–80 MB BEAM runtime node

Score: Elixir 3, Node 3, Tie 4. But the scores are not equal in weight. Elixir's wins (3, 5, 7) are the ones that scale worst if left unaddressed — memory coupling compounds over hours of use, event loop saturation hits a cliff at ~150 sessions, and subagent cleanup chains grow with tree depth. Node's wins (8, 9, 10) are present-tense shipping costs. Both are real. The question is which set of problems you'd rather have.

A note on the "careful application code" failure class. Macroscope's review of PR #581 found concrete instances worth examining. One is a logic bug independent of runtime choice: adapter-emitted events pass through #publish without re-sequencing, which can regress the global snapshot sequence — this would need fixing in Elixir too. But the other two are lifecycle-management concerns that OTP handles structurally: shutdownSession aborts the event stream before the adapter completes, potentially losing final session.exited events; and attachSession never calls #startStreaming, leaving attached sessions deaf to events. These are exactly the class of lifecycle gaps where a supervisor's declarative restart and shutdown policies eliminate entire categories of ordering bugs — not because the Node code is bad (it's early WIP and well-structured), but because managing concurrent session lifecycles through application code requires getting every edge case right manually. A supervisor doesn't have edge cases. It has a restart policy.

For the full analysis with crossover benchmarks, methodology notes, and adversarial review: complete writeup.