-
Notifications
You must be signed in to change notification settings - Fork 5
Closed
Description
Overview
Replace the current stdin/stdout-based agent process management in the Town Container with Kilo's built-in HTTP server (kilo serve). The container currently spawns kilo code --non-interactive as fire-and-forget child processes and communicates via raw stdin pipes. This is fragile and provides no structured observability.
Decision: We are going forward with the kilo serve route. See analysis: docs/gt/opencode-server-analysis.md
Context
kilo serve starts a headless HTTP server (OpenAPI 3.1) with session management, structured message sending, SSE event streaming, abort/fork/revert, diff inspection, and more. The SDK (@kilocode/sdk/v2/server) provides createOpencodeServer() to manage server lifecycle.
Current flow:
Container Control Server (port 8080)
└── Bun.spawn('kilo code --non-interactive') × N agents
└── stdin/stdout pipes (fragile, unstructured)
Target flow:
Container Control Server (port 8080)
└── kilo serve (port 4096+N) × M server instances (one per worktree)
└── HTTP API: POST /session/:id/message, GET /event (SSE), etc.
Scope
1. Replace process-manager.ts internals
- Instead of
Bun.spawn(['kilo', 'code', '--non-interactive', ...]), usecreateOpencodeServer()from@kilocode/sdk/v2/server(or equivalent) to startkilo serveinstances - One
kilo serveinstance per worktree/project directory (since a server is scoped to one project) - Manage port allocation for multiple server instances within the container
- Track server instances and their sessions instead of raw child processes
2. Replace stdin-based messaging with HTTP API
sendMessage(agentId, prompt)→POST /session/:id/messageorPOST /session/:id/prompt_asyncgetProcessStatus(agentId)→GET /session/status(structured session-level status)- Agent abort →
POST /session/:id/abort(clean abort instead of SIGTERM)
3. Replace agent-runner.ts startup flow
- After git clone/worktree setup, start a
kilo serveinstance for the worktree (if not already running) - Create a new session on the server:
POST /session - Send the initial prompt via
POST /session/:id/messagewith model/agent/system-prompt configuration - Return session ID as the agent's handle (instead of process PID)
4. Wire up SSE event streaming
- Subscribe to
GET /eventon eachkilo serveinstance - Forward relevant events (tool calls, completions, errors) to the heartbeat reporter
- This replaces the raw stdout pipe reading with typed, structured events
- Enables the future WebSocket streaming endpoint (
/agents/:agentId/stream) referenced in the TODO
5. Update control server endpoints
| Endpoint | Current | After |
|---|---|---|
POST /agents/start |
Spawns kilo process | Creates session on kilo server |
POST /agents/:id/message |
Writes to stdin pipe | POST /session/:id/message |
GET /agents/:id/status |
Process lifecycle (pid, exit code) | Session status (active tools, message count, etc.) |
POST /agents/:id/stop |
SIGTERM/SIGKILL on process | POST /session/:id/abort + optionally stop server if no more sessions |
GET /health |
Process count | Server instance count + session count |
6. Update heartbeat reporter
- Report session-level status instead of process-level status
- Include active tool calls and last message info from SSE events
What stays the same
- Git clone/worktree management (
git-manager.ts) — unchanged - Container control server (port 8080) — same interface for TownContainer DO
- Agent environment variable setup — still needed for gastown plugin config
- Dockerfile — still needs
kiloinstalled globally
Acceptance Criteria
- Container starts
kilo serveinstances instead ofkilo code --non-interactiveprocesses - Agents are managed as sessions within kilo server instances
- Follow-up messages use HTTP API instead of stdin pipes
- Agent status reflects session-level detail (not just process alive/dead)
- SSE event subscription is wired up for observability
- Clean abort via server API works
- Existing control server endpoints maintain the same external contract (no breaking changes for TownContainer DO)
- All existing container tests pass (or are updated to reflect new internals)
Risks & Notes
- Port management: Each
kilo serveneeds its own port. Need port allocation strategy (e.g., 4096 + incrementing counter) - One server per worktree: A kilo server is scoped to one project dir. Multiple agents sharing a worktree can share a server with separate sessions; agents in different worktrees need separate servers
- Resource overhead: Marginal —
kilo serveis a single Bun process either way, just with HTTP server overhead instead of raw stdin/stdout - Migration path: Can be done incrementally — start with HTTP messaging, then add SSE, then refine status reporting
Parent issue: #204
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels