Streaming responses
render the assistant's reply token by token as it streams from the model, instead of waiting for the full response.
Goal: render the assistant's reply token by token as it streams from the model, instead of waiting for the full response.
Difficulty: hard. Time: 3–5 hours. Touches: internal/provider/, internal/agent/, internal/ui/.
What's already in place
The Provider interface in internal/provider/provider.go is three methods:
type Provider interface {
Send(ctx context.Context, messages []api.Message, tools []api.ToolDef) (api.Response, error)
Model() string
SetModel(name string)
}
Send is synchronous — it returns the full response only when the model finishes. The Bubble Tea UI in internal/ui/program.go reflects that: the assistant message appears in one shot, not character by character.
The Anthropic SDK supports streaming via client.Messages.NewStreaming. The OpenAI SDK does too. The repo just doesn't use either of those entry points yet.
This is the most invasive exercise in the set — it touches the provider interface, the agent loop, and the TUI. It also exposes a real design tension: tool calls can't be streamed (they have to be fully parsed before dispatch), so a streaming path needs to handle text deltas and tool blocks differently.
What to build
A streaming variant of Send and a UI that consumes it.
Suggested steps
-
Design the chunk type. Something like:
type Chunk struct { Kind ChunkKind // TextDelta, ToolStart, ToolDelta, ToolEnd, Stop, Usage Text string // for TextDelta Block api.Block // for ToolEnd Usage *api.Usage } -
Extend the Provider interface — carefully. Two options:
a. Add a method:
SendStream(ctx, msgs, tools) (<-chan Chunk, error). Pros: clean separation; oldSendstill works. Cons: every implementation now has two methods doing nearly the same thing.b. Make streaming opt-in via a sibling interface:
type Streamer interface { SendStream(ctx, msgs, tools) (<-chan Chunk, error) }Type-assert in the agent loop. Pros: providers that can't stream don't have to. Cons: two code paths.
Pick (b). It's the same pattern used by
TotalUsage()andEstimatedCostUSD()on Provider (see how/tokensdoes an assertion incommands.go). -
Implement for Anthropic. Use
client.Messages.NewStreaming(ctx, params)ininternal/provider/anthropic.go. Translate SDK events into yourChunktype. Buffercontent_block_deltaevents for tool_use blocks until the correspondingcontent_block_stoparrives, then emit a singleToolEndchunk with the assembledapi.Block. -
Wire it into the agent loop. In
internal/agent/agent.go, when the provider implementsStreamer, drain the chunk channel and:- For
TextDelta, forward the text to a UI callback (introduceAgent.OnTextDelta func(string)). - For
ToolEnd, accumulate blocks inassistantBlocks. - For
Stop, finalise the message and dispatch tools as today.
- For
-
Wire it into the UI. The Bubble Tea program currently appends the full assistant message in one
tea.Msg. Add a new message typestreamDeltaMsg{text string}that gets dispatched as deltas arrive, and append it to the visible buffer. -
Handle cancellation. Ctrl-C during a stream must close the context, drain the channel, and not deadlock. Test this explicitly.
Acceptance
- Text appears character by character (or chunk by chunk) in the TUI.
- Tool calls still dispatch correctly after the stream ends.
- Ctrl-C cleanly aborts an in-flight stream and returns control to the REPL.
/tokensstill reports correct counts after a streamed message.- The OpenAI provider (which you haven't extended) still works through the non-streaming path.
Stretch
- Implement
Streamerfor OpenAI too (client.Chat.Completions.NewStreaming). - Add a
--no-streamflag that forces the non-streaming path even when available, for debugging. - Track time-to-first-token in the
/debugpanel. - Stream subagent output too — they currently run silently with
Quiet: true.