Compaction strategies
Chapter 06 made the API's statelessness explicit: every call ships the entire conversation. That's our problem now.
Conversations grow. Each turn re-sends the whole history. By turn 30 you're paying real money to re-tokenize hours of past chat. By turn 100 you start running into the context window.
This chapter is the first time we have to throw information away — every prior chapter only added. Like providers (chapter 03), compaction is something you'd want to swap and experiment with, so it gets the same treatment: an interface, multiple implementations, one line in main.go to switch.
The interface
type CompactionStrategy interface {
Compact(ctx context.Context, messages []api.Message) ([]api.Message, error)
}
Strategies are called at the top of every agent loop turn. Most of the time they return the input unchanged (the threshold isn't reached). When they do act, they return a shortened slice.
Three strategies
| Strategy | What it does | When to use it |
|---|---|---|
NoCompaction |
Returns input unchanged | Default; you control the budget elsewhere |
SlidingWindow{KeepLast: N} |
Drops everything but the last N messages | Cheap, no API call, loses old context |
Summarize{Threshold, KeepRecent, …} |
Asks the model to summarize old turns, replaces them with the summary | Preserves earlier context as a synthetic message; costs one extra API call when it fires |
Plus a decorator:
| Wrapper | What it does |
|---|---|
WithLogging(inner, path) |
Wraps any strategy; logs before/after diffs to a file. Useful for comparing strategies. |
// In main.go — swap the line to swap behavior
compactor = compact.NoCompaction{}
compactor = &compact.SlidingWindow{KeepLast: 10}
compactor = &compact.Summarize{Provider: llm, Threshold: 20, KeepRecent: 6}
compactor = compact.WithLogging(&compact.SlidingWindow{KeepLast: 10}, "compactions.log")
The "safe split" trick
This is the one piece of compaction that's genuinely subtle.
A naive truncation can leave a tool_use block in the dropped portion and its matching tool_result in the kept portion. The next API call sees a tool_result with no preceding tool_use — and returns 400. The conversation is unrecoverable without manual surgery.
The fix: walk backwards from your desired split point until you find a "clean" boundary — a user message that contains text, not a tool_result reply. That marks the start of a fresh turn, which is always safe to split on.
// SafeSplitPoint walks backward from `desired` to find an index where the
// conversation is in a "clean" state — no tool_use without its tool_result
// on either side of the split.
func SafeSplitPoint(messages []api.Message, desired int) int {
if desired <= 0 { return 0 }
if desired >= len(messages) { return len(messages) }
for i := desired; i > 0; i-- {
if messages[i].Role == api.RoleUser && !messages[i].HasToolResult() {
return i
}
}
return 0
}
Every strategy routes through this. If we can't find a safe boundary near where we wanted, we do nothing — better than a broken conversation.
Summarization
The interesting strategy. When the conversation hits Threshold, we:
- Find a safe split point near
len(messages) - KeepRecent. - Take the old half (everything before the split).
- Ask the model to summarize it. Use the same provider — recursive but bounded; the summarization call is one-shot, no tools, returns a single text response.
- Replace the old half with a synthetic user message:
"[earlier conversation summary] ...". - Keep the recent half as-is.
func (s *Summarize) Compact(ctx context.Context, messages []api.Message) ([]api.Message, error) {
if len(messages) < s.Threshold { return messages, nil }
split := SafeSplitPoint(messages, len(messages)-s.KeepRecent)
if split == 0 { return messages, nil }
old, recent := messages[:split], messages[split:]
resp, err := s.Provider.Send(ctx, []api.Message{{
Role: api.RoleUser,
Content: []api.Block{{Type: api.BlockText, Text: instructions + "\n\n" + api.RenderTranscript(old)}},
}}, nil)
if err != nil { return messages, fmt.Errorf("summarize: %w", err) }
// … extract summary text, prepend as a synthetic message …
return append([]api.Message{{Role: api.RoleUser, ...}}, recent...), nil
}
The model's summarization is bias-shaped by its system prompt — when summarizing in the context of "you are a coding assistant," it tends to preserve file paths, function names, and decisions, and drop chit-chat. That's a happy accident. A more rigorous design would let you override the summarization prompt explicitly.
The decorator pattern
The cleanest piece of this design is LoggingStrategy. It implements CompactionStrategy and it wraps one:
type LoggingStrategy struct {
Inner CompactionStrategy
FilePath string
}
func (l *LoggingStrategy) Compact(ctx context.Context, messages []api.Message) ([]api.Message, error) {
before := messages
after, err := l.Inner.Compact(ctx, messages)
if err != nil { return after, err }
if len(after) != len(before) {
l.writeEvent(before, after)
}
return after, nil
}
This is the decorator pattern in one struct. Logging is orthogonal to the strategy choice; it shouldn't require a LogTo string field on every strategy. Instead, anything that implements the interface can be wrapped. Future strategies need no logging-specific code; they get logging for free.
The same pattern shows up everywhere in well-designed systems: HTTP middleware, observability wrappers, retry decorators. Worth recognizing once so you can use it always.
The slash command for testing
/compact [strategy] and /verbose exist mostly so you can experiment without restarting the harness:
| Command | Effect |
|---|---|
/compact |
Run the configured strategy now |
/compact sliding |
Run an ad-hoc SlidingWindow{KeepLast: 6} |
/compact summarize |
Run an ad-hoc Summarize{Threshold: 0, …} (Threshold: 0 forces it to fire) |
/compact none |
Run NoCompaction (no-op; useful for "what's my baseline?") |
/verbose [on|off] |
Toggle live before/after printing on compaction |
This is BYO-shaped: you can interactively probe how each strategy affects the same conversation. The diffs are illuminating. Sliding window drops decisions; summarization mangles syntax; neither is perfect.
Pitfalls
Recursion in summarization. The Summarize strategy calls Provider.Send. If your provider somehow ran Compactor.Compact inside Send, you'd recurse forever. Don't do that — compaction sits in agent.loop, around Provider.Send, never inside it.
The system-prompt bias. Because summarization shares the provider, it inherits the system prompt. If your system prompt is very specific ("you are a code reviewer"), the summary may be filtered through that lens. Accept it or override Instructions on Summarize.
Compaction breaking the prompt cache. If you turn on prompt caching (we don't, but you might), every compaction event invalidates the cache for that prefix — the prefix bytes just changed. Sliding window and summarization both rewrite the prefix. Cache lifetime gets capped at "time between compactions."
In the current repo. Everything compaction-related lives in
internal/compact/:
strategy.go— theCompactionStrategyinterface +SafeSplitPointnocompaction.go— the no-op defaultslidingwindow.go— drops oldestsummarize.go— model-driven summarizationlogging.go— theWithLogging(inner, path)decoratorOne file per strategy, one interface they all implement. Adding a new one (e.g. the
TokenBudgetexercise below) is one new file + one line inmain.goto opt in.
Now try
- Run the harness, have a 15+ turn conversation, then
/compact slidingand/compact summarizeback-to-back. Compare the resultingmessages(use/verbose onfirst). - Wrap a strategy with
WithLoggingand compare two strategies on the same conversation by running each with logging to a different file. Diff the files. - Write a
TokenBudgetstrategy that drops oldest messages until estimated token count is under a configurable threshold. Start with byte-count as a token proxy; later swap in a realcount_tokenscall.
Next: 08 · Better input.