Advanced
Message Compaction
4 strategies to keep conversations within context window limits
~6 min read
0
Compaction strategies
0
Compaction files
0+
Implementation LOC
AI models have a 'memory limit' — they can only process a certain amount of text at once (the context window). As conversations grow with tool calls, file contents, and responses, they can exceed this limit.
Message compaction is like a smart note-taker sitting in on your conversation. When things get too long, it summarizes older messages while preserving the important context. You never notice it happening, but it's why Claude Code can handle hour-long coding sessions without crashing.
4 Compaction Strategies #
Different approaches for different situations
Auto-Compact
Triggered when conversation exceeds token budget. Summarizes oldest messages first, preserving recent context. The most common strategy — handles ~90% of cases.
Reactive-Compact
Monitors token usage during streaming response generation. If the response itself is growing too large, triggers mid-stream compaction to prevent context overflow.
Snip-Compact
Uses boundary markers in conversation history to identify safe snip points. Removes entire sections (like large file contents) while keeping a summary marker. More surgical than auto-compact.
Micro-Compact
Summarizes individual tool results inline rather than compacting entire message ranges. Useful when a single tool output (like a large file read) dominates the context.
Context Window Management #
How the token budget keeps conversations healthy
Before compaction
After compaction
Related Pages
← → arrow keys to navigate