Advanced

Message Compaction

4 strategies to keep conversations within context window limits

~6 min read

Compaction strategies

Compaction files

Implementation LOC

What Is This? #

AI models have a 'memory limit' — they can only process a certain amount of text at once (the context window). As conversations grow with tool calls, file contents, and responses, they can exceed this limit.

Message compaction is like a smart note-taker sitting in on your conversation. When things get too long, it summarizes older messages while preserving the important context. You never notice it happening, but it's why Claude Code can handle hour-long coding sessions without crashing.

4 Compaction Strategies #

Different approaches for different situations

Auto-Compact

Triggered when conversation exceeds token budget. Summarizes oldest messages first, preserving recent context. The most common strategy — handles ~90% of cases.

Reactive-Compact

Monitors token usage during streaming response generation. If the response itself is growing too large, triggers mid-stream compaction to prevent context overflow.

Snip-Compact

Uses boundary markers in conversation history to identify safe snip points. Removes entire sections (like large file contents) while keeping a summary marker. More surgical than auto-compact.

Micro-Compact

Summarizes individual tool results inline rather than compacting entire message ranges. Useful when a single tool output (like a large file read) dominates the context.

Context Window Management #

How the token budget keeps conversations healthy

Before compaction

90%

Threshold

→ ↓

After compaction

40%

Threshold

Query Pipeline

The complete journey from pressing Enter to Claude's response

Overview

1,902 files, 512K lines of TypeScript — a complete anatomy of an AI coding assistant

Plugins

Agents

← → arrow keys to navigate