Defending the Context Window: Fail-Open History Compaction in Go

Token bloat was killing my AI meal planner via API rate limits. Here is how I protected the context window using payload slimming and decoupled, fail-open history compaction.

In a previous post about Taming the Pull, I discussed the dangers of infinite agent loops and context bloat. Recently, that bloat hit a breaking point.

During multi-turn plan revisions in my meal-planner project, the conversation history grew uncontrollably. The LLM was passing massive, nested JSON tool responses back and forth. This token bloat caused frequent Groq API rate limit errors, bringing the whole planner to a halt.

To fix it, I had to stop treating the LLM context window as an infinite bucket and start treating it as a strictly managed resource. Let's be transparent: I could simply have a paid subscription with better limits, and the problem would be solved (or at least delayed). But since I stick to a "poor man's strategy" of squeezing the most out of free tiers, I had to find a technical solution.

Here is how I solved the token bloat problem using a two-pronged strategy: Payload Slimming and Decoupled, Fail-Open Compaction.

Phase 1: Payload Slimming

Before we even talk about history, we have to look at what we are sending the model right now. When the agent searched for recipes, I was passing the entire recipe object back into the context. This included large arrays of tags, internal IDs, and verbose metadata.

The first step was to strip non-essential fields before the recipe ever entered the LLM context. I created a simplifyForTool projection.

// Slimming the payload to protect the context window
func simplifyForTool(recipes []value.Recipe) []map[string]interface{} {
    simplified := make([]map[string]interface{}, len(recipes))
    for i, r := range recipes {
        simplified[i] = map[string]interface{}{
            "id":          r.ID,
            "title":       r.Title,
            "ingredients": r.Ingredients,
            "prep_time":   r.PrepTime,
            // Notice we intentionally drop Tags and internal metadata here
        }
    }
    return simplified
}

This immediately cut the size of the initial tool response in half. But the real problem was the history.

Phase 2: Decoupled History Compaction

As the agent loop iterated, those (even slimmed down) JSON responses piled up. I needed a way to compact previous turns, turning a raw JSON string of a recipe into a dense, token-efficient string like Title A (30m).

The dangerous way to do this is to parse the JSON directly inside the generic ExecuteAgentLoop (which I introduced in my post on Refactoring for Autonomy). But that creates tight coupling: if a tool's output format changed, the parsing would fail, and the entire loop would crash.

Instead, I used the Dependency Inversion Principle. I injected a Compactor interface into the llm.Conversation. This delegates the domain-specific compaction logic back to the application layer.

// The decoupled interface inside the generic LLM package
type Compactor interface {
    CompactToolResponse(toolName string, rawJSON string) (string, error)
}

The Fail-Open Strategy

This is the most critical part of the architecture. What happens if the Compactor encounters a JSON structure it doesn't understand (e.g., from a newly added tool)?

It fails open.

// Inside the history management logic
compacted, err := compactor.CompactToolResponse(msg.ToolName, msg.Content)
if err != nil {
    // FAIL-OPEN: If we can't compact it, log the error but keep the raw string.
    // We degrade gracefully rather than crashing the loop.
    log.Printf("compaction failed for %s: %v, using raw content", msg.ToolName, err)
    finalHistory = append(finalHistory, msg.Content)
} else {
    finalHistory = append(finalHistory, compacted)
}

Prioritizing Resilience Over Optimization

By implementing this architecture, I protected the most scarce resource in an LLM application: the context window.

The payload slimming prevents the immediate bloat, while the decoupled compactor ensures the history remains dense and focused over long interactions. More importantly, the fail-open design prioritizes system resilience. A secondary optimization like history compaction should never take down core functionality. It should degrade gracefully.

This approach keeps the generic agent engine clean, pushes domain logic where it belongs, and ensures I don't hit those Groq API rate limits anymore.

Phase 1: Payload Slimming

Phase 2: Decoupled History Compaction

The Fail-Open Strategy

Prioritizing Resilience Over Optimization

References & Resources