Context Laziness: How Free Credits Created an Expensive Hangover

I had 1,800 kr. in free credits and a dream of building the perfect meal planner. But when the credits expired, I realized my 'vibe-coded' agent was burning through context like a wildfire. Here is how I fixed my expensive architecture.

Photo by Jorge Campos / Unsplash

In my previous posts, I shared how I was engineering my AI environment and giving Gemini CLI a memory. It felt like magic. I was moving fast, "vibecoding" my way through complex flows, and building out my meal-planner project with an autonomous squad of agents.

I had a safety net: 1,872.24 kr. in free Google Cloud credits. My goal was actually to spend as much of it as possible before they expired. I told myself I was "stress-testing" the architecture. In reality, I was accepting a lie, ignoring the structural cracks in the CLI setup, while focusing only on the "Meal-Planner" features.

Then, the credits expired.

I didn't even manage to use them all, leaving 1,033.40 kr. on the table. But the moment real billing started, the "magic" turned into a very real invoice.

The Wake-Up Call

I set a conservative budget of 150 kr. per month. I wanted to be "aware" sooner rather than later if I was overspending. I didn't have to wait long. In a single week, I hit 70 kr..

The breakdown was terrifying:

A single session: 44 kr.
Another heavy session: 19 kr.

I was no longer "vibecoding" in a sandbox; I was burning cash. I had to stop building features and perform a technical post-mortem on my own agent.

The Anatomy of Context Laziness

When I looked at the logs, the issue was clear. Because the tokens had been "free," I had relied too much on "vibes" to establish my setup. I wasn't optimizing for the most expensive resource in modern AI: the context window.

I found two critical architectural leaks:

1. The Tool-Hook Explosion

In my ~/.gemini/settings.json, I had configured my memory plugin (claude-mem) to run as a BeforeTool and AfterTool hook.

// The "Vibe-Coded" Mistake
"hooks": {
  "BeforeTool": [{ "name": "claude-mem", "command": "... hook session-init" }],
  "AfterTool": [{ "name": "claude-mem", "command": "... hook observation" }]
}

This seemed logical at first, capture everything the tools do, but in an autonomous squad, a single "turn" might involve 10 tool calls. This meant the memory hook was firing 20 times per turn, adding 13k tokens of overhead every single time. I was paying for a massive context reload just to read a tiny file or list a directory.

2. The "Infinite History" Insight Bug

I had a script, extract-insights.sh, designed to find "aha moments" in my logs. It worked by fetching all new observations since the last processed ID.

The bug was in the SQL:

-- Before: Fetched everything new, regardless of scale
SELECT id, text FROM observations WHERE id > $LAST_ID;

If the state file was missing or I had been working for a long stretch, the agent would try to send the entire project history to Gemini for analysis. In one session, this bug alone generated a 27-million-token payload.

Paying Down the Debt

To fix this, I had to move from "Vibecoding" back to "Engineering."

First, I consolidated the hooks. I removed claude-mem from the tool level and moved it to the agent level (BeforeAgent and AfterAgent). This was a good first step, cutting my per-turn overhead by 90%, but I soon realized it still left me with a "Latency Tax."

Second, I implemented Time-Based Filtering. I updated my SQL queries to include a strict 24-hour window:

-- After: Architectural constraint
SELECT id, text FROM observations 
WHERE id > $LAST_ID 
AND created_at >= datetime('now', '-24 hours');

Third, I transformed how I handle technical reflection. I moved from an automated background hook that "scanned" for insights after every turn to a Manual Slash Command (/insights). This preserves the ability to distill complex sessions into technical takeaways but gives me control. I now trigger this distillation only when a real breakthrough happens, ensuring my project memory is built from intentional reflection rather than automated noise.

The "Turn Tax": Optimizing the Memory Loop

If you're using claude-mem or similar plugins, you've likely felt the "Turn Tax", that 2-5 second delay on every prompt, while the agent indexes your latest work into the database.

In a high-flow coding session, this adds up to minutes of just staring at a loading spinner.

The Optimization: From Turn-by-Turn to Lifecycle

Instead of forcing the agent to remember every word as it happens (via BeforeAgent and AfterAgent), I've shifted to a Lifecycle Hook strategy.

Here’s the change I applied to ~/.gemini/settings.json:

Drop BeforeAgent & AfterAgent: No more background calls on every turn.
Keep SessionStart: We still pull in previous context when opening the workspace.
Add SessionEnd (with summarize): We batch-persist everything we did at the very end.

The result? The agent feels instant again. Because the entire history is sent during a session anyway, I haven't lost any immediate memory. I've just deferred the "storage" cost to the exit button.

Pro-tip: If your session gets massive, PreCompress still fires to save your memory before the context window truncates, so you're protected against crashes.

The Lesson

Vibecoding is a superpower for prototyping, but it creates a specific kind of "Context Laziness." When the tokens are subsidized, you don't notice when your architecture starts to rot.

The hangover was necessary. It forced me to realize that an autonomous squad is only "autonomous" if it can operate within a sustainable budget. If you are building your own AI environment, don't wait for the credits to expire to check your "cost-per-insight."

References & Resources