Giving Gemini CLI a Memory: Replacing Custom Hooks with claude-mem

How I optimized my limited hacking hours by replacing real-time LLM hooks with a background memory worker to capture architectural insights automatically.

Photo by Kelly Sikkema / Unsplash

If you’ve noticed me blogging more frequently lately, it’s not because I suddenly found hours of free time. Between a full-time job, a family, and a three-year-old boy, my "hacking hours" are a scarce resource. I often trade sleep for study, which is how projects like the AI Meal Planner came to be.

When time is that compressed, you either optimize your workflow or you stop building. I chose to optimize.

I wanted my AI engineering environment (Gemini CLI) to do something I simply don't have the mental bandwidth for after a long day: remembering the "why" behind every architectural pivot. I needed a way to capture those 'aha' moments automatically so I could curate them into these logs later.

Initially, I solved this with a custom real-time filter.

The Old Way: Real-time Filtering

I wrote an AfterAgent bash hook (insight-collector.sh) that intercepted the agent's response on every single turn. It would take the raw output, send it to Groq (using Llama 3), and ask: "Is there a blog-worthy architectural insight here?" If yes, it appended it to a markdown file.

# Constructing the payload for Groq evaluation
payload=$(jq -n \
  --arg content "$prompt\n\nAgent Response:\n$agent_text" \
  '{
    "model": "llama-3.1-8b-instant",
    "messages": [{"role": "user", "content": $content}]
  }')

# Calling Groq silently via curl on every turn
response=$(curl -s -X POST "https://api.groq.com/openai/v1/chat/completions" \
  -H "Authorization: Bearer $GROQ_HOOK_KEY" -d "$payload")

It worked, but it had two flaws that annoyed me:

Per-Turn Overhead: Running an LLM evaluation on every single turn added latency to the CLI. When you only have two hours of sleep-deprived focus, every second of "waiting for hook" feels like a waste.
Missing the Forest: By evaluating single responses in isolation, the model missed the broader arc of a complex refactor.

Discovering claude-mem

The pivot happened when I saw a mention of claude-mem in a blog post. I was already trying to solve the "context loss" problem, so I asked my agent about it. Digging into the documentation, I realized it wasn't just for Claude Code—it had official support for the Gemini CLI hooks I was already using.

I realized we were solving the same problem from different angles. My setup focused on curation (filtering for humans), while claude-mem focused on persistence (fixing the AI's forgetfulness by maintaining a searchable SQLite database of tool outputs and decisions).

Instead of reinventing the infrastructure, I decided to join forces.

# Installing the hooks into Gemini CLI
npx claude-mem install

This command injected lifecycle hooks directly into my ~/.gemini/settings.json, offloading the operational history to a background Node worker.

The API Pivot: The Dual-LLM Architecture

I realized I didn't want the same model doing the background logging as the one doing the deep insight extraction. I needed a dual-LLM architecture: one ultra-cheap, fast model for logging, and one deep-reasoning model for periodic curation.

1. The Background Logger (Gemini Flash Lite)
I updated ~/.claude-mem/settings.json to use gemini-2.5-flash-lite. At current Pay-As-You-Go pricing ($0.075 per 1M input tokens), a heavy session of background processing—summarizing every tool call and agent turn—costs roughly 1.5 cents.

{
  "CLAUDE_MEM_PROVIDER": "gemini",
  "CLAUDE_MEM_GEMINI_MODEL": "gemini-2.5-flash-lite"
}

2. The On-Demand Curator (Gemini 2.5 Pro)
With claude-mem handling the "Work History," I refactored my collector from a real-time hook to an on-demand batch command (/insights).

Because I only run this script occasionally, I can afford to use a much smarter model. I switched it to Gemini 2.5 Pro. I also added state management so it only queries new observations, keeping the context window clean and the costs under 10 cents per run.

# State Management: Get the last processed ID
LAST_ID=$(cat "$STATE_FILE")

# Fetch New Work: Query observations newer than the last ID
QUERY="SELECT id, created_at, type, title, text FROM observations WHERE id > $LAST_ID ORDER BY id ASC;"
observations=$(sqlite3 "$DB_PATH" "$QUERY")

# Inject existing draft titles to prevent duplicates
EXISTING_DRAFTS=$(ls -1 ../blog-post-drafts/ | sed 's/\.md$//')

# Send the batch to Gemini 2.5 Pro for deep synthesis
# ... [Curl to generativelanguage.googleapis.com] ...

# Update State
echo "$max_new_id" > "$STATE_FILE"

The Result

This is pragmatism in action. By recognizing that I could use a dedicated tool for the heavy lifting of state management, I’ve removed the "documentation tax" from my limited hacking hours.

graph LR
    A[Gemini CLI] -- Background Hooks --> B[claude-mem Worker]
    B -- Summarizes via Flash Lite --> C[(SQLite DB)]
    D[Manual /insights command] -- Batch Queries --> C
    D -- Deep Synthesis via Gemini Pro --> E[Blog Drafts]

Background Persistence: claude-mem + Flash Lite cheaply records the "what."
On-Demand Curation: Gemini 2.5 Pro + State Management extracts the "why."

I no longer have latency dragging down my flow, and the insights are much richer because the evaluator sees the full story of the session's decisions, rather than isolated fragments.

The Old Way: Real-time Filtering

Discovering claude-mem

The API Pivot: The Dual-LLM Architecture

The Result

References & Resources