The Poor Man's AI: Escaping Vendor SDKs and API Rate Limits

When I had to re-process my recipe database, Google's strict rate limits brought my system to a crawl. Here is how I removed the vendor SDK and built a simple generic client in Go.

The Poor Man's AI: Escaping Vendor SDKs and API Rate Limits
Photo by Dalila Moreira / Unsplash

Building a "poor man's AI setup" means constantly fighting against free-tier rate limits. If you run a project on a cheap $5 AWS Lightsail VPS and don't want to pay for APIs, you are going to hit an HTTP 429 (Too Many Requests) wall eventually.

Recently, I refactored my meal planner's vector database to fix search blind spots and handle negative constraints. Because I changed how recipes are tagged and hashed, I had to trigger a complete re-ingestion of my entire recipe database.

That is when the weaknesses in my code showed up.

The Brittle Workaround

I was using Google's Gemini API for vector embeddings. The free tier is generous for daily planning, but it has strict requests-per-minute (RPM) limits. To survive the ingestion loop without my app crashing from rate-limit errors, I had previously used a simple workaround in the code:

// A quick fix to avoid 429 errors
if err := ProcessAndSaveRecipe(ctx, a.extractor, a.recipeRepo, post, force); err != nil {
    log.Printf("Failed to process recipe: %v", err)
}

// Wait 5 seconds to stay under Rate Limits
// We sleep even on failure to ensure we don't hammer the API
time.Sleep(5 * time.Second)

Watching a script pause for 5 seconds between every single recipe, even when the API was perfectly healthy, was a very frustrating experience. Re-ingesting 50 recipes took minutes instead of seconds.

The Vendor Lock-In Smell

I needed a faster alternative for embeddings, and I found Mixedbread AI. They offer high-quality, open-source models with a very generous free tier.

However, to make the switch, I had to remove the google.golang.org/api SDK. This highlighted a distinct code smell: I was tightly coupled to a heavy, vendor-specific SDK just to make a simple POST request that returns an array of floats.

Almost all modern AI providers now support the OpenAI-compatible /v1/embeddings format. Instead of installing another vendor SDK every time I wanted to switch providers, I decided to write a generic, provider-agnostic HTTP client.

The Generic Wrapper

I deleted the Gemini dependency entirely and created a clean EmbeddingClient that can point to any compliant endpoint.

First, I defined the Data Transfer Objects (DTOs) to keep my internal logic clean and provider-agnostic:

type embeddingRequest struct {
	Model          string   `json:"model"`
	Input          []string `json:"input"`
	Normalized     bool     `json:"normalized"`
	EncodingFormat string   `json:"encoding_format"`
}

type embeddingResponse struct {
	Data []struct {
		Embedding []float32 `json:"embedding"`
	} `json:"data"`
}

Then, I built the client struct using only the Go standard library, avoiding any external dependencies.

package llm

import (
	"bytes"
	"context"
	"encoding/json"
	"fmt"
	"net/http"
	"time"
	"ai-meal-planner/internal/config"
)

// EmbeddingClient is a generic HTTP client for generating vector embeddings.
type EmbeddingClient struct {
	apiKey     string
	baseURL    string
	model      string
	httpClient *http.Client
}

func NewEmbeddingClient(cfg *config.Config) *EmbeddingClient {
	return &EmbeddingClient{
		apiKey:  cfg.EmbeddingAPIKey,
		baseURL: "https://api.mixedbread.com/v1/embeddings",
		model:   "mixedbread-ai/mxbai-embed-large-v1",
		httpClient: &http.Client{
			Timeout: 30 * time.Second, // Never trust an external API without a timeout
		},
	}
}

The execution method is a standard HTTP POST. By requesting "float" explicitly, we avoid having to decode base64 strings later.

func (c *EmbeddingClient) GenerateEmbedding(ctx context.Context, text string) ([]float32, error) {
	reqBody := embeddingRequest{
		Model:          c.model,
		Input:          []string{text},
		Normalized:     true,
		EncodingFormat: "float",
	}

	jsonData, err := json.Marshal(reqBody)
	if err != nil {
		return nil, fmt.Errorf("failed to marshal request: %w", err)
	}

	req, err := http.NewRequestWithContext(ctx, "POST", c.baseURL, bytes.NewBuffer(jsonData))
	if err != nil {
		return nil, fmt.Errorf("failed to create request: %w", err)
	}

	req.Header.Set("Content-Type", "application/json")
	req.Header.Set("Authorization", "Bearer "+c.apiKey)

	resp, err := c.httpClient.Do(req)
	if err != nil {
		return nil, fmt.Errorf("failed to execute request: %w", err)
	}
	defer resp.Body.Close()

	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
		return nil, fmt.Errorf("API error (status %d)", resp.StatusCode)
	}

	var result embeddingResponse
	if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
		return nil, fmt.Errorf("failed to decode response: %w", err)
	}

	if len(result.Data) == 0 {
		return nil, fmt.Errorf("no embedding returned in response")
	}

	return result.Data[0].Embedding, nil
}

The Result

With the faster Mixedbread API and a more robust client, I finally deleted the time.Sleep(5 * time.Second) from the ingestion loop.

The application now processes recipe embeddings at full speed. For the Groq LLM calls, which still handle the heavy text extraction, I replaced the crude sleep with a retry loop that actually reads the "retry-after" headers. It waits intelligently only when it receives a 429, rather than pausing the whole process.

Moving away from vendor SDKs for simple API calls gave me the flexibility I needed for a zero-dollar architecture: I can now swap embedding providers whenever a better free tier comes along, just by changing an environment variable.

References & Resources