The Poor Man's AI: Escaping Vendor SDKs and API Rate Limits
When I had to re-process my recipe database, Google's strict rate limits brought my system to a crawl. Here is how I removed the vendor SDK and built a simple generic client in Go.
Building a "poor man's AI setup" means constantly fighting against free-tier rate limits. If you run a project on a cheap $5 AWS Lightsail VPS and don't want to pay for APIs, you are going to hit an HTTP 429 (Too Many Requests) wall eventually.
Recently, I refactored my meal planner's vector database to fix search blind spots and handle negative constraints. Because I changed how recipes are tagged and hashed, I had to trigger a complete re-ingestion of my entire recipe database.
That is when the weaknesses in my code showed up.
The Brittle Workaround
I was using Google's Gemini API for vector embeddings. The free tier is generous for daily planning, but it has strict requests-per-minute (RPM) limits. To survive the ingestion loop without my app crashing from rate-limit errors, I had previously used a simple workaround in the code:
// A quick fix to avoid 429 errors
if err := ProcessAndSaveRecipe(ctx, a.extractor, a.recipeRepo, post, force); err != nil {
log.Printf("Failed to process recipe: %v", err)
}
// Wait 5 seconds to stay under Rate Limits
// We sleep even on failure to ensure we don't hammer the API
time.Sleep(5 * time.Second)
Watching a script pause for 5 seconds between every single recipe, even when the API was perfectly healthy, was a very frustrating experience. Re-ingesting 50 recipes took minutes instead of seconds.
The Vendor Lock-In Smell
I needed a faster alternative for embeddings, and I found Mixedbread AI. They offer high-quality, open-source models with a very generous free tier.
However, to make the switch, I had to remove the google.golang.org/api SDK. This highlighted a distinct code smell: I was tightly coupled to a heavy, vendor-specific SDK just to make a simple POST request that returns an array of floats.
Almost all modern AI providers now support the OpenAI-compatible /v1/embeddings format. Instead of installing another vendor SDK every time I wanted to switch providers, I decided to write a generic, provider-agnostic HTTP client.
The Generic Wrapper
I deleted the Gemini dependency entirely and created a clean EmbeddingClient that can point to any compliant endpoint.
First, I defined the Data Transfer Objects (DTOs) to keep my internal logic clean and provider-agnostic:
type embeddingRequest struct {
Model string `json:"model"`
Input []string `json:"input"`
Normalized bool `json:"normalized"`
EncodingFormat string `json:"encoding_format"`
}
type embeddingResponse struct {
Data []struct {
Embedding []float32 `json:"embedding"`
} `json:"data"`
}
Then, I built the client struct using only the Go standard library, avoiding any external dependencies.
package llm
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"time"
"ai-meal-planner/internal/config"
)
// EmbeddingClient is a generic HTTP client for generating vector embeddings.
type EmbeddingClient struct {
apiKey string
baseURL string
model string
httpClient *http.Client
}
func NewEmbeddingClient(cfg *config.Config) *EmbeddingClient {
return &EmbeddingClient{
apiKey: cfg.EmbeddingAPIKey,
baseURL: "https://api.mixedbread.com/v1/embeddings",
model: "mixedbread-ai/mxbai-embed-large-v1",
httpClient: &http.Client{
Timeout: 30 * time.Second, // Never trust an external API without a timeout
},
}
}
The execution method is a standard HTTP POST. By requesting "float" explicitly, we avoid having to decode base64 strings later.
func (c *EmbeddingClient) GenerateEmbedding(ctx context.Context, text string) ([]float32, error) {
reqBody := embeddingRequest{
Model: c.model,
Input: []string{text},
Normalized: true,
EncodingFormat: "float",
}
jsonData, err := json.Marshal(reqBody)
if err != nil {
return nil, fmt.Errorf("failed to marshal request: %w", err)
}
req, err := http.NewRequestWithContext(ctx, "POST", c.baseURL, bytes.NewBuffer(jsonData))
if err != nil {
return nil, fmt.Errorf("failed to create request: %w", err)
}
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Authorization", "Bearer "+c.apiKey)
resp, err := c.httpClient.Do(req)
if err != nil {
return nil, fmt.Errorf("failed to execute request: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
return nil, fmt.Errorf("API error (status %d)", resp.StatusCode)
}
var result embeddingResponse
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return nil, fmt.Errorf("failed to decode response: %w", err)
}
if len(result.Data) == 0 {
return nil, fmt.Errorf("no embedding returned in response")
}
return result.Data[0].Embedding, nil
}
The Result
With the faster Mixedbread API and a more robust client, I finally deleted the time.Sleep(5 * time.Second) from the ingestion loop.
The application now processes recipe embeddings at full speed. For the Groq LLM calls, which still handle the heavy text extraction, I replaced the crude sleep with a retry loop that actually reads the "retry-after" headers. It waits intelligently only when it receives a 429, rather than pausing the whole process.
Moving away from vendor SDKs for simple API calls gave me the flexibility I needed for a zero-dollar architecture: I can now swap embedding providers whenever a better free tier comes along, just by changing an environment variable.