Fetching latest headlines…
How I Cut My LLM Costs by 90% Without Changing My App Logic
NORTH AMERICA
🇺🇸 United StatesMay 21, 2026

How I Cut My LLM Costs by 90% Without Changing My App Logic

0 views0 likes0 comments
Originally published byDev.to

How I Cut My LLM Costs by 90% Without Changing My App Logic

There’s a particular kind of dread that comes with checking your OpenAI billing dashboard mid-month.

I’ve been building a news automation hub that runs 14 editorial workspaces — summarizing, rewriting, fact-checking, SEO-tagging, and translation pipelines around the clock.

The AI layer was already fairly optimized:

  • Groq
  • Gemini Flash
  • DeepSeek
  • OpenRouter
  • provider rotation
  • fallback logic

But the final fallback was still OpenAI, and once rate limits hit, costs climbed faster than expected.

What I needed wasn’t more routing logic.

I needed a smarter endpoint.

The Problem

My setup already rotated between multiple providers, but the architecture had a weakness:

Provider exhausted
    -> fallback
        -> OpenAI
            -> credits disappear

The more providers I added, the messier things became:

  • more API keys
  • more retry logic
  • more conditional branches
  • more provider-specific handling

I was optimizing infrastructure with application code.

That was the mistake.

The Fix

After digging through self-hosted AI tooling, I found freellmapi.

It’s a lightweight OpenAI-compatible proxy that automatically routes requests across multiple free-tier LLM providers:

  • Groq
  • Cerebras
  • SambaNova
  • Cloudflare Workers AI
  • GitHub Models
  • OpenRouter free models
  • and others

Combined free-tier capacity: roughly 800M tokens/month.

The interesting part is that the routing happens inside the proxy — not inside your app.

My Integration

The integration took less than an hour.

1. Deploy the proxy

I ran it on my existing VPS:

  • Node.js 20
  • ~40MB idle RAM
  • localhost only

2. Add provider credentials

I added:

  • Groq key
  • Cloudflare credentials
  • OpenRouter key

inside the admin panel.

3. Point my app to a single endpoint

const client = new OpenAI({
  baseURL: "http://localhost:3001/v1",
  apiKey: process.env.LOCAL_ROUTER_KEY
});

That was basically it.

The important detail:

I stopped specifying models for non-critical tasks.

Instead of forcing a specific provider, I let the proxy auto-route requests to whatever free provider was currently available.

App
  -> freellmapi
      -> Groq
      -> Cloudflare Workers AI
      -> Cerebras
      -> SambaNova
      -> OpenRouter

If Groq rate-limited:

  • another provider picked up the request

If a provider became slow:

  • routing shifted automatically

My application code never needed to know.

The Result

Within 24 hours:

  • OpenAI usage dropped by ~90%
  • background AI tasks became almost entirely free-tier
  • no additional retry logic was needed

Most importantly:
I removed provider chaos from my application layer.

What I Learned

When engineers hit rate limits, the instinct is usually:

  • add more providers
  • add more fallback logic
  • add more code

But sometimes the better solution is adding an abstraction layer that absorbs the complexity for you.

Another realization:

Most AI tasks do not require a specific premium model.

For:

  • summaries
  • tagging
  • drafts
  • translations
  • background enrichment

…almost any decent modern 70B model works fine.

Caveats

Free-tier infrastructure has tradeoffs.

Some providers:

  • have cold starts
  • introduce latency spikes
  • become temporarily unavailable

For real-time user-facing chat systems, you should test failover carefully.

For async pipelines and batch jobs, though, it’s been surprisingly solid.

Also:
run this on infrastructure you control.

A proxy like this handles upstream API keys — don’t hand that responsibility to random hosted services.

Final Thought

The biggest optimization wasn’t changing models.

It was removing complexity from the layer that had to manage them.

Comments (0)

Sign in to join the discussion

Be the first to comment!