How I Saved Millions in Gen AI Token Costs — A Real Experiment on Efficiency Gains

The title might sound dramatic — and yes, I didn’t storm into a boardroom shouting about millions saved.

But what started as a casual read on a clever efficiency hack turned into a personal experiment in one of my projects — and the results were eye-opening. A simple adjustment in how data is formatted led to significant reductions in token usage, and at scale, that translates to real cost savings.

Here’s what I discovered and how you can apply it too.

Why Tokens Add Up

If you’ve used Gen AI APIs like OpenAI or Gemini, you know that every token counts — input and output alike.

For a handful of calls, the difference may seem trivial. But when you scale to thousands or millions of requests, even a small inefficiency quietly stacks into noticeable costs.

This experiment began with a simple question:

Can how data is formatted actually impact token usage?

Turns out, yes — and it’s more impactful than you’d think.

Experimenting With Data Formats

The article I read suggested that one data format could reduce token usage compared to standard JSON. I wanted to see it in practice, so I set up a test in my project.

Here’s a snapshot of the comparison:

JSON (standard format):

{
  "invoice_id": "INV-342",
  "vendor": "CodeTerra",
  "amount": 4500,
  "currency": "USD"
}

YAML (alternative format):

invoice_id: INV-342
vendor: CodeTerra
amount: 4500
currency: USD

Both contain identical information. But the alternative format eliminates extra punctuation — braces, commas, quotes — which reduces the number of tokens when processed by the AI model.

Results From My Project

A 20% drop in output tokens and 30% drop in character count might sound small, but when you’re processing hundreds of thousands of calls, the cumulative savings are huge.

It’s the kind of optimization that, at scale, feels like “millions saved” — even if it’s metaphorical for your project.

Note: the result can vary based on the data, few may get the better results, but it's worth trying.

Implementing It Without Breaking Your Workflow

Most systems still expect JSON, so I set up a conversion pipeline:

const yaml = require("js-yaml");
 
// Example AI output in YAML
const yamlData = `
invoice_id: INV-342
vendor: CodeTerra
amount: 4500
currency: USD
`;
 
// Convert YAML → JSON
const json = yaml.load(yamlData);
console.log(json);

Fast, lightweight, and it preserves efficiency while keeping backend systems happy.

Why This Actually Works

The secret is simple:

Fewer symbols = fewer tokens
Compact structure = higher semantic density
Cleaner data = easier for the model to process

It’s a small change, but one that scales beautifully.

Key Takeaways

1) Test ideas yourself — reading about a hack is one thing; validating it in your workflow is another.
2) Serialization matters — it affects both readability and cost.
3) Small tweaks multiply — token savings compound quickly at scale.
4) Practical conversion is easy — you can integrate efficiency improvements without changing your existing backend.

Final Thoughts

This experiment showed me that optimization doesn’t always require rewriting models or infrastructure. Sometimes, it’s about noticing tiny inefficiencies that quietly eat resources, like extra characters in a data format.

The “millions saved” might be metaphorical here, but at scale, the idea is real: small changes in AI workflows can have outsized impacts on costs and performance.

And in my case, it all started with a simple experiment inspired by an article — validating it in a live project turned theory into real insight.