10 Seconds to 2: How We Stopped Using AI Wrong

StoaPack is our 3D bin packing API. You send it items and boxes, it tells you how to pack everything efficiently. The core algorithm is deterministic. No AI. Request in, math runs, answer out in milliseconds.

But we also have custom instructions. Natural language packing rules like “keep the candles away from the chocolate” or “put all the sample packs in the small mailers.” Things that are obvious to a human and impossible to express as a standard API parameter.

For that, we needed AI. The first version worked. Then we checked our logs and noticed the AI-enabled requests were taking 10 seconds.

Two Calls, One Problem

The recommended approach was to send the entire packing problem to a large language model. Items, boxes, warehouse inventory, custom instructions, everything. The LLM generates a complete packing solution. Then a second LLM call reviews the result, because trusting a language model to do spatial optimization unsupervised felt optimistic.

Generate and review. Two calls. Four to six seconds each.

For context, the non-AI path returns in milliseconds. If you’re integrating a packing API into a fulfillment workflow processing hundreds of orders an hour, 10 seconds per request is not a feature. It’s a bug.

The Wrong Job

We already had an algorithm that handles 3D spatial optimization, weight distribution, fragile item placement, multi-warehouse routing, and hazmat segregation. In milliseconds. Because it’s math.

And we were asking a language model to do that same work. Generate-and-review is a solid pattern when AI is doing the core thinking. In our case, the core thinking was already solved. The only thing the LLM could do that our algorithm couldn’t was read English.

That was the whole insight: we didn’t need an AI packer. We needed an AI translator.

Translator, Not Solver

One LLM call now. Its only job is to read natural language and output structured constraints:

{
  "separation_rules": [
    {
      "group_a": ["item_47"],
      "group_b": ["item_22", "item_23"],
      "reason": "candles away from chocolate"
    }
  ]
}

The LLM reads English. Writes JSON. Done. The deterministic algorithm takes those constraints and does what it always does, except now it knows candles and chocolate aren’t friends.

No review call. If a constraint references an item that doesn’t exist or asks for something impossible, a code validation layer catches it and falls back to default packing. That’s a software problem, not an AI problem.

Where the AI Actually Earns Its Keep

The hard part is fuzzy matching. A customer writes “put all the pumpkin stickers in purple mailers.” In their inventory, that’s STCKR-PMPKN-3IN and POLY-MAILER-PURP-10x13.

The LLM sees the full item and box inventory in its prompt, makes the semantic match, and outputs actual IDs. A rules engine would choke on this. A tiny language model handles it in under a second.

We switched from a flagship model to GPT-4.1-nano, OpenAI’s smallest option. For classification and extraction, it’s more than enough. For solving 3D packing problems, nothing is enough. Good thing we stopped asking.

The Numbers

Before: two LLM calls, ~10 seconds, large model. After: one LLM call, ~2 seconds, nano model.

About 80% faster. About 90% cheaper. More reliable, because the packing solution still comes from the same deterministic algorithm that handles every other request.

We also added full observability. Every response with custom instructions now includes timing breakdowns and constraint diagnostics: what the LLM interpreted, what got validated, what got dropped and why. If the AI misread your instructions, you can see exactly where.

The Takeaway

If you’re building with AI, it’s worth asking regularly: what job is the AI actually doing here?

LLMs are general-purpose enough that they’ll produce something for almost any input. That’s impressive, and it’s a trap. “It works” and “it’s the right architecture” are different statements.

In our case, the AI’s job was never to solve the packing problem. It was to bridge the gap between how humans think (“keep these apart”) and how algorithms think (separation_rules: [{group_a, group_b}]). Translation, not computation. Once we got that right, everything else followed.

StoaPack is a 3D bin packing API for e-commerce fulfillment. Custom instructions, hazmat segregation, multi-warehouse optimization. If you’re curious: stoapack.stoalogistics.com