Nobody quantifies the cost of second-tier AI: slower outputs, weaker reasoning, financial models that are almost-right. For serious analytical work, running on anything less than the frontier (GPT-5) means leaving real value on the table.
Andrew Grosser
June 3, 2026 • 8 min read min read
When people talk about AI model quality, they usually talk about benchmark scores and model releases. What they rarely talk about is the quiet, ongoing cost of using a model that's good — but not the best. It doesn't announce itself. There's no error message that says 'this answer would have been 15% more accurate on GPT-5.' The cost is invisible, which makes it easy to ignore. And that's exactly why it compounds.
Second-tier AI isn't bad. That's what makes it insidious. Claude Opus 4.5 is an excellent model. Gemini Ultra is impressive. GPT-4o was transformative when it launched. On most tasks, for most users, most of the time, these models produce useful output.
But 'useful output' and 'best available output' are different things. On the Vals.ai finance benchmark — real financial analysis tasks designed to mirror professional work — Claude Opus 4.5 scored 67%. Sourcetable, running on frontier AI, scored 100%. That 33-point gap doesn't mean Claude is giving you wrong answers. It means that on a third of financial analysis tasks, the best available model is doing something the second-tier one isn't.
In a chat window, you might not notice. In a financial model driving a $10M decision, you might notice too late.
The gap between frontier and second-tier models isn't evenly distributed. It's widest on complex, multi-step analytical tasks: DCF valuations with many assumptions, portfolio optimization across many constraints, natural language queries that require precise SQL generation, regression analysis on noisy data. These are exactly the tasks where analysts need AI most — and where the quality of the model matters most.
For simpler tasks — summarizing a document, writing a straightforward formula, answering a factual question — the gap is narrow. The frontier matters most precisely when the task is hardest. Which means the cost of running on second-tier AI is highest exactly when the stakes are highest.
A single analytical error in isolation is manageable. You catch it in review, correct it, move on. But analytical work builds on itself. A slightly-off assumption in a revenue model gets used as an input to a hiring plan, which gets used as an input to a budget, which gets used as an input to a fundraising narrative. By the time the original error surfaces, it's embedded in four downstream documents.
Frontier AI isn't a guarantee against errors — no model is. But better reasoning, more accurate formula generation, and sharper natural language queries reduce the error rate across the entire chain. Over a year of daily analytical work, that reduction compounds into substantially better outcomes.
Even if you know frontier models matter, the selection problem persists. Which model is the frontier right now? GPT-5 is leading today — but when did that become true, and how confident are you? Six months ago Claude was widely considered better for reasoning tasks. Three months ago GPT-4o was the consensus pick for speed and accuracy. The leaderboard changes faster than most people track it.
The only safe answer is to not track it yourself. Use a platform that handles the selection automatically — always routing to whichever model is currently leading — so you never have to think about whether you're on the frontier or falling behind.
Sourcetable is the first AI spreadsheet purpose-built around frontier intelligence. Not 'AI-assisted' features added to a traditional spreadsheet — an analytical platform where frontier AI is the core interface. You ask questions in plain English and get analysis of your actual data. You describe a financial model and get a working spreadsheet. You ask for a dashboard and get one that pulls live from Salesforce, Stripe, or Postgres.
And because Sourcetable connects to 100+ data sources — not uploads in a chat window — the analysis compounds over time. Better models make your existing work better automatically. The revenue model you built six months ago runs on GPT-5 today. When something better than GPT-5 arrives, it'll run on that. You never fall behind. You never think about it.
The case for always-frontier AI: