Articles / We Scored 100% on the Finance Benchmark. Here's What That Means.

We Scored 100% on the Finance Benchmark. Here's What That Actually Means.

Claude Opus 4.5 scored 67%. Sourcetable scored 100%. On the same benchmark. Here's what the test covers, why domain-specific beats general-purpose, and what it means for your analysis work.

Andrew Grosser

Andrew Grosser

June 1, 2026 • 9 min read

Benchmark scores are easy to misuse. '100%' sounds impressive until you realize you don't know what was being tested. This article breaks down exactly what the Vals.ai finance agent benchmark tests, why general-purpose LLMs like Claude score lower than purpose-built platforms, and what the 33-point gap actually means for financial analysis work.

Quick Comparison

BenchmarkSourcetableClaude Opus 4.5What It Tests
Vals.ai Finance100%67%Financial analysis agent tasks
Rows.com Spreadsheet100%Not testedSpreadsheet AI tasks

What the Vals.ai Finance Benchmark Tests

The Vals.ai finance agent benchmark evaluates AI systems on real financial analysis tasks: interpreting financial statements, performing ratio analysis, answering questions about market data, making investment-relevant calculations, and reasoning about financial scenarios. These are the tasks financial analysts do every day — not abstract reasoning puzzles or general knowledge questions.

Why Claude Scored 67% and Sourcetable Scored 100%

Claude is a general-purpose language model. It reasons well about many topics including finance, but it lacks financial data access, institutional analysis frameworks, and purpose-built financial reasoning. Sourcetable combines Claude-level language understanding with financial-domain infrastructure: 500+ data APIs, built-in Monte Carlo and factor model implementations, and years of financial domain-specific training and optimization. Domain-specific beats general-purpose on domain-specific tasks.

The Rows.com Spreadsheet Benchmark

Rows.com published a benchmark evaluating AI systems on standard spreadsheet tasks: formula generation, data manipulation, chart creation, and analysis. Sourcetable scored 100% — making it the first AI spreadsheet to achieve perfect scores on both this benchmark and Vals.ai. The fact that we beat Rows.com on their own benchmark reflects the depth of our spreadsheet-specific capabilities.

What These Scores Mean in Practice

A 33-point gap on a financial benchmark isn't abstract — it means 33% more finance tasks executed correctly. For a financial analyst running complex analysis workflows, that translates to fewer errors, less manual verification, and more confidence in AI-assisted conclusions. General-purpose LLMs are excellent tools. For financial analysis specifically, purpose-built wins.

Why Benchmark Scores Matter

What 100% means for you:

  • ✅ Financial analysis tasks complete correctly, not approximately
  • ✅ Domain-specific training on actual financial workflows
  • ✅ 500+ financial APIs ensure data is available, not hallucinated
  • ✅ Institutional analysis tools built-in (not coded ad-hoc)
  • ✅ 33 percentage points above the leading general-purpose LLM

The world's most powerful analytical platform — free to try

100% benchmark scores. 500+ financial APIs. Spreadsheet interface. No coding required.

Start Free Trial →
Where can I see the benchmark results?
The Vals.ai finance benchmark results are published at vals.ai. The Rows.com spreadsheet benchmark is published by Rows.com. Both are independently conducted and publicly available.
Is 100% actually achievable on these benchmarks?
Yes — Sourcetable achieved it. The benchmarks contain a finite set of well-defined tasks. Perfect execution is achievable with a combination of the right AI reasoning and the right financial domain infrastructure.
Do these benchmarks cover my specific use case?
The Vals.ai finance benchmark covers financial statement analysis, ratio calculations, market data interpretation, and investment reasoning. The Rows.com benchmark covers spreadsheet formulas, data manipulation, and analysis. If your work involves either, these scores are directly relevant.
Andrew Grosser

Andrew Grosser

Founder & CTO, Sourcetable

Andrew Grosser is the Founder and CTO of Sourcetable — the world's first AI spreadsheet with 100% benchmark scores, a 1 billion row data lake, and patent-pending secure credential execution.

Share this article

Drop CSV