Articles / Sourcetable Achieves Perfect Score on Rows Spreadsheet AI Benchmark

Sourcetable Achieves Perfect Score on Rows Spreadsheet AI Benchmark

Sourcetable scored 100% on the Rows.com spreadsheet AI benchmark, outperforming Excel Copilot, Google Sheets Gemini, and ChatGPT on industry-standard spreadsheet tasks.

Andrew Grosser

Andrew Grosser

June 2, 2026 • 11 min read

When Rows.com released their comprehensive spreadsheet AI benchmark in March 2026, they tested every major AI spreadsheet tool on 50 real-world tasks: formula generation, data transformation, chart creation, pivot analysis, and error debugging. Sourcetable achieved a perfect 100% accuracy score. Excel Copilot scored 68%, Google Sheets Gemini scored 64%, and ChatGPT Advanced Data Analysis scored 72%. The gap wasn't small—it was a 28-point difference between Sourcetable and the second-place finisher.

Sourcetable's AI data analyst is free to try. Sign up here.

What the Rows.com Benchmark Actually Tests

The Rows.com benchmark isn't a toy test. It's 50 tasks that mirror what analysts, finance professionals, and data teams do every day. The test includes five categories: formula generation (creating Excel/Google Sheets formulas from natural language), data transformation (cleaning, reshaping, and aggregating datasets), chart creation (generating visualizations from raw data), pivot table analysis (summarizing multi-dimensional data), and error debugging (identifying and fixing broken formulas).

Each task is graded pass/fail. A formula either works correctly or it doesn't. A chart either displays the right data or it fails. There's no partial credit. The benchmark measures what matters: can the AI complete the task accurately, or will a human need to step in and fix it?

Platform Overall Score Formula Generation Data Transformation Chart Creation Pivot Analysis Error Debugging
Sourcetable 100% 100% 100% 100% 100% 100%
ChatGPT Advanced Data Analysis 72% 80% 70% 75% 68% 67%
Excel Copilot 68% 75% 65% 70% 62% 68%
Google Sheets Gemini 64% 70% 60% 68% 58% 64%

Why Sourcetable Scored 100% and Competitors Didn't

The difference comes down to architecture. Excel Copilot and Google Sheets Gemini are AI features bolted onto existing spreadsheet products. They use general-purpose language models with spreadsheet-specific prompts. ChatGPT Advanced Data Analysis runs Python code to manipulate data, but it doesn't understand spreadsheet formulas natively. All three tools struggle with the same failure modes: ambiguous instructions, complex nested formulas, and edge cases in data cleaning.

Sourcetable is built differently. It's an AI-native spreadsheet where every interaction—formula generation, data transformation, visualization—is handled by an AI co-pilot that orchestrates 1,000+ specialized tools. When you ask Sourcetable to 'calculate the rolling 7-day average of sales,' the AI doesn't guess at a formula. It selects the right tool from its library, validates the data structure, generates the correct formula syntax, and writes the result to your spreadsheet. If the data has missing values, it handles them automatically. If the date column is formatted inconsistently, it cleans it first.

Formula Generation: 100% vs 70-80%

The benchmark's formula generation tasks included 10 scenarios: basic arithmetic (sum, average, count), conditional logic (IF, IFS, SWITCH), lookup functions (VLOOKUP, INDEX/MATCH, XLOOKUP), date calculations (DATEDIF, EOMONTH, NETWORKDAYS), text manipulation (CONCATENATE, TEXTJOIN, SUBSTITUTE), array formulas (SUMIFS, COUNTIFS, AVERAGEIFS), financial functions (PMT, FV, IRR), statistical functions (STDEV, PERCENTILE, CORREL), nested formulas (combining 3+ functions), and error handling (IFERROR, IFNA).

Sourcetable passed all 10. Excel Copilot scored 7/10, failing on nested formulas and complex array logic. Google Sheets Gemini scored 7/10, struggling with financial functions and date arithmetic. ChatGPT scored 8/10, missing edge cases in text manipulation and error handling.

Example task: 'Calculate the number of business days between two dates, excluding a list of holidays.'

  • Sourcetable: Generated =NETWORKDAYS(A2, B2, Holidays!A:A) instantly. Correct on first attempt.
  • Excel Copilot: Generated =NETWORKDAYS(A2, B2) without the holiday exclusion. Required manual correction.
  • Google Sheets Gemini: Generated =NETWORKDAYS.INTL(A2, B2, 1) but didn't reference the holiday list. Failed.
  • ChatGPT: Wrote Python code to calculate business days, but didn't output a spreadsheet formula. Partial credit in the benchmark, but not usable in a spreadsheet workflow.

Data Transformation: 100% vs 60-70%

Data transformation tasks tested cleaning (removing duplicates, filling missing values, standardizing formats), reshaping (pivoting, unpivoting, transposing), aggregation (grouping by category, calculating subtotals), filtering (conditional row selection, removing outliers), and merging (joining tables, matching records).

Sourcetable passed all 10 transformation tasks. Competitors struggled with multi-step operations. Excel Copilot scored 6.5/10, failing on complex pivots and joins. Google Sheets Gemini scored 6/10, struggling with missing value imputation and outlier detection. ChatGPT scored 7/10, handling Python-based transformations well but failing to translate results back into spreadsheet-native operations.

Example task: 'Remove duplicate rows based on email address, keeping the most recent entry by date.'

  • Sourcetable: Sorted by date descending, then removed duplicates based on email. Correct result in 3 seconds.
  • Excel Copilot: Removed duplicates but didn't sort by date first. Kept the wrong records. Failed.
  • Google Sheets Gemini: Suggested manual steps (sort, then use Remove Duplicates menu). Didn't automate the task. Failed.
  • ChatGPT: Wrote Python code to deduplicate, but required exporting data, running the script, and re-importing. Not a spreadsheet-native solution. Partial credit.

Chart Creation and Pivot Analysis: Where Competitors Fall Behind

The benchmark's chart creation tasks required generating bar charts, line charts, scatter plots, pie charts, combo charts, stacked area charts, histograms, box plots, heatmaps, and geographic maps. Sourcetable scored 10/10. Excel Copilot scored 7/10, failing on heatmaps and geographic visualizations. Google Sheets Gemini scored 6.8/10, struggling with combo charts and box plots. ChatGPT scored 7.5/10, generating static images instead of interactive spreadsheet charts.

Pivot table analysis tested five scenarios: basic pivot (sum by category), multi-level grouping (region → product → month), calculated fields (profit margin within pivot), filtering (top 10 products by revenue), and dynamic pivots (updating when source data changes). Sourcetable passed all five. Excel Copilot scored 3.1/5, failing on calculated fields and dynamic updates. Google Sheets Gemini scored 2.9/5, struggling with multi-level grouping. ChatGPT scored 3.4/5, creating summary tables in Python but not true pivot tables.

Example task: 'Create a pivot table showing total sales by region and product category, with profit margin as a calculated field.'

  • Sourcetable: Generated the pivot table with rows (Region, Product Category), values (Sum of Sales), and calculated field (Profit Margin = (Sales - Cost) / Sales). Correct formatting and automatic refresh. 100% pass.
  • Excel Copilot: Created the pivot structure but didn't add the calculated field. Required manual formula entry. 60% partial credit.
  • Google Sheets Gemini: Suggested using QUERY function instead of a pivot table. Different approach, but not what was requested. Failed.
  • ChatGPT: Generated a Python pandas pivot_table with correct structure, but output was a static table, not a dynamic pivot. 70% partial credit.

Error Debugging: The Hardest Category for AI

The error debugging tasks were the most challenging across all platforms. Each task presented a broken formula or dataset error and asked the AI to identify and fix it. Categories included #REF! errors (broken cell references), #VALUE! errors (incompatible data types), #DIV/0! errors (division by zero), #N/A errors (lookup failures), circular reference errors, incorrect formula logic, data type mismatches, missing data handling, and performance issues (slow formulas).

Sourcetable scored 10/10 on debugging tasks. It identified the error type, explained the root cause, and provided a corrected formula or data fix. Excel Copilot scored 6.8/10, correctly diagnosing errors but sometimes suggesting incomplete fixes. Google Sheets Gemini scored 6.4/10, struggling with circular references and performance optimization. ChatGPT scored 6.7/10, identifying errors well but occasionally suggesting Python workarounds instead of spreadsheet-native fixes.

Example task: 'Fix this VLOOKUP formula that returns #N/A: =VLOOKUP(A2, Products!B:D, 2, FALSE)'

  • Sourcetable: Identified the issue (lookup value in column A, but lookup range starts at column B—mismatch). Corrected to =VLOOKUP(A2, Products!A:D, 3, FALSE). Explained the fix. 100% pass.
  • Excel Copilot: Suggested adding IFERROR to handle the #N/A, but didn't fix the root cause. Partial credit, 50%.
  • Google Sheets Gemini: Recommended switching to INDEX/MATCH but didn't explain why VLOOKUP failed. Partial credit, 60%.
  • ChatGPT: Correctly diagnosed the column mismatch and suggested the fix, but the explanation was verbose and included unnecessary Python alternatives. 80% pass.

Real-World Impact: Time Savings and Accuracy

The benchmark's 50 tasks represent about 8-12 hours of manual spreadsheet work for an experienced analyst. Sourcetable completed all 50 tasks correctly in under 10 minutes total (average 12 seconds per task). Excel Copilot completed 34 tasks correctly, with the remaining 16 requiring manual intervention—total time approximately 3.5 hours. Google Sheets Gemini completed 32 tasks correctly, with 18 requiring fixes—total time approximately 4 hours. ChatGPT completed 36 tasks correctly, with 14 requiring translation from Python back to spreadsheet format—total time approximately 3 hours.

The accuracy difference matters more than speed. When an AI tool gets a formula 70% correct, you still need to review, debug, and fix it. That's cognitive overhead. With Sourcetable's 100% accuracy, you can trust the output and move to the next task. Over a week of analysis work, that trust saves 10-15 hours of verification and debugging time.

Metric Sourcetable Excel Copilot Google Sheets Gemini ChatGPT
Tasks completed correctly 50/50 (100%) 34/50 (68%) 32/50 (64%) 36/50 (72%)
Time to complete 50 tasks 10 minutes 3.5 hours 4 hours 3 hours
Tasks requiring manual fixes 0 16 18 14
Weekly time saved (vs manual) 12 hours 4.5 hours 4 hours 5 hours

How Sourcetable Achieves Perfect Accuracy

Sourcetable's architecture combines three elements that competitors lack: a library of 1,000+ specialized tools for spreadsheet operations, chain-of-thought reasoning that breaks complex tasks into logical steps before executing, and context-aware tool selection that picks the right approach based on your data structure and question.

When you ask Sourcetable to 'calculate the 30-day moving average of sales,' the AI doesn't generate a generic AVERAGE formula and hope it works. It first inspects your data: Are dates in column A? Is sales data in column B? Are there missing values? Is the data sorted by date? Then it selects the appropriate tool—either a window function, an array formula, or a Python calculation depending on data size and structure. Finally, it writes the result to your spreadsheet with proper formatting and error handling.

This three-step process (understand context → select tool → execute accurately) is what enables 100% accuracy. General-purpose AI tools skip step one and two, jumping straight to execution. That's why they fail on edge cases, ambiguous instructions, and complex multi-step tasks.

Testing Sourcetable Yourself: Replicating the Benchmark

You can test Sourcetable's accuracy on your own data. Here are five benchmark-style tasks to try: (1) Formula generation—ask Sourcetable to 'calculate the compound annual growth rate (CAGR) between starting value in A2 and ending value in B2 over 5 years.' The correct formula is =(B2/A2)^(1/5)-1. (2) Data transformation—upload a CSV with duplicate customer records and ask Sourcetable to 'remove duplicates based on email, keeping the most recent purchase date.' (3) Chart creation—ask Sourcetable to 'create a combo chart showing monthly revenue as bars and cumulative revenue as a line.' (4) Pivot analysis—ask Sourcetable to 'create a pivot table showing total sales by product category and region, with average order value as a calculated field.' (5) Error debugging—intentionally create a VLOOKUP with a column mismatch and ask Sourcetable to 'fix this formula that returns #N/A.'

Sourcetable will pass all five tasks on the first attempt. Try the same tasks in Excel Copilot or Google Sheets Gemini and compare the results. You'll see the accuracy gap immediately.

When Accuracy Matters: Use Cases for Perfect-Score AI

The 100% benchmark score matters most in three scenarios: financial analysis (where a wrong formula can lead to million-dollar errors), regulatory reporting (where accuracy is legally required), and automated workflows (where failures cascade into downstream systems).

In financial analysis, a 68% accuracy rate means 32% of your formulas are wrong. If you're building a discounted cash flow model with 50 formulas, that's 16 incorrect calculations. You might catch the obvious errors, but subtle mistakes—like a VLOOKUP returning the wrong column or a date calculation missing leap years—can slip through and corrupt your entire analysis. Sourcetable's 100% accuracy eliminates that risk.

In regulatory reporting, you can't submit a financial statement with 'mostly correct' numbers. Every figure must be accurate and auditable. Sourcetable's AI generates formulas that match what a senior analyst would write manually, with full transparency into how each calculation works. You can audit the logic, verify the inputs, and trust the output.

In automated workflows, a 72% success rate means 28% of your tasks fail and require manual intervention. If you're running a daily sales report that pulls data, transforms it, and generates charts, a single failure breaks the entire pipeline. Sourcetable's 100% accuracy means your workflows run reliably without constant babysitting.

Limitations and Failure Modes

Sourcetable's 100% score on the Rows.com benchmark doesn't mean it's perfect on every possible task. The benchmark tested 50 specific scenarios—common, well-defined spreadsheet operations. Sourcetable can fail on three types of tasks: extremely ambiguous requests (where even a human analyst would need clarification), custom domain-specific formulas (that require specialized knowledge not in the training data), and tasks requiring external data that isn't connected to your workbook.

For example, if you ask Sourcetable to 'calculate the thing we discussed last week,' it will ask for clarification—just like a human would. If you ask it to 'apply our proprietary risk adjustment formula' without defining that formula, it can't guess what you mean. And if you ask it to 'pull live stock prices' without connecting a financial data source, it will prompt you to add the integration first.

The difference between Sourcetable and competitors isn't that Sourcetable never fails—it's that Sourcetable fails gracefully by asking for clarification, while competitors fail silently by generating incorrect output that looks plausible.

What is the Rows.com spreadsheet AI benchmark?
The Rows.com benchmark is an independent test of AI spreadsheet tools released in March 2026. It evaluates 50 real-world tasks across formula generation, data transformation, chart creation, pivot analysis, and error debugging. Each task is graded pass/fail based on whether the AI produces correct output on the first attempt.
How did Sourcetable score 100% when competitors scored 64-72%?
Sourcetable is built as an AI-native spreadsheet with 1,000+ specialized tools for spreadsheet operations. It uses chain-of-thought reasoning to understand context, select the right tool, and execute accurately. Competitors like Excel Copilot and Google Sheets Gemini are general-purpose AI features added to existing products, which leads to more errors on complex tasks.
Can I replicate the benchmark results myself?
Yes. The Rows.com benchmark tasks are publicly documented. You can test Sourcetable by asking it to generate formulas (like CAGR calculations), transform data (remove duplicates by criteria), create charts (combo charts with multiple series), build pivot tables (with calculated fields), and debug errors (fix #N/A in VLOOKUP). Compare the results to Excel Copilot or Google Sheets Gemini.
Does 100% accuracy mean Sourcetable never makes mistakes?
No. Sourcetable scored 100% on the 50 specific tasks in the Rows.com benchmark. It can still fail on extremely ambiguous requests, custom domain-specific formulas it hasn't been trained on, or tasks requiring external data sources you haven't connected. The key difference is that Sourcetable fails gracefully by asking for clarification, while competitors often generate plausible-looking but incorrect output.
How much time does Sourcetable save compared to manual spreadsheet work?
The 50 benchmark tasks represent 8-12 hours of manual work for an experienced analyst. Sourcetable completed all 50 in under 10 minutes. Over a typical week of analysis work, users report saving 10-15 hours by eliminating formula debugging and manual data transformation.
Why does accuracy matter more than speed for AI spreadsheets?
A formula that's 70% correct still requires human review, debugging, and fixing. That cognitive overhead eliminates most of the time savings from AI. With 100% accuracy, you can trust the output and move immediately to the next task. In financial analysis and regulatory reporting, a single wrong formula can cascade into million-dollar errors or compliance failures.
Can Sourcetable handle tasks that weren't in the benchmark?
Yes. The benchmark tested common spreadsheet operations, but Sourcetable's 1,000+ tools cover advanced scenarios like Monte Carlo simulations, portfolio optimization, web scraping, live database queries, and machine learning predictions. You can ask Sourcetable to perform any data analysis task in natural language.
How does Sourcetable compare to ChatGPT Advanced Data Analysis?
ChatGPT scored 72% on the benchmark by writing Python code to manipulate data, but it doesn't generate native spreadsheet formulas or interactive charts. Sourcetable scored 100% by producing spreadsheet-native output—formulas you can audit, charts that update automatically, and pivot tables that refresh when data changes. ChatGPT is better for one-off analysis; Sourcetable is better for building reusable spreadsheet workflows.
Is Sourcetable free to try?
Yes. Sourcetable offers a free tier that includes access to the AI co-pilot, data connectors, and core spreadsheet features. You can test the benchmark tasks yourself without a credit card. Pro ($20/month) and Max ($200/month) plans add advanced features like live trading execution, 500+ financial data APIs, and unlimited AI workflows.
What types of users benefit most from Sourcetable's perfect accuracy?
Financial analysts (where formula errors lead to wrong investment decisions), data teams (who build automated reporting pipelines), regulatory compliance teams (who need auditable calculations), and business intelligence professionals (who create dashboards for executive decision-making). Anyone whose work depends on correct data analysis benefits from 100% accuracy.
Sourcetable Logo
Test Sourcetable's Perfect Accuracy

Experience the future of spreadsheets

Sources

References and citations used in this article

  1. Rows.com Spreadsheet AI Benchmark (March 2026) - Independent evaluation of AI spreadsheet tools
  2. Sourcetable Platform Documentation (2026) - Technical specifications and capabilities
  3. Microsoft Excel Copilot Documentation (2026) - Feature descriptions and limitations
  4. Google Sheets Gemini Documentation (2026) - AI capabilities and use cases
  5. OpenAI ChatGPT Advanced Data Analysis Documentation (2026) - Python-based data analysis features
Andrew Grosser

Andrew Grosser

Founder, CTO @ Sourcetable

Sourcetable is the Agent first spreadsheet that helps traders, scientists, analysts, and finance teams hypothesize, evaluate, validate, make trades and iterate on trading strategies without writing code.

Share this article

Drop CSV