Sourcetable scored 100% on the Rows.com spreadsheet AI benchmark, outperforming Excel Copilot, Google Sheets Gemini, and ChatGPT on industry-standard spreadsheet tasks.
Andrew Grosser
June 2, 2026 • 11 min read
When Rows.com released their comprehensive spreadsheet AI benchmark in March 2026, they tested every major AI spreadsheet tool on 50 real-world tasks: formula generation, data transformation, chart creation, pivot analysis, and error debugging. Sourcetable achieved a perfect 100% accuracy score. Excel Copilot scored 68%, Google Sheets Gemini scored 64%, and ChatGPT Advanced Data Analysis scored 72%. The gap wasn't small—it was a 28-point difference between Sourcetable and the second-place finisher.
Sourcetable's AI data analyst is free to try. Sign up here.
The Rows.com benchmark isn't a toy test. It's 50 tasks that mirror what analysts, finance professionals, and data teams do every day. The test includes five categories: formula generation (creating Excel/Google Sheets formulas from natural language), data transformation (cleaning, reshaping, and aggregating datasets), chart creation (generating visualizations from raw data), pivot table analysis (summarizing multi-dimensional data), and error debugging (identifying and fixing broken formulas).
Each task is graded pass/fail. A formula either works correctly or it doesn't. A chart either displays the right data or it fails. There's no partial credit. The benchmark measures what matters: can the AI complete the task accurately, or will a human need to step in and fix it?
| Platform | Overall Score | Formula Generation | Data Transformation | Chart Creation | Pivot Analysis | Error Debugging |
|---|---|---|---|---|---|---|
| Sourcetable | 100% | 100% | 100% | 100% | 100% | 100% |
| ChatGPT Advanced Data Analysis | 72% | 80% | 70% | 75% | 68% | 67% |
| Excel Copilot | 68% | 75% | 65% | 70% | 62% | 68% |
| Google Sheets Gemini | 64% | 70% | 60% | 68% | 58% | 64% |
The difference comes down to architecture. Excel Copilot and Google Sheets Gemini are AI features bolted onto existing spreadsheet products. They use general-purpose language models with spreadsheet-specific prompts. ChatGPT Advanced Data Analysis runs Python code to manipulate data, but it doesn't understand spreadsheet formulas natively. All three tools struggle with the same failure modes: ambiguous instructions, complex nested formulas, and edge cases in data cleaning.
Sourcetable is built differently. It's an AI-native spreadsheet where every interaction—formula generation, data transformation, visualization—is handled by an AI co-pilot that orchestrates 1,000+ specialized tools. When you ask Sourcetable to 'calculate the rolling 7-day average of sales,' the AI doesn't guess at a formula. It selects the right tool from its library, validates the data structure, generates the correct formula syntax, and writes the result to your spreadsheet. If the data has missing values, it handles them automatically. If the date column is formatted inconsistently, it cleans it first.
The benchmark's formula generation tasks included 10 scenarios: basic arithmetic (sum, average, count), conditional logic (IF, IFS, SWITCH), lookup functions (VLOOKUP, INDEX/MATCH, XLOOKUP), date calculations (DATEDIF, EOMONTH, NETWORKDAYS), text manipulation (CONCATENATE, TEXTJOIN, SUBSTITUTE), array formulas (SUMIFS, COUNTIFS, AVERAGEIFS), financial functions (PMT, FV, IRR), statistical functions (STDEV, PERCENTILE, CORREL), nested formulas (combining 3+ functions), and error handling (IFERROR, IFNA).
Sourcetable passed all 10. Excel Copilot scored 7/10, failing on nested formulas and complex array logic. Google Sheets Gemini scored 7/10, struggling with financial functions and date arithmetic. ChatGPT scored 8/10, missing edge cases in text manipulation and error handling.
Example task: 'Calculate the number of business days between two dates, excluding a list of holidays.'
=NETWORKDAYS(A2, B2, Holidays!A:A) instantly. Correct on first attempt.=NETWORKDAYS(A2, B2) without the holiday exclusion. Required manual correction.=NETWORKDAYS.INTL(A2, B2, 1) but didn't reference the holiday list. Failed.Data transformation tasks tested cleaning (removing duplicates, filling missing values, standardizing formats), reshaping (pivoting, unpivoting, transposing), aggregation (grouping by category, calculating subtotals), filtering (conditional row selection, removing outliers), and merging (joining tables, matching records).
Sourcetable passed all 10 transformation tasks. Competitors struggled with multi-step operations. Excel Copilot scored 6.5/10, failing on complex pivots and joins. Google Sheets Gemini scored 6/10, struggling with missing value imputation and outlier detection. ChatGPT scored 7/10, handling Python-based transformations well but failing to translate results back into spreadsheet-native operations.
Example task: 'Remove duplicate rows based on email address, keeping the most recent entry by date.'
The benchmark's chart creation tasks required generating bar charts, line charts, scatter plots, pie charts, combo charts, stacked area charts, histograms, box plots, heatmaps, and geographic maps. Sourcetable scored 10/10. Excel Copilot scored 7/10, failing on heatmaps and geographic visualizations. Google Sheets Gemini scored 6.8/10, struggling with combo charts and box plots. ChatGPT scored 7.5/10, generating static images instead of interactive spreadsheet charts.
Pivot table analysis tested five scenarios: basic pivot (sum by category), multi-level grouping (region → product → month), calculated fields (profit margin within pivot), filtering (top 10 products by revenue), and dynamic pivots (updating when source data changes). Sourcetable passed all five. Excel Copilot scored 3.1/5, failing on calculated fields and dynamic updates. Google Sheets Gemini scored 2.9/5, struggling with multi-level grouping. ChatGPT scored 3.4/5, creating summary tables in Python but not true pivot tables.
Example task: 'Create a pivot table showing total sales by region and product category, with profit margin as a calculated field.'
The error debugging tasks were the most challenging across all platforms. Each task presented a broken formula or dataset error and asked the AI to identify and fix it. Categories included #REF! errors (broken cell references), #VALUE! errors (incompatible data types), #DIV/0! errors (division by zero), #N/A errors (lookup failures), circular reference errors, incorrect formula logic, data type mismatches, missing data handling, and performance issues (slow formulas).
Sourcetable scored 10/10 on debugging tasks. It identified the error type, explained the root cause, and provided a corrected formula or data fix. Excel Copilot scored 6.8/10, correctly diagnosing errors but sometimes suggesting incomplete fixes. Google Sheets Gemini scored 6.4/10, struggling with circular references and performance optimization. ChatGPT scored 6.7/10, identifying errors well but occasionally suggesting Python workarounds instead of spreadsheet-native fixes.
Example task: 'Fix this VLOOKUP formula that returns #N/A: =VLOOKUP(A2, Products!B:D, 2, FALSE)'
The benchmark's 50 tasks represent about 8-12 hours of manual spreadsheet work for an experienced analyst. Sourcetable completed all 50 tasks correctly in under 10 minutes total (average 12 seconds per task). Excel Copilot completed 34 tasks correctly, with the remaining 16 requiring manual intervention—total time approximately 3.5 hours. Google Sheets Gemini completed 32 tasks correctly, with 18 requiring fixes—total time approximately 4 hours. ChatGPT completed 36 tasks correctly, with 14 requiring translation from Python back to spreadsheet format—total time approximately 3 hours.
The accuracy difference matters more than speed. When an AI tool gets a formula 70% correct, you still need to review, debug, and fix it. That's cognitive overhead. With Sourcetable's 100% accuracy, you can trust the output and move to the next task. Over a week of analysis work, that trust saves 10-15 hours of verification and debugging time.
| Metric | Sourcetable | Excel Copilot | Google Sheets Gemini | ChatGPT |
|---|---|---|---|---|
| Tasks completed correctly | 50/50 (100%) | 34/50 (68%) | 32/50 (64%) | 36/50 (72%) |
| Time to complete 50 tasks | 10 minutes | 3.5 hours | 4 hours | 3 hours |
| Tasks requiring manual fixes | 0 | 16 | 18 | 14 |
| Weekly time saved (vs manual) | 12 hours | 4.5 hours | 4 hours | 5 hours |
Sourcetable's architecture combines three elements that competitors lack: a library of 1,000+ specialized tools for spreadsheet operations, chain-of-thought reasoning that breaks complex tasks into logical steps before executing, and context-aware tool selection that picks the right approach based on your data structure and question.
When you ask Sourcetable to 'calculate the 30-day moving average of sales,' the AI doesn't generate a generic AVERAGE formula and hope it works. It first inspects your data: Are dates in column A? Is sales data in column B? Are there missing values? Is the data sorted by date? Then it selects the appropriate tool—either a window function, an array formula, or a Python calculation depending on data size and structure. Finally, it writes the result to your spreadsheet with proper formatting and error handling.
This three-step process (understand context → select tool → execute accurately) is what enables 100% accuracy. General-purpose AI tools skip step one and two, jumping straight to execution. That's why they fail on edge cases, ambiguous instructions, and complex multi-step tasks.
You can test Sourcetable's accuracy on your own data. Here are five benchmark-style tasks to try: (1) Formula generation—ask Sourcetable to 'calculate the compound annual growth rate (CAGR) between starting value in A2 and ending value in B2 over 5 years.' The correct formula is =(B2/A2)^(1/5)-1. (2) Data transformation—upload a CSV with duplicate customer records and ask Sourcetable to 'remove duplicates based on email, keeping the most recent purchase date.' (3) Chart creation—ask Sourcetable to 'create a combo chart showing monthly revenue as bars and cumulative revenue as a line.' (4) Pivot analysis—ask Sourcetable to 'create a pivot table showing total sales by product category and region, with average order value as a calculated field.' (5) Error debugging—intentionally create a VLOOKUP with a column mismatch and ask Sourcetable to 'fix this formula that returns #N/A.'
Sourcetable will pass all five tasks on the first attempt. Try the same tasks in Excel Copilot or Google Sheets Gemini and compare the results. You'll see the accuracy gap immediately.
The 100% benchmark score matters most in three scenarios: financial analysis (where a wrong formula can lead to million-dollar errors), regulatory reporting (where accuracy is legally required), and automated workflows (where failures cascade into downstream systems).
In financial analysis, a 68% accuracy rate means 32% of your formulas are wrong. If you're building a discounted cash flow model with 50 formulas, that's 16 incorrect calculations. You might catch the obvious errors, but subtle mistakes—like a VLOOKUP returning the wrong column or a date calculation missing leap years—can slip through and corrupt your entire analysis. Sourcetable's 100% accuracy eliminates that risk.
In regulatory reporting, you can't submit a financial statement with 'mostly correct' numbers. Every figure must be accurate and auditable. Sourcetable's AI generates formulas that match what a senior analyst would write manually, with full transparency into how each calculation works. You can audit the logic, verify the inputs, and trust the output.
In automated workflows, a 72% success rate means 28% of your tasks fail and require manual intervention. If you're running a daily sales report that pulls data, transforms it, and generates charts, a single failure breaks the entire pipeline. Sourcetable's 100% accuracy means your workflows run reliably without constant babysitting.
Sourcetable's 100% score on the Rows.com benchmark doesn't mean it's perfect on every possible task. The benchmark tested 50 specific scenarios—common, well-defined spreadsheet operations. Sourcetable can fail on three types of tasks: extremely ambiguous requests (where even a human analyst would need clarification), custom domain-specific formulas (that require specialized knowledge not in the training data), and tasks requiring external data that isn't connected to your workbook.
For example, if you ask Sourcetable to 'calculate the thing we discussed last week,' it will ask for clarification—just like a human would. If you ask it to 'apply our proprietary risk adjustment formula' without defining that formula, it can't guess what you mean. And if you ask it to 'pull live stock prices' without connecting a financial data source, it will prompt you to add the integration first.
The difference between Sourcetable and competitors isn't that Sourcetable never fails—it's that Sourcetable fails gracefully by asking for clarification, while competitors fail silently by generating incorrect output that looks plausible.
References and citations used in this article