Decide Agent Scores 82.5% on SpreadsheetBench, the Global Leaderboard for Excel AI Agents
Decide scored 82.5% on SpreadsheetBench Verified, passing 330/400 tasks with 100% completion.
Posted by
Decide R&D Team
We submitted Decide to SpreadsheetBench Verified for independent evaluation.
We scored 82.50% accuracy, passing 330 out of 400 tasks, with a 100% completion rate and an average processing time of ~10 seconds per task.
Before we talk about what that means, it’s worth explaining why this benchmark matters more than any other in the space.
Why SpreadsheetBench Matters
The benchmark was developed by researchers at Tsinghua and Renmin University and accepted as a spotlight at NeurIPS 2024. It is the most difficult and most widely adopted benchmark for evaluating AI agents on spreadsheet tasks. Microsoft used it to evaluate Copilot in Excel. OpenAI used it to benchmark ChatGPT Agent. Anthropic used it to measure Claude. When frontier labs need to prove their agent can handle Excel, this is where they come.
The benchmark consists of real tasks pulled from online Excel forums, not synthetic problems designed in a lab, but actual questions from actual users who were stuck on actual spreadsheets.
The tasks span the full range of what people need from Excel: finding and extracting data, writing complex formulas, manipulating cells across multiple sheets, handling non-standard layouts with merged cells and missing headers, summarizing data that lives across different tables in different formats.
Each task comes with multiple test cases. A solution can't just work on one spreadsheet. It has to work on several variations of the same structure with different data. This is the equivalent of an online judge in competitive programming: your code either passes all cases or it doesn't. No partial credit.
This matters because it filters out brittle solutions. An agent that memorizes patterns or overfits to one file layout will fail. An agent that actually understands spreadsheet structure will generalize.
The original benchmark had 912 tasks. In late 2025, the authors collaborated with Shortcut’s Fundamental Research Labs to release SpreadsheetBench Verified, a curated set of 400 tasks. They removed ambiguous instructions, non-deterministic outputs, and tasks that couldn’t be scored reliably. Easier tasks were filtered out. Every remaining task was reviewed through four layers: automated consistency checks, external spreadsheet specialists, internal expert review, and final validation by the original authors.
The Verified set is now the gold standard for rigorous evaluation of spreadsheet AI agents.
The Leaderboard
SpreadsheetBench has two leaderboards: Full (912 tasks) and Verified (400 tasks).
On the Verified leaderboard, the SpreadsheetBench team runs the evaluation independently. You submit API access. They test your system. There is no way to cherry-pick results.
Here’s where things stand on the Verified leaderboard:
| Rank | Agent | Accuracy | Organization |
|---|---|---|---|
| #1 | Nobie Agent | 91.00% | Nobie |
| #2 | Qingqiu Agent | 89.25% | Kingsoft Office |
| #3 | Shortcut.ai | 86.00% | Shortcut.ai |
| #4 | Decide Agent | 82.50% | Decide AI |
Four agents. All above 82%. The entire verified frontier sits between 82.5% and 91%.
This is what the top of spreadsheet AI looks like right now. There is no runaway leader. Decide is part of that frontier, alongside Kingsoft, a publicly traded company with more than 5,000 employees and a $5 billion market cap, and Shortcut, which co-developed the Verified benchmark itself through a dedicated research lab.
For context, on the Full SpreadsheetBench leaderboard (912 tasks), Microsoft Copilot in Excel scored 57.2%, OpenAI's ChatGPT Agent scored 45.5%, and Claude scored 42.9%, all self-reported results. The Verified set we were evaluated on is a curated subset of those same 912 tasks, with easier and ambiguous tasks filtered out.
Our Results in Detail
The SpreadsheetBench team completed their evaluation and shared the following breakdown:
| Metric | Result |
|---|---|
| Overall Accuracy | 82.50% |
| Tasks Passed | 330 / 400 |
| Sheet-Level Accuracy | 88.80% |
| Cell-Level Accuracy | 79.64% |
| Completion Rate | 100% |
| Avg. Processing Time | ~10 seconds |
Two numbers stand out.
88.80% sheet-level accuracy. Sheet-level tasks are the hard ones. They involve multi-table reasoning, cross-sheet references, and structural transformations, the kind of work where you need to understand the whole workbook, not just the cell you're editing. This is where most agents break down.
100% completion rate. Our system produced an output for every single task with no timeouts and no refusals.
Who We Are
Decide is a three-person team with no venture funding.
Yet our Excel agent achieved independently verified results on SpreadsheetBench Verified, one of the most rigorous public benchmarks for spreadsheet AI.
For comparison:
- Kingsoft has over 5,000 employees and a ~$5B market cap
- Shortcut operates a dedicated research lab
- Microsoft, OpenAI, and Anthropic control hundreds of billions in resources, yet have not submitted to the Verified leaderboard
Decide is one of the few teams with publicly validated performance, not self-reported, and our results rank significantly above baseline.
What's Next
This is our first entry into a global benchmark and we will continue to upgrade our systems to be best in class.
We are also building a Google Workspace plugin, bringing Decide to where people already do their work. Whether it's Excel, Google Sheets, or the tools in between, our goal is to make spreadsheet analysis easier everywhere.
Try Decide at https://trydecide.ai. For enterprise inquiries or research questions, reach us at https://www.trydecide.ai/support.