Artificial intelligence just took a big step closer to mimicking how humans think. Scientists at Singapore-based AI company Sapient have unveiled a new model, the Hierarchical Reasoning Model (HRM), which they claim outperforms even the most advanced large language models (LLMs) like ChatGPT in reasoning benchmarks.
This breakthrough is more than just another model—it represents a different way of thinking for AI, inspired by the human brain itself.
What Is the Hierarchical Reasoning Model (HRM)?
Unlike LLMs, which rely on billions or trillions of parameters to generate responses, HRM is designed to reason the way humans do—by integrating information at multiple levels and timescales.
The researchers explain that HRM mirrors the brain’s approach to processing information:
- High-level module: handles abstract planning over longer timescales.
- Low-level module: focuses on rapid, detail-oriented computations.
Instead of the chain-of-thought (CoT) method—breaking problems into smaller natural language steps—HRM performs sequential reasoning in a single forward pass. It uses iterative refinement, short bursts of “thinking” that decide whether to keep processing or finalize an answer.
Here’s the kicker: HRM has only 27 million parameters and was trained on just 1,000 samples. For comparison, GPT-5 is estimated to have between 3 trillion and 5 trillion parameters.
Benchmark Results: HRM vs. Leading AI Models
To test its reasoning power, HRM was evaluated on the ARC-AGI benchmark, a notoriously difficult test designed to measure progress toward Artificial General Intelligence (AGI).
Here’s how HRM performed compared to popular LLMs:
- ARC-AGI-1 Benchmark:
o3-mini-high (OpenAI): 34.5%
Claude 3.7 (Anthropic): 21.2%
DeepSeek R1: 15.8%
ARC-AGI-2 Benchmark (tougher):
o3-mini-high: 3%
DeepSeek R1: 1.3%
Claude 3.7: 0.9%
These are huge leaps, especially given HRM’s dramatically smaller size. In real-world tasks, HRM achieved near-perfect performance on complex Sudoku puzzles and outperformed LLMs in maze path-finding—two problem types where conventional models tend to fail.
Why HRM Works Differently
Most advanced LLMs depend on chain-of-thought reasoning. While powerful, CoT is limited by:
- Brittle task decomposition – problems can fall apart if steps are mis-specified.
- Extensive data requirements – requiring massive training sets.
- High latency – responses take longer as each step is processed.
HRM avoids these bottlenecks by blending planning and rapid computation into one process. Think of it as the brain’s prefrontal cortex working with sensory systems to solve problems—fast, flexible, and efficient.
The Catch: Early Days and Questions
While the results are exciting, the research is still preprint and not peer-reviewed. Even more interestingly, when ARC-AGI organizers reproduced the results, they found that the hierarchical architecture itself had less impact than expected. Instead, it was a refinement process during training that gave HRM much of its boost.
This raises critical questions:
- Is HRM’s architecture the real breakthrough, or is its training method the secret sauce?
- Can this approach scale to larger models and real-world applications?
- Will brain-inspired designs become the new frontier for AI?
Why This Matters
If confirmed, HRM could mark a paradigm shift in AI development. Instead of relying on brute-force scale (trillions of parameters, enormous energy costs), models could achieve better reasoning with fewer resources.
That would mean:
- More efficient AI – less energy-hungry than today’s LLMs.
- Smarter reasoning – solving puzzles and logic tasks beyond current models.
- Closer steps toward AGI – a model that thinks more like us.
For now, HRM is still experimental. But its success on ARC-AGI shows that the next wave of AI may come not from scaling up, but from rethinking how machines reason—like the human brain.
Final Thought: AI has learned to speak like us, but can it learn to think like us? The Hierarchical Reasoning Model might be the first serious attempt at answering “yes.”