🚀 Discover this must-read post from Hacker News 📖
📂 **Category**:
✅ **What You’ll Learn**:
March 2026
NanoGPT Slowrun is an open effort to implement data-efficient learning algorithms; 5.5x data efficiency in the first week and improving.
Compute grows much faster than data

Last week we released NanoGPT Slowrun
What we’ve found so far
Muon outperforms every optimizer we tested (AdamW, SOAP, MAGMA). Multi-epoch training matters. And following work by Kotha et al.
Update: 5.5x Data Efficiency
Since the initial release, community contributions have pushed data efficiency from ~2.4x to 5.5x against modded-nanogpt, more than doubling in a few days. The key changes are: shuffling at the start of each epoch, which had outsized impact on multi-epoch training; learned projections for value embeddings instead of separate embedding tables; swapping squared ReLU for SwiGLU activation; and ensembling multiple models. 10x data efficiency seems reachable in the short term. 100x might be feasible by the end of the year, given how many directions remain unexplored, but it will require serious exploration on the algorithms side.

Directions we think are wide open
- Second-order optimizers and natural gradient methods
- Diffusion models
- Curriculum learning
- Gradient descent alternatives like evolutionary search
- Optimizing for compression/model-complexity
If you’re working on any of this or something we haven’t thought of, open an issue on the repo, or email research@qlabs.sh.
← Back to Q
💬 **What’s your take?**
Share your thoughts in the comments below!
#️⃣ **#NanoGPT #Slowrun**
🕒 **Posted on**: 1772686009
🌟 **Want more?** Click here for more info! 🌟
