GitHub – enjector/microgpt-c

🚀 Read this must-read post from Hacker News 📖

📂 **Category**:

✅ **What You’ll Learn**:

A zero-dependency, pure C99 implementation of a GPT-style character-level language model.

The algorithm faithfully matches Andrej Karpathy’s microgpt.py — same architecture, same training loop, same sampling — but compiles to native code with optional compiler-driven SIMD auto-vectorisation for dramatically faster training and inference.

Train a GPT in 20 ms. Generate names in microseconds. No Python. No PyTorch. No GPU.


MicroGPT-C is a minimal, readable implementation of a GPT (Generative Pre-trained Transformer) — the same family of models behind ChatGPT, but stripped down to its essential algorithm. It trains a tiny character-level language model that learns to generate realistic human names from scratch.

The goal is education and experimentation: understand how attention, backpropagation, and the Adam optimiser actually work at the lowest level, without any framework abstractions.

Audience Value
Students & educators Study attention, softmax, Adam, and backprop in readable C — no framework magic
Embedded / edge engineers Entire model fits in < 50 KB RAM; runs on MCUs with no runtime dependencies
Researchers Auditable baseline for quantisation, custom layers, or optimiser experiments
Rapid prototypers Train → iterate in milliseconds; test tokenisers, vocabularies, data formats


# Linux / macOS
chmod +x build.sh
./build.sh
./build/microgpt
:: Windows
build.bat
build\Release\microgpt.exe

The build automatically copies data/names.txt next to the executable.


Measured on the same workload (1,000 training steps, 20 inference samples) — C vs the reference Python:

Metric Python C (fp64) Speedup
Training time ~93 s 0.02 s ~4,600×
Training throughput ~0.1 k tok/s ~289 k tok/s ~2,800×
Steps/sec ~11 ~40,000 ~3,600×
Inference time ~0.74 s < 1 ms ~700×+
Inference rate ~27 samples/s 20,000 samples/s ~740×
Token throughput 109,000 tok/s

INT8 quantised build: ~25% slower training than fp64 on this tiny model, but ~8× smaller weight storage — ideal for constrained devices.


A single-layer, decoder-only Transformer following the GPT-2 design:

Input → Token Embed + Pos Embed → RMSNorm
  → Self-Attention (4 heads, causal) → Residual
  → RMSNorm → MLP (fc1 → ReLU → fc2, 4× width) → Residual
  → Linear (lm_head) → Softmax → next-token probabilities

Parameter Value
Embedding dim 16
Attention heads 4
Layers 1
Context length 16
Total parameters ~4,600
Weight memory (fp64) ~37 KB
Weight memory (INT8) ~4.6 KB
Training memory ~144 KB
Inference memory < 50 KB

Training uses the Adam optimiser with linear learning-rate decay (configurable in microgpt.h).


Build scripts (recommended)

Platform Standard SIMD (faster)
Linux/macOS ./build.sh ./build.sh --simd
Windows build.bat build.bat simd

The --simd flag enables compiler-driven auto-vectorisation of the core dot products, matrix multiplications, and normalisations. On x86-64 the compiler targets the best available instruction set (SSE4, AVX2, etc.) via -march=native; on MSVC it enables /arch:AVX2. This gives a measurable speed-up on larger models without any hand-written intrinsics — the compiler re-writes the scalar loops into SIMD instructions automatically.

# Linux / macOS — auto-detect best ISA
./build.sh --simd

# CMake directly
cmake -DMICROGPT_SIMD=ON ..
cmake --build . --config Release

Weights are stored as 8-bit integers with per-matrix scales — the forward pass dequantises on the fly; Adam updates an fp64 master copy and requantises each step. This reduces weight storage by ~8× (37 KB → 4.6 KB) at a small accuracy/speed trade-off.

Platform Standard SIMD
Linux/macOS ./build_quantised.sh ./build_quantised.sh --simd
Windows build_quantised.bat build_quantised.bat simd

mkdir build && cd build
cmake ..
cmake --build . --config Release

# With INT8 quantisation
cmake -DQUANTIZATION_INT8=ON ..

# With SIMD auto-vectorisation
cmake -DMICROGPT_SIMD=ON ..

# Both
cmake -DQUANTIZATION_INT8=ON -DMICROGPT_SIMD=ON ..

Path Description
microgpt.h Model config, public API declarations
microgpt.c Core engine: model, forward/backward, Adam, data loading
main.c Entry point: load data → train → generate samples
microgpt_amalgamated.c Single-file build — same algorithm, no header needed
data/names.txt Training data (one name per line, ~32k names)
CMakeLists.txt CMake build (C99, Release, optional SIMD / INT8)


microgpt_amalgamated.c is a self-contained single file containing the full GPT algorithm — data loading, training, and inference. No header file needed:

# Compile directly (no CMake required)
cc -O2 -o microgpt microgpt_amalgamated.c -lm
cp data/names.txt . && ./microgpt

# Or via CMake
cmake --build build --config Release --target microgpt_amalgamated
./build/microgpt_amalgamated

  • C99 compiler (GCC, Clang, MSVC)
  • CMake 3.10+
  • No other dependencies

MIT — see LICENSE and source file headers.

Author: Ajay Soni (ajay.soni@enjector.com), Enjector Software Ltd.

🔥 **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#GitHub #enjectormicrogptc**

🕒 **Posted on**: 1771294367

🌟 **Want more?** Click here for more info! 🌟

By

Leave a Reply

Your email address will not be published. Required fields are marked *