Revisiting “Let’s Build a Compiler”

🚀 Explore this awesome post from Hacker News 📖

📂 Category:

✅ Key idea:

There’s an old compiler-building tutorial that has become part of the field’s
lore: the Let’s Build a Compiler
series by Jack Crenshaw (published between 1988 and 1995).

I ran into it in 2003
and was very impressed, but it’s now 2025 and this tutorial is still being mentioned quite
often in Hacker News threads.
Why is that? Why does a tutorial from 35
years ago, built in Pascal and emitting Motorola 68000 assembly – technologies that
are virtually unknown for the new generation of programmers – hold sway over
compiler enthusiasts? I’ve decided to find out.

The tutorial is easily available and readable online, but
just re-reading it seemed insufficient. So I’ve decided on meticulously
translating the compilers built in it to Python and emit a more modern target –
WebAssembly. It was an enjoyable process and I want to share the outcome and
some insights gained along the way.

The result is this code repository.
Of particular interest is the TUTORIAL.md file,
which describes how each part in the original tutorial is mapped to my code. So
if you want to read the original tutorial but play with code you can actually
easily try on your own, feel free to follow my path.

A sample

To get a taste of the input language being compiled and the output my compiler
generates, here’s a sample program in the KISS language designed by Jack
Crenshaw:

var X=0

 ⚡
 procedure addseq(n, ref result)
     var i, sum  What do you think?
     while i < n
         sum = sum + i
         i = i + 1
     end
     result = result + sum
 end

 program testprog
 begin
     addseq(11, X)
 end
 .

It’s from part 13 of the tutorial, so it showcases procedures along with control
constructs like the while loop, and passing parameters both by value and by
reference. Here’s the WASM text generated by my compiler for part 13:

(module
  (memory 8)
  ;; Linear stack pointer. Used to pass parameters by ref.
  ;; Grows downwards (towards lower addresses).
  (global $__sp (mut i32) (i32.const 65536))

  (global $X (mut i32) (i32.const 0))

  (func $ADDSEQ (param $N i32) (param $RESULT i32)
    (local $I i32)
    (local $SUM i32)
    loop $loop1
      block $breakloop1
        local.get $I
        local.get $N
        i32.lt_s
        i32.eqz
        br_if $breakloop1
        local.get $SUM
        local.get $I
        i32.add
        local.set $SUM
        local.get $I
        i32.const 1
        i32.add
        local.set $I
        br $loop1
      end
    end
    local.get $RESULT
    local.get $RESULT
    i32.load
    local.get $SUM
    i32.add
    i32.store
  )

  (func $main (export "main") (result i32)
    i32.const 11
    global.get $__sp      ;; make space on stack
    i32.const 4
    i32.sub
    global.set $__sp
    global.get $__sp
    global.get $X
    i32.store
    global.get $__sp    ;; push address as parameter
    call $ADDSEQ
    ;; restore parameter X by ref
    global.get $__sp
    i32.load offset=0
    global.set $X
    ;; clean up stack for ref parameters
    global.get $__sp
    i32.const 4
    i32.add
    global.set $__sp
    global.get $X
  )
)

You’ll notice that there is some trickiness in the emitted code w.r.t. handling
the by-reference parameter (my previous post
deals with this issue in more detail). In general, though, the emitted code is
inefficient – there is close to 0 optimization applied.

Also, if you’re very diligent you’ll notice something odd about the global
variable X – it seems to be implicitly returned by the generated main
function. This is just a testing facility that makes my compiler easy to test.
All the compilers are extensively tested – usually by running the
generated WASM code and verifying expected results.

Insights – what makes this tutorial so special?

While reading the original tutorial again, I had on opportunity to reminisce on
what makes it so effective. Other than the very fluent and conversational
writing style of Jack Crenshaw, I think it’s a combination of two key
factors:

The tutorial builds a recursive-descent parser step by step, rather than
giving a long preface on automata and table-based parser generators. When
I first encountered it (in 2003), it was taken for granted that if you want
to write a parser then lex + yacc are the way to go . Following the
development of a simple and clean hand-written
parser was a revelation that wholly changed my approach to the subject;
subsequently, hand-written recursive-descent parsers have been my go-to approach
for almost 20 years now.
Rather than getting stuck in front-end minutiae, the tutorial goes straight
to generating working assembly code, from very early on. This was also a
breath of fresh air for engineers who grew up with more traditional courses
where you spend 90% of the time on parsing, type checking and other semantic
analysis and often run entirely out of steam by the time code generation
is taught.

To be honest, I don’t think either of these are a big problem with modern
resources, but back in the day the tutorial clearly hit the right nerve with
many people.

What else does it teach us?

Jack Crenshaw’s tutorial takes the syntax-directed translation
approach, where code is emitted while parsing, without having to divide the
compiler into explicit phases with IRs. As I said above, this is a fantastic
approach for getting started, but in the latter parts of the tutorial it starts
showing its limitations. Especially once we get to types, it becomes painfully
obvious that it would be very nice if we knew the types of expressions before
we generate code for them.

I don’t know if this is implicated in Jack Crenshaw’s abandoning the tutorial
at some point after part 14, but it may very well be. He keeps writing how
the emitted code is clearly sub-optimal and can be improved, but IMHO it’s
just not that easy to improve using the syntax-directed translation strategy.
With perfect hindsight vision, I would probably use Part 14 (types) as a turning
point – emitting some kind of AST from the parser and then doing simple type
checking and analysis on that AST prior to generating code from it.

Conclusion

All in all, the original tutorial remains a wonderfully readable introduction
to building compilers. This post and the GitHub repository
it describes are a modest
contribution that aims to improve the experience of folks reading the original
tutorial today and not willing to use obsolete technologies. As always, let
me know if you run into any issues or have questions!

💬 Tell us your thoughts in comments!

#️⃣ #Revisiting #Lets #Build #Compiler

🕒 Posted on 1765352371

Revisiting “Let’s Build a Compiler”

A sample

Insights – what makes this tutorial so special?

What else does it teach us?

Conclusion

By

Leave a Reply Cancel reply