Why Law Is Law-Shaped — LawVM

🔥 Read this awesome post from Hacker News 📖

📂 **Category**:

💡 **What You’ll Learn**:

I. The Structural Constraint

Law is an incrementally maintained system authored by distributed agents with
partial authority over time, requiring stable fine-grained addresses for
external reference.

Every element of this definition is load-bearing:

Incrementally maintained: Parliament cannot restate the entire legal
corpus each session. Amendments modify specific provisions of existing
statutes. The legal state at any moment is the accumulated result of
thousands of incremental patches applied over decades or centuries.
Distributed agents with partial authority: Different parliaments, at
different times, with different mandates, enacted different provisions.
A subsection added in 2021 coexists with a section from 1995 and a
chapter structure from 1972. Each retains its own authority provenance.
The current text of a statute is a palimpsest of multiple authors across
time.
Stable fine-grained addresses: Other laws say “pursuant to Section
12(2) of Act X.” Court decisions cite specific provisions. Contracts
reference them. These are external pointers into the legal corpus. If
addresses change, external references break silently. The addressing
scheme must survive amendments — which is why law uses hierarchical
structural paths rather than page numbers or byte offsets.

Software codebases evolved the same structural constraint for the same
reason: incremental modification, multiple authors, stable external
references (API contracts, imports, URLs). The resemblance between
git blame and statutory provenance tracing is convergent evolution from
identical structural pressure.

II. The Tree Is a Serialization Format

Statutes are organized as trees: parts contain chapters, chapters contain
sections, sections contain subsections. This hierarchy exists because
paper is linear — a statute must be printed as a sequence of pages,
and the hierarchy provides navigable structure.

But law does not operate as a tree:

Section 12 says “as defined in Section 4” — a cross-reference, a
pointer from one node to another, often across branches.
Section 30 says “notwithstanding Section 15(3)” — a conditional
override, an edge that modifies the meaning of a distant node.
A tax statute says “as specified in Regulation (EU) 2016/679 Article 4”
— a cross-jurisdiction dependency, linking nodes in different
legal corpora entirely.
An EU directive requires member state implementation — the Finnish
implementing statute is a derived node whose existence was caused
by an EU-level obligation.

These are graph relationships. They connect nodes across branches of the
tree, across statutes, across jurisdictions. The tree structure cannot
represent them — it only holds the content of each provision and its
position within one statute’s hierarchy.

This separation of structure from semantics is not new. Akoma Ntoso
(ISO 24679) separated the document structure (the tree) from legal
analysis (references, metadata, lifecycle) in the early 2000s. ELI and
FRBR provide identification frameworks. LegalRuleML encodes normative
content. The contribution here is not the observation that law has graph
structure — that’s established — but the argument that computing the
text layer correctly as a reproducible, inspectable replay process
is a prerequisite for computing the semantic layer correctly.

Law is written as a tree because paper demands it. Law operates
as a graph because provisions interact through references, overrides, and
dependencies that ignore hierarchical boundaries. A legal state compiler
operates on the tree (the text) to produce the substrate that semantic
tools operate on.

III. The Amendment Is an Operation, Not an Edit

An amendment act does not say “here is the new text of Section 12.” It says
“Section 12, subsection 2 is amended to read as follows.” This is a
typed operation with:

A target address: Section 12, subsection 2
An action: replace (or: repeal, insert, renumber)
A payload: the new text
A source: which act, enacted when, effective when, by whose authority

The vocabulary of text-level operations is small:

This vocabulary is verified across Finnish, UK, Estonian, and EU amendment
systems. The surface language differs (“muutetaan,” “muudetakse,” “is
amended to read”) but the structural operations on the text are the same.

III.1: What the vocabulary does NOT cover

The text-level operation vocabulary captures how the serialized text
changes. It does not capture how the legal meaning changes. Several
classes of legal action operate on meaning rather than text:

Interpretive overlays. “Section 5 shall be read as if ‘the Board’
meant ‘the Council’.” The text of §5 is unchanged. Its meaning changes.
This is a semantic operation, not a text operation. Common law “deeming
clauses” and reading-down provisions operate this way.

Delegated legislation. An act empowers a minister to make regulations.
This creates authority to produce new law — a meta-operation that
generates future operations, not an operation on existing text.

Conditional applicability. “This section applies only to entities
exceeding a turnover of €10M.” The text exists unconditionally; its legal
effect is conditional. The condition is metadata, not a text operation.

Revivor. If a repealer is itself repealed, does the original provision
revive? The answer is jurisdiction-dependent (no in some common law
systems, yes in others). The tombstone model (repeal = version with null
content) doesn’t inherently resolve this — it requires a policy decision
at the VM level.

Immutable-base traditions. In systems where the foundational text is
sacred or constitutionally entrenched (Sharia, constitutional provisions
with supermajority requirements), the operation isn’t “replace” — it’s
“layer an interpretation over an immutable base.” The base text cannot be
patched; it can only be wrapped.

These are real phenomena. A compiler that claims to capture “what the law
says” must either model them or explicitly scope itself to the text layer.
LawVM takes the second approach: it compiles the text state — what
each provision literally says at a point in time — and leaves the
normative state (what the law means, how it applies, what obligations
it creates) to downstream semantic tools. This is a deliberate separation
of concerns: get the text right first, then build interpretation on a
correct textual substrate. The alternative — building semantic models on
unverified text — is what produced decades of legal ontology work with no
reliable text layer underneath it.

IV. Multiple Time Axes

An amendment published in December 2024 may take effect on January 1, 2026.
During the intervening 13 months, the amendment exists — enacted,
published, legally valid — but the provisions it modifies have not yet
changed for legal purposes.

Law has at least two temporal dimensions:

Publication/enactment time: when the amendment was officially decided
Legal effect time: when the changed provisions enter into force

These axes are independent. “What has parliament decided?” and “what is
the law?” give different answers during the gap.

It gets harder:

A single amendment act can specify different effective dates for
different provisions (§§1–5 immediately, §6 next year, §7 “when a
decree so provides”).
Retroactive amendments change the legal effect of provisions for a
past period — an amendment published today can declare that it applies
from last year. This retroactively alters the legal state at historical
points in time.
Ultra-active provisions continue to apply to past events even after
repeal. A provision repealed today still governs contracts entered into
while it was in force. The provision is legally dead for new events but
alive for old ones.

A third operational axis exists: corpus observation time — when the
compiler ingested the data. If Finlex publishes a correction, the corpus
before and after differs. Reproducibility requires recording which source
version was used.

Version control systems like git have one temporal dimension (commit
history). Law requires at minimum two substantive dimensions plus
conditional dimensions (territory, sector, contingency). “What is the
law at date T?” is a multi-dimensional query, not a linear checkout.

V. The Atom Is Not What You Think

The tree structure suggests that provisions (sections, articles) are the
atoms — the smallest units. In practice, the atom is whatever
granularity the amendment system addresses.

Most amendments target structural nodes: replace this section, repeal this
subsection. These are tree operations.

But text-level amendments go below any structural node: “In Section 12(2),
the words ‘Secretary of State’ are replaced by the word ‘Minister’.”
This targets a substring within a leaf node. The sentence is not a node
in the tree. The amendment operates at a granularity finer than the
tree’s resolution.

Two regimes:

Structural operations: target a node. The tree handles these.
Text operations: target content within a node. The tree represents
the result (leaf text changed) but not the operation (which words and
why).

Note that even a “simple” word substitution like replacing “Secretary of
State” with “Minister” shifts statutory powers — it’s not just a string
edit but a reallocation of legal authority. The compiler operates on the
text; downstream normative analysis operates on the meaning. Both layers
matter. The compiler provides the correct text so that normative analysis
can be accurate.

VI. Convergent Evolution of Addressing

Any system requiring stable addressing into a hierarchically maintained
structure under distributed authority converges on the same pattern:
path-based identifiers.

Law: §12(2)(c) — path through the statute tree
OIDs (ITU-T/ISO): 1.3.6.1.4.1.311 — path through the global
object tree
Xanadu tumblers (Nelson, 1960s): hierarchical address down to
character granularity
ELI (European Legislation Identifier): /eli/fi/sd/2002/738/…
FRBR (Functional Requirements for Bibliographic Records): Work →
Expression → Manifestation → Item hierarchy
Filesystem paths: /usr/local/bin/lawvm

These are all the same abstraction: a tuple of (kind, label) pairs
specifying a traversal through a hierarchy.

The convergence is forced by the structural constraint: when multiple
agents independently modify parts of a hierarchical structure, and
external systems reference specific parts by address, the address must
encode the path through the hierarchy.

Path-based addresses have a known weakness: they are positional.
Insertion of §11a between §11 and §12 doesn’t break §12’s path, but
renumbering (§12 becomes §13) does. This is why renumber is an explicit
operation in the vocabulary — it records address changes so that external
references can be updated. FRBR and ELI handle this through abstraction
levels (Work-level identity survives Expression-level renumbering).
LawVM’s ProvisionTimeline preserves identity through renumber events —
the same timeline, new address.

Ted Nelson’s Xanadu anticipated this in the 1960s. His tumblers addressed
any content at any granularity, and transclusion meant references were
live links. Legal cross-references (“as defined in §4”) are exactly
Nelson’s transclusions — without the live-update mechanism. When §4 is
amended, every provision referencing it changes meaning silently. The
legal system has unversioned, untyped, silently-breaking transclusions.

VII. Authority Models Vary

Different jurisdictions assign different legal authority to consolidated
texts, which changes the compiler’s role:

Finland: Finlex consolidated texts are “informational” — not legally
binding. The “real” law is the original statute plus all amendment acts.
If Finlex’s consolidation has a bug, that’s editorial. Nobody is legally
responsible for the computed state of law. The compiler serves as an
independent oracle — replaying amendments to find bugs in the
editorial consolidation.

Estonia: Riigi Teataja consolidated text IS authoritative law. If
there is a discrepancy between amendment acts and consolidation, the
consolidation wins. The compiler serves as a consistency verifier for
binding law — divergences are legally significant findings, not
editorial footnotes.

United Kingdom: legislation.gov.uk publishes versioned texts with
effect metadata. The compiler serves as independent verification of
the official version graph.

The structural operations are the same. The social function of finding a
divergence differs: editorial error (Finland), legal inconsistency
(Estonia), version graph bug (UK).

VIII. Why Nobody Built It

The problem is well-defined: parse amendment acts, extract operations,
compile against prior state, produce consolidated text with full
provenance and temporal versioning. It is a compiler problem.

The closest prior work:

Akoma Ntoso / ELI / FRBR: document standards and identification
frameworks — the schema layer. They define how legal documents should
be structured and identified, but don’t compile amendments.
Semantic Finlex / LawSampo (Aalto/SECO): linked data, semantic
search, faceted browsing over Finnish legislation. World-class
ontology work. No amendment replay, no temporal compilation, no
independent verification. Code not published.
Graphie project (King’s College London): network analysis of UK
legislation. Structural metrics. No budget weighting, no compilation.
LegalRuleML: encoding normative content as rules. Operates on the
semantic layer, not the text layer.

Each addresses a piece. What remains underdeveloped in open research
tooling is the end-to-end replay compiler: amendment text in, typed
operations and timelines out, with residuals classified instead of hidden.
Commercial systems (Xcential, Propylon)
handle parts of this in production contexts; official systems (legislation.gov.uk)
maintain versioned editorial workflows. But the open, cross-jurisdiction,
proof-bearing replay approach remains underexplored. This gap exists because the
problem spans computer science (compilers, VCS, graph theory), legal informatics
(amendment semantics, temporal logic), public administration (authority,
publication), and linguistics (parsing amendment language). Each discipline sees
its part. The tool requires all four.

IX. Therefore LawVM

LawVM is shaped the way it is because law is shaped the way it is:

A tree parser (IRNode) because law is stored as hierarchical trees
Typed operations (LegalOperation) because amendments are structured
operations, not prose edits
Path-based addressing (LegalAddress) because stable fine-grained
addressing is forced by the structural constraint
A graph model (ProvisionTimeline + operation edges) because law
operates as a graph despite being serialized as trees
Multiple temporal axes because law has irreducibly multi-dimensional
time
Scope predicates because applicability varies along dimensions
orthogonal to time and structure
Jurisdiction-agnostic core because the structural operations are
universal even though surface languages and authority models differ
Explicit text/semantics boundary because compiling the text
correctly is prerequisite to interpreting it correctly — and because
the text layer is computationally tractable while the normative layer
requires interpretation

The end state is a semantic version control system for law: a graph
where nodes are provision versions, edges are legislative operations, and
point-in-time materialization, cross-date diff, provision lineage, and
cross-jurisdiction dependency tracking are queries over the graph.

Not a replacement for legal ontologies or normative reasoning systems.
The substrate they need. A correct, verified, temporally versioned text
layer that semantic tools can build on without first having to solve the
compilation problem themselves.

Built because the structural constraints of law demand it, and the
open tooling to do it reliably did not yet exist.

This essay explains the structural constraints behind LawVM, an open-source replay compiler for amendment-driven law. See the Finland showcase for empirical evidence, or the repository.

⚡ **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#Law #LawShaped #LawVM**

🕒 **Posted on**: 1777463133

🌟 **Want more?** Click here for more info! 🌟