vladich/pg_jitter: Better JIT for Postgres · GitHub

✨ Discover this awesome post from Hacker News 📖

📂 **Category**:

📌 **What You’ll Learn**:

A lightweight JIT compilation provider for PostgreSQL that adds three alternative JIT backends – sljit, AsmJit and MIR – delivering faster compilation and competitive query execution across PostgreSQL 14–18.

JIT compilation was introduced in Postgres 11 in 2018. It solves a problem of Postgres having to interpret expressions and use inefficient per-row loops in run-time in order to do internal data conversions (so-called tuple deforming).
On expression-heavy workloads or just wide tables, it can give a significant performance boost for those operations. However, standard LLVM-based JIT is notoriously slow at compilation.
When it takes tens to hundreds of milliseconds, it may be suitable only for very heavy, OLAP-style queries, in some cases.
For typical OLTP queries, LLVM’s JIT overhead can easily exceed the execution time of the query itself.
pg_jitter provides native code generation with microsecond-level compilation times instead of milliseconds, making JIT worthwhile for a much wider range of queries.

Typical compilation time:

  • sljit: tens to low hundreds of microseconds
  • AsmJIT: hundreds of microseconds
  • MIR: hundreds of microseconds to single milliseconds
  • LLVM (Postgres default): tens to hundreds of milliseconds

In reality, the effect of JIT compilation is broader – execution can slow down for up to ~1ms even for sljit, because of other related things, mostly cold processor cache and effects of increased memory pressure (rapid allocations / deallocations related to code generation and JIT compilation). Therefore, on systems executing a lot of queries per second, it’s recommended to avoid JIT compilation for very fast queries such as point lookups or queries processing only a few records. By default, jit_above_cost parameter is set to a very high number (100’000). This makes sense for LLVM, but doesn’t make sense for faster providers.
It’s recommended to set this parameter value to something from ~200 to low thousands for pg_jitter (depending on what specific backend you use and your specific workloads).

  • sljit is the most consistent: 5–25% faster than the interpreter across all workloads. This, and also its phenomenal compilation speed, make it the best choice for most scenarios.
  • AsmJIT excels on wide-row/deform-heavy queries (up to 32% faster) thanks to specialized tuple deforming
  • MIR provides solid gains while being the most portable backend
  • LLVM was supposed to be fast at execution time, due to clang optimization advantages, but in fact, in most cases, it’s slower than all 3 pg_jitter backends, even not counting compilation performance differences. This is due to zero-cost inlining using compile-time pre-extracted code and manual instruction-level optimization.

There are several scripts in the tests folder to run different types of benchmarks, one of them is tests/bench_comprehensive.sh, another tests/gen_cross_version_benchmarks.py.
Here are some results for ARM64 (Apple Silicon M1 Pro) and x86_64 (Ryzen AI 9 HX PRO 370) for different versions of Postgres and different backends.
Some of them are pretty interesting, for example the “super wide table” section for both ARM and x86, where LLVM’s performance is simply atrocious (10x-30x of the baseline).

ARM64 -> PG14 | PG15 | PG16 | PG17 | PG18 * sljit * AsmJit * MIR

x86_64 -> PG14 | PG15 | PG16 | PG17 | PG18 * sljit * AsmJit * MIR

  • Zero-config – set jit_provider and go
  • Three independent backends with different strengths
  • Runtime backend switching via SET pg_jitter.backend = 'sljit' (no restart)
  • PostgreSQL 14–18 support from one codebase
  • Two-tier function optimization – hot-path PG functions compiled as direct native calls
  • No LLVM dependency – pure C/C++ with small, embeddable libraries
  • Precompiled function blobs – optional build-time native code extraction for zero-cost inlining
  • Supported platforms – aside from AsmJit, other providers (in theory) can be used on most platforms supported by Postgres. But pg_jitter was only tested on Linux/MacOS/ARM64 and Linux/x86_64 so far. Testing it on other platforms is planned, but if you had success (or issues) running it, please let me know at vladimir@churyukin.com.

The current source code can be considered beta-quality. It passes all standard Postgres regression tests and shows good improvements in performance tests. But it lacks large-scale production verification (yet).
Stay tuned.

  • PostgreSQL 14–18 (with development headers)
  • CMake >= 3.16
  • C11 and C++17 compilers
  • Backend libraries as sibling directories:
parent/
├── pg_jitter/
├── sljit/        
├── asmjit/       
└── mir/          

SLJIT | AsmJit | MIR

For MIR, use the patched version from MIR-patched – it has a few changes about tracking the size of the generated native code per function, and per-function memory management.

# Build all backends
./build.sh

# Build a single backend
./build.sh sljit

# Custom PostgreSQL installation
./build.sh --pg-config /opt/pg17/bin/pg_config all

# With precompiled function blobs (optional, pick one)
./build.sh all -DPG_JITTER_USE_LLVM=ON      # requires clang + llvm-objdump
./build.sh all -DPG_JITTER_USE_C2MIR=ON     # uses MIR (no extra deps)

# Custom dependency paths
./build.sh all -DSLJIT_DIR=/path/to/sljit -DMIR_DIR=/path/to/mir
# Install all backends and restart PostgreSQL
./install.sh

# Custom paths
./install.sh --pg-config /opt/pg17/bin/pg_config --pgdata /var/lib/postgresql/data all
-- Use a specific backend directly
ALTER SYSTEM SET jit_provider = 'pg_jitter_sljit';
SELECT pg_reload_conf();

-- Or use the meta provider for runtime switching (no restart needed)
ALTER SYSTEM SET jit_provider = 'pg_jitter';
SELECT pg_reload_conf();

SET pg_jitter.backend = 'asmjit';  -- switch on the fly

pg_jitter implements PostgreSQL’s JitProviderCallbacks interface. When PostgreSQL decides to JIT-compile a query, it calls compile_expr() which:

  1. Walks the ExprState->steps[] array (PostgreSQL’s expression evaluation opcodes)
  2. Emits native machine code for ~30 hot-path opcodes (arithmetic, comparisons, variable access, tuple deforming, aggregation, boolean logic, jumps)
  3. Delegates remaining opcodes to pg_jitter_fallback_step() which calls the corresponding ExecEval* C functions
  4. Installs the compiled function with a one-time validation wrapper that catches ALTER COLUMN TYPE invalidation

Two-Tier Function Optimization

  • Tier 1: Pass-by-value operations (int, float, bool, date, timestamp, OID) compiled as direct native calls with inline overflow checking. No FunctionCallInfo overhead.
  • Tier 2: Pass-by-reference operations (numeric, text, interval, uuid) called through DirectFunctionCall C wrappers. Optionally LLVM-optimized when built with -DPG_JITTER_USE_LLVM=ON or c2mir-optimized when built with -DPG_JITTER_USE_C2MIR=ON.

sljit AsmJIT MIR
Language C C++ C
IR level Low-level (register machine) Low-level (native assembler) Medium-level (typed ops)
Register allocation Manual Virtual (automatic) Automatic
Architectures arm64, x86_64, s390x, ppc, mips, riscv arm64, x86_64 arm64, x86_64, s390x, ppc, mips, riscv
Compilation speed Fastest (10s to low 100s of μs) Fast (x3-x5) of sljit Still fast (x15-x20 of sljit)
Best for General workloads, lowest overhead Wide rows, deform-heavy queries Portability and edge cases
Library size ~100 KB ~300 KB ~200 KB

The meta provider (jit_provider="pg_jitter") is a thin dispatcher that:

  • Exposes a pg_jitter.backend GUC (user-settable, no restart required)
  • Lazily loads backend shared libraries on first use
  • Caches loaded backends for process lifetime
  • Falls back to the next available backend if the selected one isn’t installed

Each backend remains independently usable by setting jit_provider="pg_jitter_sljit" directly.

JIT-compiled code is tied to PostgreSQL’s ResourceOwner system:

  1. A PgJitterContext is created per query, extending PostgreSQL’s JitContext
  2. Each compiled function is registered on a linked list with a backend-specific free callback
  3. When the query’s ResourceOwner is released, all compiled code is freed:
    • sljit: sljit_free_code() — releases mmap’d executable memory
    • AsmJIT: JitRuntime::release() — frees the code buffer
    • MIR: MIR_gen_finish() + MIR_finish() — tears down the entire MIR context

A single codebase supports PostgreSQL 14–18 via compile-time #if PG_VERSION_NUM guards in src/pg_jitter_compat.h. Key differences handled:

  • PG17+: Generic ResourceOwner API (ResourceOwnerDesc)
  • PG14–16: JIT-specific ResourceOwner API (ResourceOwnerEnlargeJIT/RememberJIT/ForgetJIT)
  • PG18: CompactAttribute, split EEOP_DONE, CompareType rename, new opcodes

Precompiled Function Blobs

Two optional build-time pipelines extract native code for hot functions and embed them directly into the shared library:

  • LLVM pipeline (-DPG_JITTER_USE_LLVM=ON): clang compiles → extract_inlines.py extracts native blobs → embeds in header. Supports deep inlining with PG bitcode.
  • c2mir pipeline (-DPG_JITTER_USE_C2MIR=ON): c2mir compiles → MIR_gen emits native code → embeds in header. No LLVM toolchain required.

Without either pipeline, all three backends still work — Tier 1 functions use direct calls and Tier 2 uses C wrappers.

# Correctness: 203 JIT-compiled functions (all types, overflow, NULL propagation, 100K-row validation)
psql -d postgres -f tests/test_precompiled.sql

# Benchmarks
./tests/bench_all_backends.sh
./tests/gen_cross_version_benchmarks.py

# I-cache impact analysis
./tests/bench_cache_compare.sh

# Memory leak detection (10K queries with RSS trend)
./tests/test_leak_trend.sh [port] [backend]

# Multi-version build + test (PG14–18)
./tests/run_all_versions.sh

Apache License 2.0. See LICENSE.

All copyrights belong to their respective owners.

💬 **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#vladichpg_jitter #JIT #Postgres #GitHub**

🕒 **Posted on**: 1772608884

🌟 **Want more?** Click here for more info! 🌟

By

Leave a Reply

Your email address will not be published. Required fields are marked *