🚀 Discover this awesome post from Hacker News 📖
📂 Category:
✅ Key idea:
By Daroc Alden
October 28, 2025
Fil-C is a memory-safe implementation of C and C++ that aims to let C code —
complete with pointer arithmetic, unions, and other features that are often
cited as a problem for memory-safe languages — run safely, unmodified.
Its dedication to being “fanatically
” makes it an attractive choice for retrofitting memory-safety
compatible
into existing applications. Despite the project’s relative youth and single
active contributor, Fil-C is capable of compiling an
entire memory-safe Linux user space (based on
Linux From Scratch),
albeit with some modifications to the more complex programs. It also features
memory-safe signal handling and a concurrent garbage collector.
Fil-C is a fork of
Clang; it’s available under an Apache v2.0
license with LLVM exceptions for the runtime. Changes from the upstream compiler
are occasionally merged in, with Fil-C currently being based on version 20.1.8
from July 2025. The project is a personal passion
of Filip Pizlo, who has previously worked on the runtimes of a number of
managed languages, including Java and JavaScript. When he first began the
project, he was not sure that it was even possible. The initial implementation
was prohibitively slow to run, since it needed to insert a lot of different safety checks. This has
given Fil-C reputation for slowness. Since
the initial implementation proved viable, however, Pizlo has managed to optimize a number
of common cases, making Fil-C-generated code only a few times slower than
Clang-generated code, although the exact slowdown depends heavily on the
structure of the benchmarked program.
Reliable benchmarking is notoriously finicky, but in order to get some rough feel for
whether that level of performance impact would be problematic, I compiled Bash
version 5.2.32 with Fil-C and tried using it as my shell. Bash is nearly a best
case for Fil-C, because it spends more time running external programs than
running its own code, but I still expected the performance difference to be
noticeable. It wasn’t. So, at least for some programs, the performance overhead
of Fil-C does not seem to be a problem in practice.
Like what you are reading?
Try LWN for free for 1 month,
no credit card required.
In order to support its various run-time safety checks,
Fil-C does use a different internal ABI than Clang does. As a result, objects compiled with Fil-C won’t
link correctly against objects generated by other compilers. Since Fil-C is a
full implementation of C and C++ at the source-code level, however, in practice
this just requires everything to be recompiled with Fil-C. Inter-language
linking, such as with Rust, is not currently supported by the project.
Capabilities
The major challenge of rendering C memory-safe is, of course, pointer handling.
This is especially complicated by the fact that, as the
long road to CHERI-compatibility
has shown, many programs expect a pointer to be 32 or 64 bits, depending on the
architecture.
Fil-C has tried several different ways to represent pointers since the project’s
beginning in 2023. Fil-C’s first pointers were 256 bits, not thread-safe, and
didn’t protect against use-after-free bugs. The current implementation, called
“InvisiCaps”, allows
for pointers that appear to match the natural pointer size of the architecture
(although this requires storing some auxiliary information elsewhere),
with full support for concurrency and
catching use-after-free bugs, at the expense of some run-time overhead.
Fil-C’s documentation
compares InvisiCaps to a software
implementation of CHERI: pointers are separated into a trusted “capability”
piece and an untrusted “address” piece. Since Fil-C controls how the program is
compiled, it can ensure that the program doesn’t have direct
access to the capabilities of any pointers, and therefore the runtime can rely
on them being uncorrupted. The tricky part of the implementation comes from how
these two pieces of information are stored in what looks to the program like 64
bits.
When Fil-C allocates an object on the heap, it adds two metadata words before
the start of the allocated object: an upper bound, used to check accesses to the
object based on its size, and an “aux word” that is used to store additional
pointer metadata. When the program first writes a pointer value into an object, the
runtime allocates a new auxiliary allocation of the same size as the object being written
into, and puts an actual hardware-level
pointer (i.e., one without an attached capability)
to the new allocation into the aux word of the object. This auxiliary allocation, which is
invisible to the program being compiled, is used to
store the associated capability information for the pointer being stored (and is
also reused for any additional pointers stored into the object later). The address
value is stored into the object as normal, so any C bit-twiddling
techniques that require looking at the stored value of the pointer work as
expected.
This approach does mean that structures that contain pointers end up using twice
as much memory, and every load of a pointer involves a pointer indirection
through the aux word. In practice, the documentation claims that the
performance overhead of this approach for most programs makes them run about four
times more slowly, although that number depends on how heavily the program makes
use of pointers. Still, he has ideas for several optimizations that he hopes can
bring the performance overhead down over time.
One wrinkle with this approach is atomic access to pointers — i.e. using
_Atomic or volatile. Luckily, there is
no problem that cannot be solved with more pointer indirection: when the program
loads or stores a pointer value atomically, instead of having the auxiliary
allocation contain the capability information directly, it points to a
third 128-bit allocation that stores the capability and pointer value together.
That allocation can be updated with 128-bit atomic instructions, if the platform
supports them, or by creating new allocations and atomically swapping the
pointers to them.
Since the aux word is used to store a pointer value, Fil-C can use
pointer
tagging to store some additional information there as well; that is used to
indicate special types of objects that need to be handled differently, such as
functions, threads, and
mmap()-backed allocations. It’s also used to
mark freed objects, so that any access results in an error message and a crash.
Memory management
When an object is freed, its aux word marks it as a free object, which lets the
auxiliary allocation be reclaimed immediately. The
original object can’t be freed immediately, however.
Otherwise, a program could free an object,
allocate a new object in the same location, and thereby cover up use-after-free bugs.
Instead, Fil-C
uses a garbage collector to free an object’s backing
memory only once all of the pointers to it go away. Unlike other garbage collectors
for C — such as
the Boehm-Demers-Weiser garbage collector —
Fil-C can use the auxiliary
capability information to track live objects precisely.
Fil-C’s garbage collector is both parallel (collection happens faster the more
cores are available) and concurrent (collection happens without pausing the
program). Technically, the garbage collector does require threads to
occasionally pause just long enough to tell it where pointers are located on the
stack, but that only occurs at special “safe points” — otherwise, the program
can load and manipulate pointers without notifying the garbage collector. Safe
points are used as a synchronization barrier: the collector can’t know that an object
is really garbage until every thread has passed at least one safe point since it
finished marking. This synchronization is done with atomic instructions,
however, so in practice threads never need to pause for longer than a few
instructions.
The exception is the implementation of
fork(), which uses the
safe points needed by the garbage collector to temporarily pause all of the threads
in the program in order to prevent race conditions while forking. Fil-C inserts
a safe point at every backward control-flow edge, i.e., whenever code could
execute in a loop. In the common case, the inserted code just needs to load a flag register
and confirm that the garbage collector has not requested anything be done. If
the garbage collector does have a request for the thread, the thread runs a callback to
perform the needed synchronization.
Fil-C uses the same safe-point mechanism to implement signal handling. Signal
handlers are only run when the interrupted thread reaches a safe point. That, in
turn, allows signal handlers to allocate and free memory without interfering
with the garbage collector’s operation; Fil-C’s
malloc() is signal-safe.
Memory-safe Linux
Linux From Scratch (LFS) is a tutorial on compiling one’s own complete
Linux user space. It walks through the steps of compiling and installing all of the core
software needed for a typical Linux user space in a
chroot()
environment. Pizlo has successfully
run through LFS with Fil-C to
produce a memory-safe version, although a non-Fil-C compiler is still needed to
build some fundamental components, such as Fil-C’s own runtime,
the GNU C library, and the kernel. (While Fil-C’s runtime relies on a normal
copy of the GNU C library to make system calls, the programs that Fil-C compiles
use a Fil-C-compiled version of the library.)
The process is mostly identical to LFS up through the end of chapter 7, because
everything prior to that point consists of using cross-build tools to obtain a
working compiler in the chroot() environment. The one difference is
that the cross-build tools are built with a different configured prefix, so that
they won’t conflict with Fil-C. At that point, one can
build a copy of Fil-C and use it to mostly replace the existing compiler. The
remaining steps of LFS are unchanged.
Scripts to
automate the process are included in the Fil-C Git repository, including
some steps from
Beyond Linux From Scratch that result in a working graphical
user interface and a handful of more complicated applications such as Emacs.
Overall, Fil-C offers a remarkably complete solution for making existing C
programs memory-safe. While it does nothing for undefined behavior that is not
related to memory safety,
the most pernicious and difficult-to-prevent security
vulnerabilities in C programs tend to rely on exploiting memory-unsafe
behavior. Readers who have already considered and rejected Fil-C for their use
case due to its early performance problems may wish to take a second look —
although anyone hoping for stability might want to wait for others to take the
plunge, given the project’s relative immaturity.
That said, for existing applications where a sizeable performance hit is preferable to an
exploitable vulnerability, Fil-C is an excellent choice.
⚡ Share your opinion below!
#️⃣ #memorysafe #implementation #LWN.net
🕒 Posted on 1761698238
