jer-irl/threadprocs: Experimental thread-like processes, multiple executables in one address space · GitHub

🔥 Check out this must-read post from Hacker News 📖

📂 **Category**:

✅ **What You’ll Learn**:

This repository contains experimental code for thread-like processes, or multiple programs running in a shared address space.
Each threadproc behaves like a process with its own executable, globals, libc instance, etc, but pointers are valid across threadprocs.
This blends the Posix process model with the Posix multi-threading programming model, and enables things like zero-copy access to pointer-based data structures.

All Markdown files were written by hand.

See tproc-actors for one possible application framework building on top of threadprocs.

The code for the demoed programs is at example/sharedstr/allocstr.cpp and example/sharedstr/printstr.cpp, and neither contains any magic (/proc/[pid]/mem, etc), nor awareness of the server and launcher.

  • allocstr reads input, and copies it into a new std::string, and prints &newstring to console.
  • printstr reads a pointer as hex text, and prints whatever std::string it finds there.

demo.mp4


server memory diagram

The server utility “hosts” a virtual address space, and by using launcher to start programs, those launched programs coexist in the hosted address space.

Applications can share pointers in the virtual address space through some out-of-band mechanism (Demo uses copy/paste, dummy_server/client uses sockets, libtproc provides server-global scratch space), and then directly dereference those pointers, as they’re valid in the shared address space.

libtproc provides basic detection of execution as a threadproc, and allows hosted threadprocs to access a “server-global” scratch space.
Applications can build tooling using this space to implement service discovery and bootstrap shared memory-backed IPC.

This is implemented by adding another entry to the threadproc auxv.

tproc-actors uses this space to advertise per-threadproc actor registries.

Use Linux on aarch64 or x86_64; other architectures are not supported.
This was developed in a VM running Debian on a Macbook Air M1, and also tested in a Debian x86_64 Github Codespace using the .devcontainer/ configuration.

Dependencies:

apt install build-essential liburing-dev
# May need to install gcc 14+
git submodule update --init

Notably there is no dependency no ELF libraries aside from Linux system headers, though those would probably make the code nicer.

Building:

Run auto integration tests:

Or run your own programs in a shared address space:

./buildout/server /tmp/mytest.sock &
./buildout/launcher /tmp/mytest.sock program1 arg1 arg2 &
./buildout/launcher /tmp/mytest.sock program2 arg3 arg4

Read the overview or implementation for information on the project, or read comparisons to existing work.
I’ve also collected some lessons learned in conclusions.

  • Each threadproc has its own runtime library instance (libc), and care must be taken not to call malloc() in one threadproc but try to free() that memory in another threadproc.
  • Target applications must be compiled as “position independent code,” as do any dynamically loaded objects.
    • This is standard for dynamically linked libraries, and default for executable binaries compiled in many modern distros in order to support flavors of ASLR.
    • Properly architected libraries can mitigate most drawbacks of this, and executable files also carry minimal overhead.
  • brk() (and sbrk()) cannot be used reliably, because they are “address space global” to the kernel, and processes typically assume they won’t be called from unexpected places.
    • The server sets the MALLOC_MMAP_THRESHOLD_=0 environment variable for children to avoid the default glibc behavior and avoid these calls.
  • mmap with MAP_FIXED can’t be used without first “reserving” a non-fixed mapping.
    • This is generally true of any program, and “unreserved” MAP_FIXED use is unsafe even in standard Linux programs.
    • See the manpage section Using MAP_FIXED safely
  • Debugging and ptrace() are not supported.
    • It may be possible to add partial support, but I suspect GDB makes some assumptions that would be difficult to satisfy
  • The threadproc’s PID is not the same as the launching process, so operations in terms of PID may lead to issues if applications rely on details of PID-targeted operations.
  • Signals are forwarded from the launcher to the threadproc, but unhandle-able signals (SIGKILL) are not.
    • There are likely other edge cases if a threadproc relies on details of the Posix signal behavior.

There are other less pertinent limitations around the edges.
For example, threadprocs have /proc/[pid]/comm values which reflect their launched binary, but cmdline isn’t settable.
exec() syscalls also “escape” the threadproc scheme, which is probably desired may cause subtle issues.

My initial vision was for threadprocs to pass std::unique_ptrs to each other, and support IPC with nested data.
ABI aside, the major hiccup is that even if threadproc 1 releases a pointer, and threadproc 2 wraps it in a std::unique_ptr, when the destructor is called and it comes time to de-allocate the memory, threadproc 2 won’t be able to do so.

Having independent libc, libstdc++, and rust libstd instances for each tproc greatly reduces the technical dependencies on launched programs, but it also means that a threadproc cannot deallocate memory allocated by another threadproc.

One could architect their application around this limitation, and ensure memory is always handed back to the tproc which allocated it so it can be de-allocated correctly.
I’ve sketched out an application framework that automatically passes objects back to their allocating threadproc in a custom unique_ptr analogue, see tproc-actors.

This is an interesting direction, and raises questions that could be explored further, but I don’t think this is a practical model for any serious software.
Pthreads are a somewhat stagnant abstraction, but they have the benefit of decades of tooling and language development shaped around them.
I haven’t extracted the brainworm yet, though, and perhaps in the future I’ll explore ways to augment shared memory regions with custom allocators, fixed mappings, etc.

🔥 **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#jerirlthreadprocs #Experimental #threadlike #processes #multiple #executables #address #space #GitHub**

🕒 **Posted on**: 1774285434

🌟 **Want more?** Click here for more info! 🌟

By

Leave a Reply

Your email address will not be published. Required fields are marked *