GitHub – tdortman/cuckoo-filter: GPU-Accelerated Cuckoo Filter

💥 Discover this awesome post from Hacker News 📖

📂 **Category**:

📌 **What You’ll Learn**:

Documentation

A high-performance CUDA implementation of the Cuckoo Filter data structure, developed as part of the thesis “Design and Evaluation of a GPU-Accelerated Cuckoo Filter”.

This library provides a GPU-accelerated Cuckoo Filter implementation optimized for high-throughput batch operations. Cuckoo Filters are space-efficient probabilistic data structures that support insertion, lookup, and deletion operations with a configurable false positive rate.

CUDA-accelerated batch insert, lookup, and delete operations
Configurable fingerprint size and bucket size
Multiple eviction policies (DFS, BFS)
Sorted insertion mode for improved memory coalescing
Multi-GPU support via gossip
IPC support for cross-process filter sharing
Header-only library design

Benchmarks at 80% load factor on an NVIDIA GH200 (H100 HBM3, 3.4 TB/s). The GPU Cuckoo Filter is compared against:

L2-Resident (4M items, ~8 MiB)

Comparison	Insert	Query	Delete
GPU vs CPU Cuckoo	360× faster	973× faster	N/A
Cuckoo vs TCF	6× faster	42× faster	100× faster
Cuckoo vs GQF	585× faster	6× faster	273× faster
Cuckoo vs Bloom	0.6× (slower)	1.4× faster	N/A

DRAM-Resident (268M items, ~512 MiB)

Comparison	Insert	Query	Delete
GPU vs CPU Cuckoo	583× faster	1504× faster	N/A
Cuckoo vs TCF	1.9× faster	11.3× faster	35.3× faster
Cuckoo vs GQF	9.6× faster	2.6× faster	3.8× faster
Cuckoo vs Bloom	0.7× (slower)	1.0× (equal)	N/A

Note

For a more comprehensive evaluation including additional systems and analysis, see the accompanying thesis.

CUDA Toolkit (>= 12.9)
C++20 compatible compiler
Meson build system (>= 1.3.0)

meson setup build
meson compile -C build

Benchmarks and tests are built by default. To disable them:

meson setup build -DBUILD_BENCHMARKS=false -DBUILD_TESTS=false

#include <CuckooFilter.cuh>

// Configure the filter: key type, fingerprint bits, max evictions, block size, bucket size
using Config = CuckooConfig<uint64_t, 16, 500, 256, 16>;

// Create a filter with the desired capacity
CuckooFilter filter(1 << 20);  // capacity for ~1M items

// Insert keys (d_keys is a device pointer)
filter.insertMany(d_keys, numKeys);

// Or use sorted insertion
filter.insertManySorted(d_keys, numKeys);

// Check membership
filter.containsMany(d_keys, d_results, numKeys);

// Delete keys
filter.deleteMany(d_keys, d_results, numKeys);

The CuckooConfig template accepts the following parameters:

Parameter	Description	Default
`T`	Key type	–
`bitsPerTag`	Fingerprint size in bits (8, 16, 32)	–
`maxEvictions`	Maximum eviction attempts before failure	500
`blockSize`	CUDA block size	256
`bucketSize`	Slots per bucket (must be power of 2)	16
`AltBucketPolicy`	Alternate bucket calculation policy	`XorAltBucketPolicy`
`evictionPolicy`	Eviction strategy (DFS or BFS)	`BFS`
`WordType`	Atomic type (uint32_t or uint64_t)	`uint64_t`

For workloads that exceed single GPU capacity:

#include <CuckooFilterMultiGPU.cuh>

CuckooFilterMultiGPU filter(numGPUs, totalCapacity);
filter.insertMany(h_keys, numKeys);
filter.containsMany(h_keys, h_results, numKeys);

include/           - Header files
  CuckooFilter.cuh           - Main filter implementation
  CuckooFilterMultiGPU.cuh   - Multi-GPU implementation
  CuckooFilterIPC.cuh        - IPC support
  bucket_policies.cuh        - Alternative bucket policies
  helpers.cuh                - Helper functions
src/               - Example applications
benchmark/         - benchmarks
tests/             - Unit tests
scripts/           - Scripts for running/plotting benchmarks

🔥 **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#GitHub #tdortmancuckoofilter #GPUAccelerated #Cuckoo #Filter**

🕒 **Posted on**: 1767740573

🌟 **Want more?** Click here for more info! 🌟

GitHub – tdortman/cuckoo-filter: GPU-Accelerated Cuckoo Filter

L2-Resident (4M items, ~8 MiB)

DRAM-Resident (268M items, ~512 MiB)

By

Leave a Reply Cancel reply