GitHub – tdortman/cuckoo-filter: GPU-Accelerated Cuckoo Filter

💥 Discover this awesome post from Hacker News 📖

📂 **Category**:

📌 **What You’ll Learn**:

Documentation

A high-performance CUDA implementation of the Cuckoo Filter data structure, developed as part of the thesis “Design and Evaluation of a GPU-Accelerated Cuckoo Filter”.

This library provides a GPU-accelerated Cuckoo Filter implementation optimized for high-throughput batch operations. Cuckoo Filters are space-efficient probabilistic data structures that support insertion, lookup, and deletion operations with a configurable false positive rate.

  • CUDA-accelerated batch insert, lookup, and delete operations
  • Configurable fingerprint size and bucket size
  • Multiple eviction policies (DFS, BFS)
  • Sorted insertion mode for improved memory coalescing
  • Multi-GPU support via gossip
  • IPC support for cross-process filter sharing
  • Header-only library design

Benchmarks at 80% load factor on an NVIDIA GH200 (H100 HBM3, 3.4 TB/s). The GPU Cuckoo Filter is compared against:

L2-Resident (4M items, ~8 MiB)

Comparison Insert Query Delete
GPU vs CPU Cuckoo 360× faster 973× faster N/A
Cuckoo vs TCF 6× faster 42× faster 100× faster
Cuckoo vs GQF 585× faster 6× faster 273× faster
Cuckoo vs Bloom 0.6× (slower) 1.4× faster N/A

DRAM-Resident (268M items, ~512 MiB)

Comparison Insert Query Delete
GPU vs CPU Cuckoo 583× faster 1504× faster N/A
Cuckoo vs TCF 1.9× faster 11.3× faster 35.3× faster
Cuckoo vs GQF 9.6× faster 2.6× faster 3.8× faster
Cuckoo vs Bloom 0.7× (slower) 1.0× (equal) N/A

Note

For a more comprehensive evaluation including additional systems and analysis, see the accompanying thesis.

  • CUDA Toolkit (>= 12.9)
  • C++20 compatible compiler
  • Meson build system (>= 1.3.0)
meson setup build
meson compile -C build

Benchmarks and tests are built by default. To disable them:

meson setup build -DBUILD_BENCHMARKS=false -DBUILD_TESTS=false
#include <CuckooFilter.cuh>

// Configure the filter: key type, fingerprint bits, max evictions, block size, bucket size
using Config = CuckooConfig<uint64_t, 16, 500, 256, 16>;

// Create a filter with the desired capacity
CuckooFilter filter(1 << 20);  // capacity for ~1M items

// Insert keys (d_keys is a device pointer)
filter.insertMany(d_keys, numKeys);

// Or use sorted insertion
filter.insertManySorted(d_keys, numKeys);

// Check membership
filter.containsMany(d_keys, d_results, numKeys);

// Delete keys
filter.deleteMany(d_keys, d_results, numKeys);

The CuckooConfig template accepts the following parameters:

Parameter Description Default
T Key type
bitsPerTag Fingerprint size in bits (8, 16, 32)
maxEvictions Maximum eviction attempts before failure 500
blockSize CUDA block size 256
bucketSize Slots per bucket (must be power of 2) 16
AltBucketPolicy Alternate bucket calculation policy XorAltBucketPolicy
evictionPolicy Eviction strategy (DFS or BFS) BFS
WordType Atomic type (uint32_t or uint64_t) uint64_t

For workloads that exceed single GPU capacity:

#include <CuckooFilterMultiGPU.cuh>

CuckooFilterMultiGPU filter(numGPUs, totalCapacity);
filter.insertMany(h_keys, numKeys);
filter.containsMany(h_keys, h_results, numKeys);
include/           - Header files
  CuckooFilter.cuh           - Main filter implementation
  CuckooFilterMultiGPU.cuh   - Multi-GPU implementation
  CuckooFilterIPC.cuh        - IPC support
  bucket_policies.cuh        - Alternative bucket policies
  helpers.cuh                - Helper functions
src/               - Example applications
benchmark/         - benchmarks
tests/             - Unit tests
scripts/           - Scripts for running/plotting benchmarks

🔥 **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#GitHub #tdortmancuckoofilter #GPUAccelerated #Cuckoo #Filter**

🕒 **Posted on**: 1767740573

🌟 **Want more?** Click here for more info! 🌟

By

Leave a Reply

Your email address will not be published. Required fields are marked *