β¨ Check out this trending post from Hacker News π
π Category:
β Hereβs what youβll learn:
SEE β Searchable JSON Compression (Semantic Entropy Encoding)
combined β 19.5% β’ lookup p50 β 0.18 ms β’ skip β 99%
Why it matters
SEE reduces both the data tax (storage/egress) and the CPU tax (decompress/parse) by keeping JSON searchable while compressed.
It may not always be smaller than Zstd, but searchability + low I/O + random access leads to better TCO/ROI for many workloads.
β Download (Release) γ»
β‘ OnePager (ROI) γ»
β’ Try in 10 minutes
Enterprise / NDA inquiry β Private contact form
Under NDA: full VDR pack available. Please provide a company email (no confidential data required).
- Schema-aware JSON compression: combines structure Γ delta Γ Zstd (+ Bloom / Skip) to stay searchable while compressed, with page-level random access.
- Design trade-off: favors low I/O & low latency (ms) and ~99% skip rate over minimal size.
- Combined size: β19.5% of raw
- Lookup present (ms): p50 β 0.18 / p95 β 0.28 / p99 β 0.34
- Skip ratio: present β 0.99 / absent β 0.992, Bloom density β 0.30

Savings/TB = (1 β 0.195) Γ Price_per_GB Γ 1000
Example: $0.05/GB β β$40/TB, $0.25/GB β β$200/TB
python samples/quick_demo.py
Prints compression ratio, skip rate, Bloom density, and lookup latency (p50/p95/p99).
Demo package (Release v0.1.0):
-
Includes Python wheel,
.seefiles, demo scripts, metrics, and OnePager PDF. -
Reproducible on Windows / macOS / Linux.
-
Verify integrity using:
pwsh tools/verify_checksums.ps1 # or manually check SHA256SUMS.txt
KPI (demo): combined β 19.5%, lookup p50 β 0.18 ms, skip β 99%, bloom β 0.30.
Tradeoff: not always smaller than Zstd, but stays searchable while compressed, cutting I/O and CPU costs.
- Zstd-only can be smaller, but not searchable; you still pay I/O + CPU to decompress and parse JSON.
- SEE trades a small size increase for millisecond lookups and page-level random access, reducing I/O and CPU β resulting in better TCO.
-
Q. Will it ever be larger than Zstd?
A. Sometimes yes; in return you get ms lookups and ~99% skipping. For I/O/CPU-bound workloads, TCO decreases. -
Q. Best-fit data?
A. Repetitive JSON/NDJSON such as logs, events, telemetry, and metrics. -
Q. How long to reproduce?
A. About 10 minutes using the included Demo ZIP. -
Q. Why not build a separate index?
A. Separate indexes add extra I/O, space, and consistency risk.
SEE keeps searchability inside the storage format, reducing random I/O and parsing overhead. -
Q. How to tune for different data?
A. Adjust Bloom density (default β0.30, works best in 0.25β0.55). Demo prints all metrics for validation.
Whatβs included in the Release ZIP
- Python Wheel (.whl)
- Demo scripts:
samples/quick_demo.py,samples/quick_bench.py(prints KPIs) - OnePager (PDF) and
metrics/summaries - Integrity check script:
tools/verify_checksums.ps1 - README_FIRST.md β concise reproduction guide
π¦ VDR (Virtual Data Room) β Evaluation Package
What it is
The SEE VDR is a private, NDA-only evaluation bundle that lets third parties reproduce our key KPIs on their own machine:
- Compression: combined size β ~19.5% of raw
- Lookup latency: p50 β ~0.18 ms
- Skipping: ~99% page-level skip
What it contains (high level)
- Sample
.seeartifacts with minimal metadata (for reproducible tests) - A prebuilt evaluation wheel (binary-only) for quick local runs
- KPI summaries (CSV/JSON) and a frozen results snapshot
- Simple verification scripts (checksums / quality-gate)
- A concise One-Pager and evaluator README
βΉοΈ Implementation details (core algorithms, dictionaries, low-level parameters) remain proprietary and are not disclosed in this repository.
Access policy
- Distributed on request under NDA (no public download).
- To request access, please contact us via LinkedIn (see Official Links & Profiles) with the subject: βSEE VDR Accessβ.
- Redistribution, reverse engineering, and public benchmarking of VDR binaries are prohibited.
- An Evaluation EULA applies in addition to the NDA.
How evaluators use it (under NDA)
- Verify package integrity (checksums script).
- Install the provided evaluation wheel into a clean virtual environment.
- Run the 10-minute demo to print ratio / skip / bloom / p50βp99.
- Compare local output with the included KPI snapshot (apples-to-apples).
Why VDR?
- Ensures reproducible, verifiable numbers without exposing the core IP.
- Shortens technical diligence for FinOps / M&A / platform teams while keeping trade secrets protected.
If you only need the public demo, see the repositoryβs samples and Release assets.
The VDR is reserved for formal evaluations (NDA) that require deeper verification.
Note: The GitHub Discussions βEnterprise (NDA)β category is public.
Do not post confidential information or emails there β use the private form above.
π Official Links & Profiles
π¬ If you’re interested in schema-aware compression, reproducible benchmarks, or potential collaboration, feel free to connect via LinkedIn.
From Bytes to Balance Sheets β SEE (Semantic Entropy Encoding)
Optional: For reproducibility or citation
If you reproduce benchmarks or use SEE in your research, please cite:
SEE (Semantic Entropy Encoding)
https://github.com/kodomonocch1/see_proto
β‘ What do you think?
#οΈβ£ #kodomonocch1see_proto #Schemaaware #JSON #compression #millisecond #lookups #cut #transferstorage #enabling #existspos #queries #Demo #wheels #core #binaryonly
π Posted on 1760716606
