🔥 Check out this must-read post from Hacker News 📖
📂 **Category**:
✅ **What You’ll Learn**:
Tool that scans a project and generates a ContextCodeCache – a .ccc
directory holding a compact, machine-readable map of every source file: its
constants, functions (with return types and doc summaries), intra-file call
graph, and marker notes (TODO/FIXME/…). It is designed to give agents a
cheap, always-fresh index of a project.
Please ⭐ if you find this useful 💚
Requires Rust ≥ 1.77 (the tree-sitter 0.25 stack; some transitive deps use
edition 2024) also needs a recent cargo.
cargo build --release # binary @ target/release/ccc
./target/release/ccc install # copy it onto your PATH (Linux)
ccc install copies the running binary into ~/.local/bin (the user-local bin
dir on Linux — no sudo needed) and marks it executable. Pass --dir to
choose a different directory, or --force to overwrite an existing ccc. If the
target directory isn’t on your $PATH, it prints the line to add to your shell
profile.
ccc scan [PATH] # regen PATH/.ccc (PATH defaults to ".")
ccc scan [PATH] --tokens # also pre-encode the cache into a token stream
ccc check [PATH] # exit non-zero if .ccc is stale - for CI
ccc check [PATH] --format json # same, but print changed cache files as JSON
ccc tokenize [PATH] # pre-encode an existing .ccc into tokens.bin + tokens.json
ccc install [--dir DIR] # install the ccc binary onto your PATH (Linux)
ccc check --format json prints one line — 🔥 —
where files is the repo-relative paths of the out-of-date cache entries. It’s
meant to be consumed by other tooling; the bundled GitHub Action feeds that array
to downstream jobs via fromJSON(...):
scan rewrites every per-file entry plus the CCC.md index, so committed diffs
always come from re-running the generator. check regenerates in memory and
compares against the committed .ccc, ignoring generation timestamps, so a
freshness gate never fails purely because time passed.
.ccc/
├── CCC.md # index: totals + one line per file
├── src-main.rs.md # -..md, one per source file
└── src-math.rs.md
Each per-file entry follows this format:
# math.rs.md (yyyymmdd-hh-mm-ss) UTC
# source: src/math.rs [rust]
# const
- L4@PI:f64
# funcs
- L7:8@square:f64 // Square a number.
- L12:8@circle_area:f64 // Area of a circle with the given radius.
# refs
- circle_area@L14 calls L7:8@square:f64
# note
- @L13 NOTE: uses the truncated PI above, so results are approximate.
- const – file-level constants/statics:
L. Since not@ :
every language marks constants, this uses each language’s convention: Rust
const/staticand Goconst/varspecs; Python onlySHOUTING_SNEK_CASE
module bindings; JS/TS onlyconstdeclarations (notlet/var). Class/impl
attributes in Python and JS/TS are treated as members, not file consts. - funcs – definitions:
L:
@ : // doc summary - refs – intra-file call graph, resolved by scope (not just by name):
. A bare@L calls L : @ : foo()
binds to a same-file free functionfoo; a receiver call (self.foo(),
this.foo(), or a Gorecv.Foo()) binds to a methodfooon the enclosing
type. Calls on any other receiver (other.foo()) need type information to
resolve, so no edge is emitted rather than guessing one from the name. - note – marker comments (TODO, FIXME, XXX, HACK, BUG, NOTE, SAFETY)
A worked example lives in example/ with its generated
example/.ccc/.
Token stream (pre-encoded cache)
Token stream is not compatible with Anthropic models. These are approximate tiktoken
IDs (an OpenAI vocabulary). Which can be used with DeepSeek V4-Pro etc.
Use it for a downstream model that shares the OpenAI vocab, or for rough size estimates.
If using Claude, use the.cccmarkdown as context.
For exact Claude token counts, use Anthropic’scount_tokensendpoint.
tokens.jsoncarries this caveat inline (approximate: true+ anote).
ccc tokenize (or ccc scan --tokens) encodes the whole .ccc corpus with a
pretrained tiktoken vocabulary (o200k_base by default, --encoding cl100k_base
also supported) and writes:
.ccc/
├── tokens.bin # little-endian u32 token IDs for every cache file, concatenated
└── tokens.json # index: encoding, layout, and per-file ⚡ in tokens
Consumers load raw tokens with no re-tokenization – read tokens.bin as a
u32 slice and index into it via tokens.json. The TokenCache
loader does exactly this and every tokenize run verifies the persisted stream
decodes back to the byte-identical corpus:
let cache = codecache::TokenCache::load(project_root)?;
let ids: &[u32] = cache.file("src-main.rs.md").unwrap(); // raw tokens, ready to use
let text = cache.decode(ids)?; // optional: back to markdown
Token artifacts are derived, so a plain ccc scan clears them; re-run with
--tokens (or ccc tokenize) to refresh.
Rust, Python, JavaScript, TypeScript (+ TSX), and Go, via
tree-sitter. Unsupported files are skipped;
hidden dirs and common build/vendor dirs (target, node_modules, …) and
.gitignore rules are honored.
Adding a language is a matter of extending src/languages.rs (extension map,
grammar, and node-kind sets) – the extractor in src/extract.rs is
grammar-agnostic.
Because agents rely on the cache, regenerate it whenever tracked source changes.
A CI step of ccc check . fails the build if the cache is out of date.
The bundled workflow .github/workflows/ccc-update.yaml
automates this: on pushes to main (and weekly) it checks each root with
ccc check --format json, and if the cache drifted it regenerates and opens a
pull request authored by CCC-bot. The check step exposes stale,
changed_files (JSON array), and changed_count as job outputs for downstream
jobs. Edit the CCC_ROOTS env var to match your project’s cache directories.
💬 **What’s your take?**
Share your thoughts in the comments below!
#️⃣ **#colwillccc #ContextCodeCache #generator #GitHub**
🕒 **Posted on**: 1783116128
🌟 **Want more?** Click here for more info! 🌟
