sourcegraph/zoekt: Fast trigram based code search

💥 Explore this awesome post from Hacker News 📖

📂 Category:

💡 Key idea:

"Zoekt, en gij zult spinazie eten" - Jan Eertink

("seek, and ye shall eat spinach" - My primary school teacher)

Zoekt is a text search engine intended for use with source
code. (Pronunciation: roughly as you would pronounce “zooked” in English)

Note: This has been the maintained source for Zoekt since 2017, when it was forked from the
original repository github.com/google/zoekt.

Zoekt supports fast substring and regexp matching on source code, with a rich query language
that includes boolean operators (and, or, not). It can search individual repositories, and search
across many repositories in a large codebase. Zoekt ranks search results using a combination of code-related signals
like whether the match is on a symbol. Because of its general design based on trigram indexing and syntactic
parsing, it works well for a variety of programming languages.

The two main ways to use the project are

  • Through individual commands, to index repositories and perform searches through Zoekt’s query language
  • Or, through the indexserver and webserver, which support syncing repositories from a code host and searching them through a web UI or API

For more details on Zoekt’s design, see the docs directory.

go get github.com/sourcegraph/zoekt/

Note: It is also recommended to install Universal ctags, as symbol
information is a key signal in ranking search results. See ctags.md for more information.

Zoekt supports indexing and searching repositories on the command line. This is most helpful
for simple local usage, or for testing and development.

Indexing a local git repo

go install github.com/sourcegraph/zoekt/cmd/zoekt-git-index
$GOPATH/bin/zoekt-git-index -index ~/.zoekt /path/to/repo

Indexing a local directory (not git-specific)

go install github.com/sourcegraph/zoekt/cmd/zoekt-index
$GOPATH/bin/zoekt-index -index ~/.zoekt /path/to/repo
go install github.com/sourcegraph/zoekt/cmd/zoekt
$GOPATH/bin/zoekt 'hello'
$GOPATH/bin/zoekt 'hello file:README'

Zoekt also contains an index server and web server to support larger-scale indexing and searching
of remote repositories. The index server can be configured to periodically fetch and reindex repositories
from a code host. The webserver can be configured to serve search results through a web UI or API.

Indexing a GitHub organization

go install github.com/sourcegraph/zoekt/cmd/zoekt-indexserver

echo YOUR_GITHUB_TOKEN_HERE > token.txt
echo '[Tell us your thoughts in comments!]' > config.json

$GOPATH/bin/zoekt-indexserver -mirror_config config.json -data_dir ~/.zoekt/ 

This will fetch all repos under ‘github.com/apache’, then index the repositories. The indexserver takes care of
periodically fetching and indexing new data, and cleaning up logfiles. See config.go
for more details on this configuration.

go install github.com/sourcegraph/zoekt/cmd/zoekt-webserver
$GOPATH/bin/zoekt-webserver -index ~/.zoekt/

This will start a web server with a simple search UI at http://localhost:6070.
See the query syntax docs for more details on the query
language.

If you start the web server with -rpc, it exposes a simple JSON search
API at http://localhost:6070/api/search.

The JSON API supports advanced features including:

  • Streaming search results (using the FlushWallTime option)
  • Alternative BM25 scoring (using the UseBM25Scoring option)
  • Context lines around matches (using the NumContextLines option)

Finally, the web server exposes a gRPC API that supports structured query objects and advanced search options.

Thanks to Han-Wen Nienhuys for creating Zoekt. Thanks to Alexander Neubeck for
coming up with this idea, and helping Han-Wen Nienhuys flesh it out.

💬 What do you think?

#️⃣ #sourcegraphzoekt #Fast #trigram #based #code #search

🕒 Posted on 1764912083

By

Leave a Reply

Your email address will not be published. Required fields are marked *