💥 Explore this awesome post from Hacker News 📖
📂 Category:
💡 Key idea:
The OSM dataset is huge, and keeps growing every day. Great news, of course, but sometimes the sheer volume can be overwhelming – there are just gobs and gobs of data!
Hence, we created GOB (“Geo-Object Bundle”), a new file format that makes tackling OSM data faster and easier. It’s a companion format to our now-familiar Geo-Object Library (essentially, a tightly-compressed GOL with its indexes stripped).
To support this new format, GOL Tool 2.1 has two new commands: save GOLs as GOBs and load GOBs into a GOL (Of course, like all of the GeoDesk Toolkit, the GOL Tool is free & open-source).
-
GOB files are on average half the size of a GOL, and 30% smaller than PBFs.
-
Importing a GOB is 5 times faster than building a GOL from a PBF. A modern system loads a planet-size GOB into a GOL in 3 minutes. The speed advantage grows more pronounced on memory-constrained machines:
gol buildstarts paging heavily with less than 32 GB of RAM, whereasgol loadrequires minimal resources (even a decade-old laptop loads the whole planet in under an hour). -
GOBs are organized into tiles, so it’s easy to extract regional subsets (basically at file-copy speed) and stitch them back together; that makes GOB a convenient format for archiving and distributing geodata.
The image above shows some of the tiling structure, which mimics that of tile renderers. On the left, the smallest squares are zoom 6, the right shows the most granular level (zoom 12). A typical planet GOB has about 60,000 tiles.
Below are some size statistics for the planet file and popular regional extracts (without metadata):
PBF GOL GOB
Planet 65.4 GB 93.6 GB +43.1% 46.0 GB -29.7%
California 1.18 GB 1.59 GB +35.0% 770 MB -36.5%
France 4.54 GB 5.89 GB +29.7% 2.84 GB -36.3%
Germany 4.29 GB 5.92 GB +38.0% 2.67 GB -37.5%
Italy 1.96 GB 2.63 GB +34.0% 1.34 GB -31.6%
Japan 2.13 GB 2.91 GB +36.1% 1.34 GB -37.0%
Poland 1.84 GB 2.72 GB +47.6% 1.29 GB -29.7%
Switzerland 487 MB 634 MB +30.1% 311 MB -36.2%
Dense, well-mapped areas tend to compress best as GOB. Less complete regions are below average in terms of GOB’s size advantage (GOBs for Brazil and China are only 23% smaller).
Just like GOLs, GOBs don’t store:
-
metadata (timestamp of last edit, changeset, username, etc.)
-
history (each GOB is a snapshot of the OSM dataset)
Therefore, it is not intended for editing, but for archival and distribution.
You will need GOL Tool 2.1 or above (download).
To export a GOL as a GOB:
gol save []
If is omitted, it uses the same base name as the GOL. The .gol and .gob extensions are optional.
To limit the export to a specific area, use the --area (-a) option. You can specify a (multi)polygon as WKT, GeoJSON or simple coordinates (lon,lat pairs, rings are closed automatically), either directly or as a file. If no file extension is given, .wkt is assumed.
For example:
gol save world bodensee -a 9.55,47.4,8.78,47.66,9.01,47.88,9.85,47.58,9.82,47.46
exports the tiles covering the region around the Bodensee (Lake Constance).
To import tiles into a GOL:
gol load []
As with save, if is omitted, the base name of the GOL is used. If the GOL does not exist, it is created. To load just a specific region, restrict it with the -a option.
gol load japan -a shikoku
loads tiles from japan.gob into japan.gol (creating it if it doesn’t yet exist), but only those intersecting the area defined in shikoku.wkt.
This is still a work in progress, so the format may change. I’m experimenting with different compression algos beyond zlib to make it even tighter and faster (zstd didn’t yield any significant gains). I’m also in the process of enabling gol load to download a GOB directly from a URL and build the GOL in the background, which would bring the wall-clock import time to zero.
As always, questions/feedback are welcome! Please stop on by on Github and @geodesk@en.osm.town.
🔥 Tell us your thoughts in comments!
#️⃣ #OSM #file #format #smaller #PBF #faster #import #General #talk
🕒 Posted on 1761349333
