Smallest footprint: least disk per byte stored

When storage cost is the constraint, space amplification is the number that matters. LSM engines compress 1 KB values below their raw size; one engine here uses 22x.

This is the workload where the bill is the constraint: you have a lot of data, it lives somewhere you pay for by the gigabyte, and the question is which engine stores the same data in the least space.

The number is space amplification: bytes on disk divided by bytes of actual data. 1.0x is break-even. Below 1.0x means the engine compressed your data. Above 1.0x means overhead, old versions, or padding you are paying to keep.

The numbers

On-disk size after writing 100,000 keys of 1 KB each, as a multiple of the raw data:

Engine	Shape	Space amplification	Reads this as
goleveldb	LSM	0.15x	Compressed to a seventh of raw
pebble	LSM	0.26x	Compressed to a quarter of raw
buntdb	in-memory B-tree	1.03x	About break-even
pogreb	hash-log	2.04x	Twice the data
bbolt	B+tree	2.33x	Page overhead
sqlite	B-tree	4.50x	Page and index overhead
tamnd/kv	hash-log	4.97x	Log plus index overhead
badger	LSM	22.16x	Value log not yet reclaimed

The LSM engines that sort and compress in the background, goleveldb and pebble, store 1 KB values in a fraction of their raw size. That is the LSM payoff: the same background work that makes writes cheap also packs the result tightly.

badger is the cautionary tale. It is one of the fastest writers, but it keeps values in a separate log and reclaims dead space lazily, so right after a write-heavy run it can sit at 22x the data size. That space comes back as its garbage collector runs, but if you provision disk for the steady state, provision generously.

Watch the update workloads

The fillrandom numbers above are fresh writes. Updates change the picture, because some engines keep old copies until they compact:

Engine	Fresh writes	Under hot-key updates
pebble	0.26x	0.3x
goleveldb	0.15x	0.2x
buntdb	1.0x	1.0x
badger	22x	22x
tamnd/kv	5.0x	53x

tamnd/kv holds 5x on fresh writes but balloons to 53x under a hot-key update burst, because every update appends a new copy and the old ones wait for compaction. If your data churns, the mixed scenario covers this in full; the short version is that tamnd/kv is a poor fit for update-heavy storage on a disk budget.

What to pick

goleveldb or pebble when storage cost is the constraint. They store your data smaller than it arrives and stay compact under updates.
buntdb if you want a predictable 1.0x and the dataset fits in RAM.

What to avoid

badger when disk is tight, unless you account for its lazy reclamation.
tamnd/kv for update-heavy data on a disk budget, because of the 53x churn.