Skip to content
kvbench

Write ingest: a firehose of new keys

Event logs, metrics, and bulk loads that write new keys fast. LSM engines like pebble and badger absorb writes without rewriting a tree.

This is the ingest workload: you are writing a stream of new keys as fast as they arrive. Event logs, metrics pipelines, crawl output, bulk imports. Reads happen later or elsewhere; right now the only thing that matters is keeping up with the write rate.

Writing new keys favours engines that do not rewrite a tree on every insert. That is the LSM trade: a write is an in-memory buffer insert plus a small log append, and the sorting happens later in the background.

These numbers are with the disk flush off, so they measure the engine's structural write speed, not the disk. If every write must survive a crash, that is a different question with different winners, on the durable-writes page.

The numbers

Writing 100,000 fresh random keys, 1 KB values, 8 concurrent clients:

Engine Shape M4 EPYC 4-core EPYC 6-core EPYC 8-core
badger LSM 239,000 61,000 22,000 32,000
buntdb in-memory B-tree 230,000 85,000 44,000 63,000
pogreb hash-log 190,000 84,000 50,000 54,000
pebble LSM 97,000 120,000 92,000 72,000
goleveldb LSM 92,000 54,000 30,000 37,000
tamnd/kv hash-log 83,000 25,000 12,000 17,000
bbolt B+tree 38,000 20,000 11,000 10,000
sqlite B-tree 29,000 8,000 3,000 5,000

There is a twist here worth noticing. badger is fastest on the M4 laptop, but pebble is fastest on every Linux server and barely slows as the data grows, because its compaction scales across cores where badger's value-log GC does not. If your ingest runs on a server, pebble is the safer bet; on a laptop or a few cores, badger edges it.

bbolt and sqlite sit at the bottom because a B-tree insert can rewrite a page, the exact cost the LSM shape avoids.

What to pick

  • pebble for sustained ingest on a server. It is the most consistent writer across machines and compresses the result smallest on disk (see footprint).
  • badger for ingest on a laptop or a few cores, where its in-memory write path is fastest, as long as you can afford its disk footprint (it uses 22x the raw data until its background GC catches up).
  • buntdb if the dataset fits in RAM and you want fast writes and fast reads from one engine.

What to avoid

  • bbolt and sqlite for write-heavy ingest. The B-tree page rewrite caps them well below the LSM engines.
  • badger if disk space is tight. Its write speed comes with the highest space amplification measured here.
  • tamnd/kv if ingest is the main job. It is mid-pack on fresh writes and its strength is on the read side, not ingest.