perf: 32MiB read/write buffer hurts performance #404

asubiotto · 2019-11-18T17:03:24Z

While testing a pebble-backed on-disk queue implementation (using sha 454f971) for vectorized external storage, I ran into some surprising performance characteristics when changing the capacity of a pebbleMapBatchWriter (i.e. how many k/vs are Set before calling Flush on a batch). The options for Pebble are defined in NewTempEngine https://github.com/cockroachdb/cockroach/blob/6e1539ac1488f8596376cc3ac32c9b0b334600a5/pkg/storage/engine/temp_engine.go#L117

The benchmark is a single-threaded writer that writes 512MiB of data and then reads all that data back. As I increase the buffer size, the graph looks a bit like this:

Where using a batch size of 32MiB makes the runtime suddenly spike to >10s where it is~3s on either side.

The benchmark is BenchmarkQueues on my branch here: https://github.com/asubiotto/cockroach/commit/470481215325e20733a516bdeaa866f0d13ede56#diff-7599d388fb764b9163afbc7b01484ff1R124

This issue can be reproduced by running:
make bench PKG=./pkg/col/colserde BENCHES=BenchmarkQueue TESTFLAGS="-v -cpuprofile cpu.out -memprofile mem.out -store=pebble -bufsize=32MiB -blocksize=512KiB -datasize=512MiB"
and
make bench PKG=./pkg/col/colserde BENCHES=BenchmarkQueue TESTFLAGS="-v -cpuprofile cpu16.out -memprofile mem16.out -store=pebble -bufsize=16MiB -blocksize=512KiB -datasize=512MiB"
The value size (-blocksize) is 512KiB in this benchmark.

The only difference I see in the cpu profile is a much larger percentage of time taken by memclrNoHeapPointers, possibly pointing to larger GC pressure in the 32MiB case, although the allocation profiles look largely similar (although *Batch.grow stands out in both cases, it seems like we should be able to reuse memory there).

It's possible that this is a problem with the code I wrote, but given that the only variable that changes between benchmarks is the size of a buffered batch, it seems unlikely.

The text was updated successfully, but these errors were encountered:

petermattis · 2019-11-19T13:48:04Z

I'm not sure why there is such a big discrepancy at 32MB, though that buffer size does correspond to the large batch threshold (1/2 of the memtable size). If we set the buffer size to 31MB performance is good. If we set the memtable size to 128MB performance is good. The only difference I see is that when using a large batch we cycle through empty memtables which puts slightly more pressure on the GC. Why this causes such a dramatic slowdown is unclear.

petermattis · 2020-02-13T23:11:48Z

With the move to manual memory management for the Cache and memtable (#523, #527, #529), this is likely no longer a problem. I attempted to verify that, but the BenchmarkDiskQueue benchmark no longer has support for Pebble.

asubiotto mentioned this issue Nov 21, 2019

[DNM] colserde: add disk spilling prototype cockroachdb/cockroach#42647

Closed

petermattis closed this as completed Feb 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: 32MiB read/write buffer hurts performance #404

perf: 32MiB read/write buffer hurts performance #404

asubiotto commented Nov 18, 2019

petermattis commented Nov 19, 2019

petermattis commented Feb 13, 2020

perf: 32MiB read/write buffer hurts performance #404

perf: 32MiB read/write buffer hurts performance #404

Comments

asubiotto commented Nov 18, 2019

petermattis commented Nov 19, 2019

petermattis commented Feb 13, 2020