Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: 32MiB read/write buffer hurts performance #404

Closed
asubiotto opened this issue Nov 18, 2019 · 2 comments
Closed

perf: 32MiB read/write buffer hurts performance #404

asubiotto opened this issue Nov 18, 2019 · 2 comments

Comments

@asubiotto
Copy link

While testing a pebble-backed on-disk queue implementation (using sha 454f971) for vectorized external storage, I ran into some surprising performance characteristics when changing the capacity of a pebbleMapBatchWriter (i.e. how many k/vs are Set before calling Flush on a batch). The options for Pebble are defined in NewTempEngine https://github.com/cockroachdb/cockroach/blob/6e1539ac1488f8596376cc3ac32c9b0b334600a5/pkg/storage/engine/temp_engine.go#L117

The benchmark is a single-threaded writer that writes 512MiB of data and then reads all that data back. As I increase the buffer size, the graph looks a bit like this:

image

Where using a batch size of 32MiB makes the runtime suddenly spike to >10s where it is~3s on either side.

The benchmark is BenchmarkQueues on my branch here: https://github.com/asubiotto/cockroach/commit/470481215325e20733a516bdeaa866f0d13ede56#diff-7599d388fb764b9163afbc7b01484ff1R124

This issue can be reproduced by running:
make bench PKG=./pkg/col/colserde BENCHES=BenchmarkQueue TESTFLAGS="-v -cpuprofile cpu.out -memprofile mem.out -store=pebble -bufsize=32MiB -blocksize=512KiB -datasize=512MiB"
and
make bench PKG=./pkg/col/colserde BENCHES=BenchmarkQueue TESTFLAGS="-v -cpuprofile cpu16.out -memprofile mem16.out -store=pebble -bufsize=16MiB -blocksize=512KiB -datasize=512MiB"
The value size (-blocksize) is 512KiB in this benchmark.

The only difference I see in the cpu profile is a much larger percentage of time taken by memclrNoHeapPointers, possibly pointing to larger GC pressure in the 32MiB case, although the allocation profiles look largely similar (although *Batch.grow stands out in both cases, it seems like we should be able to reuse memory there).

It's possible that this is a problem with the code I wrote, but given that the only variable that changes between benchmarks is the size of a buffered batch, it seems unlikely.

@petermattis
Copy link
Collaborator

I'm not sure why there is such a big discrepancy at 32MB, though that buffer size does correspond to the large batch threshold (1/2 of the memtable size). If we set the buffer size to 31MB performance is good. If we set the memtable size to 128MB performance is good. The only difference I see is that when using a large batch we cycle through empty memtables which puts slightly more pressure on the GC. Why this causes such a dramatic slowdown is unclear.

@petermattis
Copy link
Collaborator

With the move to manual memory management for the Cache and memtable (#523, #527, #529), this is likely no longer a problem. I attempted to verify that, but the BenchmarkDiskQueue benchmark no longer has support for Pebble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants