-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: faster table/block cache #11
Comments
I believe the |
Pebble, like RocksDB, contains two primary caches: the block cache and the table cache. The table cache (implemented by For
|
Updating the tableCacheShard LRU lists requires mutually exclusive access to the list. Grabbing and releasing tableCacheShard.mu for each access shows up prominently on blocking profiles. There is a known technique for avoiding this overhead: record the access in a per-thread data structure and then batch apply the buffered accesses when the buffer is full. For a highly concurrent scan workload, this results in a significant improvement in both throughput and stability of the throughput numbers. Here are numbers when running with 100 concurrent scanners before this change: ____optype__elapsed_____ops(total)___ops/sec(cum) scan_100 120.0s 109476226 912303.4 After: ____optype__elapsed_____ops(total)___ops/sec(cum) scan_100 120.0s 151736009 1264384.6 That's a 39% speedup. Even better, the instantaneous numbers during the run were stable. See #11
The Clock-PRO algorithm doesn't require exclusive access to internal state on hits. Switch to using a RWMutex which reduces a source of contention in cached workloads. For a concurrent scan workload, performance before this commit was: ____optype__elapsed_____ops(total)___ops/sec(cum) scan_100 120.0s 151736009 1264384.6 After: ____optype__elapsed_____ops(total)___ops/sec(cum) scan_100 120.0s 170326751 1419354.7 That's a 12% improvement in throughput. See #11
Updating the tableCacheShard LRU lists requires mutually exclusive access to the list. Grabbing and releasing tableCacheShard.mu for each access shows up prominently on blocking profiles. There is a known technique for avoiding this overhead: record the access in a per-thread data structure and then batch apply the buffered accesses when the buffer is full. For a highly concurrent scan workload, this results in a significant improvement in both throughput and stability of the throughput numbers. Here are numbers when running with 100 concurrent scanners before this change: ____optype__elapsed_____ops(total)___ops/sec(cum) scan_100 120.0s 109476226 912303.4 After: ____optype__elapsed_____ops(total)___ops/sec(cum) scan_100 120.0s 151736009 1264384.6 That's a 39% speedup. Even better, the instantaneous numbers during the run were stable. See #11
The Clock-PRO algorithm doesn't require exclusive access to internal state on hits. Switch to using a RWMutex which reduces a source of contention in cached workloads. For a concurrent scan workload, performance before this commit was: ____optype__elapsed_____ops(total)___ops/sec(cum) scan_100 120.0s 151736009 1264384.6 After: ____optype__elapsed_____ops(total)___ops/sec(cum) scan_100 120.0s 170326751 1419354.7 That's a 12% improvement in throughput. See #11
This is being done in #523. |
Use C malloc/free for the bulk of cache allocations. This required elevating `cache.Value` to a public citizen of the cache package. A distinction is made between manually managed memory and automatically managed memory. Weak handles can only be made from values stored in automatically managed memory. Note that weak handles are only used for the index, filter, and range-del blocks, so the number of weak handles is O(num-tables). A finalizer is set on `*allocCache` and `*Cache` in order to ensure that any outstanding manually allocated memory is released when these objects are collected. When `invariants` are enabled, finalizers are also set on `*Value` and sstable iterators to ensure that we're not leaking manually managed memory. Fixes #11
Use C malloc/free for the bulk of cache allocations. This required elevating `cache.Value` to a public citizen of the cache package. A distinction is made between manually managed memory and automatically managed memory. Weak handles can only be made from values stored in automatically managed memory. Note that weak handles are only used for the index, filter, and range-del blocks, so the number of weak handles is O(num-tables). A finalizer is set on `*allocCache` and `*Cache` in order to ensure that any outstanding manually allocated memory is released when these objects are collected. When `invariants` are enabled, finalizers are also set on `*Value` and sstable iterators to ensure that we're not leaking manually managed memory. Fixes #11
Use C malloc/free for the bulk of cache allocations. This required elevating `cache.Value` to a public citizen of the cache package. A distinction is made between manually managed memory and automatically managed memory. Weak handles can only be made from values stored in automatically managed memory. Note that weak handles are only used for the index, filter, and range-del blocks, so the number of weak handles is O(num-tables). A finalizer is set on `*allocCache` and `*Cache` in order to ensure that any outstanding manually allocated memory is released when these objects are collected. When `invariants` are enabled, finalizers are also set on `*Value` and sstable iterators to ensure that we're not leaking manually managed memory. Fixes #11
Use C malloc/free for the bulk of cache allocations. This required elevating `cache.Value` to a public citizen of the cache package. A distinction is made between manually managed memory and automatically managed memory. Weak handles can only be made from values stored in automatically managed memory. Note that weak handles are only used for the index, filter, and range-del blocks, so the number of weak handles is O(num-tables). A finalizer is set on `*allocCache` and `*Cache` in order to ensure that any outstanding manually allocated memory is released when these objects are collected. When `invariants` are enabled, finalizers are also set on `*Value` and sstable iterators to ensure that we're not leaking manually managed memory. Fixes #11
Use C malloc/free for the bulk of cache allocations. This required elevating `cache.Value` to a public citizen of the cache package. A distinction is made between manually managed memory and automatically managed memory. Weak handles can only be made from values stored in automatically managed memory. Note that weak handles are only used for the index, filter, and range-del blocks, so the number of weak handles is O(num-tables). A finalizer is set on `*allocCache` and `*Cache` in order to ensure that any outstanding manually allocated memory is released when these objects are collected. When `invariants` are enabled, finalizers are also set on `*Value` and sstable iterators to ensure that we're not leaking manually managed memory. Fixes #11
Use C malloc/free for the bulk of cache allocations. This required elevating `cache.Value` to a public citizen of the cache package. A distinction is made between manually managed memory and automatically managed memory. Weak handles can only be made from values stored in automatically managed memory. Note that weak handles are only used for the index, filter, and range-del blocks, so the number of weak handles is O(num-tables). A finalizer is set on `*allocCache` and `*Cache` in order to ensure that any outstanding manually allocated memory is released when these objects are collected. When `invariants` are enabled, finalizers are also set on `*Value` and sstable iterators to ensure that we're not leaking manually managed memory. Fixes #11
Use C malloc/free for the bulk of cache allocations. This required elevating `cache.Value` to a public citizen of the cache package. A distinction is made between manually managed memory and automatically managed memory. Weak handles can only be made from values stored in automatically managed memory. Note that weak handles are only used for the index, filter, and range-del blocks, so the number of weak handles is O(num-tables). A finalizer is set on `*allocCache` and `*Cache` in order to ensure that any outstanding manually allocated memory is released when these objects are collected. When `invariants` are enabled, finalizers are also set on `*Value` and sstable iterators to ensure that we're not leaking manually managed memory. Fixes #11
The existing block cache implementation uses the Clock-PRO replacement policy. The block cache is important for performance. The raw speed of accessing the block cache is an important component of read performance. Concurrency of the block cache directly affects concurrent read performance. There are three areas worth exploring here:
The text was updated successfully, but these errors were encountered: