-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/cache: move the bulk of allocations off the Go heap #523
Conversation
6b02fcd
to
d20a26a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 18 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @sumeerbhola)
sstable/reader.go, line 1243 at r1 (raw file):
i, err := newRawBlockIter(bytes.Compare, data) b.Release()
This PR has a couple of actual bug fixes, which the manually managed block cache memory uncovered. This is an example. We were releasing the cache handle and then proceeding to use the data via the raw block iter. After the cache handle is released, the memory could be reused!
6897693
to
9c572de
Compare
I added a second commit which moves the memTable arena memory into manually managed (C) memory. This was the next largest source of Pebble memory use. |
983eab3
to
3d74a9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 20 files reviewed, 6 unresolved discussions (waiting on @itsbilal and @sumeerbhola)
internal/cache/alloc.go, line 78 at r2 (raw file):
} c.rnd.Seed(uint64(time.Now().UnixNano())) runtime.SetFinalizer(c, freeAllocCache)
This is the only required use of SetFinalizer
. I wonder if the allocCache
is necessary any more. Previously, it massively reduced GC pressure. But GC pressure is no longer a concern. I'm going to add a TODO.
internal/cache/clockpro.go, line 72 at r2 (raw file):
// Is buf automatically managed (i.e. it will be reclaimed by GC) or manually // managed? auto bool
TODO(peter): An alternative would be to set refs
to a large positive or negative value (e.g. 1<<30
) so it will never get released.
internal/cache/clockpro.go, line 74 at r2 (raw file):
auto bool traces []string
TODO(peter): Figure out a way to move this into under a build tag. No need to bloat the Value struct normally.
internal/cache/clockpro.go, line 130 at r2 (raw file):
} func (v *Value) trace(msg string) {
TODO(peter): move this under a build tag.
internal/cache/clockpro.go, line 736 at r2 (raw file):
} func checkValue(obj interface{}) {
TODO(peter): move this under a build tag.
b5bc2e9
to
e16cf94
Compare
A nice side-effect of the second commit is much lower peak Go memory usage in CRDB unit tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat. I haven't read all the code yet, but sending some early comments.
Reviewed 4 of 18 files at r1, 6 of 8 files at r3.
Reviewable status: 10 of 25 files reviewed, 9 unresolved discussions (waiting on @itsbilal, @petermattis, and @sumeerbhola)
internal/cache/clockpro.go, line 135 at r4 (raw file):
if old := e.getValue(); old != nil { if old.release() { if !old.auto {
this seems a bit odd: IIUC, auto values are only used for weak handles, which means their refcount should never become 0, so why should this need to be checked? Is it because entry.setValue()
can be called before the WeakHandle
is constructed?
Also, IIUC, there are two sources of references to Value:
- Handle has a reference to Handle.value
- The cache has a reference to entry.Val. This is the one being released here. And the reference to the new value has already been acquired before calling setValue().
Is that correct? If so, a code comment would be helpful.
I think it would be cleaner to make Value.release()
call allocFree()
if the reference count has gone to zero. The contract is that a caller of Value.release()
stop using the Value after returning,
I think this module could use a high-level comment explaining how the memory management works, that includes how alloc.go produces only manually managed slices, how we track which ones are auto and which ones are manual, why we need auto etc.
internal/cache/clockpro.go, line 150 at r4 (raw file):
// Handle provides a strong reference to an entry in the cache. The reference // does not pin the entry in the cache, but it does prevent the underlying byte // slice from being reused.
Would the following comment be accurate:
When both are non-nil, value
is initialized to the *Value
corresponding to entry.val
. Over the lifetime of the Handle
, value
stays unchanged but entry.val
may point to some other Value
.
internal/cache/clockpro.go, line 170 at r4 (raw file):
if h.value != nil { if h.value.release() { if !h.value.auto {
same question here about needing to check auto
.
internal/cache/clockpro.go, line 173 at r4 (raw file):
allocFree(h.value.buf) } h.value.buf = nil
this setting to nil
is not needed for correctness, yes?
internal/cache/clockpro.go, line 189 at r4 (raw file):
} h.value.makeWeak() return (*WeakHandle)(h.entry)
Assuming I understood the invariant correctly in my earlier comment, when Weak()
is called it is possible that h.entry.val
is already not pointing to h.value
so incrementing the refcount for it isn't necessary (though doing so is harmless, and there is always the possibility of a race there). Is that correct?
If so, could you add a comment.
internal/cache/value.go, line 62 at r4 (raw file):
// Truncate the buffer to the specified length. The buffer length should not be // changed once the value has been added to the cache. Instead, a new Value
why this "buffer length should not be changed" comment? It seems that manual.Free()
does the right thing by using the capacity of the slice.
internal/cache/value.go, line 70 at r4 (raw file):
func (v *Value) makeWeak() { if !v.auto { panic("pebble: cannot make auto Value into a weak Value")
s/auto/manual/
e16cf94
to
82b8325
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 25 files reviewed, 9 unresolved discussions (waiting on @itsbilal and @sumeerbhola)
internal/cache/clockpro.go, line 135 at r4 (raw file):
Previously, sumeerbhola wrote…
this seems a bit odd: IIUC, auto values are only used for weak handles, which means their refcount should never become 0, so why should this need to be checked? Is it because
entry.setValue()
can be called before theWeakHandle
is constructed?Also, IIUC, there are two sources of references to Value:
- Handle has a reference to Handle.value
- The cache has a reference to entry.Val. This is the one being released here. And the reference to the new value has already been acquired before calling setValue().
Is that correct? If so, a code comment would be helpful.
I think it would be cleaner to make
Value.release()
callallocFree()
if the reference count has gone to zero. The contract is that a caller ofValue.release()
stop using the Value after returning,I think this module could use a high-level comment explaining how the memory management works, that includes how alloc.go produces only manually managed slices, how we track which ones are auto and which ones are manual, why we need auto etc.
Agreed on the need for a high-level comment explaining the manual memory management. See the new comment above Cache
.
You're right about looking at old.auto
being confusing. There was a time in this PR when auto
values could be returned to the cache, but that is no longer true. Note that Weak
may never have been called on an auto value inserted into the cache, so we can actually hit this code path in practice. This makes me suspect that the right thing to do is to remove the Value.auto
field and mark auto values by setting a very large negative reference count. I've gone ahead and done that.
internal/cache/clockpro.go, line 150 at r4 (raw file):
Previously, sumeerbhola wrote…
Would the following comment be accurate:
When both are non-nil,value
is initialized to the*Value
corresponding toentry.val
. Over the lifetime of theHandle
,value
stays unchanged butentry.val
may point to some otherValue
.
Yes. I added something along these lines with slightly different wording.
internal/cache/clockpro.go, line 170 at r4 (raw file):
Previously, sumeerbhola wrote…
same question here about needing to check
auto
.
Same answer. I'm going to see about getting rid of this, but it is needed for correctness right now.
internal/cache/clockpro.go, line 173 at r4 (raw file):
Previously, sumeerbhola wrote…
this setting to
nil
is not needed for correctness, yes?
It is needed for the leak checking facility to work. I've added a comment about this.
internal/cache/clockpro.go, line 189 at r4 (raw file):
Previously, sumeerbhola wrote…
Assuming I understood the invariant correctly in my earlier comment, when
Weak()
is called it is possible thath.entry.val
is already not pointing toh.value
so incrementing the refcount for it isn't necessary (though doing so is harmless, and there is always the possibility of a race there). Is that correct?
If so, could you add a comment.
Comment added.
internal/cache/value.go, line 62 at r4 (raw file):
Previously, sumeerbhola wrote…
why this "buffer length should not be changed" comment? It seems that
manual.Free()
does the right thing by using the capacity of the slice.
Once the value has been added to the cache, there may be concurrent readers of the Value. I've expanded the comment to mention this.
internal/cache/value.go, line 70 at r4 (raw file):
Previously, sumeerbhola wrote…
s/auto/manual/
Egads! Done.
7e534d8
to
d3f2287
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 26 files reviewed, 10 unresolved discussions (waiting on @itsbilal and @sumeerbhola)
docs/memory.md, line 8 at r7 (raw file):
Block Cache. MemTables buffer data that has been written to the WAL but not yet flushed to an SSTable. The Block Cache provides a cache of uncompressed SSTable data blocks.
@sumeerbhola I had already started writing this when you asked for the high-level comment on how the cache handles manual memory management. Could be the basis for a blog post in the future.
70f0a40
to
b7514d8
Compare
I've added a few more commits to this PR extending it to encompass more cache memory allocations. Specifically, we now manually manage the memory for the The motivation for this extra work was noticing a performance problem on The only remaining pointer-heavy structures in the cache is the |
a292d45
to
e1474d5
Compare
I've added a commit which implements the above. This PR has now spiraled out of control. Let me know if you want it to be broken up into more digestible pieces. |
1a6c510
to
f47d186
Compare
Rebased to fix a merge conflict with #524. I'm going to hold off on making any more changes to this PR until it is reviewed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a look at all the cache related code. Just have some minor comments.
Reviewed 3 of 18 files at r1, 6 of 9 files at r2, 4 of 15 files at r8, 5 of 6 files at r10, 1 of 5 files at r11, 5 of 6 files at r17, 2 of 5 files at r18, 1 of 1 files at r19, 3 of 5 files at r21.
Reviewable status: 25 of 35 files reviewed, 16 unresolved discussions (waiting on @itsbilal, @petermattis, and @sumeerbhola)
internal/cache/alloc.go, line 36 at r14 (raw file):
// allocNew allocates a slice of size n. The use of sync.Pool provides a // per-cpu cache of allocCache structures to allocate from.
Can you add a file level comment that allocNew() allocates non garbage-collected memory, so every call must be paired with allocFree().
internal/cache/clockpro.go, line 178 at r21 (raw file):
// Acquire a reference for the cache. We'll be adjusting this below if the // value is not actually added to the cache.
aren't values created with a refcount of 1? isn't that the reference for the cache?
internal/cache/entry.go, line 48 at r21 (raw file):
// there are other pointers to the entry and shard which will keep them // alive. In particular, every entry in the cache is referenced by the // shard.blocks map. And every shard is referenced by the Cache.shards map.
is there a tracing mode that will tell us that the pointers stored in an entry
point to other entries that have been reclaimed by Go GC?
Say a global map[*entry]int
where the value is a refcount, and is incremented and decremented whenever an entry
starts/stops pointing to another entry and also tracks the other pointers to an entry.
internal/cache/entry_normal.go, line 68 at r21 (raw file):
func (c *entryAllocCache) free(e *entry) { c.entries = append(c.entries, e)
is there no limit to the number of entry
structs the cache will keep?
internal/cache/value_normal.go, line 18 at r14 (raw file):
// // This is the definition of Value that is used in normal builds. type Value struct {
isn't this the same as value_notracing.go
-- why this duplication?
sstable/reader.go, line 1086 at r21 (raw file):
// The value will be "freed" when the iterator is closed, so make a copy // which will outlast the lifetime of the iterator. newValue := make([]byte, len(value))
was this also a bug?
f47d186
to
a9d2453
Compare
a9d2453
to
a9bf630
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: 12 of 29 files reviewed, 16 unresolved discussions (waiting on @itsbilal and @sumeerbhola)
internal/cache/alloc.go, line 36 at r14 (raw file):
Previously, sumeerbhola wrote…
Can you add a file level comment that allocNew() allocates non garbage-collected memory, so every call must be paired with allocFree().
I didn't see a good place to leave a file-level comment, but I extended the comment here to indicate allocFree
MUST be called.
internal/cache/clockpro.go, line 178 at r21 (raw file):
Previously, sumeerbhola wrote…
aren't values created with a refcount of 1? isn't that the reference for the cache?
Yes, values are created with a reference count of 1. I've been thinking of that reference count as for the handle. I've moved this comment below to where this reference count is being transferred to the handle, and moved the acquire
so it is only done if the value is being added to the cache.
internal/cache/entry.go, line 48 at r21 (raw file):
Previously, sumeerbhola wrote…
is there a tracing mode that will tell us that the pointers stored in an
entry
point to other entries that have been reclaimed by Go GC?
Say a globalmap[*entry]int
where the value is a refcount, and is incremented and decremented whenever anentry
starts/stops pointing to another entry and also tracks the other pointers to an entry.
I don't have something like you're describing, but there is shard.metaCheck
which looks through the shard
maps to try and find dangling references to an entry. Is that sufficient?
internal/cache/entry_normal.go, line 68 at r21 (raw file):
Previously, sumeerbhola wrote…
is there no limit to the number of
entry
structs the cache will keep?
Correct. My thinking was that the objects in the sync.Pool
will be gc'd eventually and the memory usage isn't significant.
internal/cache/value_normal.go, line 18 at r14 (raw file):
Previously, sumeerbhola wrote…
isn't this the same as
value_notracing.go
-- why this duplication?
Yes, it is identical. No reason for the duplication other than my not figuring out earlier how to get rid of it. Done.
sstable/reader.go, line 1086 at r21 (raw file):
Previously, sumeerbhola wrote…
was this also a bug?
Sort of. This was something I found when trying to use manual memory management for weak handles. With auto handles, we'd never try to reuse the cached block. Also note, this code path is only used for testing as pebble.DB.Get
uses iterators in order to handle merge records and range tombstones. Seems worth keeping, though.
a9bf630
to
20f8be4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 12 of 29 files reviewed, 15 unresolved discussions (waiting on @itsbilal and @sumeerbhola)
internal/cache/entry_normal.go, line 68 at r21 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Correct. My thinking was that the objects in the
sync.Pool
will be gc'd eventually and the memory usage isn't significant.
I thought about this more last night and decided to add a limit (128). I think there could be scenarios where the sync.Pool
is not clearing the caches and large amounts of entries are piling up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 21 files at r15, 1 of 21 files at r22, 1 of 7 files at r23, 1 of 5 files at r24, 1 of 1 files at r26, 5 of 5 files at r27.
Reviewable status: 22 of 29 files reviewed, 18 unresolved discussions (waiting on @itsbilal, @petermattis, and @sumeerbhola)
docs/memory.md, line 44 at r27 (raw file):
Attempting to adjust `GOGC` to account for the significant amount of memory used by the Block Cache is fraught. What value should be used? `10%`? `20%`? Should the setting be tuned dynamically? Rather than
Do we know the memory limit when the node starts and do we have the block cache as a percentage of that memory limit?
If the latter were 0.8 and we knew that it would be consumed by the cache, it seems like (1/0.8 - 1) * 100 would work as the percentage. And we could possibly set it dynamically based on the current live bytes in the cache (say B) and the memory limit L. Something like:
f = B/L
min(1/f - 1, 2) * 100
Is it worth running one benchmark both with this PR and the simple change and looking at the difference in memory and cpu (especially if you are thinking of a blog post)?
internal/cache/robin_hood.go, line 243 at r27 (raw file):
} func (m *robinHoodMap) Delete(k key) {
given the context this is used, I am assuming there isn't a need to shrink the capacity if deletes are leaving too many empty slots?
internal/cache/robin_hood_test.go, line 13 at r27 (raw file):
"golang.org/x/exp/rand" )
do you have a benchmark comparing the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 22 of 29 files reviewed, 18 unresolved discussions (waiting on @itsbilal and @sumeerbhola)
docs/memory.md, line 44 at r27 (raw file):
Previously, sumeerbhola wrote…
Do we know the memory limit when the node starts and do we have the block cache as a percentage of that memory limit?
If the latter were 0.8 and we knew that it would be consumed by the cache, it seems like (1/0.8 - 1) * 100 would work as the percentage. And we could possibly set it dynamically based on the current live bytes in the cache (say B) and the memory limit L. Something like:
f = B/L
min(1/f - 1, 2) * 100Is it worth running one benchmark both with this PR and the simple change and looking at the difference in memory and cpu (especially if you are thinking of a blog post)?
We know the block cache size when the node starts, but there is no hard memory limit. We can see the memory limit of the machine / container, but it isn't necessarily ok to use all of that memory. And we also need to leave memory for SQL/other usage in the system. I could expand here on why I think tweaking the GOGC percent is fraught. Let's consider the sysbench/oltp_update_index
workload. The block cache is configured at 17GB and gets mostly filled. If we dynamically lowered GOGC we'd run into exactly the situation that the first commit in this PR experienced: more frequent GCs with more work to be done. This is going to be a pure drag on performance compared to manually managing memory. There is also the "ballast trick" (mentioned in golang/go#23044) which attempts a similar effect to dynamically changing GOGC by performing large "ballast" allocations that are otherwise unused.
internal/cache/robin_hood.go, line 243 at r27 (raw file):
Previously, sumeerbhola wrote…
given the context this is used, I am assuming there isn't a need to shrink the capacity if deletes are leaving too many empty slots?
That's my expectation as well. Having this map be mostly empty doesn't seem harmful. If the cache grew to a certain size at some point, it is likely to grow to that size again.
I wonder if there is fraction m.count / m.size
at which it makes sense to shrink. I'm not sure what it is for Robin Hood hashing. Maybe the internets know.
internal/cache/robin_hood_test.go, line 13 at r27 (raw file):
Previously, sumeerbhola wrote…
do you have a benchmark comparing the two?
https://github.com/petermattis/maptoy. The implementation here has diverged a little bit from the maptoy
implementation (a few bug fixes), but I expect the numbers are fairly similar. I just didn't port that over. Even if robinHoodMap
was slightly slower, the manually managed memory makes it beneficial.
Should I port the benchmark?
20f8be4
to
e2f720d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 20 of 29 files reviewed, 18 unresolved discussions (waiting on @itsbilal and @sumeerbhola)
internal/cache/robin_hood_test.go, line 13 at r27 (raw file):
Previously, petermattis (Peter Mattis) wrote…
https://github.com/petermattis/maptoy. The implementation here has diverged a little bit from the
maptoy
implementation (a few bug fixes), but I expect the numbers are fairly similar. I just didn't port that over. Even ifrobinHoodMap
was slightly slower, the manually managed memory makes it beneficial.Should I port the benchmark?
I ported the benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 20 of 29 files reviewed, 21 unresolved discussions (waiting on @itsbilal, @petermattis, and @sumeerbhola)
docs/memory.md, line 44 at r27 (raw file):
Previously, petermattis (Peter Mattis) wrote…
We know the block cache size when the node starts, but there is no hard memory limit. We can see the memory limit of the machine / container, but it isn't necessarily ok to use all of that memory. And we also need to leave memory for SQL/other usage in the system. I could expand here on why I think tweaking the GOGC percent is fraught. Let's consider the
sysbench/oltp_update_index
workload. The block cache is configured at 17GB and gets mostly filled. If we dynamically lowered GOGC we'd run into exactly the situation that the first commit in this PR experienced: more frequent GCs with more work to be done. This is going to be a pure drag on performance compared to manually managing memory. There is also the "ballast trick" (mentioned in golang/go#23044) which attempts a similar effect to dynamically changing GOGC by performing large "ballast" allocations that are otherwise unused.
I don't quite understand. If we know the limit on (other usage + block cache) that is the L in the earlier equation.
internal/cache/entry.go, line 48 at r28 (raw file):
// there are other pointers to the entry and shard which will keep them // alive. In particular, every entry in the cache is referenced by the // shard.blocks map. And every shard is referenced by the Cache.shards map.
this is the entries
map now.
I was thinking about this PR again, and I am wondering if this is getting too intricate and complicated.
Given that Go does not allow us to construct a reffed_ptr that would follow the RAII idiom, due to the lack of control over assignment operators and no destructors, we have to remember to accurately ref and unref in the code, which seems bug prone.
If we could adjust GOGC to make the large heap problem go away, it seems the remaining problem is the number of pointers in the internal data-structures of the cache. Can that be "fixed" by changing all the cache internal data-structures to use a uintptr
instead of pointers and rely on the entries
map to keep things alive? Making the cache remove all the uintptrs for a pointer that it was evicting is a more localized change -- it doesn't matter if someone outside the cache is keeping that memory alive. So all memory would be Go allocated.
internal/cache/robin_hood.go, line 23 at r28 (raw file):
var hashSeed = uint64(time.Now().UnixNano()) // Fibonacci hash: https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/
I didn't read the full article but I am assuming the following is a mapping of the following to our context?
size_t index_for_hash(size_t hash)
{
hash ^= hash >> shift_amount;
return (11400714819323198485llu * hash) >> shift_amount;
}
what is the motivation for the ... | 1)
-- I am guessing you want the output to always be odd but by doing so isn't the difference between the consecutive even, odd inputs being lost and causing a hash collision?
internal/cache/robin_hood.go, line 35 at r28 (raw file):
type robinHoodEntry struct { key key value *entry
can this also be a Go pointer? If yes, I think it is worth repeating here the comment from entry.go.
Use C malloc/free for the bulk of cache allocations. This required elevating `cache.Value` to a public citizen of the cache package. A distinction is made between manually managed memory and automatically managed memory. Weak handles can only be made from values stored in automatically managed memory. Note that weak handles are only used for the index, filter, and range-del blocks, so the number of weak handles is O(num-tables). A finalizer is set on `*allocCache` and `*Cache` in order to ensure that any outstanding manually allocated memory is released when these objects are collected. When `invariants` are enabled, finalizers are also set on `*Value` and sstable iterators to ensure that we're not leaking manually managed memory. Fixes #11
For manual Values, allocate the `Value` at the start of the backing buffer. For large block caches, the number of Values can reach 500k-1m or higher, and consume a significant fraction of the total allocated objects in the Go heap.
Each entry in the cache has an associated `entry` struct. For large block caches, the number of entries can reach 500k-1m or higher, and consume a significant fraction of the total allocated objects in the Go heap. For manually managed values, the lifetime of the associated `entry` is known and the `*entry` never escapes outside of the cache package. This allows us to manually manage the memory for these entries, which are the majority of entries in the cache.
The "invariants" build tag causes different code paths to be utilized. Add a no-invariants test configuration for testing the invariant-less code paths. Change the race builds to not specify the "invariants" tags as doing so causes them to frequently flake on the CI machines. This appears to be due to the size of the machines (RAM available) and not something more serious.
Add `robinHoodMap` which is a hash map implemented using Robin Hood hashing. `robinHoodMap` is specialized to use `key` and `*entry` as the key and value for the map respectively. The memory for the backing array is manually allocated. This lowers GC pressure by moving the allocated memory out of the Go heap, and by moving all of `*entry` pointers out of the Go heap. Use `robinHoodMap` for both the `shard.blocks` and `shard.files` maps. Old is Go-map, new is RobinHood-map. name old time/op new time/op delta MapInsert-16 144ns ± 5% 101ns ± 8% -29.88% (p=0.000 n=10+9) MapLookupHit-16 137ns ± 1% 47ns ± 1% -65.58% (p=0.000 n=8+10) MapLookupMiss-16 89.9ns ± 4% 48.6ns ± 3% -45.95% (p=0.000 n=10+9)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 20 of 29 files reviewed, 21 unresolved discussions (waiting on @itsbilal, @petermattis, and @sumeerbhola)
internal/cache/entry.go, line 48 at r28 (raw file):
this is the entries map now.
Fixed.
Given that Go does not allow us to construct a reffed_ptr that would follow the RAII idiom, due to the lack of control over assignment operators and no destructors, we have to remember to accurately ref and unref in the code, which seems bug prone.
It certainly is a bit awkward in Go to do this manual memory management, yet we have to do similar for file descriptors in Go. So it is possible to get this right. I'm also feeling fairly good about the correctness due to the checks that take place under the "invariants" build tag. Without those checks I'd be very scared of this PR. With them, I feel pretty good given that the number of places we need to track references is limited.
If we could adjust GOGC to make the large heap problem go away, it seems the remaining problem is the number of pointers in the internal data-structures of the cache.
I have no evidence this is possible. Only speculation from myself, and murmurings on the internets. My suspicion is that we could get some benefit from dynamically adjusting GOGC
, but we'll be fighting with those heuristics for a long time. In contrast, I think this PR is a bit of near term pain to get right, but then it will be solid and can generally be ignored.
I'll also reiterate another problem I think I alluded to earlier, it would be fairly obnoxious from a usability perspective for Pebble, a library, to muck with GOGC
, a process-wide control.
Other systems built on GC'd languages have placed cache's in "off heap" memory. Cassandra does this, for instance.
Can that be "fixed" by changing all the cache internal data-structures to use a uintptr instead of pointers and rely on the entries map to keep things alive? Making the cache remove all the uintptrs for a pointer that it was evicting is a more localized change -- it doesn't matter if someone outside the cache is keeping that memory alive. So all memory would be Go allocated.
If Go has allocated the *entry
pointer such that we need an entries
map to keep it referenced and alive, then the GC will have to traverse all of the entries in the map. Using uintptrs for all of the internal pointers would reduce the number of pointers in the heap by a constant factor, but you'd still have O(num-entries) pointers at minimum. Would that reduction be sufficient? Possibly, though it seems like it can't be better (from a performance perspective) than the approach in this PR.
internal/cache/robin_hood.go, line 23 at r28 (raw file):
Previously, sumeerbhola wrote…
I didn't read the full article but I am assuming the following is a mapping of the following to our context?
size_t index_for_hash(size_t hash) { hash ^= hash >> shift_amount; return (11400714819323198485llu * hash) >> shift_amount; }
what is the motivation for the
... | 1)
-- I am guessing you want the output to always be odd but by doing so isn't the difference between the consecutive even, odd inputs being lost and causing a hash collision?
The motivation was to avoid returning a zero hash value, but I don't think that is problematic anymore. I've removed the | 1
.
internal/cache/robin_hood.go, line 35 at r28 (raw file):
Previously, sumeerbhola wrote…
can this also be a Go pointer? If yes, I think it is worth repeating here the comment from entry.go.
Yes, this can be a Go allocated object.
e2f720d
to
0f2704f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 14 of 29 files reviewed, 21 unresolved discussions (waiting on @itsbilal and @sumeerbhola)
docs/memory.md, line 44 at r27 (raw file):
Previously, sumeerbhola wrote…
I don't quite understand. If we know the limit on (other usage + block cache) that is the L in the earlier equation.
I'm also not following the equation you gave above. Values of GOGC above 100 make the GC less aggressive and cause the heap to grow. Values below 100 make the GC run more frequently. I think a heuristic would need to keep GOGC in the range [20,100]
, depending on how full the block cache is.
I'm mildly interested to know whether tweaking GOGC dynamically is a beneficial technique, but as I mentioned elsewhere, this is really speculative. I've never done this. I've never seen any Go code which does this. And a search right now doesn't reveal anyone recommending this (the subject is silent, though I know I saw mention of this at some point). I did find golang/go#32105 which was a recently fixed deadlock bug with SetGCPercent
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 14 of 29 files reviewed, 21 unresolved discussions (waiting on @itsbilal, @petermattis, and @sumeerbhola)
docs/memory.md, line 44 at r27 (raw file):
I'm also not following the equation you gave above.
it should have said
min(1/f - 1, 1) * 100
But anyway that was a rough idea -- it would need some refinement.
internal/cache/entry.go, line 48 at r28 (raw file):
I'll also reiterate another problem I think I alluded to earlier, it would be fairly obnoxious from a usability perspective for Pebble, a library, to muck with GOGC, a process-wide control.
That is a fair point.
Ok, in we go! |
Use C malloc/free for the bulk of cache allocations. This required
elevating
cache.Value
to a public citizen of the cache package. Adistinction is made between manually managed memory and automatically
managed memory. Weak handles can only be made from values stored in
automatically managed memory. Note that weak handles are only used for
the index, filter, and range-del blocks, so the number of weak handles
is O(num-tables).
A finalizer is set on
*allocCache
and*Cache
in order to ensure thatany outstanding manually allocated memory is released when these objects
are collected.
When
invariants
are enabled, finalizers are also set on*Value
andsstable iterators to ensure that we're not leaking manually managed
memory.
Fixes #11