-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default implementation for allocating zero-sized types (or ZSTs) #38
Comments
Thanks for summing this up! |
Looks great overal. Some small remarks/bikeshedding:
|
Indeed |
@JelteF thanks for pointing those out. I fixed 1, 2, and 5.
It seems redundant to pass unnecessary information. I'd go for the minimal API.
We could propose a change to |
After considering some more names of the methods I think I prefer
Yeah that makes sense indeed.
I think that's not worth it indeed. So in that case I think it makes sense to rename the method to |
I would prefer if the "default" Also for the naming, I feel that anything with Finally, I'm still a bit hesitant on this change. While this does help allocators that don't support ZST allocation by providing a default implementation of that methods, it doubles the work for allocators that do support ZST allocation: now they will need to implement both |
This is the case in this proposal "alloc" supports both. The only difference in what you describe is that alloc_nonempty is also safe.
Good catch, completely agree.
I think "doubling" effort is a bit of exaggeration when talking about forwarding a call. I understand what you mean though. However, I don't really see a way around it, which also avoids the problem of requiring a branch when allocating using an allocator that does not support it empty allocations. |
I have one other question. Do we really need the alloc_zero_sized_type method. I don't really see a reason why anyone would call that instead of the normal alloc. If so, we can simply inline it in the default alloc implementation. |
That's a good point. I am still thinking of other potential names, because empty types also exist in Rust..
I see what you are getting at. I'm not sure at present whether a better solution that would minimize workload in both cases is possible. As I mentioned, in Rust there is only really one requirement for allocating a ZST (i.e. it must be aligned). As such, it is unlikely that someone would want to provide an alternative implementation for
Yes, I think it could actually be incorporated into |
How would you implement |
I have another potential proposal:
This means that the minimum work for allocators that do inherently support zero sized allocation is just to implement For allocators that have an extra cost associated with checking for zero sized layouts, we can still provide a version that doesn't allow zero size to omit that check. It would just default to calling the version that allows zero sized layouts, but allow for the more optimizable path to be used. So to be somewhat concrete, my proposal would be roughly unsafe trait AllocRef {
type Error;
fn alloc(self, layout: Layout) -> Result<ptr::NonNull<u8>, Self::Error>;
fn alloc_nonzero(self, layout: NonZeroLayout) -> Result<ptr::NonNull<u8>, Self::Error> {
self.alloc(layout.into())
}
// all the other provided methods
} (Modulo exact naming ofc) Though I definitely am sympathetic to the need to make allocation as cheap/predictable as possible, I think we can do so while still providing a simpler edge-case-handling default API for when it's not needed. The pattern of providing specialized methods that default are impled as the generally applicable one, but then optimized implementations impl the generally applicable one by delegating to the specialized ones is already a decently common pattern that we can use here. I definitely need to take some time to catch up on wg-alloc though because I have developed some Opinions that apparently aren't represented yet. |
@TimDiekmann as for |
Making abstractions “zero-cost” is a nice goal, but it’s not the only goal. I think it should definitely not be taken as an absolute imperative. Is avoiding a single branch really worth significantly increasing the API surface? Is this supposed performance penalty measurable at all in real-world programs? Modern CPUs are often pretty good at branch prediction. (Also, using |
It's entirely acceptable to provide default implementations of some of the methods and then specify what the default does and then say (very clearly) "Also as part of implementing this trait you must override the defaults if the defaults would interact badly with the rest of your implementation." It's an |
Sorry for my delayed response; I was on vacation. I performed a benchmark to give us some real-world data about the performance penalty of having an additional branch in the code. I had two variants: Scenario 1: sized between 1-1024: RNG using uniform distribution (time taken over 10_000_000 allocations) I ran every scenario three times for both methods of allocating
|
As expected, a scratch/bump allocator stands a lot to gain from optimizations like this. Other allocators have less to gain (as most time is spent on the allocation's (system) call), but are still slightly faster - albeit not significant. |
Just for transparency: what was your benchmarking code and platform? I'd be happy to run it on my machine. The fact that scenario 2 for the bump allocator was actually slower for the direct call seems fishy to me. There shouldn't be any way that's possible, as the direct call should objectively have less work to do in the abstract machine. |
@CAD97 I cleaned up and pushed the benchmark code here: https://github.com/Wodann/alloc-wg-benchmark. When cleaning up I realised that I hadn't included the The |
OK, after looking at your code, let me try to interpret the data. First of all, let's assume that we don't really care about the performance of ZST allocation: the vast majority of allocation will be non-ZST. Let's ignore the "zero" results for now. Secondly, it makes no sense to branch for So we're basically left with one question: with a global allocator that doesn't support ZSTs (i.e. jemalloc), what is the overhead of including two additional zero-check (one in As such, I would like to reiterate my previous position: there should be just one method ( I am against adding methods considering the fact that |
That analysis makes sense to me. The bump allocator merely served as an edge case for allocators that have low overhead. It can indeed trivially implement allocation of ZSTs. I don't know that there are any other allocators with low overhead that would benefit from a direct approach. The only thing that could still be added - but in a different form - is a convenience function for creating |
If only we had PowerOf2Usize |
What about those:
All could be unstable under Can we fix, that we don't want to add dedicated functions for ZST, but we want to add ZST support in all methods? This would also clean up |
I like impl NonNull<u8> {
pub const fn aligned_dangling(align: NonZeroUsize) -> Self {
assert!(align.is_power_of_two());
unsafe {
let ptr = align as *mut u8;
NonNull::new_unchecked(ptr)
}
}
} |
The only thing I feel we can do at that point, is requiring that any |
So if there are no concerns, I will:
|
Bikeshed: |
Do we want an |
Personally, I like There's no reason a |
Just to clarify: you want |
Yes, that's what I'd probably expect; |
I really like the idea, as the requirements are implicitly met: A I don't think either, that |
Add `Layout::dangling()` to return a well-aligned `NonNull<u8>` Adds a convenient function to `Layout` to create a `NonNull<u8>` out of a layout to be returned on ZST allocations. This is the first item on the roadmap to support ZSTs in `AllocRef`: rust-lang/wg-allocators#38 (comment) r? @Amanieu
Allow ZSTs in `AllocRef` Allows ZSTs in all `AllocRef` methods. The implementation of `AllocRef` for `Global` and `System` were adjusted to reflect those changes. This is the second item on the roadmap to support ZSTs in `AllocRef`: rust-lang/wg-allocators#38 (comment) After this has landed, I will adapt `RawVec`, but since this will be a pretty big overhaul, it makes sense to do a different PR for it. ~~Requires rust-lang#69794 to land first~~ r? @Amanieu
Allow ZSTs in `AllocRef` Allows ZSTs in all `AllocRef` methods. The implementation of `AllocRef` for `Global` and `System` were adjusted to reflect those changes. This is the second item on the roadmap to support ZSTs in `AllocRef`: rust-lang/wg-allocators#38 (comment) After this has landed, I will adapt `RawVec`, but since this will be a pretty big overhaul, it makes sense to do a different PR for it. ~~Requires rust-lang#69794 to land first~~ r? @Amanieu
Overhaul of the `AllocRef` trait to match allocator-wg's latest consens; Take 2 GitHub won't let me reopen rust-lang#69889 so I make a new PR. In addition to rust-lang#69889 this fixes the unsoundness of `RawVec::into_box` when using allocators supporting overallocating. Also it uses `MemoryBlock` in `AllocRef` to unify `_in_place` methods by passing `&mut MemoryBlock`. Additionally, `RawVec` now checks for `size_of::<T>()` again and ignore every ZST. The internal capacity of `RawVec` isn't used by ZSTs anymore, as `into_box` now requires a length to be specified. r? @Amanieu fixes rust-lang/wg-allocators#38 fixes rust-lang/wg-allocators#41 fixes rust-lang/wg-allocators#44 fixes rust-lang/wg-allocators#51
After discussion in issue #16, I propose:
alloc
into two separate functions for allocating ZSTs and sized types.The former design decision is inspired by a well-defined approach to allocating ZSTs, that reduces the workload for trait implementors.
The latter design decision is grounded in Rust's philosophy of providing zero-cost abstractions. The
if layout.size() == 0
branch inalloc
would impose a performance penalty for runtime variable allocations. Hence, we offer the ability to make bare metal calls toalloc_sized
andalloc_zero_sized_type
for performant use cases.Design questions
If we want to make a data-driven decision, we could run benchmarks to determine the actual cost of the additional branch on runtime variable allocations.
The text was updated successfully, but these errors were encountered: