-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do copy[_nonoverlapping]/swap[_nonoverlapping] do typed copies? #63159
Comments
The least gross and special-cased way I can think of achieving this would be extending A slightly more hacky, and more limited, fix for this specific case would be to trim the size copied by the amount of trailing padding (in this case 128 -> 1) if we can see statically that it's a typed copy of a single element ( |
Instead of extending memcpy, could the Rust front end generate single loads for each of the fields, and would LLVM be able to merge them as appropriate? That is, for the OP, I'd expect Rust to emit an 8-bit load. For something like |
I don't think that can work. Leaving aside the huge inefficiencies of emitting so many instructions and praying that LLVM merges them, if we just emit loads and stores for the non-padding bytes and don't say anything about the padding, then LLVM has no indication that it's even allowed to clobber those parts. |
LLVM already supports specifying padding through We currently don't use it because we generally aren't interested in TBAA. I also haven't checked whether the padding specified there is actually taking into account. |
I am not at all convinced that this is a legal optimization, and I think making it legal makes the operational semantics much more annoying than it should be.
To me this looks like the description of an untyped, If you had used |
In particular, the docs also say that
Given this spec, I see no way to justify not copying padding bytes. As in, I think we have to copy all bytes to comply with the spec, and the proposed optimization is illegal. |
@RalfJung Not for copy_nonoverlapping possibly, but we should be able to elide padding copies for implicit memcpy's, as in the |
Ah, yes, I agree |
https://rust.godbolt.org/z/Ka2VLG (A C call to |
I did use |
I never claimed it had to. Of course the compiler is allowed to inline
Agreed, I had missed that. But if I read your OP correctly, you are saying the optimization should happen for |
Yep, for TIL that I think I've never needed |
Likewise, I think we should add such a function (or feel out whether we can get away with redefining the existing functions that way). |
I think we would need to survey how users are using it in the wild. The API does say that all bytes are copied and it does not require the memory to be "valid" at I don't recall any use in libcore/liballoc/libstd where changing the semantics would break things. It can't imagine why would it make sense for some code to pick an arbitrary T with different padding, instead of the T that you actually want the code to copy. |
Just to be sure we are on the same side here, the semantics of that would basically to do If it doesn't change the Abstract Machine, I am fine with whatever. ;) |
A typed copy would also imply that this operation is UB if the copied value(s) does not satisfy the validity invariant. |
Yes.
Yes, I think we can just give this define the operational semantics of such a function as equivalent to: fn copy<T>(src: *const T, dest: *mut T, len: usize) {
for i in 0..len { dest.add(i).write(src.add(i).read()) } // EDIT: see rkruppe below
} However, that sounds like you want to make fn copy_nonoverlapping<T>(src: *const T, dest: *mut T, len: usize) {
let src = src as *const MaybeUninit<u8>;
let dest = dest as *mut MaybeUninit<u8>;
for i in 0..(len * size_of::<T>()) { *dest.add(i) = *src.add(i); }
} and that should work independently of whether
Arguably, if the |
|
Turns out Or (for the spec) we just say we use the (Pseudo-)MIR-level
I don't want to do anything, I was just trying to figure out what it is that you want to do. ;) Personally I'd feel best about just keeping our current semantics.
Do you mean the current operational semantics? Because that sure looks like it copies the padding bytes.
Now I am confused when you are talking about the old Using |
Yes and yes (the current semantics do copy the padding bytes). |
I called the new copy_nonoverlapping just |
There already is a |
Does this imply that |
Maybe. How we spec those (copying bytes vs. typed copy) and how they are implemented is orthogonal in principle, though if we actually want to exploit that they are typed we likely need intrinsics. |
I think that, at least to fix this bug, the answer is yes. The current implementation does an untyped copy instead of a typed one. That's correct, but copies too much. One can implement an untyped copy on top of a typed one, but the opposite is not true. |
Well, we first need a lang team decision that this is indeed a bug -- i.e., that these operations should act like typed copies. |
To that end, could someone make a summary for why it should be considered a bug or better yet why a typed version needs to exist? (and also why not?) A pros & cons would be helpful. :) |
I think we should separate the three questions at hand here. First of all, this function from the OP: pub unsafe fn foo(x: &A) -> A {
*x
} unarguably performs typed copy. That it currently results in machine code that copies padding is an implementation detail of the current compiler not to be relied upon any more than e.g. in a The second question (raised by the second code example in the OP) is whether A third question, which came up in the last couple comments, is whether |
Agreed. So there should likely be some issue just tracking the lost optimization potential here. Maybe that should be this one, after removing the other example from the OP.
Actually, thinking about this again -- both of them pass the input/output by value. So they already do a typed copy for that. Doing two typed copies in a row is indistinguishable from doing a typed copy and a byte-wise copy, so actually we can do what we want (between these two options) for |
I think we all agree that changing the semantics of Whether we can make this breaking change or whether that's a change worth making are unresolved questions. It is my personal opinion that typed copy semantics are a better default and they might enable some optimizations, but I wouldn't risk breaking the world for them when we can just add two new APIs without issues. While we could test how much code this change would break using I also think that resolving this is fairly low priority, but I'd be ok with someone implementing these behind a feature gate, and with with people experimenting with using @nikic's approach - we'll probably learn something useful from doing that. |
There is definitely some analysis to be done here (lay out the implications for unsafe code using them & for codegen), if someone wants to push it further. Though FWIW since it's a library API change/expansion it seems more like a libs team thing than a lang team thing (though of course it falls in the subject matter of UCG, too).
If by "@nikic's approach" you mean TBAA metadata on memcpys, one unpleasant thing you'll learn is that it will miscompile Rust programs -- I am reasonably certain there's no way to express padding with the TBAA metadata without making type punning UB. |
Yep; agreed.
(The behavior of intrinsics is mostly a lang team thing since the whole point of them is about their "intrinsicness".) |
What implications for unsafe code do you have in mind? One can already write a
I think that experimenting with optimizations for typed copies is worth doing, and that we should try to find ways to make them faster, but this is an orthogonal issue to whether we add the semantic APIs for allowing users to easily perform typed copies. We could implement typed copies as untyped ones forever and that would be fine (miri would, however, implement typed copies appropriately and detect misuses here).
Ouch. |
I don't think that's the case. TBAA metadata is generic and you don't need to model C++ semantics in particular with it. Making everything alias everything is also possible. (Of course, it's still a much bigger gun than what we actually need here.) |
@gnzlbg The implications for unsafe code are exactly the implication of choosing typed copies vs copying bytes (which may be invalid for the type in question!) including padding, and the wider impact of recommending a different default. You're also missing my point re: codegen but I won't be tricked into writing the summary I already decided I don't want to take the time to write ;) @nikic Oops, I think you're right, I forgot that there can be multiple distinct "roots". (Though I am worried about the explosion in metadata this may cause, and how the AA infrastructure will weigh the MayAlias from TBAA with a Must/NoAlias from other AA implementations.) |
True. Note however that right now, while Miri does check the validity invariant on typed copies, it does not "kill" padding. That will be non-trivial to implement, I think. That's clearly a deficiency in Miri, just one I thought you should be aware of. :) |
FWIW the same question arises for Surprisingly, however, that is currently not the case -- |
Here's a little test I made for checking that all these operations are untyped. Currently, it fails in Miri. |
I have no strong feeling about typed vs untyped copies in swap_nonoverlapping. So long as both sides are doing the same kind of copy, that's fine. (It was confusing when L->R was typed but R->L was untyped, so let's not do that again.) |
In #97712 I am proposing to document that these functions are all doing untyped copies. |
ptr::copy and ptr::swap are doing untyped copies The consensus in rust-lang#63159 seemed to be that these operations should be "untyped", i.e., they should treat the data as raw bytes, should work when these bytes violate the validity invariant of `T`, and should exactly preserve the initialization state of the bytes that are being copied. This is already somewhat implied by the description of "copying/swapping size*N bytes" (rather than "N instances of `T`"). The implementations mostly already work that way (well, for LLVM's intrinsics the documentation is not precise enough to say what exactly happens to poison, but if this ever gets clarified to something that would *not* perfectly preserve poison, then I strongly assume there will be some way to make a copy that *does* perfectly preserve poison). However, I had to adjust `swap_nonoverlapping`; after `@scottmcm's` [recent changes](rust-lang#94212), that one (sometimes) made a typed copy. (Note that `mem::swap`, which works on mutable references, is unchanged. It is documented as "swapping the values at two mutable locations", which to me strongly indicates that it is indeed typed. It is also safe and can rely on `&mut T` pointing to a valid `T` as part of its safety invariant.) On top of adding a test (that will be run by Miri), this PR then also adjusts the documentation to indeed stably promise the untyped semantics. I assume this means the PR has to go through t-libs (and maybe t-lang?) FCP. Fixes rust-lang#63159
ptr::copy and ptr::swap are doing untyped copies The consensus in rust-lang/rust#63159 seemed to be that these operations should be "untyped", i.e., they should treat the data as raw bytes, should work when these bytes violate the validity invariant of `T`, and should exactly preserve the initialization state of the bytes that are being copied. This is already somewhat implied by the description of "copying/swapping size*N bytes" (rather than "N instances of `T`"). The implementations mostly already work that way (well, for LLVM's intrinsics the documentation is not precise enough to say what exactly happens to poison, but if this ever gets clarified to something that would *not* perfectly preserve poison, then I strongly assume there will be some way to make a copy that *does* perfectly preserve poison). However, I had to adjust `swap_nonoverlapping`; after ``@scottmcm's`` [recent changes](rust-lang/rust#94212), that one (sometimes) made a typed copy. (Note that `mem::swap`, which works on mutable references, is unchanged. It is documented as "swapping the values at two mutable locations", which to me strongly indicates that it is indeed typed. It is also safe and can rely on `&mut T` pointing to a valid `T` as part of its safety invariant.) On top of adding a test (that will be run by Miri), this PR then also adjusts the documentation to indeed stably promise the untyped semantics. I assume this means the PR has to go through t-libs (and maybe t-lang?) FCP. Fixes rust-lang/rust#63159
The following example (godbolt):
produces the following LLVM-IR:
and machine code:
where 128 bytes are copied every time a value of type A is moved/copied/read/...
However, one actually only has to copy a single byte, since all other bytes are trailing padding. The expected machine code is (godbolt):
cc @nikic @rkruppe
The text was updated successfully, but these errors were encountered: