-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validity of references: Memory-related properties #77
Comments
Notice that there is an alternative that would make validity not depend on memory, while maintaining the With this approach, the properties of |
Concerning the second question, my personal thinking is that we should not require the pointed-to memory to be valid itself. One good argument for making things UB with a strict validity invariant is bug-checking tools, but in this case actually doing recursive checking of references all the time is really costly, and probably makes it near impossible to actually develop useful tools. On the other hand, it is very useful when writing unsafe code to be able to pass around a Some examples: |
So @arielb1, for example, has traditionally maintained that having I am also intrigued by this comment from @RalfJung :
That seems like a very good property to have. I am inclined to pursue this approach, personally. |
I (personally) think that not considering fn generic<T>(foo: &T) {
// body
} it is way too easy to end up with something that will most likely indefinitely stay UB for
|
Could you come up with such an example - that is UB for |
@arielb1 I do not think I’m able to come up with an example (is there one?) where it would not be I realized since I last wrote the comment that, in order to obtain |
We talked about this at the all-hands. @cramertj expressed interest in fn foo<T>(x: &!) -> T { match x { } } Even in unsafe code, the match can never cause issues on its own, the reference would already be invalid and hence you'd have UB earlier. I believe we should handle all types consistently, meaning that if One issue with this is that this makes validity very hard to check for in a UB checker like Miri, or in a valgrind tool. You'd have to do a recursive walk following all the pointers. Also, it is unclear how much optimizations benefit from this (beyond removing dead code for |
I don't understand what you are saying here. Making |
Also one thing @Centril brought up at the all-hands: we need more data. In particular, we should figure out if there are interesting patterns of unsafe code that rely on having references to invalid data, and that would be too disruptive to convert to raw pointers or too widely used to break. |
One issue with requiring references to be transitively valid: we have a whole bunch of existing reference-based APIs, such as for slices, that we could then not use. I expect this to cause a lot of trouble with existing code, but I am not sure. Another proposal for references that enables @cramertj's optimizations could be: if reference's validity depends on memory in complex ways, we will need a notion of "bitstring validity". (Avoiding that is one argument for shallow validity, IMO.) We could define validity of a reference to require that the pointee is bitstring valid. This makes checking validity feasible and enables some optimizations. However, it would mean that |
Another data point is rust-lang/rfcs#2645 (FCP-merge) which theoretically will allow transmuting between I'm in favor of the validity invariant of |
@CAD97 that RFC is not necessary for the transmute you mentioned -- it is only needed if we want to pass things by value. In memory, |
Data point for a use-case where a reference to invalid data is useful: https://users.rust-lang.org/t/am-i-triggering-undefined-behavior-here/28364/10 tl;dr:
where Counterpoint: if |
@petertodd can you walk me through why do you think that |
"A byte array with the same size as type We don't have a story currently for uninitialized unsized data -- and given some of the plans for custom DSTs, it will not be possible to support that in general (namely, for "thin" DSTs). |
I don't follow. I don't see anything wrong with |
@gnzlbg the issue is in this function: impl<T> Bytes<T> {
pub fn from_slice(buf: &[u8]) -> &Bytes<T> {
assert_eq!(buf.len(), mem::size_of::<T>());
unsafe {
&*Self::from_ptr(buf.as_ptr() as *const T)
}
}
} I can use that to turn a |
That function should be unsafe, and then the caller needs to assert the validity of the |
If we don't make validity of references recursive (and my opinion still is that we should not :D ), then that code is just fine. |
@gnzlbg So the full use-case where I came up with that beast is in-place "deserialization", e.g. for memmapped files (which as @RalfJung correctly noted elsewhere have issues with other processes changing them, but for sake of argument assume that has been solved). As we know, for many types not all bit-patterns are valid. Thus we can have an API along the lines of:
The This is why Unsized TypesSo how do unsized types fit into all this? See https://users.rust-lang.org/t/am-i-triggering-undefined-behavior-here/28364/9 As @RalfJung correctly notes, it's not clear that you can always/will always be able to determine the size of an unsized type value from pointer metadata. However it's certainly possible to do this for a subset of types, such as slices. So I simply made a But Alternative SolutionWrap a pointer instead:
Which is fine for me - not quite as nice an API for what I'm doing, as I'll need |
Another case of libstd manifesting invalid references. |
See rust-lang/rust-memory-model#2 for some earlier discussion of basically the same subject. |
According to the current Rust Reference [1], storing an uninitialized `u8` is undefined behavior. This may change in the future [2], but for now we should continue to assume it is undefined behavior. Every use of `core::mem::uninitialized` in `ufmt` is to create a local `[u8; _]`, and therefore is an example of this undefined behavior. I removed the undefined behavior in the simplest way possible, which is to replace the initializers with `[u8; _]`. [1] https://doc.rust-lang.org/reference/behavior-considered-undefined.html [2] rust-lang/unsafe-code-guidelines#77
306: Remove uses of `core::mem::uninitialized` from `ufmt`. r=jrvanwhy a=jrvanwhy According to the current Rust Reference [1], storing an uninitialized `u8` is undefined behavior. This may change in the future [2], but for now we should continue to assume it is undefined behavior. Every use of `core::mem::uninitialized` in `ufmt` is to create a local `[u8; _]`, and therefore is an example of this undefined behavior. I removed the undefined behavior in the simplest way possible, which is to replace the initializers with `[u8; _]`. [1] https://doc.rust-lang.org/reference/behavior-considered-undefined.html [2] rust-lang/unsafe-code-guidelines#77 Co-authored-by: Johnathan Van Why <[email protected]>
interpret/validity: reject references to uninhabited types According to https://doc.rust-lang.org/reference/behavior-considered-undefined.html, this is definitely UB. And we can check this without actually looking up anything in memory, we just need the reference value and its type, making this a great candidate for a validity invariant IMO and my favorite resolution of rust-lang/unsafe-code-guidelines#77. With this PR, Miri with `-Zmiri-check-number-validity` implements all my preferred options for what the validity invariants of our types could be. :) CTFE has been doing recursive checking anyway, so this is backwards compatible but might change the error output. I will submit a PR with the new Miri tests soon. r? `@oli-obk`
I'm going to reproduce a comment I made over in #346 here, since I believe it is a new point in favor of requiring some amount of validity on
|
Indeed, that is the canonical example for having fully precise That said, I would find it odd to have some validity but not full validity. So if we want fully precise |
(Relaying from #346 (comment)) I think a general "check validity behind references" is not possible. Consider: // This function can be called with `rc` and `unique` pointing to
// overlapping regions of memory.
fn evil_alias<T>(rc: &RefCell<T>, unique: &mut T) {}
fn proof<T>(x: T) {
let rc = RefCell::new(x);
let mut bmut = rc.borrow_mut();
evil_alias(&rc, &mut *bmut);
} Now when |
This issue has had way too much discussion to still be useful. It is replaced by
|
Discussing the memory-related properties of references: does
&T
have to point to allocated memory (with at leastsize_of::<T>()
bytes being allocated)? If yes, does the memory have to contain data that satisfies the validity invariant ofT
?If the answer to both of these questions is "yes", one consequence is that
&!
is uninhabited: There is no valid reference of type&!
.Currently, during LLVM lowering, we add a "dereferencable" attribute to references, indicating that the answer to the first question should be "yes". This is a rather unique case in that this is the only case where validity depends on the contents of memory. This opens some new, interesting questions:
I mentioned above that
size_of::<T>()
many bytes need to be dereferencable. How do we handle unsized types? We could determine the size according to the metadata and the type of the unsized tail. For slices, that's really easy, but for trait objects this involves the vtable, so it would introduce yet another kind of dependy of validity on the memory. However, vtables must not be modified, and they never deallocated (right?), so this is a fairly weak form of dependency where if a pointer was a valid vtable pointer once, then it always will be.With more exotic forms of unsized types, this becomes less easy.
extern type
we can mostly ignore, we cannot even dynamically know their size so we basically can just assume it is 0, and check dereferencability for that. But what about custom DST? I don't think we want to make validity depend on executing arbitrary user-defined code. We could just check validity for the sized prefix of this unsized type, but that would introduce an inconsistency between primitive DST and user-defined custom DST. Is that a problem?For unsized types, even the requirement that the pointer be well-aligned becomes subtle because determining alignment has similar issues than determining the size.
What about validity of
ManuallyDrop<&T>
?ManuallyDrop<T>
certainly shares all the bit-level properties ofT
, because we perform layout optimization on it. But doesManuallyDrop<&T>
have to be dereferencable?Note that this is not about aliasing or provenance; those should be discussed separately -- a bunch of open issues already exist for provenance in general and stacked borrows specifically.
CURRENT STATE: The thread is long and there were many positions without a good summary. My own latest position can be found here.
The text was updated successfully, but these errors were encountered: