-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What about: volatile, concurrency, and interaction with untrusted threads #152
Comments
I don't think we need to distinguish between internal and external threads of execution. A thread of execution is just that, and the concurrent semantics of volatile should be specified for those in general. Volatile loads and stores are not guaranteed to be atomic (and synchronize in any particular way), that is, using them to concurrently read/write to some memory could introduce a data-race. Suppose you have an Now, when people use volatile loads / stores to access IO registers, they exploit the platform-specific knowledge that the volatile load / store is atomic for that particular register or memory access size on that particular architecture. For example, when reading a temperature from a sensor by using a volatile load on a 32-bit register, the hardware often guarantees atomicity for memory loads, making it impossible to observe a partially modified result. Here you can get away with a volatile load because the toolchain and the hardware conspire together to avoid a data-race. We could document that whether volatile loads and stores are atomic and synchronize in any particular way for a particular memory addresses, memory access sizes, target, target features, target cpu, etc. is unspecified. That is, implementations are not required to document this behavior, and the volatile loads and stores are not guaranteed to be atomic. If misuse introduces a data-race, the behavior is undefined. AFAICT, what we cannot do is guarantee that what you are trying to accomplish always works (for all |
I think one additional thing which @hsivonen needs is for data races to produce sane effects ("garbage data" and that's it) when the race occurs on a "bag of bits" data type which has neither padding nor invalid bit patterns, such as It's a topic that comes up frequently in unsafe code discussions though, so perhaps there's already an UCG topic on this kind of matter that we can study for prior art. |
A related issue, which is mostly of theoretical interest in multi-process scenarios as current implementations don't really have another option than to do the right thing, is that an adversarial thread writing invalid data into the shared memory space (e.g. |
I think
[What about miri?] |
(Sorry for double post -) But I also think we should have a better answer for shared memory than In particular, as discussed in the internals thread, we may want to guarantee that the UB caused by races between atomic and non-atomic accesses, if the accesses are in different processes, only affects the process performing the non-atomic access. In other words, you can safely use atomic accesses on shared memory even if your communication partner might be malicious – at least with regards to that particular source of UB. That seems like a reasonable guarantee. On the other hand, there are other, more plausible ways that an architecture could hypothetically break this sort of IPC. For example, it could give each byte of memory an "initializedness" status, such that if process A writes uninitialized data to memory and process B reads it, process B gets an uninitialized value and traps if it tries to use it. (Note that Itanium does not do this; it tracks initializedness for registers, but not for memory.) |
In an ideal world, there would be a way for the untrusting thread to state "I know that I might be reading uninitialized or otherwise fishy memory, please let me do so and return arbitrary bytes on incorrect usage". Kind of like the However, I have no idea how to make that actually interact correctly with emulations of hardware that can track use of uninitialized memory but provide no way for software to announce voluntary access to uninitialized memory, such as valgrind. |
I find that using "emit exactly one load instruction" to denote potentially many actual hardware load instructions confusing. |
Feel free to improve the wording :) I mean of course that there is one load instruction per Though even that is slightly imprecise. The compiler can still inline or otherwise duplicate code, in which case a given binary could contain multiple locations corresponding to a single call to Perhaps it's better to say: Any given execution of |
This: #[no_mangle] pub unsafe fn foo(x: *mut [u8; 4096]) -> [u8; 4096] { x.read_volatile() } generates ~16000 instructions on godbolt. I don't know of any architecture in which this could actually be lowered to exactly one load instruction. |
I assume @comex would expect this to be written as a loop. |
@petrochenkov How would this be implemented? No matter how I look at this, I see a lot of issues. If we guarantee that One issue is that we can only emit this error at monomorphization time. I'm not sure how we could fix that. Another issue is that this would be a breaking change, but I suppose that we could either add new APIs and deprecate the old ones, or somehow emit this error only in newer editions (these operations are almost intrinsics). I wonder how the implementation would look like. AFAIK only the LLVM target backend can know to how many instructions the load lowers to, and requiring this kind of cooperation from all LLVM backends (and Cranelift) seems unrealistic. I suppose we could generates "tests" during compilation, in which we count the instructions, but that seems brittle. Then I wonder how this could work on Wasm |
Good point – volatile accesses of types that don't correspond to machine register types are somewhat ill-defined, AFAIK. But, e.g. #[no_mangle] pub unsafe fn foo(x: *mut u64) -> u64 { x.read_volatile() } should definitely be guaranteed to be a single load instruction on x86-64; same goes for smaller integer types. Of course there's limited room to make decisions here since we're largely at the mercy of LLVM, but I'd say the rule is roughly "if the 'obvious' way to translate this load is with a single instruction, it has to be a single instruction". Personally, I'd prefer if The rule gets less clear when SIMD gets involved. x86-64 can perform 128-bit loads via SIMD, and currently, a
On the other hand, a
I'd say this is defensible, because even if the architecture has some way to perform a 128-bit load, that's not the same as there being an 'obvious' way to load a 128-bit integer. On the other hand, with But in any case, it doesn't really matter whether a SIMD load is guaranteed, since |
I'd say that the 'architecture' here is Wasm, not whatever it's ultimately compiled into. Just as x86-64 has 128-bit load instructions that aren't guaranteed to be atomic, Wasm apparently doesn't guarantee that loads of any size are atomic, unless you use the special atomic instructions. But that's fine; it's already established that the set of guarantees provided depends on the architecture. However, Wasm is arguably a motivating use case for exposing additional intrinsics for loads and stores marked both |
That's exactly what I meant by "from the backend" - a target-specific LLVM backend, before that point no one knows about instruction specifics. |
I'm a little confused what the argument is about, but to be clear – for MMIO to work correctly, volatile loads and stores of basic integer types (specifically, ones that can fit into a register) must be done using a single load/store instruction of the correct width. So the behavior with those types needs to be a language-level guarantee. That should probably also apply to The behavior of |
(Speaking as one of the two "rust on the GBA" devs) Integer types: yes. Transparent types: absolutely must also be yes. For MMIO to be an approachable issue you have to be able to newtype all the integers involved. |
It seems to me that if you want to provide functionality that is specified as deferring to hardware semantics, and which is inherently architecture-dependent, then it is better to provide these through an explicitly architecture-dependent mechanism instead of calling it "implementation defined". For this, we have the |
It's unclear to me if you are using "atomic" and "data-race" colloquially or as specific memory model terms. For my use case, I don't need colloquial atomic: that is, I don't need indivisibility. In particular, I want to do lockless unaligned SIMD loads/stores, and I don't care if they tear in the presence of an adversarial second thread. However, I need "atomic" in the memory model sense that colloquially there is a data race but it must not be a "data race" for the purpose of "data races are UB": Just like relaxed atomics race in practice but that race is defined not to constitute a "data race" for the purpose of "data races are UB".
This is fine for my use case. My use case needs to read or write sensible values only in the single-threaded scenario. If there's another thread, the other thread is an error on the part of the unprivileged code and adversarial from the point of view of the privileged code, at which point I'm fine with the unprivileged code UBing itself and getting garbage results from the host service. I don't want it to UB the privileged host service code: the privileged code may see garbage values that are even inconsistent garbage between two reads from the same memory location, but there must not be optimizations that would introduce security bugs that were not in the source code if source code had no security bugs if every load behaved like a call to a random number generator. (An example of a compiler-introduced security bug would be an elision bound checks on the assumption that two loads from the same memory location yield consistent values.)
For clarity, I only intend to read types for which all bit patterns are valid values (and only on architectures that cannot track "uninitialized" bytes in RAM and yield some bit patterns for memory locations that are uninitialized for the purpose of high-level language semantics), and the thing I'm asking for is being able to turn off optimizations that could be dangerous in the presence of an adversarial other thread. AFAICT, this means that 1) the compiler must not use memory locations that the get written as spill space (i.e. must not invent reads that expect to read back previous writes intact) and 2) if the compiler generates two loads from the same memory location (either on its own initiative or because the source code showed two loads), the compiler must not assume that the two loads yield mutually-consistent values (i.e. the compiler must not optimize on the assumption that values read from the same memory location are mutually consistent). |
Volatile reads and writes are generic over Also, you just showed that even though x86 has 128-bit registers with atomic instructions, volatile reads and writes to u128 are not atomic. So we can't say that "If T is an integer and it fits in a register in the target, volatile reads / writes are atomic". In a 32-bit architecture, reads / writes to 64-bit integers might not be atomic, in a 16-bit architecture, reads / writes to 32-bit integers might not be atomic, etc. At best, because we only support platforms with CHAR_BITS == 8, we might be able to guarantee that 8-bit volatile reads/writes are atomic and relaxed everywhere.
The unsafe code guidelines specify what guarantees is unsafe code allowed to rely on. That is, if you write generic unsafe code, can it rely on volatile reads and writes of T being atomic and relaxed ? What if T = u64 ? AFAIK the answer to both question is "No, unsafe code cannot rely on that", so I don't know how we could guarantee something that users are not allowed to rely on. I still don't know how we can write anything better than (this):
This allows users that check what the backend does for a particular architecture to rely on that information, and if they mess up, the behavior is undefined. @comex mentions that we could guarantee that this works for "integer that fit in a register", but they showed above that this is not true, e.g., on x86, where u128 fits on many x86 registers (up to 512-bit wide), yet volatile loads and stores to u128 are not atomic relaxed. AFAICT, on a 32-bit arch 64-bit volatile loads/stores might not be atomic; on a16-bit arch, 32-bit loads/stores might not be atomic either, etc. Maybe at best we can guarantee that 8-bit wide volatile loads and stores are always atomic, by stating that we only will ever support platforms where this is the case, and if some platform does not satisfy this, we'll never support it. Please do suggest specific text about the guarantees that unsafe code is allowed to always rely on when working with volatile loads / stores. Talking about a concrete snippet of wording is IMO easier than talking on the "abstract", because one can more easily show counter-examples that prove the wording incorrect (e.g. |
If we make the behavior unspecified, and you know that |
Is there a concrete need to make them all the way "unspecified" as opposed to "may return unpredictable values if the memory locations are concurrently written to"?
I'm OK with receiving partially modified bytes. I'm just not OK with optimizations that would introduce security bugs in that case. As soon as there's a second thread writing to the memory that I'm reading, I no longer care about what values I read and only care about not having a security bug. |
Would the following be true given what LLVM provides (and has to keep providing for real-world C use cases)? My use case needs:
My use case doesn't need, but I gather the original purpose of
|
Unspecified just means that we don't specify what happens. How are "unpredictable values" any more specific than that? AFAICT "unpredictable" allows any value. Trying to be more specific here would probably require introducing a new atomic memory ordering weaker than relaxed (e.g. with support for word tearing due to concurrent writes, and specifying which values each word is allowed to take).
AFAIK all of these are guaranteed by the current specification of read/write volatile in Rust.
AFAIK the first two are not guaranteed, don't know about the third one. I don't know if " |
(I don't have time to read this exploding thread fully now, hopefully I'll get to it tonight. But please keep the discussion here focused on the interaction of volatile accesses and concurrency. Things like tearing and specifying the semantics of volatile while avoiding low-level concepts such as "load instructions" already have a topic at #33, let's not duplicate that discussion.) |
If It could perhaps be worded in terms of a single "memory access" rather than a single "instruction", but I don't see much difference; neither of those concepts can be defined without some reference to a machine model.
I believe that the correct definition is inherently architecture-specific. Better than "size of a register" is "size of a general-purpose register": that works for most architectures, but not all, e.g. Wasm doesn't even have a concept of a register. But it should be possible to establish rules like: "On x86_64, calls to This certainly shouldn't be unspecified, and I don't think it should even be implementation-defined per se, in the sense that some alternate backend could decide to behave differently. It should be required for any Rust implementation targeting x86_64. (At least, unless their interpretation of "x86_64" is something so weird and nonstandard that the rule somehow wouldn't make sense for that implementation.) |
Does any backend guarantee this (LLVM, GCC, or Cranelift)?
Allowing the behavior to change across implementations (targets, other backends, other toolchains) and being required to document the behavior reads like the definition of implementation defined.
Is this true for all x86_64 hardware? EDIT: I think so, but note that this is true independently of whether the load is volatile or not. |
I'm trying to capture that the values may be weird but there is no UB. I'm not sure what the implications of "unspecified", which I believe to be a special term, are.
They seem to be guaranteed by this C++ proposal, which I believe to try to capture the behavior of the LLVM intrinsics that Rust builds upon here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1382r0.pdf My only issue with that paper is that its item "i" says that certain volatile accesses don't constitute data races, which implies there are others that might. I'd be happier if volatile accesses categorically didn't constitute data races. (Other types of accesses to the same memory locations could still constitute data races resulting in asymmetric UB, but asymmetric UB is exactly what I'm looking for here.) |
Yes, LLVM does at least.
And then there's another paragraph that makes it extra clear that they intend you to be allowed to execute native width loads and stores as a single volatile instruction
|
On top of @comex's point that this is a no-go for stability reasons, In this case I don't think it's too difficult to fix that, because the reason for There's certainly a lot of wiggle room there around which optimizations you want to enable vs which actual side effects volatile operations will perform, but it hardly belongs in |
@Lokathor the text itself does not guarantee that volatile load / stores lower to a single atomic instruction, and instead only says:
When the "Rationale" section says that "volatile load / stores of primitive types with native hardware support lower to a single instruction" it is providing a guarantee that the text above does not provide. It still does not say that this instruction needs to be atomic (e.g. cannot tear), and I asked in #llvm@freenode today and was told that LLVM does not allow frontends to query whether a type is a primitive type with native hardware support for a particular target.
With @Centril's solution, we could add to
These intrinsics could guarantee that only a single instruction is emitted, whether the load/stores are atomic or they can tear, etc. Unsafe code could then safely rely on these guarantees, and we could provide implementations of these that are independent from what LLVM lowers volatile load/stores to (e.g. using inline assembly). The generic |
Yes, by "it" I meant the If process B truncates the file, there's nothing we can do about it, unless then OS gives some protection against shared memory file truncation. But otherwise, ti doesn't matter what values A reads from the |
Is it ok for them to not be synchronized? For example, consider the initial shared memory content is (len0, addr0):
I'm also not really sure what happens if one process reads a consistent state, e.g., (len0, addr0), and creates a When a process reads a (len, addr) pair, does each process validate addr, and len? E.g. a non-atomic write to addr from a different process could be split into ~4 writes (once per struct field), so a process doing an atomic read of addr can end up reading "garbage". |
At the moment, It's very important for correctness that shared memory files aren't closed or truncated while any process has an fd for it. All processes revalidate whenever they convert a shared address range into a reference. |
I think that's sound, but you probably want to perform a 128-bit atomic write instead (e.g. using
I think you can probably support But yeah, as long as your OS supports creating the file in such a way that in-place truncation is not allowed (and that you can test that this is the case, e.g., in case you open a file created by another process), then the API looks sound and super useful. Let me know if you need a mach implementation, should be possible for OSX as well. |
If Rust gives me an |
Yikes. Why is that still unstable? ...Apparently because it's only supported on a few targets and requires a target_feature on x86. But other atomics were stabilized, despite not being supported on all platforms, because they're supported on most of them. Weird situation. |
You don’t need to use Atomic types but atomic memory accesses. On the
platforms that support 128-Bit wide atomic accesses, Rust already provides
them, eg, in core::arch.
|
@comex Weird, I thought I saw a PR to stabilize that :/ |
Pre-Pre-RFC:
|
Here are some questions to help you refine this proposal:
|
@DemiMarie could you move that pre-pre-RFC to a new thread (as an issue here or on IRLO), please? This thread here is already long enough without discussing a new RFC. We should find a way to close this issue, but I am not sure how. Someone would have to properly summarize 120 comments... but, please don't add anything new to this issue! It is open just because we don't have the resources to write a proper summary. |
I'm working on a project where I'm running into this same question. I'm hoping I can summarize this thread and in the process check my understanding. Please let me know if any of the following is incomplete/incorrect! A Rust program that wants to share memory with untrusted thread(s) external to the program (e.g. a kernel accessing userspace memory) needs both volatile semantics and atomic semantics. Volatile semantics are necessary to ensure that memory accesses are not optimized away (i.e. interacting with externally shared memory needs to treated like any other IO). Atomic semantics are necessary to ensure that there are no data races. This is true even if we don't want/need to synchronize with the external thread(s). However, neither volatile or atomic semantics on their own are sufficient. Volatile reads and writes are not exempt from data races. Using them in this context would cause undefined behavior (unless there is control of the thread scheduling to prohibit concurrency). Additionally, atomic operations in theory could be optimized away because the compiler doesn't know we're working with externally shared memory. (At least, that seems to be why C++ decided to keep around their volatile atomics when deprecating volatile.) There is some evidence that using Since we don't currently have a mechanism in the language or any library that has the right combination of semantics, it seems that the most practical recommendation right now is to use inline assembly. Assuming we understand the hardware semantics, we can avoid data races and impose the right memory ordering by using the right instructions, and we're guaranteed that the memory accesses in the inline assembly won't be optimized away. In the future, we might adjust the semantics of |
The issue with atomics isn't about unknown threads of execution. I believe the consensus from #215 is that Rust cannot assume that external threads mutating our memory don't exist, so any optimizations rely on that are invalid. The issue is that the memory model says that if a data race occurs then the whole program is UB. It doesn't have a way to "scope" the UB to a specific thread when one thread is using atomic operations while another is using atomic operations. To interface with untrusted threads in shared memory, we would need an additional guarantee that if we only touch the shared memory with atomic operations then we can't get any UB on our side despite anything the untrusted thread does. |
@akiekintveld yes that sounds like a good summary! Also see #321 for a proposal for new volatile operations.
That is also a problem. But it is still true that neither volatile nor atomics alone are sufficient. Even if there are other threads in the Rust AM that doesn't mean the compiler has to preserve atomic accesses in a way that is suitably observable outside the Rust AM. (Though I cannot quite imagine optimizations that might break this, I still think that communication with code outside the Rust AM is different from communication with other threads in the Rust AM.) |
I suppose you could treat the untrusted thread as being outside the Rust AM (since it is compiled as separate program). At that point you can treat it as a series of assembly instructions for which all memory accesses are effectively atomic. |
The last update on this was two years ago. Has there been any improvement to the situation or are we still relegated to unportable inline assembly? What would be needed to push forward for a proper solution to this issue? I'm looking at the use case of shared memory IPC between a privileged process and a sandboxed process (both of which can have multiple threads). |
rust-lang/rfcs#3301 should be a sufficient solution to this: the untrusted code can be considered at the assembly level, where all reads and writes by individual instructions can be treated as relaxed atomic operations. You can then just use normal atomics for synchronization and |
Designate outside-of-program changes to memory accessed by volatile as non-UB
Context: An internals thread.
Use case: Rust is used to write privileged code (host services provided by a runtime environment to a JITed language, OS kernel providing syscall services to userland code, or a hypervisor providing emulated devices to a guest system) that needs to access memory that also unprivileged code can access and the unprivileged code can have multiple threads such that while unprivileged thread A has requested a service from the host such that the host service is running logically on A's thread of execution, a separate unprivileged thread of execution B could, if it is behaving badly, concurrently access the same memory from another CPU core. The unprivileged thread of execution must not be allowed to cause the privileged code written in Rust to experience UB. (It's fine for the unprivileged code to cause itself to experience UB within the bounds of its sandbox.)
The memory model itself is a whole-program model, so it doesn't apply, since in order to provide the guarantees it pledges to our thread, we must pledge the absence of data races from other threads of execution, which we can't do in this case. Hence, we need a way to access memory that is outside the memory model in the sense that there could exist an adversarial additional thread of execution that doesn't adhere to the DRF requirement. We're not trying to communicate with that thread of execution. The issue is just not letting it cause security bugs on us.
The C++ paper P1152R0 "Deprecating
volatile
" gives this use case as the very first item on its list of legitimate uses ofvolatile
in C and C++. This makes sense, since ifvolatile
works when external changes are caused by memory-mapped IO (the use case documented forstd::ptr::read_volatile
and the original use case motivating the existence ofvolatile
in C), given the codegen forvolatile
and codegen forrelaxed
atomics on architectures presently supported by Rust, it makes sense for it to also work also when external changes are caused by a rogue thread of execution.Yet, the documentation says: "a race between a
read_volatile
and any write operation to the same location is undefined behavior". I believe it's unnecessary and harmful to designate this as UB and it would be sufficient to merely say that the values returned byread_volatile
are unpredictable in that case. This makes sense in the light of an IO-like view of volatile: You need to be prepared to receive any byte from an IO stream, so not knowing at compile time what you are going to get does not have to be program-destroying UB if you are prepared to receive value not predicted at compile time.I suggest that a) the documentation be changed not to designate concurrent external modification of memory locations that a Rust program only accesses as
volatile
to be UB and b) to state in the Unsafe Code Guidelines that it's legitimate to use volatile accesses to access memory that a thread of execution external to the Rust program might change concurrently. That is, while you may read garbage, the optimizer won't assume that two volatile reads from the same location yield the same value and won't invent reads from memory locations written to using volatile writes (i.e. the memory locations are considered shared and, therefore, ineligible to be used as spill space by the compiler).Replies to the thread linked to above indicate that this should already be the case despite the documentation suggesting otherwise.
Also see #152 (comment) which tries to summarize the discussion-until-then a bit.
The text was updated successfully, but these errors were encountered: