-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Our concurrency memory model (inherited from C++20) is incompatible with x86 #548
Comments
Having read the issue, I'm not sure that it's incompatible. It's just that it's universally miscompiled. The issue is that compiling it properly will make it a stupid amount slower (and I'm not sure C++ users will appreciate the perf regresssiom). Specifically, putting an mfence after each mov for a SeqCst load would be sufficient, as would using lock cmpxchg/cmpxchg8b on targets without SSE. Though this means that non-sse can't compile any SeqCst loads in a manner that is compatible with accessing read-only memory. |
This problem should not occur as LLVM can emit a |
Well, it is incompatible with the intended lowering. Changing the lowering is the wrong fix for this.
|
Well without changing our memory model or bugging WG-21, I'm not sure what the "right fix" would be. |
I like this comment from the report
I definitely agree that just defining the relevant orders and consistency axioms would be much better than trying to circumscribe them in standardese. Even if the definition is carried out in English, as long as its translation into formal syntax is clear, that seems much better than the current approach the C++ standard is taking.
It's very clear from the document I linked above that this is a mistake they made when translating the formal notation from the paper into plain English:
|
Do they know what the error is? |
Yes, please read the document linked in the OP. |
The OP doesn't seem to have a "Proposed resolution" |
No, but it describes the error.
The proposed resolution is to follow exactly what the paper suggests (minus "(po \cup rf) acyclic"). |
So "coherence-ordered" is supposed to be the same as the paper's "psc", but it's just written in standartese that has a completely different meaning? |
I don't know what the intended correspondence between paper and standard is, sorry. Maybe @orilahav can help with that. |
Reading the paper I completely understand why the person that translated the paper didn't like the "seqcst-order is inconsistent with happens before" concept. |
The consequences of memory model choices can be quite counter-intuitive. That's why we need formalizations of these models. Clearly we want to be able to compile atomics to x86 in the obvious way, and therefore we must accept this concept even if we don't like it. |
I'm actually not sure the "add barriers solution" would be that bad to Rust, which doesn't use SeqCst reads much. Tho "change the model" is probably better. |
I don't think we should make SC reads in Rust surprisingly more expensive than in C++. Also we have to consider compatibility with C++ whenever we share memory. |
If @orilahav is reading this, I can't figure out why your paper allows for removing dead (non-sc) reads. If you have this Figure-4-style example:
This should be compilable into the same as this, which is our target execution of WWmerge
But:
|
Thank you for pointing this out! I don't recall considering "dead load elimination" back in 2017 when we worked on the paper. I've thought about it now and arrived at some interesting conclusions, summarized in two parts below. I'm glad you looked into this! PART AI think that "dead load elimination" is unsound in any of the alternatives described in the paper, including the original C11 model. My counterexample is this one:
This behavior is disallowed. Is this something compilers do? If so, we need to report a cool new problem! @RalfJung This could be another example to test in Miri, make sure the example above is indeed forbidden. PART B@arielb1's example, however, shows a serious drawback. Since In fact, this reveals a mistake in our PLDI'17 paper! Although we did not study "dead load elimination", we claimed that the so-called "register promotion" (where a variable used solely in one thread is promoted to a "register") is sound. Upon revisiting the proof, the issue lies in "Lemma I.12" in the lengthy appendix. Unsurprisingly, I simply wrote that it "easily follows from our definitions". While the paper was never fully mechanized in a proof assistant, it has managed to survive more than seven years without anyone identifying a major mistake. Well done, Ariel! :D The implications of this mistake seem actually less significant. The only reason we introduced the With It seems to me that all this complexity arises from (super uncommon?) programs that mix SC and non-SC accesses to the same location. I believe things would be much more straightforward if such mixing weren't allowed: either a location is declared SC, meaning all its accesses are SC, or it is declared a standard atomic, allowing it to use |
I did a quick check on Godbolt, and it seems that current compilers don't remove dead relaxed-atomic loads in the most obvious case, but they do omit even seq-cst loads when they refer to "imaginary address" stack objects 0 (which can't cause this sort of problem since imaginary address objects can't be shared between threads). Compilers however certainly remove non-atomic loads and that is very important to performance, so I agree that just saying that sc stores are not mergable and relaxed loads are not deletable is probably the smartest fix. EDIT: the document at https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html says that "For example, removing a (non-volatile) atomic load whose value is not used seems uncontroversial". Still didn't find a place where a compiler does it.
Makes sense, except this would be very annoying from an API surface perspective. https://bugs.llvm.org/show_bug.cgi?id=37716#c6 has an example for why removing raced-on dead relaxed loads is unsound even without seqcst. |
Indeed, I saw it before and forgot about it. Thanks for recalling that example!
disallowed, but allowed if we remove This makes me wonder about an old idea about fixing the |
Isn't the behavior forbidden today with |
I mean, if it works it could be nice to have a model that allows for removing dead relaxed loads. |
Is this meaningfully different from the examples we already have? I'm getting a bot worried about the execution time of our test suite, these consistency tests are quite slow since they run many times to ensure we never see that behavior. ;) |
will investigate this further. It makes the definition of
I can imagine a language with three modes: non-atomics, strong atomics (with SC semantics), weak atomics (rel/acq/rlx). Non-experts use only non-atomics and strong atomics. Is this much worse than the current situation? |
The thing is that in Rust the default people use is weak atomics. Strong atomics (esp. SeqCst writes) are very rare. |
I generally recommend people should use rel/acq and not SC unless they have to. But my attempt at making it easier to program that way in Rust got rejected so 🤷 It's certainly not trivial to migrate Rust to a new set of atomic APIs, given that the current ones exist and will forever have to exist due to our backwards compatibility guarantees. Regarding the example, I implemented it in Miri and it does feel different. Also Miri behaves correctly. :) However, does it really need 3 variables? I didn't really understand what happens here. Having those examples without comments explaining why the bad execution is bad is unfortunately not very useful to me... |
I agree. For that reason we stopped working on SC accesses in other projects (promising semantics, program logics...). But, if one does want to have them in the language, better find a good way to forbid mixing.
I think it does need three variables. I will try to add more text later on :) |
I'm pretty sure there's a lot of Rust code out there that uses SC. And obviously we can't remove them from std. So sadly work that doesn't cover SC accesses doesn't apply to Rust. :( |
And what often enough happens is that there's an atomic that is used with weak access in the common case, but in some rare case that is not worth optimizing people get lazy and put |
See https://cplusplus.github.io/LWG/lwg-active.html#3941 for details. Thanks @orilahav for pointing this out to me.
The text was updated successfully, but these errors were encountered: