-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust 1.54.0 optimized compilation overwrites stack variable, causing segfault #87947
Comments
Oh, I forgot to mention I can make the bug/crash go away by inserting a |
Using the The memory layout in nightly is the same as 1.54.0: |
Thanks for the pointer. I was able to do this. However, the diff is quite large and I'm not really sure what I'm looking at. I probably should have posted the disassembly from the crashing function (https://github.com/indygreg/PyOxidizer/blob/0ca3236bac944b63ea8506f273b064167e25f47b/pyembed/src/interpreter_config.rs#L585). Here's the working version in 1.53.0:
And the bad / 1.54.0 version:
There are clearly differences here. I'm not great at reading assembly, but the differences in the call to Here's the good version:
And the bad:
We clearly see the 2 |
cargo-bisect-rustc should be able to pin that down to a merge commit. https://github.com/rust-lang/cargo-bisect-rustc/blob/master/TUTORIAL.md |
I am able to reproduce the issue with Strangely,
If someone tells me how to get rustc to |
That PR introduces a MIR pass. Maybe try EDIT: It looks like that only MIR tests are modified, there are no new MIR passes. |
Oh, I realized that when building with cargo I need to use Also, thanks for that |
My earlier assertion about I'm able to reproduce the issue with I suspect this is an LLVM IR optimization bug because:
Give me time to bisect the exact pass, as there are a lot of them... If anyone has any tricks to make this easier, I'm all ears. |
I'm having difficulty getting reliable results bisecting the LLVM optimization passes. I suspect this is due to something non-deterministic in LLVM? Anyone have any pointers? Also, the bug goes away with |
Well, it looks like the rustc emits quite different LLVM IR between these 2 nightlies, even before it goes into LLVM optimization passes. nightly-2021-05-18:
nightly-2021-05-19:
Here's the command I used to build and capture LLVM IR:
|
I think that's expected given the |
I'm starting to make headway into reducing the test case. I'm able to reduce this to the following:
(This has been pushed to the What's interesting here is that if I call into a function defined in an This might indicate a bug in rustc LLVM IR emission. But I dunno. I'll try to reduce this further. |
I have reduced the reproduction case considerably and pushed it to a standalone repository: https://github.com/indygreg/rust-crash. As part of trying to reduce it further by eliminating the use of The Rust definition:
And the canonical C definition:
The omission of that Correcting the C type definition makes the crash go away. So this appears to be a bug in PyO3's FFI definitions, not a regression in the Rust compiler or LLVM. I'm a bit embarrassed this turned out to be improper Sorry for the noise. |
The field wasn't defined previously. And the enum wasn't defined as `[repr(C)]`. This missing field could result in memory corruption if a Rust-allocated `PyStatus` was passed to a Python API, which could perform an out-of-bounds write. In my code, the out-of-bounds write corrupted a variable on the stack, leading to a segfault due to illegal memory access. However, this crash only occurred on Rust 1.54! So I initially mis-attribted it as a compiler bug / regression. It appears that a low-level Rust change in 1.54.0 changed the LLVM IR in such a way to cause LLVM optimization passes to produce sufficiently different assembly code, tickling the crash. See rust-lang/rust#87947 if you want to see the wild goose chase I went on in Rust / LLVM land to potentially pin this on a compiler bug. Lessen learned: Rust crashes are almost certainly due to use of `unsafe`.
The field wasn't defined previously. And the enum wasn't defined as `[repr(C)]`. This missing field could result in memory corruption if a Rust-allocated `PyStatus` was passed to a Python API, which could perform an out-of-bounds write. In my code, the out-of-bounds write corrupted a variable on the stack, leading to a segfault due to illegal memory access. However, this crash only occurred on Rust 1.54! So I initially mis-attribted it as a compiler bug / regression. It appears that a low-level Rust change in 1.54.0 changed the LLVM IR in such a way to cause LLVM optimization passes to produce sufficiently different assembly code, tickling the crash. See rust-lang/rust#87947 if you want to see the wild goose chase I went on in Rust / LLVM land to potentially pin this on a compiler bug. Lessen learned: Rust crashes are almost certainly due to use of `unsafe`.
The field wasn't defined previously. And the enum wasn't defined as `[repr(C)]`. This missing field could result in memory corruption if a Rust-allocated `PyStatus` was passed to a Python API, which could perform an out-of-bounds write. In my code, the out-of-bounds write corrupted a variable on the stack, leading to a segfault due to illegal memory access. However, this crash only occurred on Rust 1.54! So I initially mis-attribted it as a compiler bug / regression. It appears that a low-level Rust change in 1.54.0 changed the LLVM IR in such a way to cause LLVM optimization passes to produce sufficiently different assembly code, tickling the crash. See rust-lang/rust#87947 if you want to see the wild goose chase I went on in Rust / LLVM land to potentially pin this on a compiler bug. Lessen learned: Rust crashes are almost certainly due to use of `unsafe`.
I am able to reliably reproduce a crash on x86_64-unknown-linux-gnu when building in
--release
mode with 1.54.0. The same code and build configuration works properly on 1.53.0.I'm not a great assembly-level debugger, but I believe the compiler is confused about register/variable aliasing and this effectively leads to corruption of a variable on the stack, which leads to a segfault.
Rust 1.54.0 Analysis
We're about to return from
_PyArgv_AsWstrList
(from the CPython 3.9 internals):And here's the last few frame pointers.
We eventually crash in frame 4. So let's look at it's state:
Here are the important details:
The instructions we're about to execute are:
self
in frame 4 is0x7fffffffcff0
, which appears to be inhabitingrbx
.The value of
r15
is0x7fffffffc600
. This memory address is close to frame 4's arglist and locals (0x7fffffffc648
). The value returned by this function is propagating through the intermediate frames to frame 4, which operates on it. So the compiler likely optimized things to a variable write directly into frame 4's locals space. Cool.Before we execute that
movups %xmm0,0x10(%r15)
, here is the memory we're operating on:And after:
The change there is:
So the 16 bytes between
0x7fffffffc610-0x7fffffffc61f
got cleared out. Makes sense: that's what the instructions told it to do.However,
0x7fffffffc618
contained the saved register value (0x7fffffffcff0
) forrbx
, holding the address ofself
. This value is now zeroed.We confirm this from GDB:
Note that
rbx
is0x0
andself=0x0
.Several instructions later, we return to frame 4 and execute assembly corresponding to the Rust code
if self.exe.is_none() {
. Sinceself
is NULL, we get a segfault.Rust 1.53.0 Analysis
Compiling the same source code with Rust 1.53.0, things look similar:
This time
r15
is0x7fffffffc500
.And frame 4's arglist is at
0x7fffffffc548
. The offsets are the same here. So the assembly changes the same relative bytes.But, the starting bytes that are nulled out start out as NULL, so the
movups
is effectively a no-op:However, this version does not crash because
self
is not overwritten because its address isn't stored inrbx
. The address ofself
(0x7fffffffcff0
) is stored inr12
andr15
. (I'm unsure how the memory address forself
is calculated here.)Hypothesis
This smells like a compiler optimization bug. Rust 1.54.0 seems to be emitting assembly that aliases 2 variables/registers to the same memory address.
Steps to Reproduce
In a debugger, try setting a breakpoint at
preconfig.c:109
. This should stop just beforeself
gets overwritten.The Rust function call from the crashing frame is https://github.com/indygreg/PyOxidizer/blob/0ca3236bac944b63ea8506f273b064167e25f47b/pyembed/src/interpreter_config.rs#L592. This calls into another Rust function which calls into CPython C APIs. The crash occurs at https://github.com/indygreg/PyOxidizer/blob/0ca3236bac944b63ea8506f273b064167e25f47b/pyembed/src/interpreter_config.rs#L595 when dereferencing a NULL
self
.The text was updated successfully, but these errors were encountered: