Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang vs wasm32-{emscripten,wasi} rustc C ABI mismatch w.r.t. "singleton" unions #121408

Open
hanna-kruppe opened this issue Feb 21, 2024 · 10 comments
Labels
A-ABI Area: Concerning the application binary interface (ABI) C-bug Category: This is a bug. I-miscompile Issue: Correct Rust code lowers to incorrect machine code I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-wasm Target: WASM (WebAssembly), http://webassembly.org/ P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@hanna-kruppe
Copy link
Contributor

On wasm32-unknown-emscripten and wasm32-wasi, rustc implements the C ABI for some unions incorrectly, i.e., different from Clang. Minimized example:

#[repr(C)]
pub union U {
    a: u32,
    b: u32,
}

#[no_mangle]
pub extern "C" fn unwrap_union(u: U) -> u32 {
    unsafe { u.a }
}

#[no_mangle]
pub extern "C" fn make_union() -> U {
    U { a: 0 }
}

I expected to see this happen: the resulting wasm code should pass and return the union indirectly, i.e. by pointers, as described in the C ABI document and implemented in Clang (compiler explorer).

Instead, this happened: the union is passed and returned as a single scalar (i32). See the previous compiler explorer link, and I also see it locally for wasm32-wasi (too lazy to install a whole emscripten toolchain):

$ rustc +nightly -O cabi-union.rs --crate-type=cdylib --target wasm32-wasi
$ wasm-tools print cabi_union.wasm
(module $cabi_union.wasm
  (type (;0;) (func (param i32) (result i32)))
  (type (;1;) (func (result i32)))
  (type (;2;) (func))
  (func $unwrap_union (;0;) (type 0) (param i32) (result i32)
    local.get 0
  )
  (func $make_union (;1;) (type 1) (result i32)
    i32.const 0
  )
  (func $dummy (;2;) (type 2))
  (func $__wasm_call_dtors (;3;) (type 2)
    call $dummy
    call $dummy
  )
  (func $unwrap_union.command_export (;4;) (type 0) (param i32) (result i32)
    local.get 0
    call $unwrap_union
    call $__wasm_call_dtors
  )
  (func $make_union.command_export (;5;) (type 1) (result i32)
    call $make_union
    call $__wasm_call_dtors
  )
  (table (;0;) 1 1 funcref)
  (memory (;0;) 16)
  (global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
  (export "memory" (memory 0))
  (export "unwrap_union" (func $unwrap_union.command_export))
  (export "make_union" (func $make_union.command_export))
  (@producers
    (language "Rust" "")
    (processed-by "rustc" "1.78.0-nightly (2bf78d12d 2024-02-18)")
    (processed-by "clang" "16.0.4 (https://github.com/llvm/llvm-project ae42196bc493ffe877a7e3dff8be32035dea4d07)")
  )
)

The definition of "singleton" union in the C ABI document ("recursively contains just a single scalar value") may be considered ambiguous, but clearly Clang interprets it differently from rustc, so something will have to give. I have not tried to exhaustively explore in which cases they differ, the above example may not be the only one.

Compare and contrast #71871 - as discussed there, the emscripten and wasi targets have long since been fixed to match Clang's ABI, with only wasm32-unknown-unknown lagging behind. However, it seems that the fixed C ABI on emscripten and wasi targets is still incorrect in some cases around unions.

cc @curiousdannii, who encountered this in a real project (rust-lang/cc-rs#954)

Meta

rustc +nightly --version --verbose:

rustc 1.78.0-nightly (2bf78d12d 2024-02-18)
binary: rustc
commit-hash: 2bf78d12d33ae02d10010309a0d85dd04e7cff72
commit-date: 2024-02-18
host: x86_64-unknown-linux-gnu
release: 1.78.0-nightly
LLVM version: 18.1.0

(Also happens on 1.76 stable.)

@hanna-kruppe hanna-kruppe added the C-bug Category: This is a bug. label Feb 21, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 21, 2024
@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-ABI Area: Concerning the application binary interface (ABI) labels Feb 21, 2024
@Noratrieb Noratrieb added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-wasm Target: WASM (WebAssembly), http://webassembly.org/ and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Feb 21, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 21, 2024
@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Feb 22, 2024
@curiousdannii
Copy link

I've run into this problem again, but this time it's a C callback that's returning the union. As I don't control it I can't change its signature to manually match the ABI. I think I'll have to move more processing code into my C support library.

@hanna-kruppe
Copy link
Contributor Author

hanna-kruppe commented Mar 4, 2024

A possible workaround might be to use a technically incorrect function signature on the Rust side that is lowered to the correct ABI on wasm targets by current (and future) rustc - pretending the callbacks get an out-pointer instead of returning by value / receive arguments by reference instead of by value. Pretty ugly and wasm-specific, but maybe less ugly than additional C code that is pointless on non-wasm platforms?

@curiousdannii
Copy link

curiousdannii commented Mar 4, 2024

@hanna-kruppe Hmm, that might work, but it would mean I'd need to add platform conditional code in a number of places, vs a small C shim which works everywhere.

Edit: I'm having trouble even getting a shim working. It works fine when compiled to x86_64, but I can't get it to work in wasm - it's like the DispatchRock is having its value modified.

@hanna-kruppe On the chance that you have any time to take a look, I pushed my broken code to https://github.com/curiousdannii/emglken/tree/remglk_rs_broken_unions

This builds and runs glulxe ``` ./src/build.sh ./bin/emglken.js tests/glulxercise.ulx ``` Then enter anything. You should see something like this (60664 turns into 60684 and then into 54940)
gidispatch_call_array_register 0xef0e0 &+#!Cn 60664 0xed1c
retain_array 0xef0e0 0xed1c 977032
print_disprock 60684

unretain_array 0xef0e0 977032
gidispatch_call_array_unregister 0xef0e0 &+#!Cn 54940

Glulxe fatal error: Mismatched array reference in Glk call.

Where as in x86_64 it works (280594112 is retained through it all):

gidispatch_call_array_register 0x55f910b98d60 &+#!Cn 280594112 0x7ffeb52fd1d0
retain_array 0x55f910b98d60 0x7ffeb52fd1d0 94528215811776
print_disprock 280594112
unretain_array 0x55f910b98d60 94528215811776
gidispatch_call_array_unregister 0x55f910b98d60 &+#!Cn 280594112

@curiousdannii
Copy link

curiousdannii commented Mar 5, 2024

Hmm, just found a potential solution: add an dummy union variant on the rust side that is 64 bits. I think that means it won't be considered a singleton union, but C just ignores the extra word. I might be able to remove all my manual shimming this way?

Yep, that seems to work perfectly! And when this bug eventually gets fixed, all I'll need to do is remove the dummy variant. Much cleaner. :)

curiousdannii added a commit to curiousdannii/remglk-rs that referenced this issue Mar 5, 2024
…to the DispatchRock union so that it won't be considered a singleton union
@hanna-kruppe
Copy link
Contributor Author

That's pretty risky. There may be targets where it causes Rust and C to disagree on whether the union should be passed/returned in memory. More importantly, both sides will now disagree on how large the type is on targets with 32 bit pointers. Like other ABI mismatches, that may not cause breakage immediately will fail in very spectacular ways sooner or later. For example, when Rust returns the union to C via an out-pointer, either because the source code is written like that or because that's the ABI lowering for returning by value, Rust may write a full eight bytes while the C side only reserved four bytes. Adjacent data in the stack or elsewhere will then be clobbered. In simple cases, returning the union may be compiled down to storing four bytes because it's obvious to the optimizer that only a four-byte field is initialized (that's why I had to add -Zmir-opt-level=0 to the linked example). But you shouldn't rely on that because you'll constantly be one refactoring or compiler update away a very "fun" debugging session.

@hanna-kruppe
Copy link
Contributor Author

Here's another example that exhibits the problem even when compiled with optimizations, simplified from code in the linked commit. Because the union value is taken loaded from memory, not constructed in-place, it's always returned by copying eight bytes, so I think you already have the bug I predicted in my last comment.

@curiousdannii
Copy link

curiousdannii commented Mar 5, 2024

If I also added the dummy variant to the C union that would negate the risk, right? Or do you think the compiler is smart enough to optimise that away? It wouldn't optimise it away on both sides?

I could also make my library write to the dummy variant if that would help it prevent being optimised away.

@hanna-kruppe
Copy link
Contributor Author

hanna-kruppe commented Mar 5, 2024

Both C and Rust will follow the type layout and ABI rules for the type you've written down (modulo bugs such as this one and assuming repr(C) on the Rust definition). Problems happen because the type definitions, and hence the layout / ABI, differs. Compiler optimizations are not really relevant, they only affect how and when the problems manifest. If you also change the C side to have it work with the same union type as Rust, with the extra variant, that ABI mismatch indeed disappears. I guess in your particular case that's quite feasible, since you're in control of all the relevant C code including the header file in question.

@hanna-kruppe
Copy link
Contributor Author

Some notes from looking into this today (updating my atrophied knowledge of rustc's layout/ABI internals along the way):

  • Clang does not seem to implement a strictly lexical "there is only one member in the aggregate" rule: at least fields of type struct Empty {} and empty arrays are ignored for the purpose of this rule.
  • With that in mind, I guess rustc's check for "singleton aggregates" is probably correct for structs. It also seems to be correct (or at least not broken in any way I can demonstrate) for unions ultimately involving structs with more than one non-ZST field.
  • However, the way that check is written goes awry for unions: it will consider any aggregate "singleton" if all the variants are homogeneous aggregates of some "unit" that's the same size as the union as a whole. Because of how TyAndLayout::homogeneous_aggregate is defined, this applies if all non-ZST union fields ultimately boil down to the same Reg, e.g.:
    • All pointers and pointer-sized integers map to the same Reg, that's why Dannii ran into this with union { u32, *const _ }.
    • It recursively pierces through anything newtype-ish (ignoring 1ZST fields as usual)
    • Any niche optimizations can play into this as well, e.g. Option<&T> is "the same" as usize.
    • However, integers of different sizes are always distinguished, so e.g., union { u32, u8 } is not considered singleton even though a by-value u8 argument would be promoted to u32 on wasm.
  • Enums take a similar path through the code as unions, but it's more difficult to provoke an ABI mismatch here:
    • Enums without fields are just correctly treated like scalars.
    • Enums with any non-ZST fields will generally be larger than that field or the discriminant, and thus fail the above check, modulo layout optimizations that get disabled by repr(C), repr(int), and repr(C, int).
    • Even when ZST fields in variants are involved, it seems that everything works out for repr(C) enums, at least in simple cases, because Clang also ignores some ZST union members.
    • However, with ZST fields in repr(int) enums, as well as repr(C, int), there is an ABI mismatch vs. the C equivalent. The the layout of such enums is union { Variant1, Variant2, ... } where each variant is a struct with at a field for the discriminant. Clang doesn't consider that a singleton union, even if the structs are otherwise identical for ABI purposes, but rustc will happily boil down each variant to its discriminant if there's no other fields or the other fields are 1ZSTs.

All of this makes me think there's probably a chance for a quick and dirty fix by just special casing unions somehow. At least, if nobody cares to delve further into the details of how Clang handles empty structs and arrays in all cases. I'm not itching to write a patch, though, at least not until #119183 is settled.

@workingjubilee workingjubilee added I-miscompile Issue: Correct Rust code lowers to incorrect machine code I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness and removed I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ABI Area: Concerning the application binary interface (ABI) C-bug Category: This is a bug. I-miscompile Issue: Correct Rust code lowers to incorrect machine code I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-wasm Target: WASM (WebAssembly), http://webassembly.org/ P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants