Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasm module files are not reproducibly buildable #865

Closed
tiziano88 opened this issue Apr 17, 2020 · 33 comments · Fixed by #1819
Closed

Wasm module files are not reproducibly buildable #865

tiziano88 opened this issue Apr 17, 2020 · 33 comments · Fixed by #1819
Assignees

Comments

@tiziano88
Copy link
Collaborator

It looks like some absolute file paths end up in the compiled wasm files:

./scripts/build_example -e aggregator
strings ./target/wasm32-unknown-unknown/release/aggregator.wasm | grep $USER

output:

Failed to parse request protobuf message/home/tzn/src/oak/sdk/rust/oak/src/grpc/mod.rs
aggregator::proto/home/tzn/src/oak/target/wasm32-unknown-unknown/release/build/aggregator-6c01ba62057834f7/out/oak.examples.aggregator.rs
buffer underflowStatusmessagecodedetails/home/tzn/.cargo/registry/src/github.conef.uk-1ecc6299db9ec823/log-0.4.8/src/lib.rs

blocks #861

@tiziano88
Copy link
Collaborator Author

The good news is that, at least on the same machine and under the same environment, the resulting modules are built reproducibly; see the following two invocations on GitHub Actions:

@tiziano88 tiziano88 added the P1 label May 12, 2020
@tiziano88
Copy link
Collaborator Author

This is now a blocker for #1066 and the aggregator example; for instance the aggregator Wasm module has a different hash on my machine than it does on cloud build, so it is not possible to use a single label to refer to it. Bumping up priority, @ipetr0v could you look into it?

cc @project-oak/core in case anyone has any suggestions

@tiziano88 tiziano88 added P0 and removed P1 labels Jun 3, 2020
@daviddrysdale
Copy link
Contributor

One nugget that may (or may not) be helpful – I was playing with Bloaty McBloatface the other day and it reports a bunch of .debug_* things for our release build Wasm files:

bloaty  --domain=file -n 50  ./target/wasm32-unknown-unknown/release/translator.wasm
    FILE SIZE   
 -------------- 
  32.0%   520Ki    .debug_str
  23.4%   380Ki    .debug_info
  17.1%   278Ki    .debug_line
  11.7%   189Ki    .debug_ranges
   8.4%   137Ki    .debug_pubnames
   4.2%  67.8Ki    Code
   1.3%  20.8Ki    name
   0.9%  14.4Ki    .debug_aranges
   0.7%  12.1Ki    Data
   0.2%  3.83Ki    .debug_abbrev
   0.0%     307    .debug_pubtypes
   0.0%     291    Function
   0.0%     184    Type
   0.0%     173    Element
   0.0%     124    Import
   0.0%      87    producers
   0.0%      66    Export
   0.0%      33    .debug_macinfo
   0.0%      27    Global
   0.0%       8    [WASM Header]
   0.0%       7    Table
   0.0%       5    Memory
 100.0%  1.59Mi    TOTAL

(WebAssembly allows custom sections to be included in the binary, and I think there are some particular custom sections for debug info which might be in use here.)

If (say) source file location and line number information is still being included in the binary output, then that might explain some of the differences. Maybe there's some WebAssembly equivalent of stripping debug info from .wasm binaries?

@tiziano88
Copy link
Collaborator Author

Indeed, I guess any log statement (at any level) ends up invoking the file! macro, which I think expands (at compile time) to the full path of the file (as an exercise, it is worth checking if just removing log statements fixes this issue).

@tiziano88
Copy link
Collaborator Author

I tried to rebuild the examples locally, and it turns out that if I build them from within our docker container, I do get bitwise identical artifacts (wasm files) to the ones on cloud build. So a minimal viable solution may be to just build them locally in Docker, although it is annoying. Realistically, I don't know if we have other options though.

@ipetr0v WDYT?

@tiziano88
Copy link
Collaborator Author

If file! only expanded to the last part of the file path, then perhaps it would be acceptable.

FWIW I'm not actually sure whether it expands to the full path or not.

For reference, I think this is the actual implementation of the built-in macro:

https://github.com/rust-lang/rust/blob/3a7dfda40a3e798bf086bd58cc7e5e09deb808b5/src/librustc_builtin_macros/source_util.rs#L52-L66

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 4, 2020

I'll look into it.

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 5, 2020

Looks like the Wasm file contains full paths (that include /opt/my-project directory from the Docker image):

$ strings ./examples/target/wasm32-unknown-unknown/release/aggregator.wasm|grep project
oak::grpc/opt/my-project/sdk/rust/oak/src/grpc/mod.rs
aggregator::proto/opt/my-project/examples/target/wasm32-unknown-unknown/release/build/aggregator-2d1afcb593626d14/out/oak.examples.aggregator.rsx
Failed to parse request protobuf message/opt/my-project/sdk/rust/oak/src/grpc/mod.rsP
oak_abi::grpc/opt/my-project/oak_abi/src/grpc/mod.rs
oak/opt/my-project/sdk/rust/oak/src/lib.rs
/opt/my-project/sdk/rust/oak/src/grpc/invocation.rs
Couldn't create a channel/opt/my-project/sdk/rust/oak/src/grpc/server.rsq
/opt/my-project/sdk/rust/oak/src/logger/mod.rs
/opt/my-project/sdk/rust/oak/src/lib.rs
failed to create channel/opt/my-project/sdk/rust/oak/src/grpc/client.rs

Which are not present, if I build it outside Docker.

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 5, 2020

Also they have completely different sizes:

1742915 Jun  5 13:13 aggregator_docker.wasm
1991830 Jun  5 13:12 aggregator_non_docker.wasm

@tiziano88
Copy link
Collaborator Author

It may just be that file paths are generally shorter in Docker, and that adds up quickly if there are a lot of such strings embedded.

We should probably disassemble the Wasm files and see whether the only difference are the strings.

If they are, I'm still not quite sure what to do about it though. Even if we avoid log! or file! macro invocations, any dependency still using them would still cause non determines across machines.

The obvious, but less than ideal, option would be to run examples from docker locally, but it would be unfortunate, especially given all the effort we put in making sure we can compile directly on the host.

Ideally we would either strip / trim those strings from the compiled wasm file (e.g. only keeping the last part of the file path), or find a rust compiler option to control how those macros get expanded.

@tiziano88
Copy link
Collaborator Author

@tiziano88
Copy link
Collaborator Author

BTW, is this also a problem for the oak_loader binary? I thought at some point I got the same hash locally inside and outside docker and also on cloud, but I might be wrong.

@tiziano88
Copy link
Collaborator Author

tiziano88 commented Jun 5, 2020

tiziano88 added a commit to tiziano88/oak that referenced this issue Jun 9, 2020
This is meant to reduce nondeterminism across machines or docker vs host, but it does not seem to fully
solve the problem yet (it does remove some paths from the resulting
binary, so I think it is necessary, but not sufficient).

Ref project-oak#865
tiziano88 added a commit to tiziano88/oak that referenced this issue Jun 9, 2020
This is meant to reduce nondeterminism across machines or docker vs host, but it does not seem to fully
solve the problem yet (it does remove some paths from the resulting
binary, so I think it is necessary, but not sufficient).

Ref project-oak#865
tiziano88 added a commit to tiziano88/oak that referenced this issue Jun 9, 2020
This is meant to reduce nondeterminism across machines or docker vs host, but it does not seem to fully
solve the problem yet (it does remove some paths from the resulting
binary, so I think it is necessary, but not sufficient).

Ref project-oak#865
@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 9, 2020

Also looks like it depends on where Cargo was installed:

$ strings translator_docker.wasm
...
/usr/local/cargo/toolchains/nightly-2020-04-17-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/macros/mod.rs
$ strings translator_non_docker.wasm
...
/rustc/7f3df5772439eee1c512ed2eb540beef1124d236/src/libcore/macros/mod.rs

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 9, 2020

It looks like RUSTUP_HOME and CARGO_HOME should also be remapped, but for some reason I don't have these variables outside Docker.
rust-lang/cargo#5505

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 9, 2020

Also, even using the #1118 - oak_loader still has different hashes:

$ sha256sum oak_loader_docker
50235e4b91f4d1eef9d1240843e57bb9a0834368037b28d5618dbec81c812436  oak_loader_docker
$ sha256sum oak_loader_non_docker
fb6fcc31a509fe0d3c67e5f251fa22eb7c2fd28056bfd8d61006a16523ebca9b  oak_loader_non_docker

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 9, 2020

Looking at oak_loader files through diffoscope. Looks like there are a lot of differences (offsets, data sections and even the code itself):

$ diffoscope oak_loader_docker oak_loader_non_docker
...
│ -  Start of section headers:          13969576 (bytes into file)
│ +  Start of section headers:          13969648 (bytes into file)
...
│ -  LOAD           0x56b000 0x000000000096b000 0x000000000096b000 0x25d964 0x25d964 R   0x1000
│ +  LOAD           0x56b000 0x000000000096b000 0x000000000096b000 0x25d844 0x25d844 R   0x1000
...
│ -  [ 5] .rodata           PROGBITS        000000000096b000 56b000 12b6cb 00   A  0   0 4096
│ -  [ 6] .eh_frame_hdr     PROGBITS        0000000000a966cc 6966cc 01fd34 00   A  0   0  4
│ -  [ 7] .eh_frame         X86_64_UNWIND   0000000000ab6400 6b6400 0bcda0 00   A  0   0  8
│ -  [ 8] .gcc_except_table PROGBITS        0000000000b731a0 7731a0 0557c4 00   A  0   0  4
│ +  [ 5] .rodata           PROGBITS        000000000096b000 56b000 12b5ab 00   A  0   0 4096
│ +  [ 6] .eh_frame_hdr     PROGBITS        0000000000a965ac 6965ac 01fd34 00   A  0   0  4
│ +  [ 7] .eh_frame         X86_64_UNWIND   0000000000ab62e0 6b62e0 0bcda0 00   A  0   0  8
│ +  [ 8] .gcc_except_table PROGBITS        0000000000b73080 773080 0557c4 00   A  0   0  4
...
│ -  908aa5:	48 8d 15 62 29 06 00 	lea    0x62962(%rip),%rdx        
│ +  908aa5:	48 8d 15 24 29 06 00 	lea    0x62924(%rip),%rdx
...
│ -  908b25:	48 8d 15 e2 28 06 00 	lea    0x628e2(%rip),%rdx        
│ +  908b25:	48 8d 15 a4 28 06 00 	lea    0x628a4(%rip),%rdx
...

Didn't include blobs of strings in the comment.

rustc version is the same:

$ rustc --version
rustc 1.44.0-nightly (7f3df5772 2020-04-16)

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 9, 2020

Wasm files also have quite a lot of differences:

$ diffoscope translator_docker.wasm translator_non_docker.wasm
...
│ -  (import "oak" "wait_on_channels" (func $_ZN7oak_abi16wait_on_channels17h44cbb68e62d56f63E (type 3)))
│ -  (import "oak" "channel_close" (func $_ZN7oak_abi13channel_close17h0d5c159e0895a88bE (type 5)))
│ -  (import "oak" "channel_read" (func $_ZN7oak_abi12channel_read17he8dbd198c190992cE (type 6)))
│ -  (import "oak" "channel_write" (func $_ZN7oak_abi13channel_write17hd5213fb43f5d6bedE (type 7)))
│ -  (import "oak" "channel_create" (func $_ZN7oak_abi14channel_create17h9ee4d98885b16a1fE (type 8)))
│ -  (import "oak" "node_create" (func $_ZN7oak_abi11node_create17h480de14fd8af379eE (type 9)))
│ -  (func $_ZN5alloc7raw_vec19RawVec$LT$T$C$A$GT$11allocate_in28_$u7b$$u7b$closure$u7d$$u7d$17h230a6a450136da49E.llvm.13606335029542800113 (type 2) (param i32 i32)
│ +  (import "oak" "wait_on_channels" (func $_ZN7oak_abi16wait_on_channels17h117ca4280590f60fE (type 3)))
│ +  (import "oak" "channel_close" (func $_ZN7oak_abi13channel_close17hb58a20fae360ffaeE (type 5)))
│ +  (import "oak" "channel_read" (func $_ZN7oak_abi12channel_read17h6981234c4ba5b819E (type 6)))
│ +  (import "oak" "channel_write" (func $_ZN7oak_abi13channel_write17h8a7a089b8d47ab9aE (type 7)))
│ +  (import "oak" "channel_create" (func $_ZN7oak_abi14channel_create17hcb1e4338624584c6E (type 8)))
│ +  (import "oak" "node_create" (func $_ZN7oak_abi11node_create17h747a785ace24ec5eE (type 9)))
│ +  (func $_ZN5alloc7raw_vec19RawVec$LT$T$C$A$GT$11allocate_in28_$u7b$$u7b$closure$u7d$$u7d$17h230a6a450136da49E.llvm.15154132222613204371 (type 2) (param i32 i32)
...
│ -        i32.load offset=1061092
│ +        i32.load offset=1061388
...
...
│ -        i32.const 1048712
│ +        i32.const 1048736
...
│ -            call $_ZN98_$LT$oak..error..OakError$u20$as$u20$core..convert..From$LT$oak_abi..proto..oak..OakStatus$GT$$GT$4from17haa4b790a612992aeE
│ +            call $_ZN98_$LT$oak..error..OakError$u20$as$u20$core..convert..From$LT$oak_abi..proto..oak..OakStatus$GT$$GT$4from17ha90f1534b6092945E
...
│ -    call $_ZN79_$LT$oak_abi..proto..google..rpc..Status$u20$as$u20$prost..message..Message$GT$10encode_raw17h9ae6be83c0777648E)
│ +    call $_ZN5prost8encoding13encode_varint17h4e86cc1ef1cd088fE.llvm.3247635912471923363
│ +    block  ;; label = @1
│ +      local.get 0
│ +      i32.eqz
│ +      br_if 0 (;@1;)
│ +      i64.const 10
│ +      local.get 2
│ +      call $_ZN5prost8encoding13encode_varint17h4e86cc1ef1cd088fE.llvm.15518371761998145577
│ +      local.get 0
│ +      i64.extend_i32_u
│ +      local.get 2
│ +      call $_ZN5prost8encoding13encode_varint17h4e86cc1ef1cd088fE.llvm.15518371761998145577
│ +      local.get 2
│ +      local.get 3
│ +      local.get 0
│ +      call $_ZN5alloc3vec12Vec$LT$T$GT$17extend_from_slice17h6534bec5364e4195E
│ +    end
...

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 9, 2020

So binaries have:

  • Different offsets (that are the reason for most of the differences)
  • Different strings that include CARGO_HOME paths

Wasm files have:

  • Different offsets
  • Different mangled function names
  • Different LLVM versions (may be the reason behind most of the differences)
  • Additional code
  • Different strings that include CARGO_HOME paths

Though a have the same LLVM version in both cases:

$ llvm-config --version
9.0.1

@tiziano88
Copy link
Collaborator Author

Let's focus on the Wasm diffs first, since we need those hashes to be consistent for IFC.

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 9, 2020

Apparently, when I add two similar flags

--remap-path-prefix=$CARGO_HOME=/ --remap-path-prefix=$PWD=/

Only the last one seems to have an effect on the binary.

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 10, 2020

Also, it looks like some of the panic! errors contain a non-existing (or temporary) directory /rustc/7f3df5772439eee1c512ed2eb540beef1124d236/src/libstd/macros.rs.
While Docker builds contain /usr/local/cargo/toolchains/nightly-2020-04-17-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/macros.rs in the same place in the binary.
So, the existence of such temporary directories may prohibit reproducible builds.

And the latest directory doesn't change, even if I add

--remap-path-prefix=/usr/local/cargo/toolchains/nightly-2020-04-17-x86_64-unknown-linux-gnu=rustc_sysroot/

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 17, 2020

Clearing rm -rf ~/.cargo/registry/* also doesn't help (the initial thought was that some libraries were compiled beforehand and didn't use new flags).

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 17, 2020

Similar problem: rust-lang/rust#66251

@tiziano88
Copy link
Collaborator Author

@ipetr0v what's the next step here? This is still blocking pretty much anything that relies on Wasm module labels (including the aggregator example). Should we just be building everything in Docker locally? Is that sufficient to get the same hashes that we get on GCB? That seems a step backwards, but if it's the only solution we may have to go with it (at least to compile the wasm modules).

@ipetr0v
Copy link
Contributor

ipetr0v commented Jun 19, 2020

It looks like for now it's better to stick with Docker. It's also good, because Docker image will have a fixed compiler and linker versions.

@tiziano88
Copy link
Collaborator Author

Note this means we cannot just cargo build from the host, when we are working on something. We would have to keep switching back and forth between Docker and host (or do everything in Docker).

If the linker and LLVM versions also affect the resulting Wasm, then I don't think there is much hope to reconcile them between Docker and host, but if it turns out it's just down to cargo and rustc, then perhaps there is some hope.

@daviddrysdale thoughts?

@tiziano88
Copy link
Collaborator Author

@ipetr0v I don't remember what the current status is, do we still get different hashes if running on Docker locally vs on Docker on GCB?

@ipetr0v
Copy link
Contributor

ipetr0v commented Nov 4, 2020

The rust-lang/rust#66251 is not fixed yet, so Wasm modules still have full paths in them.
We also haven't checked hashes locally vs in GCP in a while, because all Wasm modules are being downloaded from Google Storage.

@ipetr0v
Copy link
Contributor

ipetr0v commented Dec 7, 2020

So, the reason why Wasm modules were not reproducibly buildable is because we user local user as a user in Docker:

readonly DOCKER_USER="${USER:-root}"

It was necessary to avoid having root files in target directories.
And also because we we connecting existing cargo-cache that contained global paths:
"--volume=$PWD/cargo-cache:/usr/local/cargo/registry"

If I disable this behavior - I'm able to generate identical Wasm modules on different machines.

A side note: if I have 2 machines with the same $USER, generated files are almost identical. The only changes are:

--- (func $_ZN4core3ops8function6FnOnce9call_once17h5bd9657432c96844E.llvm.3465165121941308909 (type 1) (param i32 i32 i32)
+++ (func $_ZN4core3ops8function6FnOnce9call_once17h5bd9657432c96844E.llvm.6376263563138534193 (type 1) (param i32 i32 i32)
--- call $_ZN5prost8encoding13encode_varint17h16d700824a9373a0E.llvm.3465165121941308909
+++ call $_ZN5prost8encoding13encode_varint17h16d700824a9373a0E.llvm.6376263563138534193

Which is not a timestamp, because the same machine continuously generates the same Wasm modules.

@ipetr0v
Copy link
Contributor

ipetr0v commented Dec 7, 2020

So in order to have reproducible Wasm modules we might have to run Docker as root, or use a common $USER inside Docker.
In both cases we will have inaccessible files in target directories.

cc @tiziano88

@tiziano88
Copy link
Collaborator Author

Can we just rename the USER inside docker to something standard, but keep the UID of the user on the host?

I don't think the cargo cache actually gets shared between internal and external user anyways, so it should not matter that they have different names.

@daviddrysdale
Copy link
Contributor

BTW https://internals.rust-lang.org/t/crate-disambiguators-and-reproducibility/11659 might be relevant for the .llvm.<number> suffix thing.

ipetr0v added a commit that referenced this issue Mar 4, 2021
This change:
- Prevents Docker from depending on the username
- Makes Wasm modules reproducibly buildable inside Docker on different machines (by making rust library paths have the same user)
- Makes Google Cloud Builds fail, since they do not support Wasm reproducibility

Fixes #865
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants