Contract bytecode changes vastly when independent contracts added to the compiler #14250

kuzdogan · 2023-05-19T14:42:11Z

Summary

I came across a contract that could not be verified on Sourcify because the Sourcify's compilation output bytecode is different than the author's (Hardhat). Diving deeper I've found out the difference comes to the surface when some other contracts that are unrelated to the compilation target are added to compilation.

Specifically these two standard JSON inputs yield different bytecode for the same contract CompoundLens.sol:
CompoundLens-solc-input-Sourcify-bytecode.json.txt
CompoundLens-solc-input-Hardhat-bytecode.json.txt

The only input sources differences are the following which are unrelated to CompoundLens:

[
  '@openzeppelin/contracts/interfaces/draft-IERC1822.sol',
  '@openzeppelin/contracts/proxy/ERC1967/ERC1967Proxy.sol',
  '@openzeppelin/contracts/proxy/ERC1967/ERC1967Upgrade.sol',
  '@openzeppelin/contracts/proxy/Proxy.sol',
  '@openzeppelin/contracts/proxy/beacon/IBeacon.sol',
  '@openzeppelin/contracts/proxy/transparent/ProxyAdmin.sol',
  '@openzeppelin/contracts/proxy/transparent/TransparentUpgradeableProxy.sol',
  '@openzeppelin/contracts/utils/StorageSlot.sol'
]

You'll notice the bytecodes differ even when the metadata hashes are the same. This is unexpected as the different sources listed above are not related to CompoundLens. CompoundLens already compiles without those contracts are input in CompoundLens-solc-input-Sourcify-bytecode.json but when added in CompoundLens-solc-input-Hardhat-bytecode.json the contract's bytecode changes.

To reproduce

Using Solidity version v0.8.19

Output the bytecodes

Sourcify

cat CompoundLens-solc-input-Sourcify-bytecode.json.txt | solc --standard-json | jq '.contracts."contracts/Comptroller/CompoundLens.sol".CompoundLens.evm.bytecode.object' > CompoundLens-Sourcify-creation-bytecode.txt

Hardhat

cat CompoundLens-solc-input-Hardhat-bytecode.json.txt | solc --standard-json | jq '.contracts."contracts/Comptroller/CompoundLens.sol".CompoundLens.evm.bytecode.object' > CompoundLens-Hardhat-creation-bytecode.txt

Compare the bytecodes:

git diff --word-diff --word-diff-regex=. CompoundLens-Sourcify-creation-bytecode.txt CompoundLens-Hardhat-creation-bytecode.txt

Background

The contracts are on this Github repo (verify branch): https://github.com/meterio/sumer-project/tree/verify

To compile (and deploy) the original sources:

git clone https://github.com/meterio/sumer-project/
cd sumer-project
rm -rf dist
yarn
yarn compile
npx hardhat dcl --rpc http://rpctest.meter.io/ --pk $PRIV_KEY

The contract is also deployed at: https://goerli.etherscan.io/address/0x46df081108b2e2FDf1bF1E84Eeb2D7ec3AdA0061

The bytecode diff between the Sourcify output and Hardhat output was not in the metadata hash or in a certain recognizable pattern for me:

CompoundLens-hardhat-recompiled-creation.txt
CompoundLens-recompiled-creation.txt

To generate the diff: git diff --word-diff --word-diff-regex=. CompoundLens-hardhat-recompiled-creation.txt CompoundLens-recompiled-creation.txt

What I tried to do was to start from the standard JSON input of Sourcify and try to reach the Hardhat output bytecode in iterations:
initial Sourcify JSON input: CompoundLens-solc-input.json.txt
Hardhat JSON input: CompoundLens-hardhat-solc-input.json.txt

Tried using the full compilation settings from Hardhat.
Using the same settings in Hardhat in the initial Sourcify input didn't change the bytecode.
Tried using the whole sources from the Harhat input in the initial Sourcify input

Yes by using all of the sources from Hardhat, one generates the Hardhat's bytecode output.

Next, I iteratively copied sources from the Hardhat input to the Sourcify input to see adding which sources cause the change in the bytecode.

Sourcify's initial standard JSON input:

Hardhat's standard JSON input (clipped):

On each step I copied a contract that might be a potential cause of change, resolved the dependencies by also adding them, compiled the new iterated JSON input and compared the bytecodes.

Finally, I found a minimal standard JSON diff, that adding the specific sources would change the bytecode output. These are laid out in the above Summary section.

Environment

Compiler version: v0.8.19
Target EVM version (as per compiler settings): paris
Framework/IDE (e.g. Truffle or Remix): Hardhat and solc CLI
EVM execution environment / backend / blockchain client:
Operating system: MacOS

The text was updated successfully, but these errors were encountered:

ekpyron · 2023-06-07T15:43:12Z

For the record: we did found the cause and we'll fix this with the next release, but I'm not sure what to do for helping with verifying already deployed cases of this.

kuzdogan · 2023-06-07T17:09:44Z

If there is a way to recognize this, we can try to compile with everything provided (instead of just the sources in metadata) through the tooling, or warn the user about the issue on the UI. In fact this is what we do for ethereum/sourcify#618 and makes me realize this really is a similar case (and the Solidity bug #12281). I should've mentioned that but it didn't cross my mind 🤦

In that case, however, it is not possible to get a "perfect match" including the metadata, because the metadata will be changed.

ekpyron · 2023-06-12T12:35:37Z

I'm not sure there's a really easy way to recognize this. It's very similar to the older issues we had where additional source files affected compilation results, but a different place in the compiler that we missed back then.

The main cause of the difference is that the compiler chooses a different permutation of memory offsets for moving variables from stack to memory depending on some internal IDs that can change if you add more source files...

The problem, however, is, that the smallest offset we usually use for these variables is 0x80, for which a PUSH1 suffices, but we can easily end up with offsets that require two bytes, which results in a PUSH2, which will shift all offsets of all jumps...

So overall you will see constant pushes before jumps (i.e. offsets into bytecode) to change slightly - and you'll see constant offsets preceding memory loads and stores to change slightly.

So the pattern should be PUSH1/2 <someConstant> JUMP/JUMPI, PUSH1/2 <someConstant> MSTORE , PUSH1/2 <someConstant> MLOAD, in which you may get different constant values, while flipping between a PUSH1 and a PUSH2 can happen, which shifts all bytecode offsets around a bit (causing the first pattern involving jumps)... not sure if it's feasible to detect this easily...

ekpyron · 2023-07-11T10:49:40Z

So we can consider this fixed in newer compiler versions by #14311, right? Based on that I'm closing the issue, but feel free to reopen.

kuzdogan · 2023-07-11T16:04:56Z

Yep, we'll handle this ourselves. Thanks

cameel · 2023-12-22T13:31:12Z

@marcocastignoli

does the additional files content matter in the generated bytecode or it's just the presence of the files in the compilation? (so actually the additional files can also be empty but with the same identifier)

When a file is parsed, every AST node in it gets an ID. IDs are generated sequentially across all the input files as the compiler goes through them. As long as the files are processed in the same order and have the same content, you get the same IDs. So yeah, in case of bugs like this, where the compiler fails to make code generation completely independent of those IDs, changing the content in files unrelated to a contract may result in different bytecode for that contract.

To be more specific, here's what must be avoided if you want to get identical bytecode in presence of such bugs:

Adding or removing files:
- Directly, in the initial input (i.e. the names you give explicitly on the CLI or in sources in Standard JSON).
- Indirectly, via imports (i.e. files that are not in initial input, but imported by other files).
Reordering files:
- By renaming them (affects sorting order).
- By changing their paths (affects sorting order).
- By changing remappings, resulting in the compiler seeing different names and/or paths.
- By changing whether they are included or not included in the initial input (because imported files are processed as soon as they're discovered).
- By reordering imports inside a file or between files.
Adding or removing AST nodes:
- By changing file content, which may result in a different number of AST nodes being generated.

Note that:

Changing the order of files only in the initial input does not affect IDs. The compiler iterates over those files in alphabetical order.
Symlinks should not affect the order either. The compiler only takes into account the path and name of the symlink itself, as if it was the target file, ignoring the path of what it points to.
Even when a file is empty, at least one ID is dispensed - for the node that represents the root of its tree (i.e. SourceUnit).

kuzdogan added the bug 🐛 label May 19, 2023

ekpyron added this to the 0.8.21 milestone Jun 5, 2023

ekpyron mentioned this issue Jun 7, 2023

Deterministically choose memory slots for variables during stack-to-memory. #14311

Merged

kuzdogan mentioned this issue Jun 7, 2023

Revisit the match type in extra-file-input case ethereum/sourcify#1054

Open

This was referenced Jun 16, 2023

Different bytecode Hardhat vs Sourcify ethereum/sourcify#1065

Closed

Handle the Hardhat vs Sourcify different bytecode case ethereum/sourcify#1066

Closed

ekpyron closed this as completed Jul 11, 2023

kuzdogan mentioned this issue Aug 15, 2023

(v0.8.21) Contract code changes with additional files in compilation #14494

Closed

kuzdogan mentioned this issue Aug 12, 2024

[Milestone] Finish APIv2 design ethereum/sourcify#1470

Closed

kuzdogan mentioned this issue Dec 31, 2024

(v0.8.26) Contract code changes with additional files in compilation #15686

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contract bytecode changes vastly when independent contracts added to the compiler #14250

Contract bytecode changes vastly when independent contracts added to the compiler #14250

kuzdogan commented May 19, 2023

ekpyron commented Jun 7, 2023 •

edited

Loading

kuzdogan commented Jun 7, 2023

ekpyron commented Jun 12, 2023 •

edited

Loading

ekpyron commented Jul 11, 2023

kuzdogan commented Jul 11, 2023

cameel commented Dec 22, 2023 •

edited

Loading

Contract bytecode changes vastly when independent contracts added to the compiler #14250

Contract bytecode changes vastly when independent contracts added to the compiler #14250

Comments

kuzdogan commented May 19, 2023

Summary

To reproduce

Background

Environment

ekpyron commented Jun 7, 2023 • edited Loading

kuzdogan commented Jun 7, 2023

ekpyron commented Jun 12, 2023 • edited Loading

ekpyron commented Jul 11, 2023

kuzdogan commented Jul 11, 2023

cameel commented Dec 22, 2023 • edited Loading

ekpyron commented Jun 7, 2023 •

edited

Loading

ekpyron commented Jun 12, 2023 •

edited

Loading

cameel commented Dec 22, 2023 •

edited

Loading