Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build host architecture is written into metadata hash for compilation units, breaking reproducibility of output between host platforms #13922

Open
kanavin opened this issue May 16, 2024 · 8 comments
Labels
A-cross-compiling Area: using --target flag for other platforms A-rebuild-detection Area: rebuild detection and fingerprinting A-reproducibility Area: reproducible / deterministic builds C-bug Category: bug S-triage Status: This issue is waiting on initial triage.

Comments

@kanavin
Copy link

kanavin commented May 16, 2024

Problem

hash_rustc_version() writes rustc().verbose_version into the hash. That data has one problematic field:

host: x86_64-unknown-linux-gnu
or
host: aarch64-unknown-linux-gnu

Due to this, when one is using a mix of x86 and aarch build hosts with exactly same rust compiler and building for the same cross-target, the output becomes non-reproducible, and differs between the two; even the file names become different.

Steps

This requries two build hosts that have a different architecture, e.g. x86 and aarch, and running an identical cross-compile on them, using exact same host rust compiler. The output is going to be different (even in the filenames if those include hashes), even though it should be the same.

Possible Solution(s)

The situation occurred in the context of Yocto project builds, where we have a cluster of build machines (a mix of x86 and arm64), the rust version in use is tightly controlled, and we expect the output to be the same (and check for it). The quick and hacky solution was to patch out the problematic lines in cargo, as shown in the attached patch, but perhaps a better option would be to not write the host architecture into the hash when the build is a cross one.

From 065d7c263091118437465d714d8a29dbb6296921 Mon Sep 17 00:00:00 2001
From: Alexander Kanavin <alex@linutronix.de>
Date: Mon, 13 May 2024 14:57:54 +0200
Subject: [PATCH] cargo: do not write host information into compilation unit
 hashes

This breaks reproducibility in cross-builds where the cross-target
can be the same, but build hosts are different, as seen with
"rustc --version -v":
...
host: x86_64-unknown-linux-gnu

vs.

host: aarch64-unknown-linux-gnu

This can possibly be improved by only hashing host info if the build
is a native one (e.g. there's no --target option passed to cargo
invocation) but I'm not sure how.

Signed-off-by: Alexander Kanavin <alex@linutronix.de>
---
 .../src/cargo/core/compiler/context/compilation_files.rs      | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/tools/cargo/src/cargo/core/compiler/context/compilation_files.rs b/src/tools/cargo/src/cargo/core/compiler/context/compilation_files.rs
index d83dbf10c..b2ad8d9f3 100644
--- a/src/tools/cargo/src/cargo/core/compiler/context/compilation_files.rs
+++ b/src/tools/cargo/src/cargo/core/compiler/context/compilation_files.rs
@@ -652,7 +652,7 @@ fn hash_rustc_version(bcx: &BuildContext<'_, '_>, hasher: &mut StableHasher) {
     if vers.pre.is_empty() || bcx.config.cli_unstable().separate_nightlies {
         // For stable, keep the artifacts separate. This helps if someone is
         // testing multiple versions, to avoid recompiles.
-        bcx.rustc().verbose_version.hash(hasher);
+        //bcx.rustc().verbose_version.hash(hasher);
         return;
     }
     // On "nightly"/"beta"/"dev"/etc, keep each "channel" separate. Don't hash
@@ -665,7 +665,7 @@ fn hash_rustc_version(bcx: &BuildContext<'_, '_>, hasher: &mut StableHasher) {
     // Keep "host" since some people switch hosts to implicitly change
     // targets, (like gnu vs musl or gnu vs msvc). In the future, we may want
     // to consider hashing `unit.kind.short_name()` instead.
-    bcx.rustc().host.hash(hasher);
+    //bcx.rustc().host.hash(hasher);
     // None of the other lines are important. Currently they are:
     // binary: rustc  <-- or "rustdoc"
     // commit-hash: 38114ff16e7856f98b2b4be7ab4cd29b38bed59a
-- 
2.39.2

Notes

No response

Version

See this piece of code:

bcx.rustc().verbose_version.hash(hasher);

@kanavin kanavin added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels May 16, 2024
@epage epage changed the title Build host architecture is written into metadata hash for compilation units, breaking reproducibility of output Build host architecture is written into metadata hash for compilation units, breaking reproducibility of output between host platforms May 16, 2024
@epage epage added A-rebuild-detection Area: rebuild detection and fingerprinting A-reproducibility Area: reproducible / deterministic builds labels May 16, 2024
@epage
Copy link
Contributor

epage commented May 16, 2024

Made a slight tweak to the title as this does not prevent any build reproducibility but more specifically reproducibility between host platforms.

As there is still reproducibility within a host platform and this has been this way for a while, this falls below other items in my priority list. If someone would want to drive moving this forward, it would be good to research why we use the verbose version in the hash.

@weihanglo weihanglo added the A-cross-compiling Area: using --target flag for other platforms label May 16, 2024
@kanavin
Copy link
Author

kanavin commented May 16, 2024

This is the original commit:
fee7c68

As far as I understand there was never any explicit decision to use the verbose version vs concise version, it was considered simply 'the version' at the time of the commit. Maybe it had gained all that additional metadata including the host later?

I still think that if --target is explicitly passed in, then the 'host:' in that output is safe to be ignored.

@ehuss
Copy link
Contributor

ehuss commented May 16, 2024

This seems like a duplicate of #8140? Is there something different here, or can it be closed?

@kanavin
Copy link
Author

kanavin commented May 16, 2024

I think the core issue is the same, although the title of #8140 makes it look like ' -C metadata=hash' is the non-reproducible bit, and discussion went on all sorts of tangents around that. The issue is specifically the host in verbose_version when cross-compiling, not the computation of the hash as a whole.

@kanavin
Copy link
Author

kanavin commented May 16, 2024

So I would not want to close this, as the issue of whether to pass ' -C metadata=hash' at all is actually different than the issue of putting the 'host:' into it.

@ehuss
Copy link
Contributor

ehuss commented May 16, 2024

Hm, I'm not quite following how it is different. We can reword the title if that helps. That issue is specifically about having host: in the hash, which affects reproducibility.

I don't think there was any proposal to change whether or not to pass -C metadata. There is a discussion about how to remove host: from that hash, and still keep everything working. That is the tricky part.

@kanavin
Copy link
Author

kanavin commented May 16, 2024

If you reword the title of #8140 so it's explicitly about host: in the hash, then I'm fine with closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cross-compiling Area: using --target flag for other platforms A-rebuild-detection Area: rebuild detection and fingerprinting A-reproducibility Area: reproducible / deterministic builds C-bug Category: bug S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests

4 participants