Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate libraries on hash instead of filename. #32317

Merged
merged 1 commit into from
Apr 15, 2016
Merged

Deduplicate libraries on hash instead of filename. #32317

merged 1 commit into from
Apr 15, 2016

Conversation

taralx
Copy link
Contributor

@taralx taralx commented Mar 17, 2016

Removes the need for canonicalization to prevent #12459.

(Now with passing tests!)

Canonicalization breaks certain environments where the libraries are symlinks to files that don't end in .rlib (e.g. /remote/cas/$HASH).

@nagisa
Copy link
Member

nagisa commented Mar 17, 2016

r? @alexcrichton

hfive ain’t workin

@alexcrichton
Copy link
Member

Thanks for the PR @taralx! Some of the test changes here are a little worrisome to me, does this break existing tests? If not, could this just add a new standalone test (or append to an existing one?)

I'd also want to think a little bit on this as well, I feel like we've intentionally not done this in the past but I can't really see any reason not to do this when I think about it...

@taralx
Copy link
Contributor Author

taralx commented Mar 21, 2016

Yes, this breaks existing tests that assume that two libraries with different paths but the same name and identical content will not be deduplicated.

@@ -21,7 +21,7 @@ all: $(TMPDIR)/libnative.a
mkdir -p $(TMPDIR)/e1
mkdir -p $(TMPDIR)/e2
$(RUSTC) e.rs -o $(TMPDIR)/e1/libe.rlib
$(RUSTC) e.rs -o $(TMPDIR)/e2/libe.rlib
$(RUSTC) e.rs -o $(TMPDIR)/e2/libe-alt.rlib
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed because two libraries with the same name and content but different paths will not conflict anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I'm a little confused here, don't both these libraries have the same SVH? I don't follow why changing the filename causes them to suddenly be "different"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same SVH, different "hash" i.e. the part of the filename between lib and .rlib.

@alexcrichton
Copy link
Member

@taralx could this be changed to not modify existing tests where possible? If we want to add tests for new functionality then new tests should probably be added rather than tweaking old ones

@taralx
Copy link
Contributor Author

taralx commented Mar 24, 2016

@alexcrichton I've made the absolutely minimal changes to existing tests, as noted above in line notes. The problem is that various tests were making assumptions about our behavior with multiple identical crates, which is not a good way to test things. Where possible, I've changed it so that the crates are either (a) not identical or (b) differently named.

@taralx
Copy link
Contributor Author

taralx commented Mar 24, 2016

The other alternative, if you prefer it, is to rip out the fragile tests and make new ones that subsume them and more clearly cover the space.

@alexcrichton
Copy link
Member

I'd rather keep the tests continue testing the same properties if possible, but can you also add some tests specifically for this PR itself?

@taralx
Copy link
Contributor Author

taralx commented Mar 25, 2016

So I'll grant that the current situation is kind of confusing, because we have two levels of duplicate checking:

  1. For libraries found by search, we first group them by basename, then extract one from each group (which complains if we see multiple hashes). If more than one group returns a library, we complain.
  2. For externs (and other direct searches?), we simply group all of the libraries together and extract one from the single group. So we'll only complain if we hash multiple hashes, regardless of basename.

What is the desired ruleset?

@alexcrichton
Copy link
Member

Hm there may be a disconnect about "hash" here that we've been talking about, and also perhaps some discrepancies in the implementation. The "hash" in the filename, e.g. libfoo-xxxx.rlib where the hash is "xxxx", isn't actually a hash at all. It's just some disambiguating information so multiple libraries of the same crate name can sit on the filesystem. Duplicate hashes there don't actually mean anything as they don't indicate anything about the contents.

Inside the libraries, however, there's a hash called the SVH (strict version hash) which is indeed a hash of the contents of the libraries. We should be deduplicating based on the SVH, not the filename. Does that make sense?

@taralx
Copy link
Contributor Author

taralx commented Mar 30, 2016

Okay, I think I know what's going on. I think that once upon a time, we used to filter on filename first, then only read metadata if we needed to disambiguate further. Now, we read metadata and ignore filename entirely.

I'm going to push this all the way to metadata only with SVH-based dedup, ignoring filenames completely (except for .rlib suffix).

@taralx
Copy link
Contributor Author

taralx commented Apr 1, 2016

So it turns out grouping by name avoids matching an rlib with a different dylib (e.g. liba.rlib with libarena.dylib when searching for "a").

Still trying to find a solution.

@taralx
Copy link
Contributor Author

taralx commented Apr 2, 2016

@alexcrichton Ok! Have another look, hopefully the test diffs are more understandable.

@taralx
Copy link
Contributor Author

taralx commented Apr 4, 2016

I have one more little change to remove the last calls to fs::canonicalize coming as soon as I check tests.

@@ -13,11 +13,11 @@ all:
mv $(call DYLIB,foo) $(TMPDIR)/other
$(RUSTC) foo.rs --crate-type=dylib
$(RUSTC) bar.rs -L $(TMPDIR)/other 2>&1 | \
grep "multiple dylib candidates"
grep "assertion failed" && exit 1 || exit 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these tests should pass, right? With your changes the SVH is the same so they should get deduplicated so the compilation should succeed?

I'm a little wary to have a test that the compiler doesn't print "assertion failed" as that's a very brittle test to easily break without actually detecting a regression, but if these compiles are successful maybe that part can be tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the dylib one doesn't pass -- it prints other strange errors. The issue in question was "this causes the compiler to issue assertion failures". We could extend it to "these should work", but then we'd need to fix the other issue with dylibs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the "strange errors" that get printed here?

"The compiler doesn't assert" is such a broad test case it's essentially not testing anything unfortunately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to take care of those you can just compile all dylibs here with -C prefer-dynamic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. :)

@alexcrichton
Copy link
Member

Looks good to me, thanks! Can you also squash the commits?

@taralx
Copy link
Contributor Author

taralx commented Apr 5, 2016

h5 has previously said not to do that preemptively.

Squash and rebase, or just squash?

@alexcrichton
Copy link
Member

Ah yeah that was mainly intended for just review, once it's ready to go we prefer to squash + rebase before landing.

@taralx
Copy link
Contributor Author

taralx commented Apr 5, 2016

Done. Assuming tests pass, this one's good to go.

@alexcrichton
Copy link
Member

@bors: r+ 393b86a8071980f3f89205a44653a35bd7a269b1

Thanks!

@bors
Copy link
Contributor

bors commented Apr 5, 2016

⌛ Testing commit 393b86a with merge e6286af...

@bors
Copy link
Contributor

bors commented Apr 5, 2016

💔 Test failed - auto-win-msvc-64-opt-rustbuild

@taralx
Copy link
Contributor Author

taralx commented Apr 5, 2016

That's a weird failure. Does this happen often?

@alexcrichton
Copy link
Member

Looks like an ICE:

thread 'rustc' panicked at 'we should have found local duplicate earlier', C:\bot\slave\auto-win-msvc-64-opt-rustbuild\build\src\libcore\option.rs:700

definitely seems related to this patch though?

@taralx
Copy link
Contributor Author

taralx commented Apr 5, 2016

Maybe I need to --enable-debug

@alexcrichton
Copy link
Member

Ah yeah without that I think assertions may be disabled by default?

@taralx
Copy link
Contributor Author

taralx commented Apr 5, 2016

No repro with --enable-debug.

@alexcrichton
Copy link
Member

I think the incantation may be --enable-debug-assertions?

@taralx
Copy link
Contributor Author

taralx commented Apr 6, 2016

--enable-debug --enable-debug-assertions no repro :(

@bors
Copy link
Contributor

bors commented Apr 6, 2016

☔ The latest upstream changes (presumably #32767) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton
Copy link
Member

Hm ok, retry with a rebase?

@taralx
Copy link
Contributor Author

taralx commented Apr 6, 2016

Okay, let's try it.

@alexcrichton
Copy link
Member

@bors: r+ 0a1067e031caa96a200cd10c2bcd8da901fbba18

@bors
Copy link
Contributor

bors commented Apr 7, 2016

⌛ Testing commit 0a1067e with merge 4316553...

@bors
Copy link
Contributor

bors commented Apr 7, 2016

⛄ The build was interrupted to prioritize another pull request.

@bors
Copy link
Contributor

bors commented Apr 7, 2016

⌛ Testing commit 0a1067e with merge d29905a...

@bors
Copy link
Contributor

bors commented Apr 7, 2016

💔 Test failed - auto-win-gnu-32-opt-rustbuild

@taralx
Copy link
Contributor Author

taralx commented Apr 7, 2016

Seems strangely restricted to windows platforms, which build differently in rustbuild. Any idea on how I can repro this for debugging, or what might be causing it?

@taralx
Copy link
Contributor Author

taralx commented Apr 7, 2016

Is it doing a parallel compile, or did the error show up at a different place this time?

@alexcrichton
Copy link
Member

Crates are indeed compiling in parallel, and looks like this is legit?

@taralx
Copy link
Contributor Author

taralx commented Apr 7, 2016

Yeah, definitely legit, but I can't imagine what's causing it. It may be necessary to add more diagnostics to the build.

@taralx
Copy link
Contributor Author

taralx commented Apr 7, 2016

Is there a way to run the windows build manually somehow? I don't have a box with the necessary compiler on it myself. Maybe a cloud instance?

@alexcrichton
Copy link
Member

Nah unfortunately we don't have a great infrastructure for doing so :(

@bors
Copy link
Contributor

bors commented Apr 13, 2016

☔ The latest upstream changes (presumably #32828) made this pull request unmergeable. Please resolve the merge conflicts.

@taralx
Copy link
Contributor Author

taralx commented Apr 14, 2016

So it turns out that creader was also calling fs::canonicalize during existing_match under the assumption that all paths are canonical. Instead of removing it, I just backed out that change, leaving everything using canonical paths.

This bit us on windows because of the \ vs / thing.

@alexcrichton
Copy link
Member

@bors: r+ 2218245

@bors
Copy link
Contributor

bors commented Apr 15, 2016

⌛ Testing commit 2218245 with merge 76c1a0d...

bors added a commit that referenced this pull request Apr 15, 2016
Deduplicate libraries on hash instead of filename.

Removes the need for canonicalization to prevent #12459.

(Now with passing tests!)

Canonicalization breaks certain environments where the libraries are symlinks to files that don't end in .rlib (e.g. /remote/cas/$HASH).
@bors bors merged commit 2218245 into rust-lang:master Apr 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants