Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding global input hashes in lage server worker #861

Merged
merged 6 commits into from
Mar 8, 2025

Conversation

kenotron
Copy link
Member

@kenotron kenotron commented Mar 8, 2025

This PR reduces the cost of BuildXL's hashing. Since it doesn't know anything about env glob and how it is being reused across pips. This is a lage (JS task runner) concept. So, it naively executes a hash on each file. We can salvage this perf optimization by just taking control the hashing of those entries and just pass it as an input file. Since each target can possibly get its own env glob pattern... we are going to output a global_inputs_hash file per target, but the calculations are cached internally to make it fast.

We are leveraging the "info" command to generate these since this is a guaranteed command that is run all the time by the buildxl. Note cached hits from BXL means that it doesn't even interact with lage exec call for that pip. That means we cannot prepopulate things at the exec calls. the "info" the best place to generate these as a prep step. We will only do this for the --server case since normal execs don't need this (it is only needed to optimize for buildxl)

This pull request introduces several optimizations and improvements to the lage project, focusing on enhancing the efficiency of the BuildXL runs and improving the handling of global input hashes. The most important changes include adding new functionalities to handle global input hashes, refactoring existing methods, and updating the protobuf definitions to accommodate new fields.

Enhancements to BuildXL Optimizations:

  • Added logic to generate and use global input hash files for targets in the infoAction function. This optimization helps speed up BuildXL runs by avoiding repeated file reads. (packages/cli/src/commands/info/action.ts, [[1]](https://github.com/microsoft/lage/pull/861/files#diff-5d45065c5d72c3b1acb9f00b687053c6c6f987fa9f3f390d01e1c7a243d3df9fR155-R206), [[2]](https://github.com/microsoft/lage/pull/861/files#diff-5d45065c5d72c3b1acb9f00b687053c6c6f987fa9f3f390d01e1c7a243d3df9fR248-R251), [[3]](https://github.com/microsoft/lage/pull/861/files#diff-5d45065c5d72c3b1acb9f00b687053c6c6f987fa9f3f390d01e1c7a243d3df9fL200-L202), [[4]](https://github.com/microsoft/lage/pull/861/files#diff-5d45065c5d72c3b1acb9f00b687053c6c6f987fa9f3f390d01e1c7a243d3df9fL218-R276))
  • Introduced the getGlobalInputHashFilePath function to determine the path for global input hash files. (packages/cli/src/commands/targetHashFilePath.ts, [packages/cli/src/commands/targetHashFilePath.tsR1-R9](https://github.com/microsoft/lage/pull/861/files#diff-ba547c11c29e2e99e83302c76c27d57b07cbbb5722b6eedd4c3caef01b3aaf51R1-R9))

Refactoring and Code Improvements:

  • Refactored the generateCommand function to use the new shouldRunWorkersAsService helper function, improving code readability and maintainability. (packages/cli/src/commands/info/action.ts, [[1]](https://github.com/microsoft/lage/pull/861/files#diff-5d45065c5d72c3b1acb9f00b687053c6c6f987fa9f3f390d01e1c7a243d3df9fL200-L202), [[2]](https://github.com/microsoft/lage/pull/861/files#diff-5d45065c5d72c3b1acb9f00b687053c6c6f987fa9f3f390d01e1c7a243d3df9fL218-R276))
  • Added the FileHasher and hashStrings exports to the hasher package to support hash generation for global inputs. (packages/hasher/src/index.ts, [packages/hasher/src/index.tsR4-R5](https://github.com/microsoft/lage/pull/861/files#diff-1ca6c8a7c411a9c12ff5236babab9eb2e3d60c4f4772b8562f5beae74e9ce6d5R4-R5))

Protobuf and RPC Updates:

  • Updated the RunTargetResponse message in the protobuf definition to include the cwd and global_input_hash_file fields, ensuring that the necessary data is transmitted during remote procedure calls. (packages/rpc/proto/lage/v1/lage.proto, [packages/rpc/proto/lage/v1/lage.protoL16-R23](https://github.com/microsoft/lage/pull/861/files#diff-1cf9ffd1076b73fb018e4d7df07389a990b983465f604d6d956b1a7f18b859c2L16-R23))
  • Adjusted the generated TypeScript classes to reflect the changes in the protobuf definitions, ensuring proper handling of the new fields. (packages/rpc/src/gen/lage/v1/lage_pb.ts, [[1]](https://github.com/microsoft/lage/pull/861/files#diff-1caefe9c23e7418845c96bfe307a41be1429c6dbfe8431be5a50bc27bf01ff8eL88-R125), [[2]](https://github.com/microsoft/lage/pull/861/files#diff-1caefe9c23e7418845c96bfe307a41be1429c6dbfe8431be5a50bc27bf01ff8eL132-R144))

Additional Changes:

  • Imported the path module in executeRemotely.ts to handle file paths for global input hashes. (packages/cli/src/commands/exec/executeRemotely.ts, [packages/cli/src/commands/exec/executeRemotely.tsR1](https://github.com/microsoft/lage/pull/861/files#diff-930be17d1ef5b16a8986a15af1a19fe24c197583c29c0ddfec89fef77db68d6bR1))
  • Updated the createLageService function to handle global input hash files and ensure the correct paths are used during task execution. (packages/cli/src/commands/server/lageService.ts, [[1]](https://github.com/microsoft/lage/pull/861/files#diff-637e91c79aff43e711921eecd75f940dcc04834d68306ff53e84933591e001c2L4-R4), [[2]](https://github.com/microsoft/lage/pull/861/files#diff-637e91c79aff43e711921eecd75f940dcc04834d68306ff53e84933591e001c2R19), [[3]](https://github.com/microsoft/lage/pull/861/files#diff-637e91c79aff43e711921eecd75f940dcc04834d68306ff53e84933591e001c2L165-L168), [[4]](https://github.com/microsoft/lage/pull/861/files#diff-637e91c79aff43e711921eecd75f940dcc04834d68306ff53e84933591e001c2L236-R243), [[5]](https://github.com/microsoft/lage/pull/861/files#diff-637e91c79aff43e711921eecd75f940dcc04834d68306ff53e84933591e001c2R257), [[6]](https://github.com/microsoft/lage/pull/861/files#diff-637e91c79aff43e711921eecd75f940dcc04834d68306ff53e84933591e001c2R271-R272), [[7]](https://github.com/microsoft/lage/pull/861/files#diff-637e91c79aff43e711921eecd75f940dcc04834d68306ff53e84933591e001c2R319-R326), [[8]](https://github.com/microsoft/lage/pull/861/files#diff-637e91c79aff43e711921eecd75f940dcc04834d68306ff53e84933591e001c2R339-R346), [[9]](https://github.com/microsoft/lage/pull/861/files#diff-637e91c79aff43e711921eecd75f940dcc04834d68306ff53e84933591e001c2R355-R360))

@kenotron kenotron enabled auto-merge (squash) March 8, 2025 02:24
@kenotron kenotron merged commit 10cda62 into master Mar 8, 2025
10 checks passed
@kenotron kenotron deleted the info-global-inputs branch March 8, 2025 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants