Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(launch): overwrite partial job metadata instead of building a new job version #7651

Merged
merged 1 commit into from
May 29, 2024

Conversation

bcsherma
Copy link
Contributor

@bcsherma bcsherma commented May 15, 2024

Description

This PR modifies the core and legacy job builder to never create a new job version when a partial job is used.

Partial jobs are created by the wandb job create or wandb launch -u commands. They include source information but no input or output type info. When a run created from a partial job finishes, the job builder constructs a new, non-partial job version with the input and output info saved into the wandb-job.json file.

Jobs now support having i/o types stored in artifact version metadata, which can be mutated without producing a new job version. Partial jobs now write _partial key to the metadata that can be used to identify partial jobs. If a partial job is detected the job builder will overwrite the artifact metadata with the i/o types.

  • I updated CHANGELOG.md, or it's not applicable

Testing

How was this PR tested?

Copy link

codecov bot commented May 15, 2024

Codecov Report

Attention: Patch coverage is 64.70588% with 30 lines in your changes are missing coverage. Please review.

Project coverage is 75.78%. Comparing base (dd2628d) to head (0f5ac92).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7651      +/-   ##
==========================================
+ Coverage   74.26%   75.78%   +1.51%     
==========================================
  Files         500      500              
  Lines       55803    54133    -1670     
==========================================
- Hits        41444    41026     -418     
+ Misses      13946    12691    -1255     
- Partials      413      416       +3     
Flag Coverage Δ
func 41.34% <66.66%> (+0.06%) ⬆️
system 63.54% <61.17%> (+0.03%) ⬆️
unit 56.20% <44.70%> (+0.91%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
wandb/sdk/launch/create_job.py 75.23% <ø> (ø)
wandb/sdk/internal/sender.py 90.32% <50.00%> (ø)
wandb/sdk/internal/internal_api.py 81.92% <25.00%> (-0.21%) ⬇️
wandb/sdk/internal/job_builder.py 91.26% <81.81%> (+3.36%) ⬆️
core/pkg/server/sender.go 71.04% <10.00%> (+8.45%) ⬆️
core/pkg/launch/job_builder.go 70.51% <69.44%> (+7.86%) ⬆️

... and 104 files with indirect coverage changes

@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch from 1cd559c to f42e9fa Compare May 15, 2024 23:50
@bcsherma bcsherma force-pushed the 05-15-chore_artifacts_add_updateartifact_mutation_to_wandb-core_code_gen branch from e391f54 to 1c377af Compare May 16, 2024 17:34
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch from f42e9fa to 4e34d5d Compare May 16, 2024 17:35
Base automatically changed from 05-15-chore_artifacts_add_updateartifact_mutation_to_wandb-core_code_gen to main May 16, 2024 18:00
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch 2 times, most recently from ffff3c5 to 7ec97ff Compare May 16, 2024 19:55
@bcsherma bcsherma changed the title fix(launch): overwrite partial job metadata fix(launch): overwrite partial job metadata instead of building a new job version May 16, 2024
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch from 7ec97ff to e10eea2 Compare May 16, 2024 20:22
@bcsherma bcsherma marked this pull request as ready for review May 16, 2024 20:55
@bcsherma bcsherma requested a review from a team as a code owner May 16, 2024 20:55
Copy link
Member

@dmitryduev dmitryduev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a (system) test for the added logic?

tests/pytest_tests/system_tests/test_launch/test_job.py Outdated Show resolved Hide resolved
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch 5 times, most recently from 56fca2a to 0a93a41 Compare May 17, 2024 18:41
Copy link
Contributor

@KyleGoyette KyleGoyette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall logic seems correct to me. However I'd prefer to refactor some of the inherenet job builder logic into job builder methods to keep sender cleaner

core/pkg/launch/job_builder.go Outdated Show resolved Hide resolved
core/pkg/server/sender.go Outdated Show resolved Hide resolved
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch 8 times, most recently from b4b22c4 to 5dd1db7 Compare May 17, 2024 23:24
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch 2 times, most recently from b8be30a to 2e5f554 Compare May 21, 2024 17:46
@bcsherma bcsherma requested a review from KyleGoyette May 21, 2024 18:01
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch from 2e5f554 to 1ba01d9 Compare May 21, 2024 20:13
Copy link
Contributor

@KyleGoyette KyleGoyette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but mostly LGTM.

core/pkg/server/sender.go Outdated Show resolved Hide resolved
wandb/sdk/internal/internal_api.py Outdated Show resolved Hide resolved
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch from 1ba01d9 to 6df8fc6 Compare May 21, 2024 20:19
@bcsherma bcsherma requested a review from KyleGoyette May 21, 2024 20:24
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch from 6df8fc6 to 30e0cbb Compare May 21, 2024 21:26
Copy link
Contributor

@KyleGoyette KyleGoyette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but codecov is saying there are no tests covering many of the new methods added. Should be pretty straight forward to get tests in tests/pytest_tests/unit_tests/test_job_builder.py

@bcsherma bcsherma requested a review from dmitryduev May 22, 2024 16:42
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch from 30e0cbb to e333d42 Compare May 22, 2024 16:43
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch 4 times, most recently from cfb33a6 to 9cd99e1 Compare May 24, 2024 16:29
Copy link
Member

@dmitryduev dmitryduev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple nits

core/pkg/launch/job_builder.go Outdated Show resolved Hide resolved
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch from 9cd99e1 to d99e51a Compare May 29, 2024 17:42
…#7676)

Co-authored-by: Dmitry Duev <dmitryduev@users.noreply.github.com>
@bcsherma bcsherma force-pushed the 05-15-fix_launch_overwrite_partial_job_metadata branch from d99e51a to 0f5ac92 Compare May 29, 2024 17:46
@bcsherma bcsherma merged commit b082e18 into main May 29, 2024
39 of 41 checks passed
@bcsherma bcsherma deleted the 05-15-fix_launch_overwrite_partial_job_metadata branch May 29, 2024 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants