New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve squash merge detection #1238
base: develop
Are you sure you want to change the base?
Conversation
Wow 😯 I'm gonna check how it performs on my large commercial repo (~800k commits) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1238 +/- ##
===========================================
- Coverage 98.48% 98.32% -0.17%
===========================================
Files 14 14
Lines 3706 3769 +63
Branches 814 828 +14
===========================================
+ Hits 3650 3706 +56
- Misses 37 41 +4
- Partials 19 22 +3 ☔ View full report in Codecov by Sentry. |
My main worry is this loop. For very long-lived branches that are way behind their fork point it could be a problem, I was thinking that maybe I could set a limit of commits (e.g., if we don't see the squash merge in the first 100 commits, then assume it's not merged). |
Whoops invoking |
Actually... stepping back a bit, how often did you encounter the case when a squash merge has not been detected (but is now detected after your PR)? Could you provide a quick example how that situation looked like? it's been a long time since I've touched that logic TBH 🤔 |
Quite often, in the main repo I use there are always commits between the fork point and the squash merge. I validated with the tests I added that indeed that was the case, if you remove the new logic (e.g., forcing an early exit here changing Up to this point I was manually sliding out merged branches because I thought it was a problem on how the merges were done in the repo, but then I noticed it only happened in busy repos, and that's how I found out the problem were those intermediate commits. We might be able to do some shell magic to at least make git compute the patch id without having the diff for each commit in the Python code, let me check. |
Huh okay... would you mind putting this logic behind a git config key? 🤔 flag to |
Sounds good, I'll take a look, thx! |
Added the option, let me know if it looks good. I wasn't sure if some of the docs are automatic or not (e.g. the traverse/status help pages) so I didn't update them. Also, not sure if the deprecation is how you expected. I'll take a look later regarding the use of |
Now it's using |
4c26a00
to
08b0a8c
Compare
I'm gonna introduce minor fixes directly onto your branch. Sorry for the lags, I'm busy as heck in my commercial job :/ |
Also, the comments on the PR are for myself from now on — you've already done more than I could ever ask 😅 |
return self.__is_equivalent_tree_reachable_cached[equivalent_to_commit_hash, reachable_from_commit_hash] | ||
prev_mode, prev_result = self.__is_equivalent_tree_reachable_cached[equivalent_to_commit_hash, reachable_from_commit_hash] | ||
|
||
# Only return cached result if we're using the same mode or if we already checked with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this extra caching check makes sense. IIUC within the given git-machete invocation (i.e. within the given Python process lifetime), there's only ever a single set of CLI options, and hence a single squash detection mode. I'll probably revert that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And within tests, a new GitContext
is created in each test method anyway.
out = utils.get_non_empty_lines(self._popen_git("patch-id", input=patch_contents).stdout) | ||
|
||
if len(out) == 0: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line uncovered by tests. Probably okay to leave it so, there doesn't seem to be any reasonable path that would cover this case. Also, doesn't feel right to just # pragma: no cover
it away.
@@ -786,6 +787,7 @@ def is_equivalent_tree_reachable( | |||
self, | |||
equivalent_to: AnyRevision, | |||
reachable_from: AnyRevision, | |||
opt_squash_merge_detection: SquashMergeDetection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should no longer be called is_equivalent_tree_reachable
, since in EXACT
mode we're checking for some other condition (the method result might be true even though there's no equivalent tree reachable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better, I'd keep GitContext
oblivious to SquashMergeDetection
mode. This is a machete-specific concept, after all, and we strive to keep GitContext
as machete-agnostic as possible.
So, I'll extract a method like is_equivalent_patch_reachable
(the exact name TBD), and call it somewhere from MacheteClient
so that the overall logic is equivalent to the current state.
* ``none``: No squash merge/rebase detection. | ||
* ``simple``: Compares the tree state of the merge commit with the tree state of the upstream branch. This detects squash merges/rebases as long as there was not any commit on the upstream branch since the last common commit. | ||
* ``exact``: Compares the patch that would be applied by the merge commit with the commits that occurred on the upstream branch since the last common commit. This detects squash merges/rebases even if there were commits on the upstream branch since the last common commit. However, it might have a performance impact as it requires listing all the commits in the upstream. | ||
|
||
|
||
**Environment variables:** | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs for status
and traverse
need to be updated as well
@@ -64,6 +66,10 @@ def validate(self) -> None: | |||
if self.opt_sync_github_prs and self.opt_sync_gitlab_mrs: | |||
raise MacheteException( | |||
"Option `-H/--sync-github-prs` cannot be specified together with `-L/--sync-gitlab-mrs`.") | |||
if not isinstance(self.opt_squash_merge_detection, SquashMergeDetection): | |||
valid_values = ', '.join(e.value for e in SquashMergeDetection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be covered by tests
@@ -911,6 +956,20 @@ def get_commits_between(self, earliest_exclusive: AnyRevision, latest_inclusive: | |||
utils.get_non_empty_lines(self._popen_git("log", "--format=%H:%h:%s", f"^{earliest_exclusive}", latest_inclusive, "--").stdout) | |||
)))) | |||
|
|||
def get_patch_ids_for_commits_between( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move next to get_patch_id
return result | ||
|
||
def get_patch_id(self, patch_contents: str) -> Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce a dedicated type (alias for str
) for patch ids
This PR improves detection of squash merges. The existing logic only works if there are no commits in between the fork point and the merge, which in busy repositories might not be the case. I've added direct comparison of diffs using
git patch-id
, which should be reasonably stable.The change adds some delay in
git machete status
as there are more commands being executed. For my repos this doesn't seem to be above "barely noticeable", but the impact could be worse when there are more commits to compare.Also, in order to be able to run
patch_id
I've modified_popen_git
and subsequent functions so they accept standard input, I hope that's ok.