Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve squash merge detection #1238

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

gjulianm
Copy link
Contributor

This PR improves detection of squash merges. The existing logic only works if there are no commits in between the fork point and the merge, which in busy repositories might not be the case. I've added direct comparison of diffs using git patch-id, which should be reasonably stable.

The change adds some delay in git machete status as there are more commands being executed. For my repos this doesn't seem to be above "barely noticeable", but the impact could be worse when there are more commits to compare.

Also, in order to be able to run patch_id I've modified _popen_git and subsequent functions so they accept standard input, I hope that's ok.

@PawelLipski PawelLipski added this to the v3.25.3 milestone Apr 26, 2024
@PawelLipski PawelLipski added the status Relates to how status is rendered label Apr 26, 2024
@PawelLipski PawelLipski self-requested a review April 26, 2024 08:20
@PawelLipski
Copy link
Collaborator

Wow 😯 I'm gonna check how it performs on my large commercial repo (~800k commits)

@codecov-commenter
Copy link

codecov-commenter commented Apr 26, 2024

Codecov Report

Attention: Patch coverage is 88.75000% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 98.32%. Comparing base (d2ae146) to head (29068d0).

Files Patch % Lines
git_machete/git_operations.py 89.74% 2 Missing and 2 partials ⚠️
git_machete/options.py 40.00% 2 Missing and 1 partial ⚠️
git_machete/constants.py 85.71% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1238      +/-   ##
===========================================
- Coverage    98.48%   98.32%   -0.17%     
===========================================
  Files           14       14              
  Lines         3706     3769      +63     
  Branches       814      828      +14     
===========================================
+ Hits          3650     3706      +56     
- Misses          37       41       +4     
- Partials        19       22       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gjulianm
Copy link
Contributor Author

Wow 😯 I'm gonna check how it performs on my large commercial repo (~800k commits)

My main worry is this loop. For very long-lived branches that are way behind their fork point it could be a problem, I was thinking that maybe I could set a limit of commits (e.g., if we don't see the squash merge in the first 100 commits, then assume it's not merged).

@PawelLipski
Copy link
Collaborator

Wow 😯 I'm gonna check how it performs on my large commercial repo (~800k commits)

My main worry is this loop. For very long-lived branches that are way behind their fork point it could be a problem, I was thinking that maybe I could set a limit of commits (e.g., if we don't see the squash merge in the first 100 commits, then assume it's not merged).

Whoops invoking git diff in a loop will indeed be troublesome :/ currently the number of invoked git commands is O(N) with respect to branch layout size (even though if some individual commands like git log might be linear wrt. git log size) ☹️

@PawelLipski
Copy link
Collaborator

Actually... stepping back a bit, how often did you encounter the case when a squash merge has not been detected (but is now detected after your PR)? Could you provide a quick example how that situation looked like? it's been a long time since I've touched that logic TBH 🤔

@gjulianm
Copy link
Contributor Author

gjulianm commented Apr 26, 2024

Actually... stepping back a bit, how often did you encounter the case when a squash merge has not been detected (but is now detected after your PR)? Could you provide a quick example how that situation looked like? it's been a long time since I've touched that logic TBH 🤔

Quite often, in the main repo I use there are always commits between the fork point and the squash merge. I validated with the tests I added that indeed that was the case, if you remove the new logic (e.g., forcing an early exit here changing if result to if result or True) the tests that add commits in between fail.

Up to this point I was manually sliding out merged branches because I thought it was a problem on how the merges were done in the repo, but then I noticed it only happened in busy repos, and that's how I found out the problem were those intermediate commits.

We might be able to do some shell magic to at least make git compute the patch id without having the diff for each commit in the Python code, let me check.

@PawelLipski
Copy link
Collaborator

Huh okay... would you mind putting this logic behind a git config key? 🤔 flag to status can also be considered instead, as there's already --no-detect-squash-merges. Maybe sth like --squash-merge-detection={none,simple,exact}? And then --no-detect-squash-merges to be deprecated.

@gjulianm
Copy link
Contributor Author

Huh okay... would you mind putting this logic behind a git config key? 🤔 flag to status can also be considered instead, as there's already --no-detect-squash-merges. Maybe sth like --squash-merge-detection={none,simple,exact}? And then --no-detect-squash-merges to be deprecated.

Sounds good, I'll take a look, thx!

RELEASE_NOTES.md Outdated Show resolved Hide resolved
@gjulianm
Copy link
Contributor Author

Added the option, let me know if it looks good. I wasn't sure if some of the docs are automatic or not (e.g. the traverse/status help pages) so I didn't update them. Also, not sure if the deprecation is how you expected.

I'll take a look later regarding the use of git log -p to get all the patches in one git call.

@gjulianm
Copy link
Contributor Author

gjulianm commented May 6, 2024

Now it's using git log -p, we still have to call git patch-id for each changeset though, I didn't see any option to get the patch-id from the git log call directly.

@PawelLipski
Copy link
Collaborator

I'm gonna introduce minor fixes directly onto your branch. Sorry for the lags, I'm busy as heck in my commercial job :/

@PawelLipski
Copy link
Collaborator

Also, the comments on the PR are for myself from now on — you've already done more than I could ever ask 😅

RELEASE_NOTES.md Outdated Show resolved Hide resolved
return self.__is_equivalent_tree_reachable_cached[equivalent_to_commit_hash, reachable_from_commit_hash]
prev_mode, prev_result = self.__is_equivalent_tree_reachable_cached[equivalent_to_commit_hash, reachable_from_commit_hash]

# Only return cached result if we're using the same mode or if we already checked with the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this extra caching check makes sense. IIUC within the given git-machete invocation (i.e. within the given Python process lifetime), there's only ever a single set of CLI options, and hence a single squash detection mode. I'll probably revert that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And within tests, a new GitContext is created in each test method anyway.

out = utils.get_non_empty_lines(self._popen_git("patch-id", input=patch_contents).stdout)

if len(out) == 0:
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line uncovered by tests. Probably okay to leave it so, there doesn't seem to be any reasonable path that would cover this case. Also, doesn't feel right to just # pragma: no cover it away.

git_machete/cli.py Outdated Show resolved Hide resolved
@@ -786,6 +787,7 @@ def is_equivalent_tree_reachable(
self,
equivalent_to: AnyRevision,
reachable_from: AnyRevision,
opt_squash_merge_detection: SquashMergeDetection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should no longer be called is_equivalent_tree_reachable, since in EXACT mode we're checking for some other condition (the method result might be true even though there's no equivalent tree reachable).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better, I'd keep GitContext oblivious to SquashMergeDetection mode. This is a machete-specific concept, after all, and we strive to keep GitContext as machete-agnostic as possible.

So, I'll extract a method like is_equivalent_patch_reachable (the exact name TBD), and call it somewhere from MacheteClient so that the overall logic is equivalent to the current state.

* ``none``: No squash merge/rebase detection.
* ``simple``: Compares the tree state of the merge commit with the tree state of the upstream branch. This detects squash merges/rebases as long as there was not any commit on the upstream branch since the last common commit.
* ``exact``: Compares the patch that would be applied by the merge commit with the commits that occurred on the upstream branch since the last common commit. This detects squash merges/rebases even if there were commits on the upstream branch since the last common commit. However, it might have a performance impact as it requires listing all the commits in the upstream.


**Environment variables:**

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs for status and traverse need to be updated as well

@@ -64,6 +66,10 @@ def validate(self) -> None:
if self.opt_sync_github_prs and self.opt_sync_gitlab_mrs:
raise MacheteException(
"Option `-H/--sync-github-prs` cannot be specified together with `-L/--sync-gitlab-mrs`.")
if not isinstance(self.opt_squash_merge_detection, SquashMergeDetection):
valid_values = ', '.join(e.value for e in SquashMergeDetection)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be covered by tests

@@ -911,6 +956,20 @@ def get_commits_between(self, earliest_exclusive: AnyRevision, latest_inclusive:
utils.get_non_empty_lines(self._popen_git("log", "--format=%H:%h:%s", f"^{earliest_exclusive}", latest_inclusive, "--").stdout)
))))

def get_patch_ids_for_commits_between(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move next to get_patch_id

tests/test_status.py Outdated Show resolved Hide resolved
@PawelLipski PawelLipski modified the milestones: v3.25.3, v3.26.0 May 22, 2024
return result

def get_patch_id(self, patch_contents: str) -> Optional[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce a dedicated type (alias for str) for patch ids

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status Relates to how status is rendered
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants