-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Refactor bots logic #445
Conversation
`services/bots.py` is used to select the token for a repo / owner that will authenticate requests to the git providers. With recent changes to multi-github apps the logic has become quite convoluted and a bit confusing. In particular with complex return structures of functions and an approach that doesn't really take into consideration the owner's service. These changes aim to separate the logic a bit better in different files grouped by "token type", improve typehinting and docstrings, and also simplify the structure of data returned by some functions.
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found. @@ Coverage Diff @@
## main #445 +/- ##
========================================
Coverage 97.34% 97.35%
========================================
Files 399 405 +6
Lines 33612 33775 +163
========================================
+ Hits 32720 32881 +161
- Misses 892 894 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #445 +/- ##
========================================
Coverage 97.34% 97.35%
========================================
Files 399 405 +6
Lines 33612 33775 +163
========================================
+ Hits 32720 32881 +161
- Misses 892 894 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found @@ Coverage Diff @@
## main #445 +/- ##
========================================
Coverage 97.34% 97.35%
========================================
Files 399 405 +6
Lines 33612 33775 +163
========================================
+ Hits 32720 32881 +161
- Misses 892 894 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAttention: Patch coverage is
Changes have been made to critical files, which contain lines commonly executed in production. Learn more ✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #445 +/- ##
==========================================
+ Coverage 97.37% 97.40% +0.03%
==========================================
Files 430 437 +7
Lines 34302 34666 +364
==========================================
+ Hits 33401 33766 +365
+ Misses 901 900 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes
|
The `get_token_type_mapping` function is used to get different token types for public repos. These depend on install YAML configuration. It only affects public repos that don't have a token defined(*) So the function used to make 2 checks: * repo is private? * github app installations? With the refactoring from the previous commit fetching the github app installation info was pulled to `get_repo_provider_service` function. So we can decide there if we need the token type mapping for this case or not. This simplified the `get_token_type_mapping` function, which is preferable. I also removed the unused `commit` arg from `get_repo_provider_service`. (*) - except for comments, which prefer the configured commenter bot. I don't know why
Because (again) of the recent multi-github app changes part of the logic that goes into getting auth info for the torngit adapters was moved from `bots` service to the `repository` and `owner` services (in their respective functions to get the provider adapter). These changes unify that logic back into the `bots` service and put that as the interface between `bots` and the other services through the `AdapterAuthInformation` type. It should provide all the needed auth info for us to generate an appropriate torngit instance. For what it's worth I beleive that the changes that happenend to the `owner` service related to the token_refresh_callback were a bug that is fixed by these changes. That is because the `owner.bot` is a different Owner that owns the token being used, so the callback function needs to save the refreshed version back to `owner.bot`, not `owner`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like
Apps are selected randomly but assigned weights based on how recently they were created. | ||
This means that older apps are selected more frequently as the main app than newer ones. | ||
(up to 10 days, when the probability of being chosen is the same) | ||
The random selection is done so we can distribute request load more evenly among apps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is an app that was updated 10 days ago better than an app that was updated 7 days ago? and in the case where somebody did misconfigure the app, wouldn't it be better to surface that ASAP so they can fix rather than try to delay using the broken app until the following week?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature request was to prevent normal operation from breaking if a new missconfigured app is introduced.
The idea is that a newly configured app could have been poorly configured. If selected it can derail normal operations. So by reducing the number of requests used by new installations we can maintain more regular operations (if it is missconfiguired we would see the errors appearing of course, but "most" requests would still be OK)
From that point on the ramp-up period is completely arbitrary.
selected_app_id = random.choices(keys, weights, k=1)[0] | ||
apps_to_consider.append(ghapp_installations_filter[selected_app_id]) | ||
# random.choices chooses with replacement | ||
# which we are trying to avoid here. So we remove the key selected and its weight from the population. | ||
key_idx = keys.index(selected_app_id) | ||
keys.pop(key_idx) | ||
weights.pop(key_idx) | ||
selections += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this work https://docs.python.org/3/library/random.html#random.sample ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem to have weights...
OR if I use the "weight" as the "count", is that statistically equivalent?...
I can round the weights so they are integers... yeah it could work, thanks for the suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I commented before from the wrong browser :E apologies)
OK so looking at this again it would not work.
Because it's missing the weights, and using "weight" as counts would re-introduce the problem of replacement.
We want all available apps to be selected with the weights taken into consideration.
services/bots/github_apps.py
Outdated
# DEPRECATED FLOW - begin | ||
if owner.integration_id and ( | ||
(repository and repository.using_integration) or (repository is None) | ||
): | ||
log.info( | ||
"Selected deprecated owner.integration_id to communicate with github", | ||
extra=extra_info_to_log, | ||
) | ||
return [GithubInstallationInfo(installation_id=owner.integration_id)] | ||
# DEPRECATED FLOW - end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these logs still firing or could we delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see these logs for the past 15 days. So I guess we can delete this?
services/bots/__init__.py
Outdated
|
||
|
||
def _get_owner_or_appropriate_bot(owner: Owner) -> Owner: | ||
def _get_owner_or_appropriate_bot(owner: Owner, repoid: int | None = None) -> Owner: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does anybody use this new extra argument? cmd-f doesn't show usages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used for logging.
I changed _get_repo_appropriate_bot
(you can check the original version in the deleted bots.py
file), and they had a slightly different message + the repoid in the logs.
I thought it would be important for debugging to keep the repoid
in the logs
This commit simply adds unit tests for the `get_adapter_auth_information` interface in various different scenarios.
0036ea4
to
0e1940a
Compare
* improve logs on repo bot selection * simplify `_can_use_this_app` conditions * improve type comment
0e1940a
to
e5ac2f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. Just a few questions.
@@ -10,3 +11,19 @@ | |||
type TokenWithOwner = Tuple[Token, Optional[Owner]] | |||
|
|||
type TokenTypeMapping = Dict[TokenType, Token] | |||
|
|||
|
|||
class AdapterAuthInformation(TypedDict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a more generic docstring for this class that describes this object at a higher level? A few of the fields are GitHub specific. Is this object meant to be GitHub specific?
Instead of using the nifty `_` for private functions move the helper functions to other files. Instead of making a big `helper` file I decided to put things in semantically interesting files: * repo_bots - control the interaction with configuration on a repo level * owner_bots - control the interaction with configuration on a owner level * public_bots - control the interaction with configuration on an installation level (YAML configured) Usually the analysis follows that order of priority. One small issue was that token mapping would get the admin_bo (redoing the search for the appropriate bot), and that would cause a circular import, so I just deciede to re-use the search value and pass the admin bot token as an arg. Many changes in tests and import paths. In particular some tests became obsolete from the interface change. Because the token_type_mapping function doesn't do the search anymore, tests that depended on that broke down. So I removed them, but their "testing" ability is captured in new tests on `services/bots/tests/test_bots.py`
We have been more than 15 days without seeing a log indicating that the `owner.integration_id` was selected to communicate with github. At this point the syncing of integrations is apparently OK and we have run the github backfill task to create GithubAppInstallations to owners that didn't have those before.
9b711b6
to
f58a5a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
services/bots.py
is used to select the token for a repo / owner that willauthenticate requests to the git providers.
With recent changes to multi-github apps the logic has become quite convoluted and a bit
confusing. In particular with complex return structures of functions and an approach that
doesn't really take into consideration the owner's service.
These changes aim to separate the logic a bit better in different files grouped by "token type",
improve typehinting and docstrings, and also simplify the structure of data returned by some functions.