Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Refactor bots logic #445

Merged
merged 8 commits into from
May 17, 2024
Merged

refactor: Refactor bots logic #445

merged 8 commits into from
May 17, 2024

Conversation

giovanni-guidini
Copy link
Contributor

@giovanni-guidini giovanni-guidini commented May 13, 2024

services/bots.py is used to select the token for a repo / owner that will
authenticate requests to the git providers.

With recent changes to multi-github apps the logic has become quite convoluted and a bit
confusing. In particular with complex return structures of functions and an approach that
doesn't really take into consideration the owner's service.

These changes aim to separate the logic a bit better in different files grouped by "token type",
improve typehinting and docstrings, and also simplify the structure of data returned by some functions.

💡 Read the commit messages

The commit messages include further context for the changes within.

`services/bots.py` is used to select the token for a repo / owner that will
authenticate requests to the git providers.

With recent changes to multi-github apps the logic has become quite convoluted and a bit
confusing. In particular with complex return structures of functions and an approach that
doesn't really take into consideration the owner's service.

These changes aim to separate the logic a bit better in different files grouped by "token type",
improve typehinting and docstrings, and also simplify the structure of data returned by some functions.
@codecov-notifications
Copy link

codecov-notifications bot commented May 13, 2024

Codecov Report

Attention: Patch coverage is 98.98219% with 4 lines in your changes are missing coverage. Please review.

✅ All tests successful. No failed tests found.

Impacted file tree graph

@@           Coverage Diff            @@
##             main     #445    +/-   ##
========================================
  Coverage   97.34%   97.35%            
========================================
  Files         399      405     +6     
  Lines       33612    33775   +163     
========================================
+ Hits        32720    32881   +161     
- Misses        892      894     +2     
Flag Coverage Δ
integration 97.35% <98.98%> (+<0.01%) ⬆️
latest-uploader-overall 97.35% <98.98%> (+<0.01%) ⬆️
unit 97.35% <98.98%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.59% <98.10%> (+0.01%) ⬆️
OutsideTasks 97.51% <98.96%> (+<0.01%) ⬆️
Files Coverage Δ
services/bots/__init__.py 100.00% <100.00%> (ø)
services/bots/owner_bots.py 100.00% <100.00%> (ø)
services/bots/repo_bots.py 100.00% <100.00%> (ø)
services/bots/tests/test_bots.py 100.00% <100.00%> (ø)
services/bots/types.py 100.00% <100.00%> (ø)
services/comparison/overlays/critical_path.py 100.00% <100.00%> (ø)
services/github.py 85.71% <ø> (ø)
services/notification/notifiers/checks/base.py 98.20% <ø> (ø)
services/notification/notifiers/tests/conftest.py 100.00% <ø> (ø)
.../notification/notifiers/tests/unit/test_comment.py 99.22% <ø> (ø)
... and 21 more

... and 1 file with indirect coverage changes

@codecov-qa
Copy link

codecov-qa bot commented May 13, 2024

Codecov Report

Attention: Patch coverage is 98.98219% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 97.35%. Comparing base (4d31035) to head (f58a5a0).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##             main     #445    +/-   ##
========================================
  Coverage   97.34%   97.35%            
========================================
  Files         399      405     +6     
  Lines       33612    33775   +163     
========================================
+ Hits        32720    32881   +161     
- Misses        892      894     +2     
Flag Coverage Δ
integration 97.35% <98.98%> (+<0.01%) ⬆️
latest-uploader-overall 97.35% <98.98%> (+<0.01%) ⬆️
unit 97.35% <98.98%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.59% <98.10%> (+0.01%) ⬆️
OutsideTasks 97.51% <98.96%> (+<0.01%) ⬆️
Files Coverage Δ
services/bots/__init__.py 100.00% <100.00%> (ø)
services/bots/owner_bots.py 100.00% <100.00%> (ø)
services/bots/repo_bots.py 100.00% <100.00%> (ø)
services/bots/tests/test_bots.py 100.00% <100.00%> (ø)
services/bots/types.py 100.00% <100.00%> (ø)
services/comparison/overlays/critical_path.py 100.00% <100.00%> (ø)
services/github.py 85.71% <ø> (ø)
services/notification/notifiers/checks/base.py 98.20% <ø> (ø)
services/notification/notifiers/tests/conftest.py 100.00% <ø> (ø)
.../notification/notifiers/tests/unit/test_comment.py 99.22% <ø> (ø)
... and 21 more

... and 1 file with indirect coverage changes

Copy link

codecov-public-qa bot commented May 13, 2024

Codecov Report

Attention: Patch coverage is 98.98219% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 97.35%. Comparing base (4d31035) to head (f58a5a0).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found ☺️

Impacted file tree graph

@@           Coverage Diff            @@
##             main     #445    +/-   ##
========================================
  Coverage   97.34%   97.35%            
========================================
  Files         399      405     +6     
  Lines       33612    33775   +163     
========================================
+ Hits        32720    32881   +161     
- Misses        892      894     +2     
Flag Coverage Δ
integration 97.35% <98.98%> (+<0.01%) ⬆️
latest-uploader-overall 97.35% <98.98%> (+<0.01%) ⬆️
unit 97.35% <98.98%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.59% <98.10%> (+0.01%) ⬆️
OutsideTasks 97.51% <98.96%> (+<0.01%) ⬆️
Files Coverage Δ
services/bots/__init__.py 100.00% <100.00%> (ø)
services/bots/owner_bots.py 100.00% <100.00%> (ø)
services/bots/repo_bots.py 100.00% <100.00%> (ø)
services/bots/tests/test_bots.py 100.00% <100.00%> (ø)
services/bots/types.py 100.00% <100.00%> (ø)
services/comparison/overlays/critical_path.py 100.00% <100.00%> (ø)
services/github.py 85.71% <ø> (ø)
services/notification/notifiers/checks/base.py 98.20% <ø> (ø)
services/notification/notifiers/tests/conftest.py 100.00% <ø> (ø)
.../notification/notifiers/tests/unit/test_comment.py 99.22% <ø> (ø)
... and 21 more

... and 1 file with indirect coverage changes

Copy link

codecov bot commented May 13, 2024

Codecov Report

Attention: Patch coverage is 98.98219% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 97.40%. Comparing base (4d31035) to head (f58a5a0).
Report is 1 commits behind head on main.

Changes have been made to critical files, which contain lines commonly executed in production. Learn more

✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #445      +/-   ##
==========================================
+ Coverage   97.37%   97.40%   +0.03%     
==========================================
  Files         430      437       +7     
  Lines       34302    34666     +364     
==========================================
+ Hits        33401    33766     +365     
+ Misses        901      900       -1     
Flag Coverage Δ
integration 97.35% <98.98%> (+<0.01%) ⬆️
latest-uploader-overall 97.35% <98.98%> (+<0.01%) ⬆️
unit 97.35% <98.98%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.69% <98.10%> (+0.08%) ⬆️
OutsideTasks 97.51% <98.96%> (+<0.01%) ⬆️
Files Coverage Δ
services/bots/__init__.py 100.00% <100.00%> (ø)
services/bots/owner_bots.py 100.00% <100.00%> (ø)
services/bots/repo_bots.py 100.00% <100.00%> (ø)
services/bots/tests/test_bots.py 100.00% <100.00%> (ø)
services/bots/types.py 100.00% <100.00%> (ø)
services/comparison/overlays/critical_path.py 100.00% <100.00%> (ø)
services/github.py 100.00% <ø> (ø)
services/notification/notifiers/checks/base.py 98.20% <ø> (ø)
services/notification/notifiers/tests/conftest.py 100.00% <ø> (ø)
.../notification/notifiers/tests/unit/test_comment.py 99.30% <ø> (+0.03%) ⬆️
... and 21 more

... and 1 file with indirect coverage changes

Related Entrypoints
run/app.tasks.upload.Upload
run/app.tasks.status.SetError
run/app.tasks.notify.Notify
run/app.tasks.pulls.Sync
run/app.tasks.compute_comparison.ComputeComparison
run/app.tasks.upload.UploadFinisher
run/app.tasks.upload.UploadProcessor
run/app.tasks.commit_update.CommitUpdate
run/app.tasks.upload.PreProcessUpload
run/app.tasks.sync_repo_languages_gql.SyncLanguagesGQL
run/app.tasks.bundle_analysis.BundleAnalysisNotify
run/app.tasks.test_results.TestResultsFinisherTask
run/app.tasks.sync_repo_languages.SyncLanguages
run/app.tasks.sync_repos.SyncRepos
run/app.tasks.sync_teams.SyncTeams
run/app.tasks.label_analysis.process_request

The `get_token_type_mapping` function is used to get different token types for public repos.
These depend on install YAML configuration. It only affects public repos that don't have a token defined(*)

So the function used to make 2 checks:
* repo is private?
* github app installations?

With the refactoring from the previous commit fetching the github app installation info was pulled to `get_repo_provider_service` function.
So we can decide there if we need the token type mapping for this case or not.
This simplified the `get_token_type_mapping` function, which is preferable.

I also removed the unused `commit` arg from `get_repo_provider_service`.

(*) - except for comments, which prefer the configured commenter bot. I don't know why
Because (again) of the recent multi-github app changes part of the logic that goes into
getting auth info for the torngit adapters was moved from `bots` service to the `repository`
and `owner` services (in their respective functions to get the provider adapter).

These changes unify that logic back into the `bots` service and put that as the interface
between `bots` and the other services through the `AdapterAuthInformation` type.
It should provide all the needed auth info for us to generate an appropriate torngit instance.

For what it's worth I beleive that the changes that happenend to the `owner` service related to
the token_refresh_callback were a bug that is fixed by these changes.
That is because the `owner.bot` is a different Owner that owns the token being used, so the callback
function needs to save the refreshed version back to `owner.bot`, not `owner`.
@giovanni-guidini giovanni-guidini requested a review from a team May 13, 2024 13:13
Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like

services/bots/__init__.py Outdated Show resolved Hide resolved
services/bots/github_apps.py Outdated Show resolved Hide resolved
Comment on lines +69 to +72
Apps are selected randomly but assigned weights based on how recently they were created.
This means that older apps are selected more frequently as the main app than newer ones.
(up to 10 days, when the probability of being chosen is the same)
The random selection is done so we can distribute request load more evenly among apps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is an app that was updated 10 days ago better than an app that was updated 7 days ago? and in the case where somebody did misconfigure the app, wouldn't it be better to surface that ASAP so they can fix rather than try to delay using the broken app until the following week?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature request was to prevent normal operation from breaking if a new missconfigured app is introduced.

The idea is that a newly configured app could have been poorly configured. If selected it can derail normal operations. So by reducing the number of requests used by new installations we can maintain more regular operations (if it is missconfiguired we would see the errors appearing of course, but "most" requests would still be OK)

From that point on the ramp-up period is completely arbitrary.

Comment on lines +100 to +107
selected_app_id = random.choices(keys, weights, k=1)[0]
apps_to_consider.append(ghapp_installations_filter[selected_app_id])
# random.choices chooses with replacement
# which we are trying to avoid here. So we remove the key selected and its weight from the population.
key_idx = keys.index(selected_app_id)
keys.pop(key_idx)
weights.pop(key_idx)
selections += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@Gguidini Gguidini May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to have weights...
OR if I use the "weight" as the "count", is that statistically equivalent?...

I can round the weights so they are integers... yeah it could work, thanks for the suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I commented before from the wrong browser :E apologies)

OK so looking at this again it would not work.
Because it's missing the weights, and using "weight" as counts would re-introduce the problem of replacement.
We want all available apps to be selected with the weights taken into consideration.

Comment on lines 232 to 241
# DEPRECATED FLOW - begin
if owner.integration_id and (
(repository and repository.using_integration) or (repository is None)
):
log.info(
"Selected deprecated owner.integration_id to communicate with github",
extra=extra_info_to_log,
)
return [GithubInstallationInfo(installation_id=owner.integration_id)]
# DEPRECATED FLOW - end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these logs still firing or could we delete this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see these logs for the past 15 days. So I guess we can delete this?



def _get_owner_or_appropriate_bot(owner: Owner) -> Owner:
def _get_owner_or_appropriate_bot(owner: Owner, repoid: int | None = None) -> Owner:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does anybody use this new extra argument? cmd-f doesn't show usages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used for logging.

I changed _get_repo_appropriate_bot (you can check the original version in the deleted bots.py file), and they had a slightly different message + the repoid in the logs.
I thought it would be important for debugging to keep the repoid in the logs

services/bots/types.py Outdated Show resolved Hide resolved
This commit simply adds unit tests for the `get_adapter_auth_information` interface
in various different scenarios.
* improve logs on repo bot selection
* simplify `_can_use_this_app` conditions
* improve type comment
Copy link
Contributor

@michelletran-codecov michelletran-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Just a few questions.

services/bots/__init__.py Show resolved Hide resolved
@@ -10,3 +11,19 @@
type TokenWithOwner = Tuple[Token, Optional[Owner]]

type TokenTypeMapping = Dict[TokenType, Token]


class AdapterAuthInformation(TypedDict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a more generic docstring for this class that describes this object at a higher level? A few of the fields are GitHub specific. Is this object meant to be GitHub specific?

services/bots/__init__.py Outdated Show resolved Hide resolved
Instead of using the nifty `_` for private functions move the helper functions to other files.
Instead of making a big `helper` file I decided to put things in semantically interesting files:
* repo_bots - control the interaction with configuration on a repo level
* owner_bots - control the interaction with configuration on a owner level
* public_bots - control the interaction with configuration on an installation level (YAML configured)

Usually the analysis follows that order of priority.
One small issue was that token mapping would get the admin_bo (redoing the search for the appropriate bot),
and that would cause a circular import, so I just deciede to re-use the search value and pass the admin bot token
as an arg.

Many changes in tests and import paths. In particular some tests became obsolete from the interface change.
Because the token_type_mapping function doesn't do the search anymore, tests that depended on that broke down.
So I removed them, but their "testing" ability is captured in new tests on `services/bots/tests/test_bots.py`
We have been more than 15 days without seeing a log indicating that the `owner.integration_id`
was selected to communicate with github.

At this point the syncing of integrations is apparently OK and we have run the github backfill task
to create GithubAppInstallations to owners that didn't have those before.
Copy link
Contributor

@michelletran-codecov michelletran-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@giovanni-guidini giovanni-guidini removed the request for review from matt-codecov May 17, 2024 11:24
@giovanni-guidini giovanni-guidini added this pull request to the merge queue May 17, 2024
Merged via the queue into main with commit 5fd87a1 May 17, 2024
29 checks passed
@giovanni-guidini giovanni-guidini deleted the gio/refactor-bots branch May 17, 2024 11:31
Copy link

sentry-io bot commented May 24, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ OwnerWithoutValidBotError app.tasks.sync_repos.SyncRepos View Issue
  • ‼️ RepositoryWithoutValidBotError app.tasks.sync_repos.SyncRepos View Issue
  • ‼️ RepositoryWithoutValidBotError app.tasks.sync_repo_languages_gql.SyncLanguagesGQL View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants