Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenless V3 #533

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Tokenless V3 #533

wants to merge 1 commit into from

Conversation

joseph-sentry
Copy link
Contributor

@joseph-sentry joseph-sentry commented May 1, 2024

Purpose/Motivation

We don't want to make GH API calls when doing tokenless anymore

Links to relevant tickets

codecov/engineering-team#1574

What does this PR do?

  • Remove GitHub API calls from TokenlessAuthentication

@joseph-sentry joseph-sentry requested a review from a team as a code owner May 1, 2024 15:15
@codecov-qa
Copy link

codecov-qa bot commented May 1, 2024

Codecov Report

Attention: Patch coverage is 85.93750% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 91.32%. Comparing base (62f64f3) to head (b791fc6).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Files Patch % Lines
codecov_auth/authentication/repo_auth.py 86.20% 8 Missing ⚠️
upload/views/commits.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #533      +/-   ##
==========================================
- Coverage   91.35%   91.32%   -0.04%     
==========================================
  Files         599      601       +2     
  Lines       15972    16016      +44     
==========================================
+ Hits        14592    14626      +34     
- Misses       1380     1390      +10     
Flag Coverage Δ
unit 91.32% <85.93%> (-0.04%) ⬇️
unit-latest-uploader 91.32% <85.93%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

codecov-public-qa bot commented May 1, 2024

Codecov Report

Attention: Patch coverage is 85.93750% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 91.32%. Comparing base (62f64f3) to head (b791fc6).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found ☺️

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #533      +/-   ##
==========================================
- Coverage   91.35%   91.32%   -0.04%     
==========================================
  Files         599      601       +2     
  Lines       15972    16016      +44     
==========================================
+ Hits        14592    14626      +34     
- Misses       1380     1390      +10     
Flag Coverage Δ
unit 91.32% <85.93%> (-0.04%) ⬇️
unit-latest-uploader 91.32% <85.93%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
upload/views/reports.py 100.00% <ø> (ø)
upload/views/commits.py 97.14% <83.33%> (-2.86%) ⬇️
codecov_auth/authentication/repo_auth.py 94.57% <86.20%> (-3.93%) ⬇️

... and 4 files with indirect coverage changes

Impacted file tree graph

Copy link

codecov bot commented May 1, 2024

Codecov Report

Attention: Patch coverage is 85.93750% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 95.80%. Comparing base (62f64f3) to head (b791fc6).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Files Patch % Lines
codecov_auth/authentication/repo_auth.py 86.20% 8 Missing ⚠️
upload/views/commits.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main    #533     +/-   ##
=======================================
- Coverage   95.84   95.80   -0.04     
=======================================
  Files        777     779      +2     
  Lines      17290   17338     +48     
=======================================
+ Hits       16571   16610     +39     
- Misses       719     728      +9     
Flag Coverage Δ
unit 91.32% <85.93%> (-0.04%) ⬇️
unit-latest-uploader 91.32% <85.93%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@giovanni-guidini
Copy link
Contributor

We need to be careful merging these changes to avoid re-introducing the bug of having an upload from a fork branch accidentally overwrite coverage in the upstream branch.

Considering the upload endpoints used by the CLI either we change the CLI to send the branch info every time (maybe through headers) OR we use the info we have to make the validation that the branch name is in the format fork:branch. If that is not the case, reject the upload.

Currently I believe it should be possible to make this check:

  • For the commit creation the branch should be part of the body
  • For the subsequent requests (create report and send the upload) we can pull the commit from the database and check its branch information.

Personally I feel that we should be able to accept / reject a request without having to look at the request body. So I'd opt to make the CLI send specific headers with the information we need.
(Also by removing the current checks maybe X-Tokenless-PR header is useless?)

@joseph-sentry
Copy link
Contributor Author

joseph-sentry commented May 2, 2024

So I'd opt to make the CLI send specific headers with the information we need.
(Also by removing the current checks maybe X-Tokenless-PR header is useless?)

We could just validate that the x-tokenless-pr header is in the format we expect and has the correct repo name (it matches the one in the url)?

@giovanni-guidini
Copy link
Contributor

X-Tokenless-PR is a number [1] that we used to get the correct PR from the provider. That does little for us considering the changes in this PR. If you just change the value the header name will be misleading.

X-Tokenless should not match the repo name in the URL if it comes from a fork [1], cause it's supposed to be the fork slug, while the slug in the URL should be the upstream's.

[1] https://github.com/codecov/codecov-cli/blob/7b028499a521f8df3cf3bb6b642246a19e07bee9/codecov_cli/services/report/__init__.py#L57-L58

@joseph-sentry joseph-sentry changed the title fix: don't make api calls in tokenless auth Tokenless V3 May 6, 2024
if ":" not in tokenless:
raise exceptions.AuthenticationFailed(tokenless_auth_failed_message)

# make sure it's backwards compatible with the old way that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be compatible if validation fails (in the line above) because "the old way" doesn't include a : in the X-Tokenless header, ...right?

Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably belongs in another PR, but in addition to getting rid of the get_pull_request_info() check like you do here, we also want to get rid of the "recent CI run" check. that happens in upload/helpers.py. with that gone, i think we can delete the whole upload/tokenless directory

also, the other day you made a great point about the report/upload endpoints being able to get the branch from the commit rather than needing it to be sent by the CLI. i think we can take advantage of that to simplify this logic across the API, CLI, and CI actions

on the CLI side, we already have to send the branch name in the request body for the create-commit step, and we already modify the branch name for forks there. the other two steps include which commit they are working with, so the API will be able to look up the commit and get the branch from there. so none of the commands need to set X-Tokenless, and the action doesn't have to set any extra env vars. (i think this also avoids an edge case where an attacker passes authentication by passing X-Tokenless: fork:hahaha to the do-upload endpoint but the commit SHA they passed is actually a commit on main)

then on the API side, i think the only difference between the commit endpoint and the others is where we get the branch name from? if so, i think we can merge back into one TokenAuthentication class, and if the commit sha is part of the URL we get the branch from the commit DB object, otherwise we look for the branch in the request body because we're creating a new commit

my brain is a little fried right now so this comment might not be very useful. i'll do another pass tomorrow, or maybe we could hop on a call and talk about it

if a request is creating a commit then we require
the name of the branch it's creating a commit on to
contain a colon, because that means that it won't
affect the coverage of the branches that already belong
to that repository. This will only work for public
repositories.

Subsequent API calls to the reports and uploads endpoints
will check the branch and visibility of the repo those
requests are targetting.

Signed-off-by: joseph-sentry <joseph.sawaya@sentry.io>
@codecov-notifications
Copy link

Codecov Report

Attention: Patch coverage is 85.93750% with 9 lines in your changes are missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files Patch % Lines
codecov_auth/authentication/repo_auth.py 86.20% 8 Missing ⚠️
upload/views/commits.py 83.33% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can do this with one TokenAuthentication class. the regex with the commit sha already makes that part an optional group so the same regex should work for all three endpoints. the logic could be something like:

def _get_branch(self, request, commit_from_url):
    if commit_from_url:
        return commit_from_url.branch
    else:
        body = json.loads(str(request.body, "utf8"))
        return body.get("branch")

def authenticate(self, request):
    repo, commit_from_url = self._get_info_from_request_path(request)
    branch = self._get_branch(request, commit_from_url)

    if not branch:
        raise Whatever()
    ...

also, i left a couple comments in test files themselves but as a general note, can you make sure there are test cases that ensure a tokenless upload for a unprefixed branch name fails? the test cases covering tokenless rejections are all due to the repo being private, but we also want to reject based on the branch name

Comment on lines 270 to +279
# Validate provider
service = match.group(1)
try:
service_enum = Service(service)
# Currently only Github is supported
# TODO [codecov/engineering-team#914]: Extend tokenless support to other providers
if service_enum != Service.GITHUB:
raise exceptions.AuthenticationFailed(self.auth_failed_message)
raise exceptions.AuthenticationFailed(tokenless_auth_failed_message)
except ValueError:
raise exceptions.AuthenticationFailed(self.auth_failed_message)
raise exceptions.AuthenticationFailed(tokenless_auth_failed_message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this stuff isn't necessary anymore. if we aren't making requests to the git provider it shouldn't matter who that provider is

we shouldn't close the ticket in the comment because the CLI still needs work to support other providers, but this logic here can go

Comment on lines -191 to +205
@pytest.mark.parametrize("branch_sent", ["main", "someone/the_repo:main"])
@pytest.mark.parametrize("branch_sent", ["someone/the_repo:main", "someone:main"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer have a test case with an unprefixed branch name. can you add one in and make sure that unprefixed branch names are rejected without a token?

fake_provider_service.get_pull_request.assert_called_with("4")


def test_commit_tokenless_missing_branch(db, client, mocker):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this still seems like a good test case?

@@ -100,6 +114,7 @@ def test_reports_post_no_auth(db, mocker):
repository = RepositoryFactory(
name="the_repo", author__username="codecov", author__service="github"
)
repository.private = False
token = "BAD"
commit = CommitFactory(repository=repository)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test never calls commit.save(). is it failing for the reason we expect, which is an invalid token? or is it failing because the commit isn't in the DB?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants