Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby/Python/JS/Swift: Add category of Private information to shared sensitive data heuristics #16446

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

joefarebrother
Copy link
Contributor

Adds a category of private information to the shared sensitive data heuristics file.
This may result in new results for the following queries:

  • rb/sensitive-get-query
  • py/clear-text-storage-sensitive-data
  • py/clear-text-logging-sensitive-data
  • py/weak-sensitive-data-hashing
  • js/clear-text-stotage-sensitive-data
  • js/clear-text-logging

No change note is needed for Swift, as the new heuristics are unused and thus should not affect any queries.
@joefarebrother joefarebrother marked this pull request as ready for review May 9, 2024 09:05
@joefarebrother joefarebrother requested review from a team as code owners May 9, 2024 09:05
@joefarebrother joefarebrother changed the title [Draft] Ruby/Python/JS/Swift: Add category of Private information to shared sensitive data heuristics Ruby/Python/JS/Swift: Add category of Private information to shared sensitive data heuristics May 9, 2024
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the author of the private information regexps for Swift (class SensitivePrivateInfo), I'm happy to see these expressions being used more widely. They've already had a reasonable amount of testing and iteration in Swift, and I expect they will work pretty well for other languages with minimal or no additional tuning.

@@ -49,6 +49,7 @@ class SensitiveCredential extends SensitiveDataType, TCredential {
exists(SensitiveDataClassification classification |
not classification = SensitiveDataClassification::password() and // covered by `SensitivePassword`
not classification = SensitiveDataClassification::id() and // not accurate enough
not classification = SensitiveDataClassification::private() and // covered by `SensitivePrivateInfo`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you've excluded the new results from Swift for now. Since they're a near duplicate of Swift's existing class SensitivePrivateInfo I think we should aim (now or later) to define that in terms of the new result classification instead, similarly to how class SensitivePassword already works. This may add a few new results (for SSN and CCN) in the four or five Swift sensitive data queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I don't mind doing this as follow-up actually. I already made half the changes I'll need to run a test of the SSN and CCN terms!

print(debit_card) # NOT OK
print(bank_number) # NOT OK
print(bank_account) # NOT OK, but NOT FOUND - "account" is treated as having the "id" classification and thus excluded.
print(accountNo) # NOT OK, but NOT FOUND - "account" is treated as having the "id" classification and thus excluded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These NOT FOUND results are good observations. We can think about mechanisms for limiting the exclusions perhaps and finding more results in cases like these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be considered for this PR or as potential follow-up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As potential follow-up I'd say. There will always be more improvements we can make to heuristics like this.

@joefarebrother
Copy link
Contributor Author

@geoffw0 What are the next steps for this? Should I just be waiting for an approval from each language team?

@erik-krogh
Copy link
Contributor

@joefarebrother: Could you summarise what changed results your evaluations showed.
What new results appeared for which queries, do you think they look good, and there any results that look bad, and is there any noticeable performance impact.

@joefarebrother
Copy link
Contributor Author

@erik-krogh There are new results for rb/sensitive-get-query, py/clear-text-storage-sensitive-data, and py/clear-text-logging-sensitive-data. and py/weak-sensitive-data-hashing.

New results in ruby sensitive get looks like a few TPs from new heuristics. (The MRVA run that restricts to only new heuristics shows some apparent FPs, but they are also present in the run of the full version of the query prior to changes; and appear to be the result of some nodes being spuriously detected as calls to several different methods including those matching sensitive heuristics)

Python cleartext storage and weak sensitive hashing have a few new results for which the sources look like TPs from new heuristics.
Python cleartext logging has many new results (as the query has a lot of sinks); going through them they look like mostly TPs. I did notice just one FP from an attribute named SSN that was not a social security number.
There is no noticable performance impact.

Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS 👍

But we should probably do a followup where this implementation is moved to the shared folder (together with CryptoAlgorithmNames?).

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geoffw0 What are the next steps for this? Should I just be waiting for an approval from each language team?

Yes I think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants