Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby/Python/JS/Swift: Add category of Private information to shared sensitive data heuristics #16446

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
category: minorAnalysis
---
* Additional heuristics for a new sensitive data classification for private information (e.g. credit card numbers) have been added to the shared `SensitiveDataHeuristics.qll` library. This may result in additional results for queries that use sensitive data such as `js/clear-text-storage-sensitive-data` and `js/clear-text-logging`.
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,14 @@
* - id: a user name or other account information;
* - password: a password or authorization key;
* - certificate: a certificate.
* - private: private data such as credit card numbers
*
* While classifications are represented as strings, this should not be relied upon.
* Instead, use the predicates in `SensitiveDataClassification::` to work with
* classifications.
*/
class SensitiveDataClassification extends string {
SensitiveDataClassification() { this in ["secret", "id", "password", "certificate"] }
SensitiveDataClassification() { this in ["secret", "id", "password", "certificate", "private"] }
}

/**
Expand All @@ -38,6 +39,9 @@ module SensitiveDataClassification {

/** Gets the classification for certificates. */
SensitiveDataClassification certificate() { result = "certificate" }

/** Gets the classification for private data. */
SensitiveDataClassification private() { result = "private" }
}

/**
Expand Down Expand Up @@ -77,6 +81,40 @@ module HeuristicNames {
*/
string maybeCertificate() { result = "(?is).*(cert)(?!.*(format|name|ification)).*" }

/**
* Gets a regular expression that identifies strings that may indicate the presence of
* private data.
*/
string maybePrivate() {
result =
"(?is).*(" +
// Inspired by the list on https://cwe.mitre.org/data/definitions/359.html
// Government identifiers, such as Social Security Numbers
"social.?security|employer.?identification|national.?insurance|resident.?id|" +
"passport.?(num|no)|([_-]|\\b)ssn([_-]|\\b)|" +
// Contact information, such as home addresses
"post.?code|zip.?code|home.?addr|" +
// and telephone numbers
"(mob(ile)?|home).?(num|no|tel|phone)|(tel|fax|phone).?(num|no)|telephone|" +
"emergency.?contact|" +
// Geographic location - where the user is (or was)
"latitude|longitude|nationality|" +
// Financial data - such as credit card numbers, salary, bank accounts, and debts
"(credit|debit|bank|visa).?(card|num|no|acc(ou)?nt)|acc(ou)?nt.?(no|num|credit)|" +
"salary|billing|credit.?(rating|score)|([_-]|\\b)ccn([_-]|\\b)|" +
// Communications - e-mail addresses, private e-mail messages, SMS text messages, chat logs, etc.
// "e(mail|_mail)|" + // this seems too noisy
// Health - medical conditions, insurance status, prescription records
"birth.?da(te|y)|da(te|y).?(of.?)?birth|" +
"medical|(health|care).?plan|healthkit|appointment|prescription|" +
"blood.?(type|alcohol|glucose|pressure)|heart.?(rate|rhythm)|body.?(mass|fat)|" +
"menstrua|pregnan|insulin|inhaler|" +
// Relationships - work and family
"employ(er|ee)|spouse|maiden.?name" +
// ---
").*"
}

/**
* Gets a regular expression that identifies strings that may indicate the presence
* of sensitive data, with `classification` describing the kind of sensitive data involved.
Expand All @@ -90,6 +128,9 @@ module HeuristicNames {
or
result = maybeCertificate() and
classification = SensitiveDataClassification::certificate()
or
result = maybePrivate() and
classification = SensitiveDataClassification::private()
}

/**
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
category: minorAnalysis
---
* Additional heuristics for a new sensitive data classification for private information (e.g. credit card numbers) have been added to the shared `SensitiveDataHeuristics.qll` library. This may result in additional results for queries that use sensitive data such as `py/clear-text-storage-sensitive-data` and `py/clear-text-logging-sensitive-data`.
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,14 @@
* - id: a user name or other account information;
* - password: a password or authorization key;
* - certificate: a certificate.
* - private: private data such as credit card numbers
*
* While classifications are represented as strings, this should not be relied upon.
* Instead, use the predicates in `SensitiveDataClassification::` to work with
* classifications.
*/
class SensitiveDataClassification extends string {
SensitiveDataClassification() { this in ["secret", "id", "password", "certificate"] }
SensitiveDataClassification() { this in ["secret", "id", "password", "certificate", "private"] }
}

/**
Expand All @@ -38,6 +39,9 @@ module SensitiveDataClassification {

/** Gets the classification for certificates. */
SensitiveDataClassification certificate() { result = "certificate" }

/** Gets the classification for private data. */
SensitiveDataClassification private() { result = "private" }
}

/**
Expand Down Expand Up @@ -77,6 +81,40 @@ module HeuristicNames {
*/
string maybeCertificate() { result = "(?is).*(cert)(?!.*(format|name|ification)).*" }

/**
* Gets a regular expression that identifies strings that may indicate the presence of
* private data.
*/
string maybePrivate() {
result =
"(?is).*(" +
// Inspired by the list on https://cwe.mitre.org/data/definitions/359.html
// Government identifiers, such as Social Security Numbers
"social.?security|employer.?identification|national.?insurance|resident.?id|" +
"passport.?(num|no)|([_-]|\\b)ssn([_-]|\\b)|" +
// Contact information, such as home addresses
"post.?code|zip.?code|home.?addr|" +
// and telephone numbers
"(mob(ile)?|home).?(num|no|tel|phone)|(tel|fax|phone).?(num|no)|telephone|" +
"emergency.?contact|" +
// Geographic location - where the user is (or was)
"latitude|longitude|nationality|" +
// Financial data - such as credit card numbers, salary, bank accounts, and debts
"(credit|debit|bank|visa).?(card|num|no|acc(ou)?nt)|acc(ou)?nt.?(no|num|credit)|" +
"salary|billing|credit.?(rating|score)|([_-]|\\b)ccn([_-]|\\b)|" +
// Communications - e-mail addresses, private e-mail messages, SMS text messages, chat logs, etc.
// "e(mail|_mail)|" + // this seems too noisy
// Health - medical conditions, insurance status, prescription records
"birth.?da(te|y)|da(te|y).?(of.?)?birth|" +
"medical|(health|care).?plan|healthkit|appointment|prescription|" +
"blood.?(type|alcohol|glucose|pressure)|heart.?(rate|rhythm)|body.?(mass|fat)|" +
"menstrua|pregnan|insulin|inhaler|" +
// Relationships - work and family
"employ(er|ee)|spouse|maiden.?name" +
// ---
").*"
}

/**
* Gets a regular expression that identifies strings that may indicate the presence
* of sensitive data, with `classification` describing the kind of sensitive data involved.
Expand All @@ -90,6 +128,9 @@ module HeuristicNames {
or
result = maybeCertificate() and
classification = SensitiveDataClassification::certificate()
or
result = maybePrivate() and
classification = SensitiveDataClassification::private()
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,21 @@ edges
| test.py:19:16:19:29 | ControlFlowNode for get_password() | test.py:19:5:19:12 | ControlFlowNode for password | provenance | |
| test.py:44:5:44:5 | ControlFlowNode for x | test.py:45:11:45:11 | ControlFlowNode for x | provenance | |
| test.py:44:9:44:25 | ControlFlowNode for Attribute() | test.py:44:5:44:5 | ControlFlowNode for x | provenance | |
| test.py:70:5:70:10 | ControlFlowNode for config | test.py:74:11:74:31 | ControlFlowNode for Subscript | provenance | |
| test.py:72:21:72:37 | ControlFlowNode for Attribute | test.py:70:5:70:10 | ControlFlowNode for config | provenance | |
| test.py:48:14:48:35 | ControlFlowNode for social_security_number | test.py:49:15:49:36 | ControlFlowNode for social_security_number | provenance | |
| test.py:48:38:48:40 | ControlFlowNode for ssn | test.py:50:15:50:17 | ControlFlowNode for ssn | provenance | |
| test.py:48:54:48:63 | ControlFlowNode for passportNo | test.py:52:15:52:24 | ControlFlowNode for passportNo | provenance | |
| test.py:54:34:54:45 | ControlFlowNode for home_address | test.py:57:15:57:26 | ControlFlowNode for home_address | provenance | |
| test.py:59:14:59:26 | ControlFlowNode for user_latitude | test.py:60:15:60:27 | ControlFlowNode for user_latitude | provenance | |
| test.py:59:29:59:42 | ControlFlowNode for user_longitude | test.py:61:15:61:28 | ControlFlowNode for user_longitude | provenance | |
| test.py:63:14:63:26 | ControlFlowNode for mobile_number | test.py:64:15:64:27 | ControlFlowNode for mobile_number | provenance | |
| test.py:63:29:63:35 | ControlFlowNode for phoneNo | test.py:65:15:65:21 | ControlFlowNode for phoneNo | provenance | |
| test.py:67:14:67:23 | ControlFlowNode for creditcard | test.py:68:15:68:24 | ControlFlowNode for creditcard | provenance | |
| test.py:67:26:67:35 | ControlFlowNode for debit_card | test.py:69:15:69:24 | ControlFlowNode for debit_card | provenance | |
| test.py:67:38:67:48 | ControlFlowNode for bank_number | test.py:70:15:70:25 | ControlFlowNode for bank_number | provenance | |
| test.py:67:76:67:78 | ControlFlowNode for ccn | test.py:73:15:73:17 | ControlFlowNode for ccn | provenance | |
| test.py:67:81:67:88 | ControlFlowNode for user_ccn | test.py:74:15:74:22 | ControlFlowNode for user_ccn | provenance | |
| test.py:101:5:101:10 | ControlFlowNode for config | test.py:105:11:105:31 | ControlFlowNode for Subscript | provenance | |
| test.py:103:21:103:37 | ControlFlowNode for Attribute | test.py:101:5:101:10 | ControlFlowNode for config | provenance | |
nodes
| test.py:19:5:19:12 | ControlFlowNode for password | semmle.label | ControlFlowNode for password |
| test.py:19:16:19:29 | ControlFlowNode for get_password() | semmle.label | ControlFlowNode for get_password() |
Expand All @@ -24,9 +37,35 @@ nodes
| test.py:44:5:44:5 | ControlFlowNode for x | semmle.label | ControlFlowNode for x |
| test.py:44:9:44:25 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| test.py:45:11:45:11 | ControlFlowNode for x | semmle.label | ControlFlowNode for x |
| test.py:70:5:70:10 | ControlFlowNode for config | semmle.label | ControlFlowNode for config |
| test.py:72:21:72:37 | ControlFlowNode for Attribute | semmle.label | ControlFlowNode for Attribute |
| test.py:74:11:74:31 | ControlFlowNode for Subscript | semmle.label | ControlFlowNode for Subscript |
| test.py:48:14:48:35 | ControlFlowNode for social_security_number | semmle.label | ControlFlowNode for social_security_number |
| test.py:48:38:48:40 | ControlFlowNode for ssn | semmle.label | ControlFlowNode for ssn |
| test.py:48:54:48:63 | ControlFlowNode for passportNo | semmle.label | ControlFlowNode for passportNo |
| test.py:49:15:49:36 | ControlFlowNode for social_security_number | semmle.label | ControlFlowNode for social_security_number |
| test.py:50:15:50:17 | ControlFlowNode for ssn | semmle.label | ControlFlowNode for ssn |
| test.py:52:15:52:24 | ControlFlowNode for passportNo | semmle.label | ControlFlowNode for passportNo |
| test.py:54:34:54:45 | ControlFlowNode for home_address | semmle.label | ControlFlowNode for home_address |
| test.py:57:15:57:26 | ControlFlowNode for home_address | semmle.label | ControlFlowNode for home_address |
| test.py:59:14:59:26 | ControlFlowNode for user_latitude | semmle.label | ControlFlowNode for user_latitude |
| test.py:59:29:59:42 | ControlFlowNode for user_longitude | semmle.label | ControlFlowNode for user_longitude |
| test.py:60:15:60:27 | ControlFlowNode for user_latitude | semmle.label | ControlFlowNode for user_latitude |
| test.py:61:15:61:28 | ControlFlowNode for user_longitude | semmle.label | ControlFlowNode for user_longitude |
| test.py:63:14:63:26 | ControlFlowNode for mobile_number | semmle.label | ControlFlowNode for mobile_number |
| test.py:63:29:63:35 | ControlFlowNode for phoneNo | semmle.label | ControlFlowNode for phoneNo |
| test.py:64:15:64:27 | ControlFlowNode for mobile_number | semmle.label | ControlFlowNode for mobile_number |
| test.py:65:15:65:21 | ControlFlowNode for phoneNo | semmle.label | ControlFlowNode for phoneNo |
| test.py:67:14:67:23 | ControlFlowNode for creditcard | semmle.label | ControlFlowNode for creditcard |
| test.py:67:26:67:35 | ControlFlowNode for debit_card | semmle.label | ControlFlowNode for debit_card |
| test.py:67:38:67:48 | ControlFlowNode for bank_number | semmle.label | ControlFlowNode for bank_number |
| test.py:67:76:67:78 | ControlFlowNode for ccn | semmle.label | ControlFlowNode for ccn |
| test.py:67:81:67:88 | ControlFlowNode for user_ccn | semmle.label | ControlFlowNode for user_ccn |
| test.py:68:15:68:24 | ControlFlowNode for creditcard | semmle.label | ControlFlowNode for creditcard |
| test.py:69:15:69:24 | ControlFlowNode for debit_card | semmle.label | ControlFlowNode for debit_card |
| test.py:70:15:70:25 | ControlFlowNode for bank_number | semmle.label | ControlFlowNode for bank_number |
| test.py:73:15:73:17 | ControlFlowNode for ccn | semmle.label | ControlFlowNode for ccn |
| test.py:74:15:74:22 | ControlFlowNode for user_ccn | semmle.label | ControlFlowNode for user_ccn |
| test.py:101:5:101:10 | ControlFlowNode for config | semmle.label | ControlFlowNode for config |
| test.py:103:21:103:37 | ControlFlowNode for Attribute | semmle.label | ControlFlowNode for Attribute |
| test.py:105:11:105:31 | ControlFlowNode for Subscript | semmle.label | ControlFlowNode for Subscript |
subpaths
#select
| test.py:20:48:20:55 | ControlFlowNode for password | test.py:19:16:19:29 | ControlFlowNode for get_password() | test.py:20:48:20:55 | ControlFlowNode for password | This expression logs $@ as clear text. | test.py:19:16:19:29 | ControlFlowNode for get_password() | sensitive data (password) |
Expand All @@ -39,4 +78,17 @@ subpaths
| test.py:39:22:39:35 | ControlFlowNode for get_password() | test.py:39:22:39:35 | ControlFlowNode for get_password() | test.py:39:22:39:35 | ControlFlowNode for get_password() | This expression logs $@ as clear text. | test.py:39:22:39:35 | ControlFlowNode for get_password() | sensitive data (password) |
| test.py:40:22:40:35 | ControlFlowNode for get_password() | test.py:40:22:40:35 | ControlFlowNode for get_password() | test.py:40:22:40:35 | ControlFlowNode for get_password() | This expression logs $@ as clear text. | test.py:40:22:40:35 | ControlFlowNode for get_password() | sensitive data (password) |
| test.py:45:11:45:11 | ControlFlowNode for x | test.py:44:9:44:25 | ControlFlowNode for Attribute() | test.py:45:11:45:11 | ControlFlowNode for x | This expression logs $@ as clear text. | test.py:44:9:44:25 | ControlFlowNode for Attribute() | sensitive data (password) |
| test.py:74:11:74:31 | ControlFlowNode for Subscript | test.py:72:21:72:37 | ControlFlowNode for Attribute | test.py:74:11:74:31 | ControlFlowNode for Subscript | This expression logs $@ as clear text. | test.py:72:21:72:37 | ControlFlowNode for Attribute | sensitive data (password) |
| test.py:49:15:49:36 | ControlFlowNode for social_security_number | test.py:48:14:48:35 | ControlFlowNode for social_security_number | test.py:49:15:49:36 | ControlFlowNode for social_security_number | This expression logs $@ as clear text. | test.py:48:14:48:35 | ControlFlowNode for social_security_number | sensitive data (private) |
| test.py:50:15:50:17 | ControlFlowNode for ssn | test.py:48:38:48:40 | ControlFlowNode for ssn | test.py:50:15:50:17 | ControlFlowNode for ssn | This expression logs $@ as clear text. | test.py:48:38:48:40 | ControlFlowNode for ssn | sensitive data (private) |
| test.py:52:15:52:24 | ControlFlowNode for passportNo | test.py:48:54:48:63 | ControlFlowNode for passportNo | test.py:52:15:52:24 | ControlFlowNode for passportNo | This expression logs $@ as clear text. | test.py:48:54:48:63 | ControlFlowNode for passportNo | sensitive data (private) |
| test.py:57:15:57:26 | ControlFlowNode for home_address | test.py:54:34:54:45 | ControlFlowNode for home_address | test.py:57:15:57:26 | ControlFlowNode for home_address | This expression logs $@ as clear text. | test.py:54:34:54:45 | ControlFlowNode for home_address | sensitive data (private) |
| test.py:60:15:60:27 | ControlFlowNode for user_latitude | test.py:59:14:59:26 | ControlFlowNode for user_latitude | test.py:60:15:60:27 | ControlFlowNode for user_latitude | This expression logs $@ as clear text. | test.py:59:14:59:26 | ControlFlowNode for user_latitude | sensitive data (private) |
| test.py:61:15:61:28 | ControlFlowNode for user_longitude | test.py:59:29:59:42 | ControlFlowNode for user_longitude | test.py:61:15:61:28 | ControlFlowNode for user_longitude | This expression logs $@ as clear text. | test.py:59:29:59:42 | ControlFlowNode for user_longitude | sensitive data (private) |
| test.py:64:15:64:27 | ControlFlowNode for mobile_number | test.py:63:14:63:26 | ControlFlowNode for mobile_number | test.py:64:15:64:27 | ControlFlowNode for mobile_number | This expression logs $@ as clear text. | test.py:63:14:63:26 | ControlFlowNode for mobile_number | sensitive data (private) |
| test.py:65:15:65:21 | ControlFlowNode for phoneNo | test.py:63:29:63:35 | ControlFlowNode for phoneNo | test.py:65:15:65:21 | ControlFlowNode for phoneNo | This expression logs $@ as clear text. | test.py:63:29:63:35 | ControlFlowNode for phoneNo | sensitive data (private) |
| test.py:68:15:68:24 | ControlFlowNode for creditcard | test.py:67:14:67:23 | ControlFlowNode for creditcard | test.py:68:15:68:24 | ControlFlowNode for creditcard | This expression logs $@ as clear text. | test.py:67:14:67:23 | ControlFlowNode for creditcard | sensitive data (private) |
| test.py:69:15:69:24 | ControlFlowNode for debit_card | test.py:67:26:67:35 | ControlFlowNode for debit_card | test.py:69:15:69:24 | ControlFlowNode for debit_card | This expression logs $@ as clear text. | test.py:67:26:67:35 | ControlFlowNode for debit_card | sensitive data (private) |
| test.py:70:15:70:25 | ControlFlowNode for bank_number | test.py:67:38:67:48 | ControlFlowNode for bank_number | test.py:70:15:70:25 | ControlFlowNode for bank_number | This expression logs $@ as clear text. | test.py:67:38:67:48 | ControlFlowNode for bank_number | sensitive data (private) |
| test.py:73:15:73:17 | ControlFlowNode for ccn | test.py:67:76:67:78 | ControlFlowNode for ccn | test.py:73:15:73:17 | ControlFlowNode for ccn | This expression logs $@ as clear text. | test.py:67:76:67:78 | ControlFlowNode for ccn | sensitive data (private) |
| test.py:74:15:74:22 | ControlFlowNode for user_ccn | test.py:67:81:67:88 | ControlFlowNode for user_ccn | test.py:74:15:74:22 | ControlFlowNode for user_ccn | This expression logs $@ as clear text. | test.py:67:81:67:88 | ControlFlowNode for user_ccn | sensitive data (private) |
| test.py:105:11:105:31 | ControlFlowNode for Subscript | test.py:103:21:103:37 | ControlFlowNode for Attribute | test.py:105:11:105:31 | ControlFlowNode for Subscript | This expression logs $@ as clear text. | test.py:103:21:103:37 | ControlFlowNode for Attribute | sensitive data (password) |