Python: ignore common words (certain/concert) as sensitive source

This commit is contained in:
Rasmus Wriedt Larsen
2022-06-22 11:02:00 +02:00
parent abdcfd55c3
commit 5dc2bb717a
2 changed files with 7 additions and 3 deletions

View File

@@ -96,10 +96,14 @@ module HeuristicNames {
* Gets a regular expression that identifies strings that may indicate the presence of data * Gets a regular expression that identifies strings that may indicate the presence of data
* that is hashed or encrypted, and hence rendered non-sensitive, or contains special characters * that is hashed or encrypted, and hence rendered non-sensitive, or contains special characters
* suggesting nouns within the string do not represent the meaning of the whole string (e.g. a URL or a SQL query). * suggesting nouns within the string do not represent the meaning of the whole string (e.g. a URL or a SQL query).
*
* We also filter out common words like `certain` and `concert`, since otherwise these could
* be matched by the certificate regular expressions. Same for `accountable` (account), or
* `secretarial` (secret).
*/ */
string notSensitiveRegexp() { string notSensitiveRegexp() {
result = result =
"(?is).*([^\\w$.-]|redact|censor|obfuscate|hash|md5|sha|random|((?<!un)(en))?(crypt|code)).*" "(?is).*([^\\w$.-]|redact|censor|obfuscate|hash|md5|sha|random|((?<!un)(en))?(crypt|code)|certain|concert|secretar|accountant|accountab).*"
} }
/** /**

View File

@@ -58,8 +58,8 @@ def my_func(password): # $ SensitiveDataSource=password
# FP where the `cert` in `uncertainty` makes us treat it like a certificate # FP where the `cert` in `uncertainty` makes us treat it like a certificate
# https://github.com/github/codeql/issues/9632 # https://github.com/github/codeql/issues/9632
def my_other_func(uncertainty): # $ SPURIOUS: SensitiveDataSource=certificate def my_other_func(uncertainty):
print(uncertainty) # $ SPURIOUS: SensitiveUse=certificate print(uncertainty)
password = some_function() # $ SensitiveDataSource=password password = some_function() # $ SensitiveDataSource=password
print(password) # $ SensitiveUse=password print(password) # $ SensitiveUse=password