mirror of
https://github.com/github/codeql.git
synced 2025-12-17 01:03:14 +01:00
Python: Fix a bug in glob conversion
If you have a filter like `**/foo/**` set in the `paths-ignore` bit of your config file, then currently the following happens: - First, the CodeQL CLI observes that this string ends in `/**` and strips off the `**` leaving `**/foo/` - Then the Python extractor strips off leading and trailing `/` characters and proceeds to convert `**/foo` into a regex that is matched against files to (potentially) extract. The trouble with this is that it leaves us unable to distinguish between, say, a file `foo.py` and a file `foo/bar.py`. In other words, we have lost the ability to exclude only the _folder_ `foo` and not any files that happen to start with `foo`. To fix this, we instead make a note of whether the glob ends in a forward slash or not, and adjust the regex correspondingly.
This commit is contained in:
@@ -41,6 +41,9 @@ def glob_part_to_regex(glob, add_sep):
|
||||
|
||||
def glob_to_regex(glob, prefix=""):
|
||||
'''Convert entire glob to a compiled regex'''
|
||||
# When the glob ends in `/`, we need to remember this so that we don't accidentally add an
|
||||
# extra separator to the final regex.
|
||||
end_sep = "" if glob.endswith("/") else SEP
|
||||
glob = glob.strip().strip("/")
|
||||
parts = glob.split("/")
|
||||
#Trailing '**' is redundant, so strip it off.
|
||||
@@ -53,7 +56,7 @@ def glob_to_regex(glob, prefix=""):
|
||||
# something like `C:\\folder\\subfolder\\` and without escaping the
|
||||
# backslash-path-separators will get interpreted as regex escapes (which might be
|
||||
# invalid sequences, causing the extractor to crash)
|
||||
full_pattern = escape(prefix) + ''.join(parts) + "(?:" + SEP + ".*|$)"
|
||||
full_pattern = escape(prefix) + ''.join(parts) + "(?:" + end_sep + ".*|$)"
|
||||
return re.compile(full_pattern)
|
||||
|
||||
def filter_from_pattern(pattern, prev_filter, prefix):
|
||||
|
||||
Reference in New Issue
Block a user