The previous version was tested on a version of the code where we had
temporarily removed the `glob.strip("/")` bit, and so the bug didn't
trigger then.
We now correctly remember if the glob ends in `/`, and add an extra part
in that case. This way, if the path ends with multiple slashes, they
effectively get consolidated into a single one, which results in the
correct semantics.
If you have a filter like `**/foo/**` set in the `paths-ignore` bit of
your config file, then currently the following happens:
- First, the CodeQL CLI observes that this string ends in `/**` and
strips off the `**` leaving `**/foo/`
- Then the Python extractor strips off leading and trailing `/`
characters and proceeds to convert `**/foo` into a regex that is
matched against files to (potentially) extract.
The trouble with this is that it leaves us unable to distinguish
between, say, a file `foo.py` and a file `foo/bar.py`. In other words,
we have lost the ability to exclude only the _folder_ `foo` and not any
files that happen to start with `foo`.
To fix this, we instead make a note of whether the glob ends in a
forward slash or not, and adjust the regex correspondingly.
Changes the default behaviour of the Python extractor so files inside
hidden directories are extracted by default.
Also adds an extractor option, `skip_hidden_directories`, which can be
set to `true` in order to revert to the old behaviour.
Finally, I made the logic surrounding what is logged in various cases a
bit more obvious.
Technically this changes the behaviour of the extractor (in that hidden
excluded files will now be logged as `(excluded)`, but I think this
makes more sense anyway.
Adds a comment explaining why we no longer flag the indirect tuple
example.
Also adds a test case which _would_ be flagged if not for the type
annotation.
As we're no longer tracking tuples across function boundaries, we lose
the result that related to this setup (which, as the preceding commit
explains, lead to a lot of false positives).
Removes the dependence on points-to in favour of an approach based on
(local) data-flow.
I first tried a version that used type tracking, as this more accurately
mimics the behaviour of the old query. However, I soon discovered that
there were _many_ false positives in this setup. The main bad pattern I
saw was a helper function somewhere deep inside the code that both
receives and returns an argument that can be tuples with different sizes
and origins. In this case, global flow produces something akin to a
cartesian product of "n-tuples that flow into the function" and
"m-tuples that flow into the function" where m < n.
To combat this, I decided to instead focus on only flow _within_ a given
function (and so local data-flow was sufficient).
Additionally, another class of false positives I saw was cases where the
return type actually witnessed that the function in question could
return tuples of varying sizes. In this case it seems reasonable to not
flag these instances, since they are already (presumably) being checked
by a type checker.
More generally, if you've annotated the return type of the function with
anything (not just `Tuple[...]`), then there's probably little need to
flag it.