As we're no longer tracking tuples across function boundaries, we lose
the result that related to this setup (which, as the preceding commit
explains, lead to a lot of false positives).
Removes the dependence on points-to in favour of an approach based on
(local) data-flow.
I first tried a version that used type tracking, as this more accurately
mimics the behaviour of the old query. However, I soon discovered that
there were _many_ false positives in this setup. The main bad pattern I
saw was a helper function somewhere deep inside the code that both
receives and returns an argument that can be tuples with different sizes
and origins. In this case, global flow produces something akin to a
cartesian product of "n-tuples that flow into the function" and
"m-tuples that flow into the function" where m < n.
To combat this, I decided to instead focus on only flow _within_ a given
function (and so local data-flow was sufficient).
Additionally, another class of false positives I saw was cases where the
return type actually witnessed that the function in question could
return tuples of varying sizes. In this case it seems reasonable to not
flag these instances, since they are already (presumably) being checked
by a type checker.
More generally, if you've annotated the return type of the function with
anything (not just `Tuple[...]`), then there's probably little need to
flag it.
Technically we still depend on points-to in that we still mention
`PythonFunctionValue` and `ClassValue` in the query. However, we
immediately move to working with the corresponding `Function` and
`Class` AST nodes, and so we're not really using points-to. (The reason
for doing things this way is that otherwise the `.toString()` for all of
the alerts would change, which would make the diff hard to interpret.
This way, it should be fairly simple to see which changes are actually
relevant.)
We do lose some precision when moving away from points-to, and this is
reflected in the changes in the `.expected` file. In particular we no
longer do complicated tracking of values, but rather look at the
syntactic structure of the classes in question. This causes us to lose
out on some results where a special method is defined elsewhere, and
causes a single FP where a special method initially has the wrong
signature, but is subsequently overwritten with a function with the
correct signature.
We also lose out on results having to do with default values, as these
are now disabled.
Finally, it was necessary to add special handling of methods marked with
the `staticmethod` decorator, as these expect to receive fewer
arguments. This was motivated by a MRVA run, where e.g. sympy showed a
lot of examples along the lines of
```
@staticmethod
def __abs__():
return ...
```
Adds a new boolean parameter `is_unused_default` that indicates whether
the given result is one where a parameter to a special method has a
default value (which will never be used when invoked in the normal way).
These results are somewhat less useful (because the special method
_might_ be invoked directly, in which case the default value would still
be relevant), but it seemed like a shame to simply remove the code, so
instead I opted to disable it in this way.
Moves a bunch of `owner.declaredAttribute(name) = f` instances to the
top level, in the process greatly cleaning up the code. The behaviour
should be the unchanged.
Having done this, there's only one place where we depend on points-to,
and that's in the remaining `declaredAttribute` call. This should
greatly simplify the move away from points to.
Fixes the false positive reported in
https://github.com/github/codeql/issues/18910
Adds a new `Annotation` class (subclass of `Expr`) which encompasses all
possible kinds of annotations in Python.
Using this, we look for string literals which are part of an annotation,
and which have the same content as the name of a (potentially) unused
global variable, and in that case we do not produce an alert.
In future, we may want to support inspecting such string literals more
deeply (e.g. to support stuff like "list[unused_var]"), but I think for
now this level of support is sufficient.
Adds two test cases having to do with type annotations. The first one
demonstrates that type annotations (even if they are never executed by
the Python interpreter) count as uses for the purposes of the unused
variable query. The second one demonstrates that this is _not_ the case
if all such uses are inside strings (i.e. forward declarations), as we
do not currently inspect the content of these strings.
These seem generally useful outside of points-to, and so it might be
better to add them to the `Function` class instead.
I took the liberty of renaming these to say `Arguments` rather than
`Parameters`, as this is more in line with the nomenclature that we're
using elsewhere. (The internal points-to methods retain the old names.)
I'm somewhat ambivalent about the behaviour of `getMaxParameters` on
functions with `*varargs`. The hard-coded `INT_MAX` return value is
somewhat awkward, but the alternative (to only have the predicate
defined when a specific maximum exists) seems like it would potentially
cause a lot of headaches.