As it turns out, referring to the request object using `flask.request`
is not uncommon, and this meant restricting to `Name` nodes was too
strong. With the changes in this commit, we now include those
occurrences as well.
The `flask.request` global object is commonly used in request handlers
to access data in the active request. In our modelling, we handled this
by treating the initial (module-local) definition of `request` as a
source of remote flow. In practice this meant a lot of alerts would act
as if `from flask import request` was the ultimate "source" of remote
flow, and to find the actual request-handler-local instance of `request`
one would have to inspect the data-flow path between source and sink.
To improve this state of affairs, I have made the following changes to
the definition of `FlaskRequestSource`:
- We no longer consider `from flask import request` to be a source.
- Instead, we look at all places where that `request` value can flow,
and include only the ones that are `LocalSourceNode`s (so that inside a
request handler, the first occurrence of the `request` object is the
source).
In practice, this leads to alerts that are much easier to decipher.
Our current modelling only treated `psycopg2` insofar as it implemented
PEP 249 (which does not define any notion of connection pool), which
meant we were missing database connections that arose from such pools.
With these changes, we add support for the three classes relating to
database pools that are defined in `psycopg2`. (Note that
`getAnInstance` automatically looks at subclasses, which means this
should also handle cases where the user has defined a new subclass that
inherits from one of these three classes.)
Also fixes an issue with the return type annotations that caused these
to not work properly.
Currently, annotated assignments don't work properly, due to the fact
that our flow relation doesn't consider flow going to the "type" part of
an annotated assignment. This means that in `x : Foo`, we do correctly
note that `x` is annotated with `Foo`, but we have no idea what `Foo`
is, since it has no incoming flow.
To fix this we should probably just extend the flow relation, but this
may need to be done with some care, so I have left it as future work.
Adds support for tracking instances via type annotations. Also adds a
convenience method to the newly added `Annotation` class,
`getAnnotatedExpression`, that returns the expression that is annotated
with the given type. For return annotations this is any value returned
from the annotated function in question.
Co-authored-by: Napalys Klicius <napalys@github.com>
Fixes the false positive reported in
https://github.com/github/codeql/issues/18910
Adds a new `Annotation` class (subclass of `Expr`) which encompasses all
possible kinds of annotations in Python.
Using this, we look for string literals which are part of an annotation,
and which have the same content as the name of a (potentially) unused
global variable, and in that case we do not produce an alert.
In future, we may want to support inspecting such string literals more
deeply (e.g. to support stuff like "list[unused_var]"), but I think for
now this level of support is sufficient.
These seem generally useful outside of points-to, and so it might be
better to add them to the `Function` class instead.
I took the liberty of renaming these to say `Arguments` rather than
`Parameters`, as this is more in line with the nomenclature that we're
using elsewhere. (The internal points-to methods retain the old names.)
I'm somewhat ambivalent about the behaviour of `getMaxParameters` on
functions with `*varargs`. The hard-coded `INT_MAX` return value is
somewhat awkward, but the alternative (to only have the predicate
defined when a specific maximum exists) seems like it would potentially
cause a lot of headaches.