I am very tempted to leave out the constants, or at the very least
`False`, `True`, and `None`, as these have _many_ occurrences in the
average codebase, and are not terribly useful at the API-graph level.
If we really do want to capture "nodes that refer to such and such
constant", then I think a better solution would be to create classes
extending `DataFlow::Node` to facilitate this.
Which I had done locally. Problem is the same about not having PostUpdateNode
when points-to is not able to resolve the call, so I'm happy to just make CI
happy right now, and hopefully we'll get a fix to the underlying problem soon 😊
The problem with `tainted_filelike` not having taint, is that in the call
`ujson.dump(tainted_obj, tainted_filelike)`
there is no PostUpdateNote for `tainted_filelike` :( The reason is that
points-to is not able to resolve the call, so none of the clauses in
`argumentPreUpdateNode` matches
See 08731fc6cf/python/ql/src/semmle/python/dataflow/new/internal/DataFlowPrivate.qll (L101-L111)
Let's deal with that issue in an other PR though
I noticed that we don't handle PostUpdateNote very well in the concept tests,
for exmaple for `json.dump(...)` there _should_ have been an `encodeOutput` as
part of the inline expectations.
I'll work on fixing that up in a separate PR, to keep things clean.
I had comitted a bad .expected file it seems, and since the encoding for UTF-8
is named differently from Python 2 to Python 3, we're only going to run the test
for one version.
Limits the behaviour of github/codeql#5614 in two ways:
First, we only consider files that are contained in the source archive.
This prevents unnecessary computation involving files in e.g. the
standard library.
Secondly, we ignore any relative imports (e.g. `from .foo import ...`),
as these only work inside packages anyway.
This fixes an observed performance regression on projects that include
`google-cloud-sdk` as part of their source code.