The problem with `tainted_filelike` not having taint, is that in the call
`ujson.dump(tainted_obj, tainted_filelike)`
there is no PostUpdateNote for `tainted_filelike` :( The reason is that
points-to is not able to resolve the call, so none of the clauses in
`argumentPreUpdateNode` matches
See 08731fc6cf/python/ql/src/semmle/python/dataflow/new/internal/DataFlowPrivate.qll (L101-L111)
Let's deal with that issue in an other PR though
I noticed that we don't handle PostUpdateNote very well in the concept tests,
for exmaple for `json.dump(...)` there _should_ have been an `encodeOutput` as
part of the inline expectations.
I'll work on fixing that up in a separate PR, to keep things clean.
After further research, it was discovered that Flask-Mongoengine has multiple ways of allowing a developer to call the Document class. One way is by directly importing the Document class from the module. Another approach is to get the Document class via a mongoengine instance.
The update to this query checks for cases where the developer gets the Document class via the MongoEngine instance.
Other misc changes include setting the various predicates to private.
A couple of notes with these changes:
- Added TypeTracker pattern to handle subscript expressions. We've found that pymongo supports subscripts expressions when calling databases and collections. To resolve this, we implemented the TypeTracker pattern to catch those subscripts since CodeQL Python API modeling doesn't support subscript expressions.
- After some research, we've discovered that MongoEngine and Flask-MongoEngine utilize MongoClient under-the-hood. This requires us to rewrite the query so that instead of querying these libraries with specific queries, we are instead going to query for usages of MongoClient since all of the libraries we are targeting utilizes MongoClient under-the-hood.
Limits the behaviour of github/codeql#5614 in two ways:
First, we only consider files that are contained in the source archive.
This prevents unnecessary computation involving files in e.g. the
standard library.
Secondly, we ignore any relative imports (e.g. `from .foo import ...`),
as these only work inside packages anyway.
This fixes an observed performance regression on projects that include
`google-cloud-sdk` as part of their source code.