To ensure that this query works against numerous usages of libraries such as PyMongo, Flask PyMongo, Mongoengine, and Flask Mongoengine, I've added a variety of query tests to test against. These tests deal with scenarious such as:
- Subscript expressions
- Mongoengine instances and Document subclasses
- Mongoengine connection usage
- And more...
After further research, it was discovered that Flask-Mongoengine has multiple ways of allowing a developer to call the Document class. One way is by directly importing the Document class from the module. Another approach is to get the Document class via a mongoengine instance.
The update to this query checks for cases where the developer gets the Document class via the MongoEngine instance.
Other misc changes include setting the various predicates to private.
A couple of notes with these changes:
- Added TypeTracker pattern to handle subscript expressions. We've found that pymongo supports subscripts expressions when calling databases and collections. To resolve this, we implemented the TypeTracker pattern to catch those subscripts since CodeQL Python API modeling doesn't support subscript expressions.
- After some research, we've discovered that MongoEngine and Flask-MongoEngine utilize MongoClient under-the-hood. This requires us to rewrite the query so that instead of querying these libraries with specific queries, we are instead going to query for usages of MongoClient since all of the libraries we are targeting utilizes MongoClient under-the-hood.
I had comitted a bad .expected file it seems, and since the encoding for UTF-8
is named differently from Python 2 to Python 3, we're only going to run the test
for one version.
Limits the behaviour of github/codeql#5614 in two ways:
First, we only consider files that are contained in the source archive.
This prevents unnecessary computation involving files in e.g. the
standard library.
Secondly, we ignore any relative imports (e.g. `from .foo import ...`),
as these only work inside packages anyway.
This fixes an observed performance regression on projects that include
`google-cloud-sdk` as part of their source code.