A couple of notes with these changes:
- Added TypeTracker pattern to handle subscript expressions. We've found that pymongo supports subscripts expressions when calling databases and collections. To resolve this, we implemented the TypeTracker pattern to catch those subscripts since CodeQL Python API modeling doesn't support subscript expressions.
- After some research, we've discovered that MongoEngine and Flask-MongoEngine utilize MongoClient under-the-hood. This requires us to rewrite the query so that instead of querying these libraries with specific queries, we are instead going to query for usages of MongoClient since all of the libraries we are targeting utilizes MongoClient under-the-hood.
ObjectId is a sanitizer used to sanitize strings into valid MongoDB ids. During research we've found that this method is used.
ObjectId returns a string representing an id. If at any time ObjectId can't parse it's input (like when a tainted dict in passed in), then ObjectId will throw an error preventing the query from running.
After some research, we discovered that any keyword argument passed to the objects method will result in NoSQL injection. This includes scenarios where we have the following:
objects(name_of_model_attribute=unsanitized_user_input)
One of those cases where I _wish_ `pragma[inline]` also meant "don't
join on the stuff inside this predicate -- it's inlined for a reason".
Unsurprisingly, joining on the scope first works poorly.
Many instances of `lookup` are restricted by the presence of
`attributeRequired`, but this does not work well if we join on
`name`. A few instances of `only_bind_into` prevents this.
This has no effect on the current compilation (indeed,
`ssa_filter_definition_bool` is not currently inlined), but will
prevent this from ever occurring, should the heuristics for inlining
ever change...