This one is a bit more involved. Of note is the fact that it at present
only uses local flow when determining the origin of some value (whereas
the points-to version used global flow). It may be desirable to rewrite
this query to use global data-flow, but this should be done with some
care (as using "all unhashable objects" as the set of sources is
somewhat iffy with respect to performance). For that reason, I'm
sticking to mostly local flow (except for well behaved things like types
and built-ins).
For this one we actually lose a test result. However, this is kind of to
be expected since we no longer have the "precise" MRO that the points-to
analysis computes.
Honestly, I'm on the fence about even keeping this query at all. It
seems like it might be superfluous in a world with good Python type
checking.
Same comment as for the preceding commit. We lose one test result due to
the fact that we don't know what to do about `for ... in 1` (because `1`
is an instance of a built-in). I'm going to defer addressing this until
we get some modelling of built-in types.
Uses the new `DuckTyping` module to handle recognising whether a class
is a container or not. Only trivial test changes (one version uses
"class", the other "Class").
Note that the ported query has no understanding of built-in classes. At
some point we'll likely want to replace `hasUnresolvedBase` (which will
hold for any class that extends a built-in) with something that's aware
of the built-in classes.
Removes the use of points-to for accessing various built-ins from three
of the queries. In order for this to work I had to extend the lists of
known built-ins slightly.
The ones that no longer require points-to no longer import
`LegacyPointsTo`. The ones that do use the specific
`...MetricsWithPointsTo` classes that are applicable.
Fixes the test failures that arose from making `ExtractedArgumentNode`
local.
For the consistency checks, we now explicitly exclude the
`ExtractedArgumentNode`s (now much more plentiful due to the
overapproximation) that don't have a corresponding `getCallArg` tuple.
For various queries/tests using `instanceof ArgumentNode`, we instead us
`isArgumentNode`, which explicitly filters out the ones for which
`isArgumentOf` doesn't hold (which, again, is the case for most of the
nodes in the overapproximation).
This pull request introduces a new CodeQL query for detecting prompt injection vulnerabilities in Python code targeting AI prompting APIs such as agents and openai. The changes includes a new experimental query, new taint flow and type models, a customizable dataflow configuration, documentation, and comprehensive test coverage.
See https://docs.python.org/3/library/compression.zstd.html for
information about this library.
As far as I can tell, the `zstd` library is not vulnerable to things
like ZipSlip, but it _could_ be vulnerable to a decompression bomb
attack, so I extended those models accordingly.
In hindsight, having a `.getMetrics()` method that just returns `this`
is somewhat weird. It's possible that it predates the existence of the
inline cast, however.
Fixed 73 .ql query files where the @name metadata contained an ending period.
This ensures consistency with the CodeQL query metadata style guidelines.
For whatever reason, the CFG node for exceptions and exception groups
was placed with the points-to code. (Probably because a lot of the
predicates depended on points-to.)
However, as it turned out, two of the SSA modules only depended on
non-points-to properties of these nodes, and so it was fairly
straightforward to remove the imports of `LegacyPointsTo` for those
modules.
In the process, I moved the aforementioned CFG node types into
`Flow.qll`, and changed the classes in the `Exceptions` module to the
`...WithPointsTo` form that we introduced elsewhere.