This frees `Class.qll`, `Exprs.qll`, and `Function.qll` from the
clutches of points-to. For the somewhat complicated setup with
`getLiteralObject` (an abstract method), I opted for a slightly ugly but
workable solution of just defining a predicate on `ImmutableLiteral`
that inlines each predicate body, special-cased to the specific instance
to which it applies.
Moves the existing points-to predicates to the newly added class
`ControlFlowNodeWithPointsTo` which resides in the `LegacyPointsTo`
module.
(Existing code that uses these predicates should import this module, and
references to `ControlFlowNode` should be changed to
`ControlFlowNodeWithPointsTo`.)
Also updates all existing points-to based code to do just this.
Also fixes an issue with the return type annotations that caused these
to not work properly.
Currently, annotated assignments don't work properly, due to the fact
that our flow relation doesn't consider flow going to the "type" part of
an annotated assignment. This means that in `x : Foo`, we do correctly
note that `x` is annotated with `Foo`, but we have no idea what `Foo`
is, since it has no incoming flow.
To fix this we should probably just extend the flow relation, but this
may need to be done with some care, so I have left it as future work.
Adds support for tracking instances via type annotations. Also adds a
convenience method to the newly added `Annotation` class,
`getAnnotatedExpression`, that returns the expression that is annotated
with the given type. For return annotations this is any value returned
from the annotated function in question.
Co-authored-by: Napalys Klicius <napalys@github.com>
Fixes the false positive reported in
https://github.com/github/codeql/issues/18910
Adds a new `Annotation` class (subclass of `Expr`) which encompasses all
possible kinds of annotations in Python.
Using this, we look for string literals which are part of an annotation,
and which have the same content as the name of a (potentially) unused
global variable, and in that case we do not produce an alert.
In future, we may want to support inspecting such string literals more
deeply (e.g. to support stuff like "list[unused_var]"), but I think for
now this level of support is sufficient.
Does a few things:
- Renames `StrConst` to `StringLiteral`, and deprecates the former.
- Also deprecates `Str`.
- Adds an override of `StringLiteral::toString` making it output
`"StringLiteral"` rather than the inherited `"Str"`. This ensures that
the AST viewer shows these nodes as the former type, not the latter.
There are a large number of uses of `StrConst` in the codebase. These
will be fixed in a later commit.
Before:
```
Tuple counts for Exprs::Call::getAKeyword_dispred#ff#antijoin_rhs/3@7bc202ij after 9s:
1 ~0% {1} r1 = CONSTANT(unique int)[2]
4244385 ~2% {1} r2 = JOIN r1 WITH py_dict_items_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0'
4244352 ~3% {3} r3 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg1', Lhs.0 'arg0', Rhs.2 'arg2'
66618690 ~3% {5} r4 = JOIN r3 WITH AstGenerated::Call_::getNamedArg_dispred#ffb ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.0 'arg1', Lhs.2 'arg2', Rhs.1, Rhs.2
31187133 ~0% {5} r5 = SELECT r4 ON In.3 < In.2 'arg2'
31187133 ~1% {5} r6 = SCAN r5 OUTPUT In.4, 0, In.0 'arg0', In.1 'arg1', In.2 'arg2'
0 ~0% {3} r7 = JOIN r6 WITH py_dict_items ON FIRST 2 OUTPUT Lhs.2 'arg0', Lhs.3 'arg1', Lhs.4 'arg2'
return r7
Tuple counts for Exprs::Call::getAKeyword_dispred#ff/2@1dc9468b after 421ms:
1 ~0% {1} r1 = CONSTANT(unique int)[2]
4244385 ~2% {1} r2 = JOIN r1 WITH py_dict_items_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result'
4244352 ~0% {3} r3 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Lhs.0 'result', Rhs.1 'this', Rhs.2
4244352 ~0% {3} r4 = r3 AND NOT Exprs::Call::getAKeyword_dispred#ff#antijoin_rhs(Lhs.0 'result', Lhs.1 'this', Lhs.2)
4244352 ~6% {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'result'
return r5
```
Oof. All that work to produce zero tuples. Luckily we can improve
matters somewhat.
Basically, there's no reason to test _all_ dictionary unpackings, since
we're only interested in a lower bound. Thus, we can use `min` instead
which is much more efficient. For convenience I factored this into its
own (private) helper predicate.
Now the tuple counts look as follows:
```
Tuple counts for Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_range/2@39b0e9sm after 1ms:
246 ~0% {2} r1 = JOIN Keywords::DictUnpackingOrKeyword#class#f#shared WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg1'
return r1
Registering Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_range/2@39b0e9sm + with content 9ea2f123k8necpu015v6tpsc2t1
>>> Created relation Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_range/2@39b0e9sm with 246 rows.
Starting to evaluate predicate Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_term/3@9f4ca5g8
Tuple counts for Exprs::Call::getMinimumUnpackingIndex_dispred#ff#min_term/3@9f4ca5g8 after 0ms:
246 ~2% {3} r1 = JOIN Keywords::DictUnpackingOrKeyword#class#f#shared WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg2', Rhs.2 'arg2'
return r1
Tuple counts for Exprs::Call::getAKeyword_dispred#ff/2@000a0alb after 906ms:
1 ~0% {1} r1 = CONSTANT(unique int)[2]
4244385 ~2% {1} r2 = JOIN r1 WITH py_dict_items_10#join_rhs ON FIRST 1 OUTPUT Rhs.1 'result'
4244352 ~0% {3} r3 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Lhs.0 'result', Rhs.1 'this', Rhs.2
4244280 ~0% {3} r4 = r3 AND NOT Exprs::Call::getMinimumUnpackingIndex_dispred#ff_0#antijoin_rhs(Lhs.1 'this')
4244280 ~6% {2} r5 = SCAN r4 OUTPUT In.1 'this', In.0 'result'
4244352 ~3% {3} r6 = JOIN r2 WITH AstGenerated::Call_::getNamedArg_dispred#ffb_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'this', Lhs.0 'result', Rhs.2
72 ~4% {4} r7 = JOIN r6 WITH Exprs::Call::getMinimumUnpackingIndex_dispred#ff ON FIRST 1 OUTPUT Lhs.1 'result', Lhs.0 'this', Lhs.2, Rhs.1
72 ~4% {4} r8 = SELECT r7 ON In.2 <= In.3
72 ~0% {2} r9 = SCAN r8 OUTPUT In.1 'this', In.0 'result'
4244352 ~6% {2} r10 = r5 UNION r9
return r10
```
This is not the perfect join order (note the similarity between `r3`
and `r6`) but overall it's a win.
- new syntactic category `Pattern` (in `Patterns.qll`)
- subpatterns available on statments
- new statements `MatchStmt` and `Case`
(`Match` would conflict with the shared ReDoS library)
- new expression `Guard`
- support for pattern lists