An example (provided by @redsun82) is the string `f"{x:=^20}"`. Parsing
this (with unnamed nodes shown) illustrates the problem:
```
module [0, 0] - [2, 0]
expression_statement [0, 0] - [0, 11]
string [0, 0] - [0, 11]
string_start [0, 0] - [0, 2]
interpolation [0, 2] - [0, 10]
"{" [0, 2] - [0, 3]
expression: named_expression [0, 3] - [0, 9]
name: identifier [0, 3] - [0, 4]
":=" [0, 4] - [0, 6]
ERROR [0, 6] - [0, 7]
"^" [0, 6] - [0, 7]
value: integer [0, 7] - [0, 9]
"}" [0, 9] - [0, 10]
string_end [0, 10] - [0, 11]
```
Observe that we've managed to combine the format specifier token `:` and
the fill character `=` in a single token (which doesn't match the `:` we
expect in the grammar rule), and hence we get a syntax error.
If we change the `=` to some other character (e.g. a `-`), we instead
get
```
module [0, 0] - [2, 0]
expression_statement [0, 0] - [0, 11]
string [0, 0] - [0, 11]
string_start [0, 0] - [0, 2]
interpolation [0, 2] - [0, 10]
"{" [0, 2] - [0, 3]
expression: identifier [0, 3] - [0, 4]
format_specifier: format_specifier [0, 4] - [0, 9]
":" [0, 4] - [0, 5]
"}" [0, 9] - [0, 10]
string_end [0, 10] - [0, 11]
```
and in particular no syntax error.
To fix this, we want to ensure that the `:` is lexed on its own, and the
`token(prec(1, ...))` construction can be used to do exactly this.
Finally, you may wonder why `=` is special here. I think what's going on
is that the lexer knows that `:=` is a token on its own (because it's
used in the walrus operator), and so it greedily consumes the following
`=` with this in mind.
This pull request introduces a new CodeQL query for detecting prompt injection vulnerabilities in Python code targeting AI prompting APIs such as agents and openai. The changes includes a new experimental query, new taint flow and type models, a customizable dataflow configuration, documentation, and comprehensive test coverage.
See https://peps.python.org/pep-0758/ for more details.
We implement this by extending the syntax for exceptions and exception
groups so that the `type` field can now contain either an expression
(which matches the old behaviour), or a comma-separated list of at least
two elements (representing the new behaviour).
We model the latter case using a new node type `exception_list`, which
in `tsg-python` is simply mapped to a tuple. This means it matches the
existing behaviour (when the tuple is surrounded by parentheses)
exactly, hence we don't need to change any other code.
As a consequence of this, however, we cannot directly parse the Python
2.7 syntax `except Foo, e: ...` as `except Foo as e: ...`, as this would
introduce an ambiguity in the grammar. Thus, we have removed support for
the (deprecated) 2.7-style syntax, and only allow `as` to indicate
binding of the exception. The syntax `except Foo, e: ...` continues to
be parsed (in particular, it's not suddenly a syntax error), but it will
be parsed as if it were `except (Foo, e): ...`, which may not give the
correct results.
In principle we could extend the QL libraries to account for this case
(specifically when analysing Python 2 code). In practice, however, I
expect this to have a minor impact on results, and not worth the
additional investment at this time.