mirror of
https://github.com/github/codeql.git
synced 2026-02-12 21:21:16 +01:00
An example (provided by @redsun82) is the string `f"{x:=^20}"`. Parsing
this (with unnamed nodes shown) illustrates the problem:
```
module [0, 0] - [2, 0]
expression_statement [0, 0] - [0, 11]
string [0, 0] - [0, 11]
string_start [0, 0] - [0, 2]
interpolation [0, 2] - [0, 10]
"{" [0, 2] - [0, 3]
expression: named_expression [0, 3] - [0, 9]
name: identifier [0, 3] - [0, 4]
":=" [0, 4] - [0, 6]
ERROR [0, 6] - [0, 7]
"^" [0, 6] - [0, 7]
value: integer [0, 7] - [0, 9]
"}" [0, 9] - [0, 10]
string_end [0, 10] - [0, 11]
```
Observe that we've managed to combine the format specifier token `:` and
the fill character `=` in a single token (which doesn't match the `:` we
expect in the grammar rule), and hence we get a syntax error.
If we change the `=` to some other character (e.g. a `-`), we instead
get
```
module [0, 0] - [2, 0]
expression_statement [0, 0] - [0, 11]
string [0, 0] - [0, 11]
string_start [0, 0] - [0, 2]
interpolation [0, 2] - [0, 10]
"{" [0, 2] - [0, 3]
expression: identifier [0, 3] - [0, 4]
format_specifier: format_specifier [0, 4] - [0, 9]
":" [0, 4] - [0, 5]
"}" [0, 9] - [0, 10]
string_end [0, 10] - [0, 11]
```
and in particular no syntax error.
To fix this, we want to ensure that the `:` is lexed on its own, and the
`token(prec(1, ...))` construction can be used to do exactly this.
Finally, you may wonder why `=` is special here. I think what's going on
is that the lexer knows that `:=` is a token on its own (because it's
used in the walrus operator), and so it greedily consumes the following
`=` with this in mind.