See https://peps.python.org/pep-0758/ for more details.
We implement this by extending the syntax for exceptions and exception
groups so that the `type` field can now contain either an expression
(which matches the old behaviour), or a comma-separated list of at least
two elements (representing the new behaviour).
We model the latter case using a new node type `exception_list`, which
in `tsg-python` is simply mapped to a tuple. This means it matches the
existing behaviour (when the tuple is surrounded by parentheses)
exactly, hence we don't need to change any other code.
As a consequence of this, however, we cannot directly parse the Python
2.7 syntax `except Foo, e: ...` as `except Foo as e: ...`, as this would
introduce an ambiguity in the grammar. Thus, we have removed support for
the (deprecated) 2.7-style syntax, and only allow `as` to indicate
binding of the exception. The syntax `except Foo, e: ...` continues to
be parsed (in particular, it's not suddenly a syntax error), but it will
be parsed as if it were `except (Foo, e): ...`, which may not give the
correct results.
In principle we could extend the QL libraries to account for this case
(specifically when analysing Python 2 code). In practice, however, I
expect this to have a minor impact on results, and not worth the
additional investment at this time.
Adds three new AST nodes to the mix:
- `TemplateString` represents a t-string in Python 3.14
- `TemplateStringPart` represents one of the string constituents of a
t-string. (The interpolated expressions are represented as `Expr` nodes,
just like f-strings.)
- `JoinedTemplateString` represents an implicit concatenation of
template strings.
Importantly, we _completely avoid_ the complicated construction we
currently do for format strings (as well as the confusing nomenclature).
No extra injection of empty strings (so that a template string is a
strict alternation of strings and expressions). A `JoinedTemplateString`
simply has a list of template string children, and a `TemplateString`
has a list of "values" which may be either `Expr` or
`TemplateStringPart` nodes.
If we ever find that we actually want the more complicated interface for
these strings, then I would much rather we reconstruct this inside of QL
rather than in the parser.
- Extends the scanner with a new token kind representing the start of a
template string. This is used to distinguish template strings from
regular strings (because only a template string will start with a
`_template_string_start` external token).
- Cleans up the logic surrounding interpolations (and the method names)
so that format strings and template strings behave the same in this
case.
Finally, we add two new node types in the tree-sitter grammar:
- `template_string` behaves like format strings, but is a distinct type
(mainly so that an implicit concatenation between template strings and
regular strings becomes a syntax error).
- `concatenated_template_string` is the counterpart of
`concatenated_string`.
However, internally, the string parts of a template strings are just the
same `string_content` nodes that are used in regular format strings. We
will disambiguate these inside `tsg-python`.
This makes things a bit cleaner.
After this, the only non-private (and non-`LegacyPointsTo`) imports of
`semmle.python.{types,objects,pointsto}.*` are in
`semmle.python.objects.ObjectInternal`, which is reasonable, as that is
the entry point for the entire internal object API.
In hindsight, having a `.getMetrics()` method that just returns `this`
is somewhat weird. It's possible that it predates the existence of the
inline cast, however.
Gets rid of the `getMetrics` methods on the `Function`, `Class`, and
`Module` classes. To access the metrics, one must first import the
`LegacyPointsTo` module, and then either change the type to
`{Function,Class,Module}Metrics` or cast to the appropriate type.
Fixed 73 .ql query files where the @name metadata contained an ending period.
This ensures consistency with the CodeQL query metadata style guidelines.
The `Builtins` module is deeply entwined with points-to, so it would be
nice to not have this dependence. Happily, the only thing we used
`Builtin` for was to get the names of known builtins, and for this we
already maintain such a set of names in
`dataflow.new.internal.Builtins`.
For whatever reason, the CFG node for exceptions and exception groups
was placed with the points-to code. (Probably because a lot of the
predicates depended on points-to.)
However, as it turned out, two of the SSA modules only depended on
non-points-to properties of these nodes, and so it was fairly
straightforward to remove the imports of `LegacyPointsTo` for those
modules.
In the process, I moved the aforementioned CFG node types into
`Flow.qll`, and changed the classes in the `Exceptions` module to the
`...WithPointsTo` form that we introduced elsewhere.
Turns out the `ImportTime` module (despite living in
`semmle.python.types` does not actually depend on points-to, so some of
the `LegacyPointsTo` imports could be replaced or removed.