Turns out, `except*` is actually not a token on its own according to the
Python grammar. This means it's legal to write `except *foo: ...`, which
we previously would consider a syntax error.
To fix it, we simply break up the `except*` into two separate tokens.
That is, the `*T` in `def foo(*args : *T): ...`.
This is apparently a piece of syntax we did not support correctly until
now.
In terms of the grammar, we simply add `list_splat` as a possible
alternative for `type` (which could previously only be an `expression`).
We also update `python.tsg` to not specify `expression` those places (as
the relevant stanzas will then not work for `list_splat`s).
This syntax is not supported by the old parser, hence we only add a new
parser test for it.
This is primarily useful for ensuring that errors where a node does not
have an appropriate context set in `python.tsg` actually have an effect
on the pass/fail status of the parser tests. Previously, these would
just be logged to stdout, but test could still succeed when there were
errors present.
Also fixes one of the logging lines in `tsg_parser.py` to be more
consistent with the others.
We need these to return the dominator instead of declaring it in the parameter list, so that we can use it directly to fulfill part of the signature for the SSA library.
We can't rewrite it with an inline predicate since the SSA module calls with a transitive closure '*', which does not permit inline predicates.
On Windows, make's path resolution algorithm is incorrect.
It picks up a bazel.exe in PATH that's _after_ a bazel binary.
In particular, on actions, the non-exe binary is a bazelisk
instance, whereas bazel.exe is a bazel (at the current time 7.3.2)
installation.
This means we pick up the wrong bazel version, and
if the differences between the bazel we want and that we actually
get are too big, the build fails.
This caused a dataset check error on the `python/cpython` database, as
we had a `DictUnpacking` node whose parent was not a `dict_item_list`,
but rather an `expr_list`.
Investigating a bit further revealed that this was because in a
construction like
```python
class C[T](base, foo=bar, **kwargs): ...
```
we were mistakenly adding `**kwargs` to the same list as `base` (which
is just a list of expressions), rather than the same list as `foo=bar`
(which is a list of dictionary items)
The ultimate cause of this was the use of `! name` in `python.tsg` to
distinguish between bases and keyword arguments (only the latter of
which have the `name` field). Because `dictionary_splat` doesn't have a
`name` field either, these were mistakenly put in the wrong list,
leading to the error.
Also, because our previous test of `class` statements did not include a
`**kwargs` construction, we were not checking that the new parser
behaved correctly in this case. For the most part this was not a
problem, but on files that use syntax not supported by the old parser
(like type parameters on classes), this became an issue. This is also
why we did not see this error previously.
To fix this, we added `! value` (which is a field present on
`dictionary_splat` nodes) as a secondary filter, and added a third
stanza to handle `dictionary_splat` nodes.
Here's an example of one of these errors:
```
INVALID_KEY predicate py_cobjectnames(@py_cobject obj, string name)
The key set {obj} does not functionally determine all fields. Here is a
pair of tuples that agree on the key set but differ at index 1: Tuple 1
in row 63874: (72088,"u'<X>'") Tuple 2 in row 63875: (72088,"u'<?>'")
```
(Here, the substring `X` should really be the Unicode character U+FFFD,
but for some reason I'm not allowed to put that in this commit message.)
Inside the extractor, we assign IDs based on the string type (bytestring
or Unicode) and a hash of the UTF-8 encoded content of the string. In
this case, however, certain _different_ strings were receiving the same
hash, due to replacement characters in the encoding process.
In particular, we were converting unencodable characters to question
marks in one place, and to U+FFFD in another place. This caused a
discrepancy that lead to the dataset check error.
To fix this, we put in a custom error handler that always puts the
U+FFFD character in place of unencodable characters. With this, the
strings now agree, and hence there is no clash.