16 Commits

Author SHA1 Message Date
Taus
f5a06bef4a Merge pull request #19929 from github/tausbn/python-update-tree-sitter-dependency
Python: Update `tree-sitter` dependency
2025-09-17 13:40:13 +02:00
Arthur Baars
5d3ec35e29 Remove non-breaking spaces from code 2025-09-05 09:41:15 +02:00
Taus
235822d782 Python: Improve handling of syntax errors
Rather than relying on matching arbitrary nodes inside tree-sitter-graph
and then checking whether they are of type ERROR or MISSING (which seems
to have stopped working in later versions of tree-sitter), we now
explicitly go through the tree-sitter tree, locating all of the error
and missing nodes along the way. We then add these on to the graph
output in the same format as was previously produced by
tree-sitter-graph.

Note that it's very likely that some of the syntax errors will move
around a bit as a consequence of this change. In general, we don't
expect syntax errors to have stable locations, as small changes in the
grammar can cause an error to appear in a different position, even if
the underlying (erroneous) code has not changed.
2025-09-02 12:41:57 +00:00
Taus
76f15a890c Python: Update tree-sitter dependency
Updates the Python extractor to depend on version 0.24.7 of tree-sitter
(and 0.12.0 of tree-sitter-graph).

A few changes were needed in order to make the code build and run after
updating the dependencies:

- In `main.rs`, the `Language` parameter is now passed as a reference.
- In `python.tsg`, many queries had captures that were not actually used
in the body of the stanza. This is no longer allowed (unless the
captures start with an underscore), as it may indicate an error. To fix
this, I added underscores in the appropriate places (and verified that
none of these unused captures were in fact bugs).
2025-09-02 12:40:20 +00:00
Taus
c5be2a3e2d Python: Allow comments in subscripts
Once again, the interaction between anchors and extras (specifically
comments) was causing trouble.

The root of the problem was the fact that in `a[b]`, we put `b` in the
`index` field of the subscript node, whereas in `a[b,c]`, we
additionally synthesize a `Tuple` node for `b,c` (which matches the
Python AST).

To fix this, we refactored the grammar slightly so as to make that tuple
explicit, such that a subscript node either contains a single expression
or the newly added tuple node. This greatly simplifies the logic.
2025-02-06 14:04:57 +00:00
Taus
2892f0ff48 Merge pull request #17873 from github/tausbn/python-fix-generator-expression-locations
Python: Even more parser fixes
2024-11-01 12:47:19 +01:00
Taus
f75615b913 Merge pull request #17822 from github/tausbn/python-more-parser-fixes
Python: A few more parser fixes
2024-10-30 13:47:10 +01:00
Taus
5d6600e61f Python: Fix generator expression locations
Our logic for detecting the first and last item in a generator
expression was faulty, sometimes matching comments as well. Because
attributes (like `_location_start`) can only be written once, this
caused `tree-sitter-graph` to get unhappy.

To fix this, we now require the first item to be an `expression`, and
the last one to be either a `for_in_clause` or an `if_clause`.
Crucially, `comment` is neither of these, and this prevents the
unfortunate overlap.
2024-10-28 14:53:09 +00:00
Taus
ef60b730ea Python: Fix parenthesized tuple parser bug
We were writing the `parenthesised` attribute twice on tuples, once
because of the explicit parenthetisation, and once because all non-empty
tuples are parenthesised. This made `tree-sitter-graph` unhappy.

To fix this, we now explicitly check whether a tuple is already
parenthesised, and do nothing if that is the case.
2024-10-28 14:49:45 +00:00
Taus
b4ecc7937d Python: Fix some more async parsing problems
Turns out we were not setting the `is_async` field on anything except
`async for` statements. This commit makes it so that we also do this for
`async def` and `async with`, and adds a test that this produces the
same behaviour as the old parser.
2024-10-28 14:44:02 +00:00
Taus
5db601af3c Python: Allow comments in comprehensions
A somewhat complicated solution that necessitated adding a new custom
function to `tsg-python`. See the comments in `python.tsg` for why this
was necessary.
2024-10-23 14:24:47 +00:00
Taus
4f60494019 Python: Support assignments of the form [x,y,z] = w
Surprisingly, the new parser did not support these constructs (and the
relevant test was missing this case), so on files that required the new
parser we were unable to parse this construct.

To fix it, we add `list_pattern` (not to be confused with
`pattern_list`) as a `tree-sitter-python` node that results in a `List`
node in the AST.
2024-10-22 16:06:35 +00:00
Taus
8053e0ed44 Python: Allow list_splats as type annotations
That is, the `*T` in `def foo(*args : *T): ...`.

This is apparently a piece of syntax we did not support correctly until
now.

In terms of the grammar, we simply add `list_splat` as a possible
alternative for `type` (which could previously only be an `expression`).
We also update `python.tsg` to not specify `expression` those places (as
the relevant stanzas will then not work for `list_splat`s).

This syntax is not supported by the old parser, hence we only add a new
parser test for it.
2024-10-22 15:17:12 +00:00
Taus
1cd04c96c7 Python: Fix bug in handling of **kwargs in class bases
This caused a dataset check error on the `python/cpython` database, as
we had a `DictUnpacking` node whose parent was not a `dict_item_list`,
but rather an `expr_list`.

Investigating a bit further revealed that this was because in a
construction like

```python
class C[T](base, foo=bar, **kwargs): ...
```
we were mistakenly adding `**kwargs` to the same list as `base` (which
is just a list of expressions), rather than the same list as `foo=bar`
(which is a list of dictionary items)

The ultimate cause of this was the use of `! name` in `python.tsg` to
distinguish between bases and keyword arguments (only the latter of
which have the `name` field). Because `dictionary_splat` doesn't have a
`name` field either, these were mistakenly put in the wrong list,
leading to the error.

Also, because our previous test of `class` statements did not include a
`**kwargs` construction, we were not checking that the new parser
behaved correctly in this case. For the most part this was not a
problem, but on files that use syntax not supported by the old parser
(like type parameters on classes), this became an issue. This is also
why we did not see this error previously.

To fix this, we added `! value` (which is a field present on
`dictionary_splat` nodes) as a secondary filter, and added a third
stanza to handle `dictionary_splat` nodes.
2024-10-21 15:35:47 +00:00
Taus
55ee3eb36b Python: Add TSG support for type defaults 2024-10-15 11:22:31 +00:00
Taus
6dec323cfc Python: Copy Python extractor to codeql repo 2024-03-07 13:59:16 +00:00