codeql

mirror of https://github.com/github/codeql.git synced 2026-02-12 05:01:06 +01:00

Author	SHA1	Message	Date
Taus	bac356c9a1	Python: Regenerate parser files	2026-02-05 13:46:59 +00:00
Taus	68c1a3d389	Python: Fix syntax error when `=` is used as a format fill character An example (provided by @redsun82) is the string `f"{x:=^20}"`. Parsing this (with unnamed nodes shown) illustrates the problem: ``` module [0, 0] - [2, 0] expression_statement [0, 0] - [0, 11] string [0, 0] - [0, 11] string_start [0, 0] - [0, 2] interpolation [0, 2] - [0, 10] "{" [0, 2] - [0, 3] expression: named_expression [0, 3] - [0, 9] name: identifier [0, 3] - [0, 4] ":=" [0, 4] - [0, 6] ERROR [0, 6] - [0, 7] "^" [0, 6] - [0, 7] value: integer [0, 7] - [0, 9] "}" [0, 9] - [0, 10] string_end [0, 10] - [0, 11] ``` Observe that we've managed to combine the format specifier token `:` and the fill character `=` in a single token (which doesn't match the `:` we expect in the grammar rule), and hence we get a syntax error. If we change the `=` to some other character (e.g. a `-`), we instead get ``` module [0, 0] - [2, 0] expression_statement [0, 0] - [0, 11] string [0, 0] - [0, 11] string_start [0, 0] - [0, 2] interpolation [0, 2] - [0, 10] "{" [0, 2] - [0, 3] expression: identifier [0, 3] - [0, 4] format_specifier: format_specifier [0, 4] - [0, 9] ":" [0, 4] - [0, 5] "}" [0, 9] - [0, 10] string_end [0, 10] - [0, 11] ``` and in particular no syntax error. To fix this, we want to ensure that the `:` is lexed on its own, and the `token(prec(1, ...))` construction can be used to do exactly this. Finally, you may wonder why `=` is special here. I think what's going on is that the lexer knows that `:=` is a token on its own (because it's used in the walrus operator), and so it greedily consumes the following `=` with this in mind.	2026-02-05 13:45:54 +00:00
Taus	659ec3999b	Mark generated files as generated	2026-01-12 15:24:01 +00:00
Taus	4db60df9dd	Python: Regenerate parser files	2026-01-06 13:40:38 +00:00
Taus	2380bfd459	Python: Add support for PEP-758 exception syntax See https://peps.python.org/pep-0758/ for more details. We implement this by extending the syntax for exceptions and exception groups so that the `type` field can now contain either an expression (which matches the old behaviour), or a comma-separated list of at least two elements (representing the new behaviour). We model the latter case using a new node type `exception_list`, which in `tsg-python` is simply mapped to a tuple. This means it matches the existing behaviour (when the tuple is surrounded by parentheses) exactly, hence we don't need to change any other code. As a consequence of this, however, we cannot directly parse the Python 2.7 syntax `except Foo, e: ...` as `except Foo as e: ...`, as this would introduce an ambiguity in the grammar. Thus, we have removed support for the (deprecated) 2.7-style syntax, and only allow `as` to indicate binding of the exception. The syntax `except Foo, e: ...` continues to be parsed (in particular, it's not suddenly a syntax error), but it will be parsed as if it were `except (Foo, e): ...`, which may not give the correct results. In principle we could extend the QL libraries to account for this case (specifically when analysing Python 2 code). In practice, however, I expect this to have a minor impact on results, and not worth the additional investment at this time.	2026-01-06 13:40:37 +00:00
Taus	28e733e335	Python: Support template strings in rest of extractor Adds three new AST nodes to the mix: - `TemplateString` represents a t-string in Python 3.14 - `TemplateStringPart` represents one of the string constituents of a t-string. (The interpolated expressions are represented as `Expr` nodes, just like f-strings.) - `JoinedTemplateString` represents an implicit concatenation of template strings. Importantly, we _completely avoid_ the complicated construction we currently do for format strings (as well as the confusing nomenclature). No extra injection of empty strings (so that a template string is a strict alternation of strings and expressions). A `JoinedTemplateString` simply has a list of template string children, and a `TemplateString` has a list of "values" which may be either `Expr` or `TemplateStringPart` nodes. If we ever find that we actually want the more complicated interface for these strings, then I would much rather we reconstruct this inside of QL rather than in the parser.	2025-12-16 23:57:58 +01:00
Taus	cd7ae34380	Python: Regenerate parser files	2025-12-16 23:57:58 +01:00
Taus	7768ebe8b8	Python: Add parser support for template strings - Extends the scanner with a new token kind representing the start of a template string. This is used to distinguish template strings from regular strings (because only a template string will start with a `_template_string_start` external token). - Cleans up the logic surrounding interpolations (and the method names) so that format strings and template strings behave the same in this case. Finally, we add two new node types in the tree-sitter grammar: - `template_string` behaves like format strings, but is a distinct type (mainly so that an implicit concatenation between template strings and regular strings becomes a syntax error). - `concatenated_template_string` is the counterpart of `concatenated_string`. However, internally, the string parts of a template strings are just the same `string_content` nodes that are used in regular format strings. We will disambiguate these inside `tsg-python`.	2025-12-16 23:57:58 +01:00
Taus	f5a06bef4a	Merge pull request #19929 from github/tausbn/python-update-tree-sitter-dependency Python: Update `tree-sitter` dependency	2025-09-17 13:40:13 +02:00
Arthur Baars	5d3ec35e29	Remove non-breaking spaces from code	2025-09-05 09:41:15 +02:00
Taus	13a93c7e32	Python: Add suggestions from Copilot	2025-09-03 11:55:49 +00:00
Taus	235822d782	Python: Improve handling of syntax errors Rather than relying on matching arbitrary nodes inside tree-sitter-graph and then checking whether they are of type ERROR or MISSING (which seems to have stopped working in later versions of tree-sitter), we now explicitly go through the tree-sitter tree, locating all of the error and missing nodes along the way. We then add these on to the graph output in the same format as was previously produced by tree-sitter-graph. Note that it's very likely that some of the syntax errors will move around a bit as a consequence of this change. In general, we don't expect syntax errors to have stable locations, as small changes in the grammar can cause an error to appear in a different position, even if the underlying (erroneous) code has not changed.	2025-09-02 12:41:57 +00:00
Taus	76f15a890c	Python: Update `tree-sitter` dependency Updates the Python extractor to depend on version 0.24.7 of tree-sitter (and 0.12.0 of tree-sitter-graph). A few changes were needed in order to make the code build and run after updating the dependencies: - In `main.rs`, the `Language` parameter is now passed as a reference. - In `python.tsg`, many queries had captures that were not actually used in the body of the stanza. This is no longer allowed (unless the captures start with an underscore), as it may indicate an error. To fix this, I added underscores in the appropriate places (and verified that none of these unused captures were in fact bugs).	2025-09-02 12:40:20 +00:00
Taus	ad53518644	Python: Regenerate parser files	2025-06-26 15:34:44 +00:00
Taus	e04821e9e3	Python: Allow use of `match` as an identifier This previously only worked in certain circumstances. In particular, assignments such as `match[1] = ...` or even just `match[1]` would fail to parse correctly. Fixing this turned out to be less trivial than anticipated. Consider the fact that ``` match [1]: case (...) ``` can either look the start of a `match` statement, or it could be a type ascription, ascribing the value of `case(...)` (a call) to the item at index 1 of `match`. To fix this, then, we give `match` the identifier and `match` the statement the same precendence in the grammar, and additionally also mark a conflict between `match_statement` and `primary_expression`. This causes the conflict to be resolved dynamically, and seems to do the right thing in all cases.	2025-06-26 15:33:00 +00:00
Paolo Tranquilli	1bcc6ddb32	Rust/Ruby/Python: apply clippy lints	2025-02-25 13:21:28 +01:00
Paolo Tranquilli	6089a75262	Rust/Ruby/Python: format code	2025-02-25 13:19:03 +01:00
Paolo Tranquilli	e8799e346d	Rust/Python: fix edition-related errors	2025-02-25 13:16:58 +01:00
Paolo Tranquilli	eff87d24fa	Rust/Ruby/Python: update rustc and edition	2025-02-25 13:15:19 +01:00
Paolo Tranquilli	38efd4a8a2	Python: downgrade `tree-sitter` back to `0.20.4`	2025-02-18 10:03:18 +01:00
Paolo Tranquilli	342bff6125	Python: undo tree-sitter update	2025-02-17 15:52:45 +01:00
Paolo Tranquilli	91b3d108bb	Python: upgrade cargo dependencies This required some code changes because of some breaking changes in `clap` and `tree-sitter`. Also needed to assign a new bazel repo name to the `crates_vendor` to avoid name conflicts in `MODULE.bazel`.	2025-02-17 10:56:36 +01:00
Paolo Tranquilli	cc939e64fd	Python: fix bazel rule	2025-02-07 14:42:26 +01:00
Taus	7124e80f28	Python: Regenerate parser files	2025-02-06 14:05:40 +00:00
Taus	c5be2a3e2d	Python: Allow comments in subscripts Once again, the interaction between anchors and extras (specifically comments) was causing trouble. The root of the problem was the fact that in `a[b]`, we put `b` in the `index` field of the subscript node, whereas in `a[b,c]`, we additionally synthesize a `Tuple` node for `b,c` (which matches the Python AST). To fix this, we refactored the grammar slightly so as to make that tuple explicit, such that a subscript node either contains a single expression or the newly added tuple node. This greatly simplifies the logic.	2025-02-06 14:04:57 +00:00
Cornelius Riemenschneider	a66f8209f9	Rust: Vendor 3rdparty dependencies. We've been observing some performance issues using crate_universe on CI. Therefore, we're moving to vendor the auto-generated BUILD files in our repository. This should provide a nice speed boost, while getting rid of the complexity of the "rust cache" job we've been using when we had a lot of git dependencies. This PR includes a vendor script, and I'll put up a CI job internally that runs that vendor script on Cargo.toml and Cargo.lock changes, to check that the vendored files are in sync.	2024-11-13 13:22:14 +01:00
Taus	2892f0ff48	Merge pull request #17873 from github/tausbn/python-fix-generator-expression-locations Python: Even more parser fixes	2024-11-01 12:47:19 +01:00
Taus	f75615b913	Merge pull request #17822 from github/tausbn/python-more-parser-fixes Python: A few more parser fixes	2024-10-30 13:47:10 +01:00
Taus	5d6600e61f	Python: Fix generator expression locations Our logic for detecting the first and last item in a generator expression was faulty, sometimes matching comments as well. Because attributes (like `_location_start`) can only be written once, this caused `tree-sitter-graph` to get unhappy. To fix this, we now require the first item to be an `expression`, and the last one to be either a `for_in_clause` or an `if_clause`. Crucially, `comment` is neither of these, and this prevents the unfortunate overlap.	2024-10-28 14:53:09 +00:00
Taus	ef60b730ea	Python: Fix parenthesized tuple parser bug We were writing the `parenthesised` attribute twice on tuples, once because of the explicit parenthetisation, and once because all non-empty tuples are parenthesised. This made `tree-sitter-graph` unhappy. To fix this, we now explicitly check whether a tuple is already parenthesised, and do nothing if that is the case.	2024-10-28 14:49:45 +00:00
Taus	b4ecc7937d	Python: Fix some more `async` parsing problems Turns out we were not setting the `is_async` field on anything except `async for` statements. This commit makes it so that we also do this for `async def` and `async with`, and adds a test that this produces the same behaviour as the old parser.	2024-10-28 14:44:02 +00:00
Taus	e710c0a6bf	Python: Regenerate parser files	2024-10-28 14:44:01 +00:00
Taus	ac87868097	Python: Fix parsing of `await` inside expressions Found when parsing `Lib/test/test_coroutines.py` using the new parser. For whatever reason, having `await` be an `expression` (with an argument of the same kind) resulted in a bad parse. Consulting the official grammar, we see that `await` should actually be a `primary_expression` instead. This is also more in line with the other unary operators, whose precedence is shared by the `await` syntax.	2024-10-28 14:44:01 +00:00
Taus	1e51703ce9	Python: Allow escaped quotes/backslashes in raw strings Quoting the Python documentation (last paragraph of https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences): "Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes)." We did not handle this correctly in the scanner, as we only consumed the backslash but not the following single or double quote, resulting in that character getting interpreted as the end of the string. To fix this, we do a second lookahead after consuming the backslash, and if the next character is the end character for the string, we advance the lexer across it as well. Similarly, backslashes in raw strings can escape other backslashes. Thus, for a string like '\\' we must consume the second backslash, otherwise we'll interpret it as escaping the end quote.	2024-10-28 14:40:24 +00:00
Taus	5db601af3c	Python: Allow comments in comprehensions A somewhat complicated solution that necessitated adding a new custom function to `tsg-python`. See the comments in `python.tsg` for why this was necessary.	2024-10-23 14:24:47 +00:00
Taus	4f60494019	Python: Support assignments of the form `[x,y,z] = w` Surprisingly, the new parser did not support these constructs (and the relevant test was missing this case), so on files that required the new parser we were unable to parse this construct. To fix it, we add `list_pattern` (not to be confused with `pattern_list`) as a `tree-sitter-python` node that results in a `List` node in the AST.	2024-10-22 16:06:35 +00:00
Taus	89ea4b8200	Python: Regenerate parser files	2024-10-22 15:39:41 +00:00
Taus	9c913902c5	Python: Allow `except` to be written as `except ` Turns out, `except` is actually not a token on its own according to the Python grammar. This means it's legal to write `except foo: ...`, which we previously would consider a syntax error. To fix it, we simply break up the `except*` into two separate tokens.	2024-10-22 15:39:29 +00:00
Taus	7ceefb509b	Python: Regenerate parser files	2024-10-22 15:17:34 +00:00
Taus	8053e0ed44	Python: Allow `list_splat`s as type annotations That is, the `T` in `def foo(args : *T): ...`. This is apparently a piece of syntax we did not support correctly until now. In terms of the grammar, we simply add `list_splat` as a possible alternative for `type` (which could previously only be an `expression`). We also update `python.tsg` to not specify `expression` those places (as the relevant stanzas will then not work for `list_splat`s). This syntax is not supported by the old parser, hence we only add a new parser test for it.	2024-10-22 15:17:12 +00:00
Taus	1cd04c96c7	Python: Fix bug in handling of `kwargs` in class bases This caused a dataset check error on the `python/cpython` database, as we had a `DictUnpacking` node whose parent was not a `dict_item_list`, but rather an `expr_list`. Investigating a bit further revealed that this was because in a construction like ```python class C[T](base, foo=bar, kwargs): ... ``` we were mistakenly adding `kwargs` to the same list as `base` (which is just a list of expressions), rather than the same list as `foo=bar` (which is a list of dictionary items) The ultimate cause of this was the use of `! name` in `python.tsg` to distinguish between bases and keyword arguments (only the latter of which have the `name` field). Because `dictionary_splat` doesn't have a `name` field either, these were mistakenly put in the wrong list, leading to the error. Also, because our previous test of `class` statements did not include a `kwargs` construction, we were not checking that the new parser behaved correctly in this case. For the most part this was not a problem, but on files that use syntax not supported by the old parser (like type parameters on classes), this became an issue. This is also why we did not see this error previously. To fix this, we added `! value` (which is a field present on `dictionary_splat` nodes) as a secondary filter, and added a third stanza to handle `dictionary_splat` nodes.	2024-10-21 15:35:47 +00:00
Taus	55ee3eb36b	Python: Add TSG support for type defaults	2024-10-15 11:22:31 +00:00
Taus	6545bfffa7	Python: Regenerate parser files Two new files -- alloc.h and array.h -- suddenly appeared. Presumably they are used by the somewhat newer version of tree-sitter. To be safe, I included them in this commit.	2024-10-15 11:22:31 +00:00
Taus	882249ef82	Python: Add grammar support for type defaults Also fixes an oversight in the grammar: starred expressions should be allowed inside the subscript of an `Index` expression.	2024-10-15 11:22:30 +00:00
Cornelius Riemenschneider	092bc6445d	Rust/bazel: Port to bzlmod. This gets rid of our last workspace dependency. In particular, this change also gets rid of the checked-in extra lock files that took forever to generate.	2024-06-10 17:03:58 +02:00
Tom Hvitved	386bc1eb03	Bazel: repin	2024-05-24 13:53:55 +02:00
Tom Hvitved	7490472772	Update Python to use Rust 1.74	2024-05-24 13:05:39 +02:00
Tom Hvitved	158dafa7d0	Python: Dummy change to trigger CI	2024-05-21 11:25:21 +02:00
Paolo Tranquilli	9f5782b67b	Bazel: introduce buildifier formatting This introduces tooling and enforcement for formatting bazel files. The tooling is provided as a bazel run target from [keith/buildifier-prebuilt](https://github.com/keith/buildifier-prebuilt). This is used in a [`pre-commit`](https://pre-commit.com/) hook for those having that installed. In turn this is used in a CI check. Relying on a `pre-commit` action gives us easy checking that buildifying did not change anything in the files and printing the diff, without having to hand-roll the check ourselves. This enforcement will make usage of gazelle easier, as gazelle itself might reformat files, even outside of `go`. Having them properly formatted will allow gazelle to leave them unchanged, without needing to configure awkward exclude directives.	2024-04-24 15:49:48 +02:00
Taus	752d28c1b9	Python: Update repinning instructions This aligns us better with the corresponding instructions for the Ruby extractor.	2024-04-05 12:30:40 +02:00

1 2

57 Commits