codeql

mirror of https://github.com/github/codeql.git synced 2026-05-14 19:29:28 +02:00

Author	SHA1	Message	Date
Paolo Tranquilli	ee13ea0f6b	Harden `_relative_path` for Windows and mixed-form inputs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 11:35:02 +02:00
Paolo Tranquilli	d28792537b	Python extractor: use relative paths in diagnostic locations Diagnostic `Location.file` fields contained absolute filesystem paths, causing the GitHub UI to generate broken file links with runner paths like `/home/runner/work/...`. Now paths are relativized against the source root (`LGTM_SRC` or cwd), falling back to absolute if the file is outside the source root. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 10:32:05 +02:00
Taus	6c675fcede	Python: Consolidate duplicated code	2026-04-16 21:14:42 +00:00
Taus	fc5b3562c3	Python: Add parser test for comprehensions with unpacking	2026-04-14 13:27:31 +02:00
Taus	90b64616f7	Python: Also fix `(value, key)` bug in old parser	2026-04-14 13:27:31 +02:00
Taus	91d4cf6624	Python: Update `python.tsg` First, we extend the various location overriding hacks to also accept list and dict splats in various places. Having done this, we then have to tackle how to actually desugar these new comprehension forms (as this is what we currently do for the old forms). As a reminder, a list comprehension like `[x for x in y]` currently gets desugared into a small local function, something like ```python def listcomp(a): for x in a: yield x listcomp(y) ``` For `[*x for x in y]`, the behaviour we want is that we unpack `x` before yielding its elements in turn. This is essentially what we would get if we were to use `yield from x` instead of `yield x` in the above desugaring, so that's what we do. This also works for set comprehensions. For dict comprehensions, it's slightly more complicated. Here, the generator function instead yields a stream of `(key, value)` tuples. (And apparently the old parser got this wrong and emitted `(value, key)` pairs instead, which we faithfully recreated in the new parser as well. We fix that bug in both parsers while we're at it). So, a bare `yield from` is not enough, we also need a `.items()` call to get the double-starred expression to emit its items as a stream of tuples (that we then `yield from`. To make this (hopefully) less verbose in the implementation, we defer the decision of whether to use `yield` or `yield from` by introducing a `yield_kind` scoped variable that determines the type of the actual AST node. And of course for dict comprehensions with unpacking we need to synthesise the extra machinery mentioned above. On the plus side, this means we don't have to mess with control-flow, as the existing machinery should be able to handle the desugared syntax just fine.	2026-04-14 13:27:31 +02:00
Taus	97086c3cc9	Python: Regenerate parser files	2026-04-14 13:27:31 +02:00
Taus	4b5ff0b89e	Python: Support unpacking in comprehensions in `tree-sitter-python` This is the easy part -- we just allow `dictionary_splat` or `list_splat` to appear in the same place as the expression.	2026-04-14 13:27:31 +02:00
Taus	1ddfed6b6b	Python: Add QL support for lazy imports Adds a new `isLazy` predicate to the relevant classes, and adds the relevant dbscheme (and up/downgrade) changes. On upgrades we do nothing, and on downgrades we remove the `is_lazy` bits.	2026-04-10 14:25:08 +00:00
Taus	fe94828fe4	Python: Add overlay annotations to AST template Otherwise these will disappear every time we regenerate the AST.	2026-04-10 14:23:29 +00:00
Taus	2c79f9d828	Python: Regenerate parser files	2026-04-10 13:50:59 +00:00
Taus	ad4018f399	Python: Add parser support for lazy imports As defined in PEP-810. We implement this in much the same way as how we handle `async` annotations currently. The relevant nodes get an `is_lazy` field that defaults to being false.	2026-04-10 13:50:43 +00:00
Taus	8c27437628	Python: Bump extractor version and add change note	2026-02-05 13:50:54 +00:00
Taus	12ee93042b	Python: Add tests	2026-02-05 13:47:24 +00:00
Taus	bac356c9a1	Python: Regenerate parser files	2026-02-05 13:46:59 +00:00
Taus	68c1a3d389	Python: Fix syntax error when `=` is used as a format fill character An example (provided by @redsun82) is the string `f"{x:=^20}"`. Parsing this (with unnamed nodes shown) illustrates the problem: ``` module [0, 0] - [2, 0] expression_statement [0, 0] - [0, 11] string [0, 0] - [0, 11] string_start [0, 0] - [0, 2] interpolation [0, 2] - [0, 10] "{" [0, 2] - [0, 3] expression: named_expression [0, 3] - [0, 9] name: identifier [0, 3] - [0, 4] ":=" [0, 4] - [0, 6] ERROR [0, 6] - [0, 7] "^" [0, 6] - [0, 7] value: integer [0, 7] - [0, 9] "}" [0, 9] - [0, 10] string_end [0, 10] - [0, 11] ``` Observe that we've managed to combine the format specifier token `:` and the fill character `=` in a single token (which doesn't match the `:` we expect in the grammar rule), and hence we get a syntax error. If we change the `=` to some other character (e.g. a `-`), we instead get ``` module [0, 0] - [2, 0] expression_statement [0, 0] - [0, 11] string [0, 0] - [0, 11] string_start [0, 0] - [0, 2] interpolation [0, 2] - [0, 10] "{" [0, 2] - [0, 3] expression: identifier [0, 3] - [0, 4] format_specifier: format_specifier [0, 4] - [0, 9] ":" [0, 4] - [0, 5] "}" [0, 9] - [0, 10] string_end [0, 10] - [0, 11] ``` and in particular no syntax error. To fix this, we want to ensure that the `:` is lexed on its own, and the `token(prec(1, ...))` construction can be used to do exactly this. Finally, you may wonder why `=` is special here. I think what's going on is that the lexer knows that `:=` is a token on its own (because it's used in the walrus operator), and so it greedily consumes the following `=` with this in mind.	2026-02-05 13:45:54 +00:00
Ian Lynagh	d1175276ca	python: Use more standard shared dbscheme sections We now use the shared "Overlay support" and "Database metadata".	2026-01-20 11:56:13 +00:00
Ian Lynagh	d125e224ac	python: Add dbscheme regeneration instructions	2026-01-20 11:56:13 +00:00
Taus	659ec3999b	Mark generated files as generated	2026-01-12 15:24:01 +00:00
Taus	2c83b296a4	Python: Add parser test Note in particular that the `exceptions.py` test is unaffected.	2026-01-06 13:40:38 +00:00
Taus	4db60df9dd	Python: Regenerate parser files	2026-01-06 13:40:38 +00:00
Taus	2380bfd459	Python: Add support for PEP-758 exception syntax See https://peps.python.org/pep-0758/ for more details. We implement this by extending the syntax for exceptions and exception groups so that the `type` field can now contain either an expression (which matches the old behaviour), or a comma-separated list of at least two elements (representing the new behaviour). We model the latter case using a new node type `exception_list`, which in `tsg-python` is simply mapped to a tuple. This means it matches the existing behaviour (when the tuple is surrounded by parentheses) exactly, hence we don't need to change any other code. As a consequence of this, however, we cannot directly parse the Python 2.7 syntax `except Foo, e: ...` as `except Foo as e: ...`, as this would introduce an ambiguity in the grammar. Thus, we have removed support for the (deprecated) 2.7-style syntax, and only allow `as` to indicate binding of the exception. The syntax `except Foo, e: ...` continues to be parsed (in particular, it's not suddenly a syntax error), but it will be parsed as if it were `except (Foo, e): ...`, which may not give the correct results. In principle we could extend the QL libraries to account for this case (specifically when analysing Python 2 code). In practice, however, I expect this to have a minor impact on results, and not worth the additional investment at this time.	2026-01-06 13:40:37 +00:00
Taus	47c967a06c	Python: Bump extractor version	2025-12-16 23:57:58 +01:00
Taus	28e733e335	Python: Support template strings in rest of extractor Adds three new AST nodes to the mix: - `TemplateString` represents a t-string in Python 3.14 - `TemplateStringPart` represents one of the string constituents of a t-string. (The interpolated expressions are represented as `Expr` nodes, just like f-strings.) - `JoinedTemplateString` represents an implicit concatenation of template strings. Importantly, we _completely avoid_ the complicated construction we currently do for format strings (as well as the confusing nomenclature). No extra injection of empty strings (so that a template string is a strict alternation of strings and expressions). A `JoinedTemplateString` simply has a list of template string children, and a `TemplateString` has a list of "values" which may be either `Expr` or `TemplateStringPart` nodes. If we ever find that we actually want the more complicated interface for these strings, then I would much rather we reconstruct this inside of QL rather than in the parser.	2025-12-16 23:57:58 +01:00
Taus	cd7ae34380	Python: Regenerate parser files	2025-12-16 23:57:58 +01:00
Taus	7768ebe8b8	Python: Add parser support for template strings - Extends the scanner with a new token kind representing the start of a template string. This is used to distinguish template strings from regular strings (because only a template string will start with a `_template_string_start` external token). - Cleans up the logic surrounding interpolations (and the method names) so that format strings and template strings behave the same in this case. Finally, we add two new node types in the tree-sitter grammar: - `template_string` behaves like format strings, but is a distinct type (mainly so that an implicit concatenation between template strings and regular strings becomes a syntax error). - `concatenated_template_string` is the counterpart of `concatenated_string`. However, internally, the string parts of a template strings are just the same `string_content` nodes that are used in regular format strings. We will disambiguate these inside `tsg-python`.	2025-12-16 23:57:58 +01:00
Taus	f55ff96674	Python: Bump extractor version and add change note	2025-11-27 13:52:37 +00:00
Alexander Köplinger	458f8570e8	Fix KeyError: 'name' in python/extractor/imp.py on Python 3.14 Follow-up to https://github.com/github/codeql/pull/20630 The fix didn't fully work since when we raise the ImportError in `find_module` we don't pass a named argument into the format string which causes a `KeyError`. We need to use a format string without named arguments, like Python 3.13 and earlier did.	2025-11-25 12:38:55 +01:00
Nora Dimitrijević	e120e5c3ba	Merge pull request #20337 from d10c/d10c/python-overlay-compilation-plus-extractor Python: enable overlay compilation + extractor overlay support	2025-10-16 14:49:01 +02:00
Taus	c4b27d5f28	Python: Fix `ImportError` in `imp.py` under Python 3.14 It seems `_ERR_MSG` was silently removed in Python 3.14, leading to an `ImportError` when running the extractor. To fix this, we explicitly set `_ERR_MSG` when the existing import fails (using `_ERR_MSG_PREFIX` which is available in Python 3.14+, along with the bits that make up the difference between this and `_ERR_MSG`).	2025-10-13 13:50:43 +00:00
Nora Dimitrijević	c749607db8	Bump python extractor version to 7.1.5	2025-10-07 11:22:16 +02:00
Nora Dimitrijević	1a9683f986	Add `@top` database type	2025-10-06 11:47:14 +02:00
Nora Dimitrijević	6f208e9dec	Write overlay metadata at end of extraction.	2025-10-06 11:47:12 +02:00
Nora Dimitrijević	49b18db044	Python extractor: in overlay mode, traverse only changed files - fall back to full extraction on overlay changes json read error - we filter both root modules and (transitive) imports against the overlay-changes json.	2025-10-06 11:47:09 +02:00
Nora Dimitrijević	e0cf719cb9	Path transformer: handle Windows-style paths And don't add slash to start of path patterns on Windows.	2025-10-06 11:37:04 +02:00
Nora Dimitrijević	29b1a7403b	Support CODEQL_PATH_TRANSFORMER env var in python path renamer The new name is required by overlay support.	2025-10-06 11:37:02 +02:00
Nora Dimitrijević	a88d3397cd	Add overlay builtins to python dbscheme	2025-10-06 11:36:56 +02:00
Taus	f5a06bef4a	Merge pull request #19929 from github/tausbn/python-update-tree-sitter-dependency Python: Update `tree-sitter` dependency	2025-09-17 13:40:13 +02:00
Arthur Baars	5d3ec35e29	Remove non-breaking spaces from code	2025-09-05 09:41:15 +02:00
Taus	f6732a927b	Python: Bump extractor version	2025-09-03 11:56:54 +00:00
Taus	13a93c7e32	Python: Add suggestions from Copilot	2025-09-03 11:55:49 +00:00
Taus	9802ad77dc	Python: Update `types_new.py` and test output	2025-09-02 12:41:57 +00:00
Taus	235822d782	Python: Improve handling of syntax errors Rather than relying on matching arbitrary nodes inside tree-sitter-graph and then checking whether they are of type ERROR or MISSING (which seems to have stopped working in later versions of tree-sitter), we now explicitly go through the tree-sitter tree, locating all of the error and missing nodes along the way. We then add these on to the graph output in the same format as was previously produced by tree-sitter-graph. Note that it's very likely that some of the syntax errors will move around a bit as a consequence of this change. In general, we don't expect syntax errors to have stable locations, as small changes in the grammar can cause an error to appear in a different position, even if the underlying (erroneous) code has not changed.	2025-09-02 12:41:57 +00:00
Taus	b108d47b26	Python: Update parser test output It seems that with a newer version of tree-sitter, we no longer parse the (not actually valid!) syntax `Spam[P2]` as if the `` is an exponentiation operation (with a missing left operand).	2025-09-02 12:41:55 +00:00
Taus	76f15a890c	Python: Update `tree-sitter` dependency Updates the Python extractor to depend on version 0.24.7 of tree-sitter (and 0.12.0 of tree-sitter-graph). A few changes were needed in order to make the code build and run after updating the dependencies: - In `main.rs`, the `Language` parameter is now passed as a reference. - In `python.tsg`, many queries had captures that were not actually used in the body of the stanza. This is no longer allowed (unless the captures start with an underscore), as it may indicate an error. To fix this, I added underscores in the appropriate places (and verified that none of these unused captures were in fact bugs).	2025-09-02 12:40:20 +00:00
Taus	ad53518644	Python: Regenerate parser files	2025-06-26 15:34:44 +00:00
Taus	e04821e9e3	Python: Allow use of `match` as an identifier This previously only worked in certain circumstances. In particular, assignments such as `match[1] = ...` or even just `match[1]` would fail to parse correctly. Fixing this turned out to be less trivial than anticipated. Consider the fact that ``` match [1]: case (...) ``` can either look the start of a `match` statement, or it could be a type ascription, ascribing the value of `case(...)` (a call) to the item at index 1 of `match`. To fix this, then, we give `match` the identifier and `match` the statement the same precendence in the grammar, and additionally also mark a conflict between `match_statement` and `primary_expression`. This causes the conflict to be resolved dynamically, and seems to do the right thing in all cases.	2025-06-26 15:33:00 +00:00
Taus	2158eaa34c	Python: Fix a bug in glob regex creation The previous version was tested on a version of the code where we had temporarily removed the `glob.strip("/")` bit, and so the bug didn't trigger then. We now correctly remember if the glob ends in `/`, and add an extra part in that case. This way, if the path ends with multiple slashes, they effectively get consolidated into a single one, which results in the correct semantics.	2025-05-15 15:34:11 +00:00
Taus	c8cca126a1	Python: Bump extractor version	2025-05-15 14:59:33 +00:00
Taus	96558b53b8	Python: Update test The second test case now sets the `paths-ignore` setting in the config file in order to skip files in hidden directories.	2025-05-15 14:53:15 +00:00

1 2 3

149 Commits