codeql

mirror of https://github.com/github/codeql.git synced 2025-12-16 16:53:25 +01:00

Author	SHA1	Message	Date
Taus	f709d02464	Python: Bump extractor version	2025-12-04 16:43:05 +00:00
Taus	48cd54165a	Python: Support template strings in rest of extractor Adds three new AST nodes to the mix: - `TemplateString` represents a t-string in Python 3.14 - `TemplateStringPart` represents one of the string constituents of a t-string. (The interpolated expressions are represented as `Expr` nodes, just like f-strings.) - `JoinedTemplateString` represents an implicit concatenation of template strings. Importantly, we _completely avoid_ the complicated construction we currently do for format strings (as well as the confusing nomenclature). No extra injection of empty strings (so that a template string is a strict alternation of strings and expressions). A `JoinedTemplateString` simply has a list of template string children, and a `TemplateString` has a list of "values" which may be either `Expr` or `TemplateStringPart` nodes. If we ever find that we actually want the more complicated interface for these strings, then I would much rather we reconstruct this inside of QL rather than in the parser.	2025-12-04 16:42:43 +00:00
Taus	5928d0ff12	Python: Regenerate parser files	2025-12-04 16:31:17 +00:00
Taus	287e18d02c	Python: Add parser support for template strings - Extends the scanner with a new token kind representing the start of a template string. This is used to distinguish template strings from regular strings (because only a template string will start with a `_template_string_start` external token). - Cleans up the logic surrounding interpolations (and the method names) so that format strings and template strings behave the same in this case. Finally, we add two new node types in the tree-sitter grammar: - `template_string` behaves like format strings, but is a distinct type (mainly so that an implicit concatenation between template strings and regular strings becomes a syntax error). - `concatenated_template_string` is the counterpart of `concatenated_string`. However, internally, the string parts of a template strings are just the same `string_content` nodes that are used in regular format strings. We will disambiguate these inside `tsg-python`.	2025-12-04 16:31:16 +00:00
Taus	f55ff96674	Python: Bump extractor version and add change note	2025-11-27 13:52:37 +00:00
Alexander Köplinger	458f8570e8	Fix KeyError: 'name' in python/extractor/imp.py on Python 3.14 Follow-up to https://github.com/github/codeql/pull/20630 The fix didn't fully work since when we raise the ImportError in `find_module` we don't pass a named argument into the format string which causes a `KeyError`. We need to use a format string without named arguments, like Python 3.13 and earlier did.	2025-11-25 12:38:55 +01:00
Nora Dimitrijević	e120e5c3ba	Merge pull request #20337 from d10c/d10c/python-overlay-compilation-plus-extractor Python: enable overlay compilation + extractor overlay support	2025-10-16 14:49:01 +02:00
Taus	c4b27d5f28	Python: Fix `ImportError` in `imp.py` under Python 3.14 It seems `_ERR_MSG` was silently removed in Python 3.14, leading to an `ImportError` when running the extractor. To fix this, we explicitly set `_ERR_MSG` when the existing import fails (using `_ERR_MSG_PREFIX` which is available in Python 3.14+, along with the bits that make up the difference between this and `_ERR_MSG`).	2025-10-13 13:50:43 +00:00
Nora Dimitrijević	c749607db8	Bump python extractor version to 7.1.5	2025-10-07 11:22:16 +02:00
Nora Dimitrijević	1a9683f986	Add `@top` database type	2025-10-06 11:47:14 +02:00
Nora Dimitrijević	6f208e9dec	Write overlay metadata at end of extraction.	2025-10-06 11:47:12 +02:00
Nora Dimitrijević	49b18db044	Python extractor: in overlay mode, traverse only changed files - fall back to full extraction on overlay changes json read error - we filter both root modules and (transitive) imports against the overlay-changes json.	2025-10-06 11:47:09 +02:00
Nora Dimitrijević	e0cf719cb9	Path transformer: handle Windows-style paths And don't add slash to start of path patterns on Windows.	2025-10-06 11:37:04 +02:00
Nora Dimitrijević	29b1a7403b	Support CODEQL_PATH_TRANSFORMER env var in python path renamer The new name is required by overlay support.	2025-10-06 11:37:02 +02:00
Nora Dimitrijević	a88d3397cd	Add overlay builtins to python dbscheme	2025-10-06 11:36:56 +02:00
Taus	f5a06bef4a	Merge pull request #19929 from github/tausbn/python-update-tree-sitter-dependency Python: Update `tree-sitter` dependency	2025-09-17 13:40:13 +02:00
Arthur Baars	5d3ec35e29	Remove non-breaking spaces from code	2025-09-05 09:41:15 +02:00
Taus	f6732a927b	Python: Bump extractor version	2025-09-03 11:56:54 +00:00
Taus	13a93c7e32	Python: Add suggestions from Copilot	2025-09-03 11:55:49 +00:00
Taus	9802ad77dc	Python: Update `types_new.py` and test output	2025-09-02 12:41:57 +00:00
Taus	235822d782	Python: Improve handling of syntax errors Rather than relying on matching arbitrary nodes inside tree-sitter-graph and then checking whether they are of type ERROR or MISSING (which seems to have stopped working in later versions of tree-sitter), we now explicitly go through the tree-sitter tree, locating all of the error and missing nodes along the way. We then add these on to the graph output in the same format as was previously produced by tree-sitter-graph. Note that it's very likely that some of the syntax errors will move around a bit as a consequence of this change. In general, we don't expect syntax errors to have stable locations, as small changes in the grammar can cause an error to appear in a different position, even if the underlying (erroneous) code has not changed.	2025-09-02 12:41:57 +00:00
Taus	b108d47b26	Python: Update parser test output It seems that with a newer version of tree-sitter, we no longer parse the (not actually valid!) syntax `Spam[P2]` as if the `` is an exponentiation operation (with a missing left operand).	2025-09-02 12:41:55 +00:00
Taus	76f15a890c	Python: Update `tree-sitter` dependency Updates the Python extractor to depend on version 0.24.7 of tree-sitter (and 0.12.0 of tree-sitter-graph). A few changes were needed in order to make the code build and run after updating the dependencies: - In `main.rs`, the `Language` parameter is now passed as a reference. - In `python.tsg`, many queries had captures that were not actually used in the body of the stanza. This is no longer allowed (unless the captures start with an underscore), as it may indicate an error. To fix this, I added underscores in the appropriate places (and verified that none of these unused captures were in fact bugs).	2025-09-02 12:40:20 +00:00
Taus	ad53518644	Python: Regenerate parser files	2025-06-26 15:34:44 +00:00
Taus	e04821e9e3	Python: Allow use of `match` as an identifier This previously only worked in certain circumstances. In particular, assignments such as `match[1] = ...` or even just `match[1]` would fail to parse correctly. Fixing this turned out to be less trivial than anticipated. Consider the fact that ``` match [1]: case (...) ``` can either look the start of a `match` statement, or it could be a type ascription, ascribing the value of `case(...)` (a call) to the item at index 1 of `match`. To fix this, then, we give `match` the identifier and `match` the statement the same precendence in the grammar, and additionally also mark a conflict between `match_statement` and `primary_expression`. This causes the conflict to be resolved dynamically, and seems to do the right thing in all cases.	2025-06-26 15:33:00 +00:00
Taus	2158eaa34c	Python: Fix a bug in glob regex creation The previous version was tested on a version of the code where we had temporarily removed the `glob.strip("/")` bit, and so the bug didn't trigger then. We now correctly remember if the glob ends in `/`, and add an extra part in that case. This way, if the path ends with multiple slashes, they effectively get consolidated into a single one, which results in the correct semantics.	2025-05-15 15:34:11 +00:00
Taus	c8cca126a1	Python: Bump extractor version	2025-05-15 14:59:33 +00:00
Taus	96558b53b8	Python: Update test The second test case now sets the `paths-ignore` setting in the config file in order to skip files in hidden directories.	2025-05-15 14:53:15 +00:00
Taus	98388be25c	Python: Remove special casing of hidden files If it is necessary to exclude hidden files, then adding ``` paths-ignore: ['*/./**'] ``` to the relevant config file is recommended instead.	2025-05-15 14:49:17 +00:00
Taus	61719cf448	Python: Fix a bug in glob conversion If you have a filter like `/foo/` set in the `paths-ignore` bit of your config file, then currently the following happens: - First, the CodeQL CLI observes that this string ends in `/` and strips off the `` leaving `/foo/` - Then the Python extractor strips off leading and trailing `/` characters and proceeds to convert `/foo` into a regex that is matched against files to (potentially) extract. The trouble with this is that it leaves us unable to distinguish between, say, a file `foo.py` and a file `foo/bar.py`. In other words, we have lost the ability to exclude only the _folder_ `foo` and not any files that happen to start with `foo`. To fix this, we instead make a note of whether the glob ends in a forward slash or not, and adjust the regex correspondingly.	2025-05-15 14:48:06 +00:00
Taus	605f2bff9c	Python: Add integration test	2025-05-02 14:27:46 +00:00
Taus	0c1b379ac1	Python: Extract files in hidden dirs by default Changes the default behaviour of the Python extractor so files inside hidden directories are extracted by default. Also adds an extractor option, `skip_hidden_directories`, which can be set to `true` in order to revert to the old behaviour. Finally, I made the logic surrounding what is logged in various cases a bit more obvious. Technically this changes the behaviour of the extractor (in that hidden excluded files will now be logged as `(excluded)`, but I think this makes more sense anyway.	2025-05-02 12:44:05 +00:00
Taus	6546bb1b1d	Merge branch 'main' into tausbn/python-fix-match-pruning-logic	2025-03-06 14:37:58 +01:00
Paolo Tranquilli	1bcc6ddb32	Rust/Ruby/Python: apply clippy lints	2025-02-25 13:21:28 +01:00
Paolo Tranquilli	6089a75262	Rust/Ruby/Python: format code	2025-02-25 13:19:03 +01:00
Paolo Tranquilli	e8799e346d	Rust/Python: fix edition-related errors	2025-02-25 13:16:58 +01:00
Paolo Tranquilli	eff87d24fa	Rust/Ruby/Python: update rustc and edition	2025-02-25 13:15:19 +01:00
Paolo Tranquilli	38efd4a8a2	Python: downgrade `tree-sitter` back to `0.20.4`	2025-02-18 10:03:18 +01:00
Paolo Tranquilli	342bff6125	Python: undo tree-sitter update	2025-02-17 15:52:45 +01:00
Paolo Tranquilli	91b3d108bb	Python: upgrade cargo dependencies This required some code changes because of some breaking changes in `clap` and `tree-sitter`. Also needed to assign a new bazel repo name to the `crates_vendor` to avoid name conflicts in `MODULE.bazel`.	2025-02-17 10:56:36 +01:00
Taus	918c05c538	Python: Don't prune any `MatchLiteralPattern`s Extends the mechanism introduced in https://github.com/github/codeql/pull/18030 to behave the same for _all_ `MatchLiteralPattern`s, not just the ones that happen to be the constant `True` or `False`. Co-authored-by: yoff <yoff@github.com>	2025-02-11 12:58:52 +00:00
Paolo Tranquilli	cc939e64fd	Python: fix bazel rule	2025-02-07 14:42:26 +01:00
yoff	37ddaa36ad	Merge pull request #18702 from github/tausbn/python-allow-comments-in-subscripts Python: Allow comments in subscripts	2025-02-06 23:31:29 +01:00
Taus	131ec8d22f	Python: Handle loop constructs outside of loops Observed on some test files in Nuitka/Nuitka, having `break` and `continue` outside of loops in Python is (to Python) a syntax error, but our parser happily accepted this broken syntax. This then caused issues further downstream in the control-flow construction, as it broke some invariants. To fix this we now skip the code that would previously fail when the invariants are broken. Co-authored-by: yoff <yoff@github.com>	2025-02-06 14:30:16 +00:00
Taus	7124e80f28	Python: Regenerate parser files	2025-02-06 14:05:40 +00:00
Taus	c5be2a3e2d	Python: Allow comments in subscripts Once again, the interaction between anchors and extras (specifically comments) was causing trouble. The root of the problem was the fact that in `a[b]`, we put `b` in the `index` field of the subscript node, whereas in `a[b,c]`, we additionally synthesize a `Tuple` node for `b,c` (which matches the Python AST). To fix this, we refactored the grammar slightly so as to make that tuple explicit, such that a subscript node either contains a single expression or the newly added tuple node. This greatly simplifies the logic.	2025-02-06 14:04:57 +00:00
Taus	60d97e0e16	Python: Print file path when logging context errors This makes it _much_ easier to find the offending bit of syntax.	2025-02-05 13:13:39 +00:00
Cornelius Riemenschneider	53ca5083a9	Upgrade bazel to 8.0.0. Previously, we were using 8.0.0rc1. In particular, this upgrade means we need to explicitly import more rules, as they've been moved out of the core bazel repo.	2024-12-10 12:05:37 +01:00
Taus	a9817a0281	Python: Add guide describing how to extend the parser	2024-11-28 12:32:00 +00:00
Taus	d779ae5c3e	Python: Add change note for CFG pruning fix ... And also bump the extractor version.	2024-11-26 15:39:15 +00:00

1 2 3

127 Commits