codeql

mirror of https://github.com/github/codeql.git synced 2026-02-12 05:01:06 +01:00

Author	SHA1	Message	Date
Taus	8c27437628	Python: Bump extractor version and add change note	2026-02-05 13:50:54 +00:00
Ian Lynagh	d1175276ca	python: Use more standard shared dbscheme sections We now use the shared "Overlay support" and "Database metadata".	2026-01-20 11:56:13 +00:00
Ian Lynagh	d125e224ac	python: Add dbscheme regeneration instructions	2026-01-20 11:56:13 +00:00
Taus	47c967a06c	Python: Bump extractor version	2025-12-16 23:57:58 +01:00
Taus	28e733e335	Python: Support template strings in rest of extractor Adds three new AST nodes to the mix: - `TemplateString` represents a t-string in Python 3.14 - `TemplateStringPart` represents one of the string constituents of a t-string. (The interpolated expressions are represented as `Expr` nodes, just like f-strings.) - `JoinedTemplateString` represents an implicit concatenation of template strings. Importantly, we _completely avoid_ the complicated construction we currently do for format strings (as well as the confusing nomenclature). No extra injection of empty strings (so that a template string is a strict alternation of strings and expressions). A `JoinedTemplateString` simply has a list of template string children, and a `TemplateString` has a list of "values" which may be either `Expr` or `TemplateStringPart` nodes. If we ever find that we actually want the more complicated interface for these strings, then I would much rather we reconstruct this inside of QL rather than in the parser.	2025-12-16 23:57:58 +01:00
Taus	f55ff96674	Python: Bump extractor version and add change note	2025-11-27 13:52:37 +00:00
Nora Dimitrijević	c749607db8	Bump python extractor version to 7.1.5	2025-10-07 11:22:16 +02:00
Nora Dimitrijević	1a9683f986	Add `@top` database type	2025-10-06 11:47:14 +02:00
Nora Dimitrijević	6f208e9dec	Write overlay metadata at end of extraction.	2025-10-06 11:47:12 +02:00
Nora Dimitrijević	49b18db044	Python extractor: in overlay mode, traverse only changed files - fall back to full extraction on overlay changes json read error - we filter both root modules and (transitive) imports against the overlay-changes json.	2025-10-06 11:47:09 +02:00
Nora Dimitrijević	e0cf719cb9	Path transformer: handle Windows-style paths And don't add slash to start of path patterns on Windows.	2025-10-06 11:37:04 +02:00
Nora Dimitrijević	29b1a7403b	Support CODEQL_PATH_TRANSFORMER env var in python path renamer The new name is required by overlay support.	2025-10-06 11:37:02 +02:00
Nora Dimitrijević	a88d3397cd	Add overlay builtins to python dbscheme	2025-10-06 11:36:56 +02:00
Taus	f6732a927b	Python: Bump extractor version	2025-09-03 11:56:54 +00:00
Taus	2158eaa34c	Python: Fix a bug in glob regex creation The previous version was tested on a version of the code where we had temporarily removed the `glob.strip("/")` bit, and so the bug didn't trigger then. We now correctly remember if the glob ends in `/`, and add an extra part in that case. This way, if the path ends with multiple slashes, they effectively get consolidated into a single one, which results in the correct semantics.	2025-05-15 15:34:11 +00:00
Taus	c8cca126a1	Python: Bump extractor version	2025-05-15 14:59:33 +00:00
Taus	98388be25c	Python: Remove special casing of hidden files If it is necessary to exclude hidden files, then adding ``` paths-ignore: ['*/./**'] ``` to the relevant config file is recommended instead.	2025-05-15 14:49:17 +00:00
Taus	61719cf448	Python: Fix a bug in glob conversion If you have a filter like `/foo/` set in the `paths-ignore` bit of your config file, then currently the following happens: - First, the CodeQL CLI observes that this string ends in `/` and strips off the `` leaving `/foo/` - Then the Python extractor strips off leading and trailing `/` characters and proceeds to convert `/foo` into a regex that is matched against files to (potentially) extract. The trouble with this is that it leaves us unable to distinguish between, say, a file `foo.py` and a file `foo/bar.py`. In other words, we have lost the ability to exclude only the _folder_ `foo` and not any files that happen to start with `foo`. To fix this, we instead make a note of whether the glob ends in a forward slash or not, and adjust the regex correspondingly.	2025-05-15 14:48:06 +00:00
Taus	0c1b379ac1	Python: Extract files in hidden dirs by default Changes the default behaviour of the Python extractor so files inside hidden directories are extracted by default. Also adds an extractor option, `skip_hidden_directories`, which can be set to `true` in order to revert to the old behaviour. Finally, I made the logic surrounding what is logged in various cases a bit more obvious. Technically this changes the behaviour of the extractor (in that hidden excluded files will now be logged as `(excluded)`, but I think this makes more sense anyway.	2025-05-02 12:44:05 +00:00
Taus	918c05c538	Python: Don't prune any `MatchLiteralPattern`s Extends the mechanism introduced in https://github.com/github/codeql/pull/18030 to behave the same for _all_ `MatchLiteralPattern`s, not just the ones that happen to be the constant `True` or `False`. Co-authored-by: yoff <yoff@github.com>	2025-02-11 12:58:52 +00:00
Taus	131ec8d22f	Python: Handle loop constructs outside of loops Observed on some test files in Nuitka/Nuitka, having `break` and `continue` outside of loops in Python is (to Python) a syntax error, but our parser happily accepted this broken syntax. This then caused issues further downstream in the control-flow construction, as it broke some invariants. To fix this we now skip the code that would previously fail when the invariants are broken. Co-authored-by: yoff <yoff@github.com>	2025-02-06 14:30:16 +00:00
Taus	60d97e0e16	Python: Print file path when logging context errors This makes it _much_ easier to find the offending bit of syntax.	2025-02-05 13:13:39 +00:00
Taus	d779ae5c3e	Python: Add change note for CFG pruning fix ... And also bump the extractor version.	2024-11-26 15:39:15 +00:00
Taus	a4ccda5fe3	Python: Fix pruning of literals in `match` pattern Co-authored-by: yoff <lerchedahl@gmail.com>	2024-11-19 13:48:13 +00:00
Taus	2ef3ae9860	Python: Improve parser logging/timing/customisability Does a bunch of things, unfortunately all in the same place, so my apologies in advance for a slightly complicated commit. As for the changes themselves, this commit - Adds timers for the old and new parsers. This means we get the overall time spent on these parts of the extractor if the extractor is run with `DEBUG` output shown. - Adds logging information (at the `DEBUG` level) to show which invocations of the parsers happen when, and whether they succeed or not. - Adds support for using an environment variable named `CODEQL_PYTHON_DISABLE_OLD_PARSER` to disable using the old parser entirely. This makes it easier to test the new parser in isolation. - Fixes a bug where we did not check whether a parse with the new parser had already succeeded, and so would do a superfluous second parse.	2024-10-30 13:58:46 +00:00
Taus	f75615b913	Merge pull request #17822 from github/tausbn/python-more-parser-fixes Python: A few more parser fixes	2024-10-30 13:47:10 +01:00
Taus	fcec8e0256	Python: Fail tests when errors/warnings are logged This is primarily useful for ensuring that errors where a node does not have an appropriate context set in `python.tsg` actually have an effect on the pass/fail status of the parser tests. Previously, these would just be logged to stdout, but test could still succeed when there were errors present. Also fixes one of the logging lines in `tsg_parser.py` to be more consistent with the others.	2024-10-22 15:11:51 +00:00
Taus	cc39ae57dc	Python: Fix dataset check error for string encoding Here's an example of one of these errors: ``` INVALID_KEY predicate py_cobjectnames(@py_cobject obj, string name) The key set {obj} does not functionally determine all fields. Here is a pair of tuples that agree on the key set but differ at index 1: Tuple 1 in row 63874: (72088,"u'<X>'") Tuple 2 in row 63875: (72088,"u'<?>'") ``` (Here, the substring `X` should really be the Unicode character U+FFFD, but for some reason I'm not allowed to put that in this commit message.) Inside the extractor, we assign IDs based on the string type (bytestring or Unicode) and a hash of the UTF-8 encoded content of the string. In this case, however, certain _different_ strings were receiving the same hash, due to replacement characters in the encoding process. In particular, we were converting unencodable characters to question marks in one place, and to U+FFFD in another place. This caused a discrepancy that lead to the dataset check error. To fix this, we put in a custom error handler that always puts the U+FFFD character in place of unencodable characters. With this, the strings now agree, and hence there is no clash.	2024-10-21 15:31:16 +00:00
Taus	417e60a466	Python: Update extractor version	2024-10-15 11:22:54 +00:00
Taus	2af0d78435	Python: Add `default` field to the relevant AST nodes	2024-10-15 11:22:32 +00:00
Rasmus Lerchedahl Petersen	e2eb08b543	Python: improve messaging	2024-10-11 15:36:44 +02:00
Rasmus Lerchedahl Petersen	22588c9f85	Python: update ectractor version	2024-10-11 15:36:44 +02:00
Rasmus Lerchedahl Petersen	4a291147e0	Python: only look for the py2 stdlib if we extract std lib	2024-10-11 15:36:44 +02:00
Rasmus Lerchedahl Petersen	e91efaa92e	python: do not extract stdlib by default	2024-10-11 15:36:44 +02:00
Rasmus Wriedt Larsen	354394d4c2	Python: Don't use fake locations in diagnostics Some of the internal tooling would not be too happy about this :D	2024-07-12 13:36:41 +02:00
Rasmus Wriedt Larsen	60d1dc8af8	Python: Bump extractor version	2024-07-09 14:15:52 +02:00
Rasmus Wriedt Larsen	6b3625e24e	Python: Handle diagnostics writing for BuiltinModuleExtractable	2024-07-09 14:15:52 +02:00
Rasmus Wriedt Larsen	c1da2c1d2f	Python: Gracefully handle exceptions in diagnostics writing	2024-07-09 14:15:51 +02:00
Rasmus Wriedt Larsen	a8b976b389	Python: Always log errors before writing diagnostics So we have the info in the logs if the diagnostics processing fails	2024-07-09 13:47:53 +02:00
Taus	6dec323cfc	Python: Copy Python extractor to `codeql` repo	2024-03-07 13:59:16 +00:00

40 Commits