codeql

mirror of https://github.com/github/codeql.git synced 2026-02-13 13:41:08 +01:00

Author	SHA1	Message	Date
Taus	918c05c538	Python: Don't prune any `MatchLiteralPattern`s Extends the mechanism introduced in https://github.com/github/codeql/pull/18030 to behave the same for _all_ `MatchLiteralPattern`s, not just the ones that happen to be the constant `True` or `False`. Co-authored-by: yoff <yoff@github.com>	2025-02-11 12:58:52 +00:00
Taus	131ec8d22f	Python: Handle loop constructs outside of loops Observed on some test files in Nuitka/Nuitka, having `break` and `continue` outside of loops in Python is (to Python) a syntax error, but our parser happily accepted this broken syntax. This then caused issues further downstream in the control-flow construction, as it broke some invariants. To fix this we now skip the code that would previously fail when the invariants are broken. Co-authored-by: yoff <yoff@github.com>	2025-02-06 14:30:16 +00:00
Taus	a4ccda5fe3	Python: Fix pruning of literals in `match` pattern Co-authored-by: yoff <lerchedahl@gmail.com>	2024-11-19 13:48:13 +00:00
Taus	cc39ae57dc	Python: Fix dataset check error for string encoding Here's an example of one of these errors: ``` INVALID_KEY predicate py_cobjectnames(@py_cobject obj, string name) The key set {obj} does not functionally determine all fields. Here is a pair of tuples that agree on the key set but differ at index 1: Tuple 1 in row 63874: (72088,"u'<X>'") Tuple 2 in row 63875: (72088,"u'<?>'") ``` (Here, the substring `X` should really be the Unicode character U+FFFD, but for some reason I'm not allowed to put that in this commit message.) Inside the extractor, we assign IDs based on the string type (bytestring or Unicode) and a hash of the UTF-8 encoded content of the string. In this case, however, certain _different_ strings were receiving the same hash, due to replacement characters in the encoding process. In particular, we were converting unencodable characters to question marks in one place, and to U+FFFD in another place. This caused a discrepancy that lead to the dataset check error. To fix this, we put in a custom error handler that always puts the U+FFFD character in place of unencodable characters. With this, the strings now agree, and hence there is no clash.	2024-10-21 15:31:16 +00:00
Taus	6dec323cfc	Python: Copy Python extractor to `codeql` repo	2024-03-07 13:59:16 +00:00

5 Commits