codeql

mirror of https://github.com/github/codeql.git synced 2026-05-27 17:41:24 +02:00

Author	SHA1	Message	Date
yoff	57fa3ee2d4	Python: SSA: handle closure variables via per-scope entry defs The new SSA's implicit entry-def predicate previously placed entries in the variable's defining scope. For closure variables that's the outer function, so inner functions had no entry def for the captured variable — reads in the inner scope failed to resolve to any definition. Mirrors legacy ESSA's 'NonLocalVariable.getScopeEntryDefinition()': place an implicit entry def at every reading scope's entry block, independently of where the variable is defined. A closure variable accessed in two nested functions and the outer one gets three entry defs (one per reading scope). Also makes 'ScopeEntryDefinition' extend 'EssaNodeDefinition' (matching legacy ESSA), with 'getDefiningNode()' returning the scope's entry CFG node. This requires extending the private 'writeDefNode' helper to project i=-1 entries to bb.getNode(0). Updates the new-vs-legacy comparison snapshot: closure-variable reads ('x:32:5'), nested global reads ('GLOBAL:52:1') now resolve. New 'def-only-new' entries appear for unbound names ('sum', 'open', 'compute') — the new SSA uniformly creates scope-entry defs for all non-local reads, including those that legacy ESSA classifies as builtin and excludes. This is a more uniform semantic and arguably cleaner. Updates the SsaTest 'some_undefined' annotation: previously documented as a known limitation, now correctly resolves to a scope-entry def. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 16:32:43 +00:00
yoff	ac468c8f37	Python: extend new SSA with ESSA-shaped adapter + baseline comparison test Phase 0.5 - Adapter API on top of the shared SSA: Adds the legacy-ESSA-shaped class hierarchy that the dataflow library consumes, layered on the shared 'Ssa::Make' instantiation: * EssaDefinition / EssaNodeDefinition: the latter exposes 'getDefiningNode()' (the CFG node at the def's index in its BB) and 'getVariable()' / 'getScope()'. * AssignmentDefinition: matches Assign, AnnAssign with value, AssignExpr and AugAssign target Names. Exposes 'getValue()' pointing at the RHS' CFG node. * ParameterDefinition: matches when the defining Name is in parameter context. * WithDefinition: matches 'with ... as x:' bindings. * ScopeEntryDefinition: implicit entry defs at synthetic position '-1' of the scope's entry basic block (non-local / global / builtin / captured reads). * PhiFunction (alias for PhiNode). * EssaVariable adapter wrapping a 'Ssa::Definition' with 'getAUse()', 'getDefinition()', 'getAnUltimateDefinition()', and 'getName()'. * AdjacentUses module with 'firstUse' and 'adjacentUseUse' predicates bridging to 'Ssa::firstUse' / 'Ssa::adjacentUseUse'. This is the minimum API the new dataflow's internals call into. The richer legacy ESSA (refinement nodes, attribute refinements, edge refinements) stays in 'semmle.python.essa.Essa' for legacy code. Phase 0.6 - Comparison test: Adds 'dataflow-new-ssa-vs-legacy/CmpTest.ql' that snapshots the difference between definitions produced by new SSA vs legacy ESSA on the same Python source. Baseline output records the current 'def-only-old' mismatches, grouped by category: * function/class/global definitions with no in-scope read (intentional; SSA is liveness-pruned) * captured / closure variables (real gap in new SSA - no closure-capture handling yet) * module variables __name__ / __package__ / $ (legacy ESSA implicit bindings) * exception 'as' bindings (depend on raise modelling) Zero 'def-only-new' mismatches: the new SSA never produces a spurious definition compared to legacy ESSA on this corpus. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 16:32:43 +00:00
yoff	f5bf8ae8dd	Python: fix augstore for the new CFG and add store/load test In the legacy CFG the same Python 'Name' that is the target of an augmented assignment has two distinct CFG nodes — a load node (context 3) earlier in the basic block and a store node (context 5) later. 'augstore(load, store)' relates the pair via dominance. The new (shared) CFG canonicalises each AST expression to a single CFG node, so 'load' and 'store' collapse to one. The dominance-based 'augstore' from the legacy implementation no longer holds (it would require 'load.strictlyDominates(load)'), so 'isAugLoad' / 'isAugStore' never fired and 'isStore' missed the AugAssign target entirely. Redefines 'augstore' as reflexive on the AugAssign target's canonical CFG node. With this change: * isAugLoad / isAugStore both fire on the single canonical node. * isStore fires (via 'or augstore(_, this)') — matching the legacy classification that an augmented-assignment target is a store. * isLoad does not fire (excluded by 'not augstore(_, this)'). Adds 'python/ql/test/library-tests/ControlFlow/store-load/' covering plain load/store/delete, parameters, augmented assignment, tuple unpacking, attribute and subscript stores. The test asserts the classification directly on the new-CFG facade. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 16:32:43 +00:00
yoff	79db96c717	Python: introduce shared-SSA adapter on the new CFG Adds 'python/ql/lib/semmle/python/dataflow/new/internal/SsaImpl.qll', a minimal Python SSA implementation built on the shared SSA library ('codeql.ssa.Ssa::Make<Location, Cfg, Input>'). The structure mirrors Java's adapter at 'java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll'. Key design choices: * 'SourceVariable' wraps 'Py::Variable'. Only variables that are read or deleted somewhere are tracked - write-only variables don't benefit from SSA construction. * Variable references are positional ('BasicBlock', 'int') pairs looked up via 'Cfg::NameNode.defines'/'.uses'/'.deletes' (which themselves are one-line bridges to AST-level 'Name.defines' etc.). * Parameter writes are not synthesised: parameter Name nodes are already wired into the CFG (per the earlier C#-style parameter extension in 'AstNodeImpl.qll'), so the regular 'variableWrite' path handles them at their natural CFG index. * Non-local / captured / global / builtin variables read in a scope but not written in it receive a synthetic entry definition at index '-1' of the scope's entry basic block. This matches Java's 'hasEntryDef'. * 'del x' is modelled as a certain write at the deletion site. Includes an inline-expectations test under 'python/ql/test/library-tests/dataflow-new-ssa/' covering: plain parameter pass-through, simple assignment + read, reassignment with dead-write pruning, if/else with phi insertion at the join, and an undefined-name read (currently a known limitation - no SSA flow without an enclosing definition). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 16:32:43 +00:00
yoff	3c21bbfbf5	Python: test dead bindings under no-raise CFG abstraction Adds 'dead_under_no_raise.py' to the bindings test suite, capturing the three CPython patterns where bindings legitimately have no CFG node because the surrounding code is unreachable under the 'no expressions raise' abstraction: 1. Statements after a 'try: return X; except: pass' block. 2. The 'else:' clause of a try whose body always raises. 3. Cache-lookup pattern 'try: return cache[k]; except: pass' followed by computation and store. These bindings intentionally carry no 'cfgdefines=' annotations. If raise modelling is later added to the CFG, the BindingsTest will surface the new CFG nodes as unexpected results and this file will need to be revisited. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 16:32:43 +00:00
yoff	01c6b2b262	Python: wire PEP 695 type parameters into the shared CFG (green) Adds CFG coverage for the binding 'Name's introduced by PEP 695 type-parameter syntax on functions, classes, and 'type' aliases: def func[T](...): ... class Box[T]: ... def multi[T: int, Ts, *P](...): ... type Alias[T] = ... For each parametrised AST node, the type-parameter names (and, for 'type' aliases, the alias name itself) are added as children of the enclosing CFG node so that 'Name.defines(v)' has a corresponding position. Bounds and defaults are intentionally not wired (they have no SSA-relevant semantics for our purposes). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 16:32:42 +00:00
Copilot	f12307278a	Python: wire match-pattern bindings into the shared CFG (green) Adds concrete `Pattern` subclasses in `AstNodeImpl.qll` for every `MatchPattern` AST kind, with `getChild` overrides that expose sub-patterns and bound Names. Specifically: - MatchCapturePattern (`case x:`) -> getVariable() - MatchAsPattern (`case … as v:`) -> getPattern(), getAlias() - MatchStarPattern (`case [*rest]:`) -> getTarget() - MatchSequencePattern (`case [a, b]:`) -> getPattern(i) - MatchClassPattern (`case Cls(p, q, k=v)`) -> getClass(), positional, keyword - MatchMappingPattern (`case {k: v}:`) -> getMapping(i) - MatchKeyValuePattern, MatchKeywordPattern, MatchDoubleStarPattern - MatchOrPattern, MatchLiteralPattern, MatchValuePattern Without these, every Name bound by a match pattern lacked a CFG node. Removes the corresponding MISSING: annotations from match_pattern.py (all 11 cases). Verified: all 24 ControlFlow/evaluation-order tests still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 16:32:42 +00:00
Copilot	ba9dc9f5f1	Python: wire import-statement bindings into the shared CFG (green) Adds `ImportStmt` and `ImportStarStmt` wrappers in `AstNodeImpl.qll`. For each `Alias` in an import statement, both the value (module/member expression) and the bound `asname` Name become children of the CFG node for the import statement, in evaluation order. Without this, every `Name` introduced by `import` / `from .. import ..` lacked a CFG node, even though `Name.defines(v)` returns true for it on the AST side. This was the highest-volume gap: 20,332 missing import aliases across CPython. Removes the corresponding MISSING: annotations from imports.py. Verified: all 24 ControlFlow/evaluation-order tests still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 16:32:42 +00:00
Copilot	768ebc1e2d	Python: wire parameters into the shared CFG (C# pattern) Implements `AstSig::Parameter` and `callableGetParameter(c, i)` in `AstNodeImpl.qll`, following the C# template (`csharp/.../ControlFlowGraph.qll:147-156`) rather than Java's `Parameter() { none() }`. Each Python parameter (positional, args, keyword-only, *kwargs) now becomes a CFG node at a stable position in the enclosing callable's entry sequence. Defaults still evaluate at function-definition time via `FunctionDefExpr.getDefault` / `LambdaExpr.getDefault`, so `Parameter::getDefaultValue()` returns `none()` (the shared CFG library calls this to model the missing-argument fallback, which Python does not surface at the CFG level). The bindings test now exercises parameters (the `py_expr_contexts(_, 4, ...)` exclusion has been removed). A new `parameters.py` test case covers positional, defaulted, vararg, kwarg, keyword-only, kitchen-sink, method (self/cls), lambda, and PEP 570 positional-only parameters. Several other test files were updated to annotate parameters that the test had previously hidden (synthetic `.0` comprehension parameter, method `self`, decorator `f`, etc.). Verified: - All 24 ControlFlow/evaluation-order tests still pass. - CFG consistency query (`python/ql/consistency-queries/CfgConsistency.ql`) shows zero violations on CPython. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 16:32:42 +00:00
Copilot	5d60a0d7c1	Python: wire AnnAssign into the shared CFG (green) Adds an `AnnAssignStmt` wrapper in `AstNodeImpl.qll` so that PEP 526 annotated assignments (`x: int = 1`, `x: int`) participate in the control flow graph. Evaluation order follows CPython: annotation, optional value, target binding. Without this, `x: int = 1` had no CFG node for `x` even though `Name.defines(v)` returns true for it on the AST side. SSA built on the new CFG would therefore miss every annotated-assignment write. Removes the corresponding MISSING: annotations from the CFG-binding gap test: - annassign.py — all four cases now green. - match_pattern.py — class-body annotated fields (`x: int`, `y: int`). - type_params.py — `item: T` inside class. Verified: all 24 ControlFlow/evaluation-order tests still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 16:32:42 +00:00
Copilot	336c7a44a8	Python: add CFG-binding gap tests (red) Adds inline-expectation tests for the new shared CFG implementation in python/ql/lib/semmle/python/controlflow/internal/AstNodeImpl.qll, covering every Python binding construct that introduces a variable. The test files use MISSING: annotations to record bindings whose defining Name AST node is not currently reachable from the new CFG. These are the 'red' half of red-green commit pairs: subsequent commits will extend AstNodeImpl to cover each construct and remove the corresponding MISSING: marker. Confirmed-broken categories: - Import aliases (from x import a) - Annotated assignment (x: int = 1) - Exception handler (except E as e) - Match patterns (case x, case [a,b], case ... as v) - PEP 695 type params (def f[T], class C[T]) Confirmed-working (no MISSING:): - Compound targets, with-as, comprehensions, decorated def/class, walrus, starred. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 16:32:42 +00:00
Copilot	577cf4a630	Shared CFG: support for-else and while-else loops Add two default predicates to AstSig: default AstNode getWhileElse(WhileStmt loop) { none() } default AstNode getForeachElse(ForeachStmt loop) { none() } When defined, the explicit-step rules for While/Do and Foreach route the loop's normal-completion exits through the else block before reaching the after-loop node: - WhileStmt: after-false condition -> before-else -> after-while (instead of directly after-while). - ForeachStmt: after-collection [empty] and the LoopHeader exit are both routed through before-else -> after-foreach. Python's Ast module overrides the predicates to return the synthetic BlockStmt for the orelse slot, replacing the previous customisations in Input::step. This eliminates parallel direct successors emitted by the previous Python-side step additions (verified: multipleSuccessors on a CPython database goes from 1340 to 0). Java and C# CFG tests are unaffected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 16:32:39 +00:00
Taus	28567870ac	WIP2	2026-05-26 16:32:38 +00:00
Taus	f5629a5583	WIP	2026-05-26 16:32:38 +00:00
Taus	75a3168c09	Python: Ignore synthetic CFG nodes We can only annotate the ones that correspond directly to AST nodes anyway. Co-authored-by: yoff <yoff@github.com>	2026-05-26 16:32:37 +00:00
Taus	49c38dddb7	Python: Instantiate CFG tests with new CFG library Co-authored-by: yoff <yoff@github.com>	2026-05-26 16:32:36 +00:00
Taus	166b3226ac	Python: Make CFG tests parameterised Currently we only instantiate them with the old CFG library, but in the future we'll want to do this with the new library as well. Co-authored-by: yoff <yoff@github.com>	2026-05-26 16:32:36 +00:00
Taus	66bdd22a14	Python: Add ConsecutiveTimestamps test This one is potentially a bit iffy -- it checks for a very powerful propetry (that implies many of the other queries), but as the test results show, it can produce false positives when there is in fact no problem. We may want to get rid of it entirely, if it becomes too noisy.	2026-05-26 16:32:36 +00:00
Taus	e21b6b9b2e	Python: Add NeverReachable test This looks for nodes annotated with `t.never` in the test that are reachable in the CFG. This should not happen (it messes with various queries, e.g. the "mixed returns" query), but the test shows that in a few particular cases (involving the `match` statement where all cases contain `return`s), we _do_ have reachable nodes that shouldn't be.	2026-05-26 16:32:36 +00:00
Taus	500dec3f67	Python: Add BasicBlockOrdering test This one demonstrates a bug in the current CFG. In a dictionary comprehension `{k: v for k, v in d.items()}`, we evaluate the value before the key, which is incorrect. (A fix for this bug has been implemented in a separate PR.)	2026-05-26 16:32:36 +00:00
Taus	29ce07c204	Python: Add some CFG-validation queries These use the annotated, self-verifying test files to check various consistency requirements. Some of these may be expressing the same thing in different ways, but it's fairly cheap to keep them around, so I have not attempted to produce a minimal set of queries for this.	2026-05-26 16:32:36 +00:00
Taus	6e77a45fb3	Python: Add self-validating CFG tests These tests consist of various Python constructions (hopefully a somewhat comprehensive set) with specific timestamp annotations scattered throughout. When the tests are run using the Python 3 interpreter, these annotations are checked and compared to the "current timestamp" to see that they are in agreement. This is what makes the tests "self-validating". There are a few different kinds of annotations: the basic `t[4]` style (meaning this is executed at timestamp 4), the `t.dead[4]` variant (meaning this _would_ happen at timestamp 4, but it is in a dead branch), and `t.never` (meaning this is never executed at all). In addition to this, there is a query, MissingAnnotations, which checks whether we have applied these annotations maximally. Many expression nodes are not actually annotatable, so there is a sizeable list of excluded nodes for that query.	2026-05-26 16:32:35 +00:00
Taus	ac23e16786	Python: Move Python 3.15 data-flow tests to a separate file We won't be able to run these tests until Python 3.15 is actually out (and our CI is using it), so it seemed easiest to just put them in their own test directory.	2026-04-17 13:16:46 +00:00
Taus	dc36609743	Python: Add data-flow tests Alas, all these demonstrate is that we already don't fully support the desugared `yield from` form.	2026-04-17 12:15:04 +00:00
Taus	8b1ecf05c9	Python: Update test output This change reflects the `(value, key)` to `(key, value)` fix in an earlier commit.	2026-04-14 13:27:31 +02:00
Taus	fa61f6f3df	Python: Model `@typing.overload` in method resolution Adds `hasOverloadDecorator` as a predicate on functions. It looks for decorators called `overload` or `something.overload` (usually `typing.overload` or `t.overload`). These are then filtered out in the predicates that (approximate) resolving methods according to the MRO. As the test introduced in the previous commit shows, this removes the spurious resolutions we had before.	2026-03-05 22:20:03 +00:00
Taus	0561a63003	Python: Add test for overloaded `__init__` resolution Adds a test showing that `@typing.overload` stubs are spuriously resolved as call targets alongside the actual `__init__` implementation.	2026-03-05 22:20:03 +00:00
Owen Mansel-Chan	99a4fe4828	Update expected test output column numbers	2026-03-04 15:02:53 +00:00
Owen Mansel-Chan	aa28c94562	Remove double space after $ in inline expectations tests	2026-03-04 14:12:42 +00:00
Owen Mansel-Chan	91b6801db1	py: Inline expectation should have space before $	2026-03-04 13:11:38 +00:00
Owen Mansel-Chan	5a97348e78	python: Inline expectation should have space after $ This was a regex-find-replace from `# \$(?! )` (using a negative lookahead) to `# $ `.	2026-03-04 12:45:05 +00:00
yoff	600f585a31	Merge pull request #21296 from yoff/python/bool-comparison-guards Python: Handle guards being compared to boolean literals	2026-02-26 21:13:51 +01:00
Taus	6bfb1e1fae	Merge pull request #21344 from github/tausbn/python-remove-points-to-from-metrics-libraries Python: Remove points-to from metrics library	2026-02-24 15:55:16 +01:00
yoff	7351e82c92	python: handle guards compared to boolean literals	2026-02-24 10:00:22 +01:00
yoff	8488039fb9	python: add tests for guards compared to booleans	2026-02-24 10:00:21 +01:00
Taus	e8de8433f4	Python: Update all metrics-dependant queries The ones that no longer require points-to no longer import `LegacyPointsTo`. The ones that do use the specific `...MetricsWithPointsTo` classes that are applicable.	2026-02-19 12:32:27 +00:00
Taus	248932db7a	Python: Fix `frameworks/data/warnings.ql`	2026-02-16 13:48:32 +00:00
Taus	df0f2f8ce4	Python: Simple dataflow annotations None of these required any changes to the dataflow libraries, so it seemed easiest to put them in their own commit.	2026-02-16 13:48:32 +00:00
Taus	958c798c3f	Python: Accept dataflow test changes New nodes means new results. Luckily we rarely have a test that selects _all_ dataflow nodes.	2026-01-30 12:50:25 +00:00
Taus	ac5a74448f	Python: Fix tests With `ModuleVariableNode`s now appearing for _all_ global variables (not just the ones that actually seem to be used), some of the tests changed a bit. Mostly this was in the form of new flow (because of new nodes that popped into existence). For some inline expectation tests, I opted to instead exclude these results, as there was no suitable location to annotate. For the normal tests, I just accepted the output (after having vetted it carefully, of course).	2026-01-30 12:50:25 +00:00
Taus	34800d1519	Merge pull request #20945 from joefarebrother/python-websockets Python: Model remote flow sources for the `websockets` library	2026-01-29 15:47:46 +01:00
Tom Hvitved	b974a84bef	Merge pull request #21051 from hvitved/shared/flow-summary-provenance-filtering Shared: Provenance-based filtering of flow summaries	2026-01-26 17:24:34 +01:00
Tom Hvitved	0adece7cde	Python: Adapt to changes in `FlowSummaryImpl`	2026-01-26 12:40:19 +01:00
yoff	3dbfb9fa4b	python: add machinery for MaD barriers and reinstate previously removed barrier now as a MaD row	2026-01-22 17:30:24 +01:00
yoff	1ac3706e75	Python support `ListElement` in MaD	2026-01-09 13:08:06 +01:00
yoff	5c6d83ed65	Merge pull request #20877 from joefarebrother/python-tornado-websocket Python: Add models for websocket handlers for Tornado	2025-12-09 10:08:59 +01:00
Taus	1b519384d7	Merge pull request #20739 from github/tausbn/python-remove-top-level-points-to-imports Python: Hide points-to imports in `python.qll`	2025-12-05 14:24:41 +01:00
Joe Farebrother	ac55cf9544	Update test and qldoc	2025-12-01 20:41:59 +00:00
Joe Farebrother	7cf3964e44	Update expectations	2025-12-01 20:27:48 +00:00
Joe Farebrother	384e17a4ef	Implement websockets models	2025-12-01 16:24:59 +00:00

1 2 3 4 5 ...

1335 Commits