Commit Graph

2538 Commits

Author SHA1 Message Date
Copilot
b2ff09f70a Python: add new shared-SSA-backed SSA adapter
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python SSA adapter additively, without changing any production
behaviour.

Library additions:

- semmle.python.dataflow.new.internal.SsaImpl — Python SSA
  implementation built on the new (shared) CFG. Mirrors the Java SSA
  adapter (java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll):
  an InputSig is defined in terms of positional (BasicBlock, int)
  variable references, and the shared
  codeql.ssa.Ssa::Make<Location, Cfg, Input> module is then
  instantiated.

  SourceVariable is the AST-level Py::Variable. Variable references
  are looked up via the new CFG facade's NameNode.defines/uses/deletes
  predicates (added in the preceding PR), which themselves are
  one-line bridges to AST-level Name.defines/uses/deletes.

  Implicit-entry definitions are inserted for non-local/global/builtin
  reads, captured variables, and (when needed) parameters.

Test additions:

- library-tests/dataflow-new-ssa/ — exercises the new SSA over a
  representative test corpus and checks expected def/use chains.

- library-tests/dataflow-new-ssa-vs-legacy/ — runs both new SSA and
  legacy ESSA over the same corpus and diffs the results, so any
  semantic divergence shows up as a test failure.

Production impact:

None. The new SSA adapter has zero callers in lib/ and src/ — the
legacy ESSA SSA (semmle/python/essa/*) remains the default. The
dataflow library is not migrated yet; that lands in a follow-up PR.

Verified by:
- All 367 lib + src + consistency-queries compile clean.
- All 641 ControlFlow + PointsTo + dataflow + essa + consistency
  library-tests pass.
- Both new dataflow-new-ssa[/vs-legacy] test packs pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-05 08:03:48 +00:00
Copilot
4aee0b3c87 Python: add new shared-CFG-backed control flow graph
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python CFG library additively, without changing any production
behaviour.

Library additions:

- semmle.python.controlflow.internal.AstNodeImpl — mediates between
  the Python AST and the shared codeql.controlflow.ControlFlowGraph
  signature. Wraps Python's Stmt/Expr/Scope/Pattern and adds two
  synthetic kinds of node (BlockStmt for body slots, intermediate
  nodes for multi-operand boolean expressions).

- semmle.python.controlflow.internal.Cfg — public facade
  re-exposing the same API surface as semmle/python/Flow.qll
  (ControlFlowNode, CallNode, BasicBlock, NameNode, DefinitionNode,
  CompareNode, ...), backed by the shared CFG.

- lib/printCfgNew.ql — debug/visualisation query for the new CFG.

- consistency-queries/CfgConsistency.ql — consistency query running
  the shared CFG's standard checks against Python.

Shared library:

- shared.controlflow.ControlFlowGraph — adds two defaulted
  getWhileElse / getForeachElse predicates to AstSig so Python can
  model while-else / for-else (no behavioural change for other
  languages).

Test additions:

- ControlFlow/bindings/* — annotation-driven SSA-binding tests for
  the new CFG (annassign, compound, comprehension, decorated,
  except_handler, imports, match_pattern, parameters, simple,
  type_params, walrus_starred, with_stmt, dead_under_no_raise).

- ControlFlow/store-load/* — basic store/load coverage.

- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
  OldCfg evaluation-order self-validation suite, run against the
  new CFG via NewCfgImpl.qll.

- Minor extensions to existing test_if.py / test_boolean.py +
  cosmetic .expected churn on a handful of OldCfg tests.

No dataflow, SSA, or production query is migrated yet — that lands in
follow-up PRs. The new CFG library has zero callers in lib/ and src/.

Verified by:
- All lib + src + consistency-queries compile clean (367 queries).
- All 56 ControlFlow library-tests pass.
- All 474 dataflow + PointsTo library-tests + consistency tests pass.
- syntax_error/CONSISTENCY/CfgConsistency passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-05 08:03:29 +00:00
yoff
a2295e7216 Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-05 07:57:43 +00:00
Copilot
0623acc7f5 Python: qualify Flow.qll's AST references with Py:: prefix
Preparatory refactor for the shared-CFG dataflow migration. Switches
'import python' to 'import python as Py' inside Flow.qll, and qualifies
every AST-class reference (Expr, Bytes, Dict, AssignExpr, Compare,
Module, Scope, Call, Attribute, SsaVariable, AugAssign, etc.) with the
Py:: prefix.

Flow.qll's own CFG types (ControlFlowNode, BasicBlock, CallNode,
NameNode, DefinitionNode, CompareNode, ...) keep their unqualified
names — they remain the public CFG API exported from this file.

This is a semantic noop: the qualification was applied mechanically by
script and no name resolution changes. Verified by:
- All 361 lib/ + src/ queries compile clean.
- All 186 ControlFlow + PointsTo + dataflow library-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-05 07:57:42 +00:00
yoff
dc5aa8e0f5 Python: deprecate Function.getAReturnValueFlowNode() and rewrite internal callers
Follow-up to the getAFlowNode deprecation in the same PR: same AST→legacy-CFG
bridge pattern. The 11 internal call sites (across objects/, types/,
frameworks/, and TypeTrackingImpl) are rewritten to bind a `Return ret`
explicitly, then constrain via `ret.getScope() = f and n.getNode() = ret.getValue()`.

The predicate itself is preserved with a deprecation note so external
users do not experience churn.

Semantic noop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-05 07:57:42 +00:00
Copilot
db1e5035b4 Python: deprecate AstNode.getAFlowNode() and rewrite internal callers
Preparatory refactor for the shared-CFG dataflow migration.

Deprecates the AstNode.getAFlowNode() cached predicate on the public
Python QL API and rewrites all ~140 internal callers across lib/, src/,
test/, and tools/ from `expr.getAFlowNode() = cfgNode` to
`cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which
already exists in Flow.qll.

The predicate itself is preserved (with a deprecation note pointing at
the new pattern) so external users do not experience churn — they can
migrate at their own pace and the AST/CFG hierarchies still get the
intended untangling once the deprecation eventually elapses.

Semantic noop verified by:
- All 361 lib/ + src/ queries compile clean.
- All 122 ControlFlow + PointsTo library-tests pass.
- All 64 dataflow library-tests pass.
- All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/
  Security/CWE-798/ModificationOfParameterWithDefault query-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-05 07:57:42 +00:00
yoff
7a3f546587 Python: inline init_module_submodule_defn into ImportResolution
The new-dataflow ImportResolution module only used
semmle.python.essa.SsaDefinitions for the 5-line helper predicate
SsaSource::init_module_submodule_defn. Inline it locally and drop the
dependency on legacy SsaDefinitions. This is the only remaining direct
import of semmle.python.essa.* in the new dataflow stack, so dropping
it makes the layering cleaner.

Semantic noop on the current SSA: SsaSourceVariable.getName() and
GlobalVariable.getId() both project the same DB column
(variable(_,_,result)), and the old call's 'init.getEntryNode() = f'
join was just constraining init = package via Scope.getEntryNode()'s
functional uniqueness. RA dump of accesses.ql confirms only the
expected predicate-rename shuffle; all 70 dataflow + ApiGraphs library
tests pass.

This factors out commit 8cab5a20f2 from the larger shared-CFG
migration #21925.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-05 07:57:42 +00:00
yoff
821325b7e5 Python: simplify decorator-detection predicates to pure AST match
The internal predicates that identify `@staticmethod`, `@classmethod` and
`@property` decorators previously required the decorator's `NameNode` to
satisfy `isGlobal()` (i.e. no SSA def reaches the decorator's name use).
That filter was correct but unnecessarily indirect: these three names
are builtins, and even when a class body redefines one, the class body
has not started executing at the decorator position, so Python uses the
builtin.

Match the decorator's AST `Name` directly instead, dropping the CFG/SSA
detour. The slight semantic change — `isGlobal()` would have rejected
module-level shadowing of these builtins — is negligible in practice
and explicitly documented in the change note.

`hasContextmanagerDecorator` and `hasOverloadDecorator` keep the
`NameNode.isGlobal()` check because their target names (`contextmanager`,
`overload`) are imported, not builtin, and local shadowing is a real
concern.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-05 07:57:42 +00:00
Josef Svenningsson
68be006a29 Merge pull request #21641 from github/josefs/promptInjectionImprovements
Improve prompt inject for Python
2026-04-29 11:23:52 +01:00
Josef Svenningsson
bb18bb084c Improve prompt inject for Python 2026-04-28 18:24:16 +01:00
Taus
c748fdf8ee Merge pull request #21694 from github/tausbn/python-add-support-for-pep-810
Python: Add support for PEP 810
2026-04-14 13:27:08 +02:00
Owen Mansel-Chan
7458674470 Merge pull request #21584 from owen-mc/shared/update-mad-comments
Shared: update code comments explaining models-as-data format to include barriers and barrier guards
2026-04-14 09:30:28 +01:00
Taus
1ddfed6b6b Python: Add QL support for lazy imports
Adds a new `isLazy` predicate to the relevant classes, and adds the
relevant dbscheme (and up/downgrade) changes. On upgrades we do nothing,
and on downgrades we remove the `is_lazy` bits.
2026-04-10 14:25:08 +00:00
Taus
16683aee0e Merge pull request #21590 from github/tausbn/python-improve-bind-all-interfaces-query
Python: Improve "bind all interfaces" query
2026-04-07 17:59:48 +02:00
Owen Mansel-Chan
37aac05964 Replace branch with acceptingValue 2026-03-27 22:39:10 +00:00
Owen Mansel-Chan
10fddc7b96 Add barriers and barrier guards to MaD format explanations 2026-03-27 09:47:24 +00:00
Taus
c439fc5d45 Python: Replace type tracking with global data-flow
This takes care of most of the false negatives from the preceding
commit.

Additionally, we add models for some known wrappers of `socket.socket`
from the `gevent` and `eventlet` packages.
2026-03-26 15:35:33 +00:00
Taus
ac48eca916 Python: Use cls.getMethod instead of getName 2026-03-23 15:26:00 +00:00
Taus
93e35661e6 Python: Make isNewType more precise
For module-level metaclass declarations, we now also check that the
right hand side in a `__metaclass__ = type` assignment is in fact the
built-in `type`.
2026-03-23 15:22:24 +00:00
Taus
a276f721f7 Python: Add ternary overridesMethod
This one also allows easy access to the method being overridden and the
class on which it resides. This let's us simplify DocStrings.ql
accordingly.
2026-03-23 15:21:27 +00:00
Taus
56c83e250e Python: Make comment more precise
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-23 15:09:27 +01:00
Taus
50b3b7ee1f Python: Add DuckTyping::hasUnreliableMro
Primarily used to filter out false positives in cases where our MRO
approximation may be wrong.
2026-03-20 13:30:29 +00:00
Taus
c04b615a07 Python: Extend DuckTyping module
Adds `overridesMethod` and `isPropertyAccessor`.
2026-03-20 13:28:45 +00:00
Taus
b57e92164c Python: Add declares/getAttribute API
These could arguably be moved to `Class` itself, but for now I'm
choosing to limit the changes to the `DuckTyping` module (until we
decide on a proper API).
2026-03-20 13:28:45 +00:00
Taus
cd92162920 Python: Add DuckTyping::isNewStyle
Approximates the behaviour of `Types::isNewStyle` but without depending
on points-to
2026-03-20 13:28:45 +00:00
Taus
33ed6034f6 Python: Introduce DuckTyping module
This module (which for convenience currently resides inside
`DataFlowDispatch`, but this may change later) contains convenience
predicates for bridging the gap between the data-flow layer and the old
points-to analysis.
2026-03-20 13:28:44 +00:00
Taus
f4841e1f39 Python: Use API graphs instead of points-to for simple built-ins
Also extends the list of known built-ins slightly, to add some that were
missing.
2026-03-20 13:28:44 +00:00
Taus
a99b3f2c3b Merge pull request #21459 from github/tausbn/python-fix-missing-relative-imports
Python: Fix resolution of relative imports from namespace packages
2026-03-16 14:59:44 +01:00
Taus
e16bb226c0 Python: Fix resolution of relative imports from namespace packages
The fix may look a bit obscure, so here's what's going on.

When we see `from . import helper`, we create an `ImportExpr` with level
equal to 1 (corresponding to the number of dots). To resolve such
imports, we compute the name of the enclosing package, as part of
`ImportExpr.qualifiedTopName()`. For this form of import expression, it
is equivalent to `this.getEnclosingModule().getPackageName()`. But
`qualifiedTopName` requires that `valid_module_name` holds for its
result, and this was _not_ the case for namespace packages.

To fix this, we extend `valid_module_name` to include the module names
of _any_ folder, not just regular package (which are the ones where
there's a `__init__.py` in the folder). Note that this doesn't simply
include all folders -- only the ones that result in valid module names
in Python.
2026-03-12 13:29:23 +00:00
Taus
f2bad1e6e1 Python: Improve docstring and make predicate private 2026-03-09 13:41:38 +00:00
Taus
fa61f6f3df Python: Model @typing.overload in method resolution
Adds `hasOverloadDecorator` as a predicate on functions. It looks for
decorators called `overload` or `something.overload` (usually
`typing.overload` or `t.overload`). These are then filtered out in the
predicates that (approximate) resolving methods according to the MRO.

As the test introduced in the previous commit shows, this removes the
spurious resolutions we had before.
2026-03-05 22:20:03 +00:00
yoff
600f585a31 Merge pull request #21296 from yoff/python/bool-comparison-guards
Python: Handle guards being compared to boolean literals
2026-02-26 21:13:51 +01:00
yoff
89e5a9bd72 Update python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPublic.qll
Co-authored-by: Taus <tausbn@github.com>
2026-02-26 13:14:26 +01:00
yoff
cfbae50845 Python: convert barrier guard to MaD 2026-02-26 13:12:34 +01:00
yoff
9b9c9304c7 Python: simplify logic, suggested in review 2026-02-25 18:16:38 +01:00
yoff
c4f8748a42 Python: simplify barrier guard 2026-02-25 18:03:40 +01:00
Taus
6bfb1e1fae Merge pull request #21344 from github/tausbn/python-remove-points-to-from-metrics-libraries
Python: Remove points-to from metrics library
2026-02-24 15:55:16 +01:00
yoff
7351e82c92 python: handle guards compared to boolean literals 2026-02-24 10:00:22 +01:00
Taus
480ae619e6 Merge pull request #21116 from github/tausbn/python-add-dataflow-overlay-annotations
Add `overlay[local]` annotations
2026-02-21 13:44:09 +01:00
Taus
20fea3955e Python: Remove points-to from Metrics.qll
Moves the classes/predicates that _actually_ depend on points-to to the
`LegacyPointsTo` module, leaving behind a module that contains all of
the metrics-related stuff (line counts, nesting depth, etc.) that don't
need points-to to be evaluated.

Consequently, `Metrics` is now no longer a private import in
`python.qll`.
2026-02-19 12:32:27 +00:00
Taus
6b6d8862b0 Merge pull request #21288 from microsoft/azure_python_sanitizer_upstream2
Azure python sanitizer upstream2
2026-02-18 14:59:59 +01:00
Ben Rodes
ceb3b21e0f Update python/ql/lib/semmle/python/security/dataflow/ServerSideRequestForgeryCustomizations.qll
Co-authored-by: Taus <tausbn@github.com>
2026-02-17 10:28:43 -05:00
Taus
cd62cdadff Python: Fix bad join in returnStep 2026-02-16 16:48:08 +00:00
Taus
304cd12fff Python: Fix bad join in missing_imported_module
This caused a ~30x blowup in intermediate tuples, now back to baseline.
2026-02-16 13:48:33 +00:00
Taus
987b10ab3e Python: Fix bad join in OutgoingRequestCall
On `keras-team/keras`, this was producing ~200 million intermediate
tuples in order to produce a total of ... 2 tuples.

After the refactor, max intermediate tuple count is ~80k for the
charpred (and 4 for the new helper predicate).
2026-02-16 13:48:33 +00:00
Taus
72f5109ec2 Python: Add more overlay[caller] to Flow.qll
These were causing the repo `gufolabs/noc` to spend ~30 seconds
evaluating `ControlFlowNode.strictlyDominates`. Just in case, I added
`overlay[caller] to the other instances of `pragma[inline]` as well.
2026-02-16 13:48:33 +00:00
Taus
306d7d1b5d Python: DataFlowDispatch.qll annotations 2026-02-16 13:48:32 +00:00
Taus
7ea96c43ec Python: DataFlowPrivate.qll annotations 2026-02-16 13:48:32 +00:00
Taus
bd71db87be Python: DataFlowPublic.qll annotations 2026-02-16 13:48:32 +00:00
Taus
c46c662b72 Python: LocalSources.qll annotations 2026-02-16 13:48:32 +00:00