mirror of
https://github.com/github/codeql.git
synced 2026-06-24 22:27:03 +02:00
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
79 lines
3.0 KiB
Plaintext
79 lines
3.0 KiB
Plaintext
/**
|
|
* Provides classes modeling security-relevant aspects of the `PyYAML` PyPI package
|
|
* (imported as `yaml`)
|
|
*
|
|
* See
|
|
* - https://pyyaml.org/wiki/PyYAMLDocumentation
|
|
* - https://pyyaml.docsforge.com/master/documentation/
|
|
*/
|
|
|
|
private import python
|
|
private import semmle.python.controlflow.internal.Cfg as Cfg
|
|
private import semmle.python.dataflow.new.DataFlow
|
|
private import semmle.python.Concepts
|
|
private import semmle.python.ApiGraphs
|
|
|
|
/**
|
|
* Provides classes modeling security-relevant aspects of the `PyYAML` PyPI package
|
|
* (imported as `yaml`)
|
|
*
|
|
* See
|
|
* - https://pyyaml.org/wiki/PyYAMLDocumentation
|
|
* - https://pyyaml.docsforge.com/master/documentation/
|
|
*/
|
|
private module Yaml {
|
|
/**
|
|
* A call to any of the loading functions in `yaml` (`load`, `load_all`, `full_load`,
|
|
* `full_load_all`, `unsafe_load`, `unsafe_load_all`, `safe_load`, `safe_load_all`)
|
|
*
|
|
* See https://pyyaml.org/wiki/PyYAMLDocumentation (you will have to scroll down).
|
|
*/
|
|
private class YamlLoadCall extends Decoding::Range, DataFlow::CallCfgNode {
|
|
override Cfg::CallNode node;
|
|
string func_name;
|
|
|
|
YamlLoadCall() {
|
|
func_name in [
|
|
"load", "load_all", "full_load", "full_load_all", "unsafe_load", "unsafe_load_all",
|
|
"safe_load", "safe_load_all"
|
|
] and
|
|
this = API::moduleImport("yaml").getMember(func_name).getACall()
|
|
}
|
|
|
|
/**
|
|
* This function was thought safe from the 5.1 release in 2017, when the default
|
|
* loader was changed to `FullLoader` (see
|
|
* https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation).
|
|
*
|
|
* In 2020 new exploits were found, meaning it's not safe. With the 6.0 release (see
|
|
* https://github.com/yaml/pyyaml/commit/8cdff2c80573b8be8e8ad28929264a913a63aa33),
|
|
* when using `load` and `load_all` you are now required to specify a Loader. But
|
|
* from what I (@RasmusWL) can gather, `FullLoader` is not to be considered safe,
|
|
* although known exploits have been mitigated (is at least my impression). Also see
|
|
* https://github.com/yaml/pyyaml/issues/420#issuecomment-696752389 for more
|
|
* details.
|
|
*/
|
|
override predicate mayExecuteInput() {
|
|
func_name in ["full_load", "full_load_all", "unsafe_load", "unsafe_load_all"]
|
|
or
|
|
func_name in ["load", "load_all"] and
|
|
// If the `Loader` is not set to either `SafeLoader` or `BaseLoader` or not set at all,
|
|
// then the default loader will be used, which is not safe.
|
|
not exists(DataFlow::Node loader_arg |
|
|
loader_arg in [this.getArg(1), this.getArgByName("Loader")]
|
|
|
|
|
loader_arg =
|
|
API::moduleImport("yaml")
|
|
.getMember(["SafeLoader", "BaseLoader", "CSafeLoader", "CBaseLoader"])
|
|
.getAValueReachableFromSource()
|
|
)
|
|
}
|
|
|
|
override DataFlow::Node getAnInput() { result in [this.getArg(0), this.getArgByName("stream")] }
|
|
|
|
override DataFlow::Node getOutput() { result = this }
|
|
|
|
override string getFormat() { result = "YAML" }
|
|
}
|
|
}
|