Files
codeql/python/ql/lib/semmle/python/frameworks/Pydantic.qll
yoff 1037f5242e Python: migrate dataflow library to new CFG + shared SSA
Switches the trunk dataflow library and all in-tree consumers
(frameworks, ApiGraphs, Concepts, regexp, security customisations,
test harness) from the legacy Flow.qll/ESSA stack to the new
shared-CFG facade (Cfg.qll) and the ESSA-shaped adapter on the
shared-SSA library (SsaImpl.qll).

Highlights:

  * DataFlowPublic/Private/Dispatch, Attributes, VariableCapture,
    IterableUnpacking, ImportResolution, ImportStar, LocalSources,
    TaintTrackingPrivate, MatchUnpacking, TypeTrackingImpl,
    SsaImpl, Builtins all now qualify CFG/SSA references with
    Cfg:: / SsaImpl:: and stop pulling in semmle.python.essa.*.

  * AstNodeImpl.qll/Cfg.qll: ImportMember exposes its inner
    ImportExpr, DefinitionNode.getValue covers Alias / AnnAssign /
    AugAssign / AssignExpr / For-target / Parameter-default,
    ForNode is treated as an expression node, AnnotatedExitNode is
    canonical, and BoolExprNode.getAnOperand drops the dominance
    constraint that did not hold for short-circuit BBs.

  * SsaImpl.qll: parameters always get a ParameterDefinition (so
    unused parameters still have SSA defs), scope-entry defs for
    module globals require an actual store somewhere, scope-exit
    has a synthetic use so reaching-defs survives to module
    boundary, and the legacy SsaSourceVariable / EssaVariable
    surface (getName, getScope, getAUse, getASourceUse,
    getAnImplicitUse) is reinstated for downstream queries.

  * DataFlowPublic.qll: GuardNode redesigned around the new
    structural outcome nodes (isAfterTrue / isAfterFalse).  The
    legacy ConditionBlock + flipped indirection is gone;
    controlsBlock walks UP through 'not' / '==True' / 'is False'
    etc. via outcomeOfGuard, accumulating polarity cleanly.  Only
    BarrierGuard<...> is preserved as public API.

  * ModuleVariableNode.getAWrite and LocalFlow::definitionFlowStep
    bypass SSA and consult Cfg::NameNode.defines /
    Cfg::DefinitionNode.getValue directly, so that write defs
    pruned by shared SSA (because the variable has no in-scope
    read) still produce dataflow steps.

  * Frameworks + downstream consumers: replace
    EssaVariable.hasDefiningNode, getAReturnValueFlowNode,
    Parameter.getDefault, Scope.getEntryNode / getANormalExit etc.
    with CFG-side bridges through Cfg::ControlFlowNode.

The legacy Flow.qll / Essa.qll stack is untouched and remains
available for queries that import it directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-28 21:09:47 +00:00

113 lines
4.3 KiB
Plaintext

/**
* Provides classes modeling security-relevant aspects of the `pydantic` PyPI package.
*
* See
* - https://pypi.org/project/pydantic/
* - https://pydantic-docs.helpmanual.io/
*/
private import python
private import semmle.python.controlflow.internal.Cfg as Cfg
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.TaintTracking
private import semmle.python.Concepts
private import semmle.python.ApiGraphs
private import semmle.python.frameworks.data.ModelsAsData
/**
* INTERNAL: Do not use.
*
* Provides models for `pydantic` PyPI package.
*
* See
* - https://pypi.org/project/pydantic/
* - https://pydantic-docs.helpmanual.io/
*/
module Pydantic {
/**
* Provides models for `pydantic.BaseModel` subclasses (a pydantic model).
*
* See https://pydantic-docs.helpmanual.io/usage/models/.
*/
module BaseModel {
/** Gets a reference to a `pydantic.BaseModel` subclass (a pydantic model). */
API::Node subclassRef() {
result = API::moduleImport("pydantic").getMember("BaseModel").getASubclass+()
or
result = ModelOutput::getATypeNode("pydantic.BaseModel~Subclass").getASubclass*()
}
/**
* A source of instances of `pydantic.BaseModel` subclasses, extend this class to model new instances.
*
* This can include instantiations of the class, return values from function
* calls, or a special parameter that will be set when functions are called by an external
* library.
*
* Use the predicate `BaseModel::instance()` to get references to instances of `pydantic.BaseModel`.
*/
abstract class InstanceSource extends DataFlow::LocalSourceNode { }
/** Gets a reference to an instance of a `pydantic.BaseModel` subclass. */
private DataFlow::TypeTrackingNode instance(DataFlow::TypeTracker t) {
t.start() and
result instanceof InstanceSource
or
t.start() and
instanceStepToPydanticModel(_, result)
or
exists(DataFlow::TypeTracker t2 | result = instance(t2).track(t2, t))
}
/** Gets a reference to an instance of a `pydantic.BaseModel` subclass. */
DataFlow::Node instance() { instance(DataFlow::TypeTracker::end()).flowsTo(result) }
/**
* A step from an instance of a `pydantic.BaseModel` subclass, that might result in
* an instance of a `pydantic.BaseModel` subclass.
*
* NOTE: We currently overapproximate, and treat all attributes as containing
* another pydantic model. For the code below, we _could_ limit this to `main_foo`
* and members of `other_foos`. IF THIS IS CHANGED, YOU MUST CHANGE THE ADDITIONAL
* TAINT STEPS BELOW, SUCH THAT SIMPLE ACCESS OF SOMETHING LIKE `str` IS STILL
* TAINTED.
*
*
* ```py
* class MyComplexModel(BaseModel):
* field: str
* main_foo: Foo
* other_foos: List[Foo]
* ```
*/
private predicate instanceStepToPydanticModel(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
// attributes (such as `model.foo`)
nodeFrom = instance() and
nodeTo.(DataFlow::AttrRead).getObject() = nodeFrom
or
// subscripts on attributes (such as `model.foo[0]`). This needs to handle nested
// lists (such as `model.foo[0][0]`), and access being split into multiple
// statements (such as `xs = model.foo; xs[0]`).
//
// To handle this we overapproximate which things are a Pydantic model, by
// treating any subscript on anything that originates on a Pydantic model to also
// be a Pydantic model. So `model[0]` will be an overapproximation, but should not
// really cause problems (since we don't expect real code to contain such accesses)
nodeFrom = instance() and
nodeTo.asCfgNode().(Cfg::SubscriptNode).getObject() = nodeFrom.asCfgNode()
}
/**
* Extra taint propagation for `pydantic.BaseModel` subclasses. (note that these could also be `pydantic.BaseModel` subclasses)
*/
private class AdditionalTaintStep extends TaintTracking::AdditionalTaintStep {
override predicate step(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
// NOTE: if `instanceStepToPydanticModel` is changed to be more precise, these
// taint steps should be expanded, such that a field that has type `str` is
// still tainted.
instanceStepToPydanticModel(nodeFrom, nodeTo)
}
}
}
}