Merge branch 'main' into jty/python/emailInjection

2025-12-22 11:46:32 +01:00 · 2022-05-26 16:27:57 -04:00
parent b5734ed6a2 57b9e6ee40
commit 76c27c685f
5291 changed files with 409020 additions and 32790 deletions
--- a/python/ql/lib/CHANGELOG.md
+++ b/python/ql/lib/CHANGELOG.md
@@ -1,3 +1,42 @@
+## 0.3.0
+
+### Breaking Changes
+
+* The imports made available from `import python` are no longer exposed under `DataFlow::` after doing `import semmle.python.dataflow.new.DataFlow`, for example using `DataFlow::Add` will now cause a compile error.
+
+### Minor Analysis Improvements
+
+* The modeling of `request.files` in Flask has been fixed, so we now properly handle assignments to local variables (such as `files = request.files; files['key'].filename`).
+* Added taint propagation for `io.StringIO` and `io.BytesIO`. This addition was originally [submitted as part of an experimental query by @jorgectf](https://github.com/github/codeql/pull/6112).
+
+## 0.2.0
+
+### Breaking Changes
+
+* The signature of `allowImplicitRead` on `DataFlow::Configuration` and `TaintTracking::Configuration` has changed from `allowImplicitRead(DataFlow::Node node, DataFlow::Content c)` to `allowImplicitRead(DataFlow::Node node, DataFlow::ContentSet c)`.
+
+## 0.1.0
+
+### Breaking Changes
+
+* The recently added flow-state versions of `isBarrierIn`, `isBarrierOut`, `isSanitizerIn`, and `isSanitizerOut` in the data flow and taint tracking libraries have been removed.
+
+### Deprecated APIs
+
+* Queries importing a data-flow configuration from `semmle.python.security.dataflow`
+  should ensure that the imported file ends with `Query`, and only import its top-level
+  module. For example, a query that used `CommandInjection::Configuration` from
+  `semmle.python.security.dataflow.CommandInjection` should from now use `Configuration`
+  from `semmle.python.security.dataflow.CommandInjectionQuery` instead.
+
+### Major Analysis Improvements
+
+* Added data-flow for Django ORM models that are saved in a database (no `models.ForeignKey` support).
+
+### Minor Analysis Improvements
+
+* Improved modeling of Flask `Response` objects, so passing a response body with the keyword argument `response` is now recognized.
+
 ## 0.0.13

 ## 0.0.12
--- a/python/ql/lib/change-notes/2022-02-17-Django-orm-dataflow-steps.md
+++ b/python/ql/lib/change-notes/2022-02-17-Django-orm-dataflow-steps.md
@@ -1,4 +0,0 @@
---
-category: majorAnalysis
---
-* Added data-flow for Django ORM models that are saved in a database (no `models.ForeignKey` support).
--- a/python/ql/lib/change-notes/2022-03-21-query-suffix-convention.md
+++ b/python/ql/lib/change-notes/2022-03-21-query-suffix-convention.md
@@ -1,8 +0,0 @@
---
-category: deprecated
---
-* Queries importing a data-flow configuration from `semmle.python.security.dataflow`
-  should ensure that the imported file ends with `Query`, and only import its top-level
-  module. For example, a query that used `CommandInjection::Configuration` from
-  `semmle.python.security.dataflow.CommandInjection` should from now use `Configuration`
-  from `semmle.python.security.dataflow.CommandInjectionQuery` instead.
--- a/python/ql/lib/change-notes/2022-03-30-flask-recognize-body-param.md
+++ b/python/ql/lib/change-notes/2022-03-30-flask-recognize-body-param.md
@@ -1,4 +0,0 @@
---
-category: minorAnalysis
---
-* Improved modeling of Flask `Response` objects, so passing a response body with the keyword argument `response` is now recognized.
--- a/python/ql/lib/change-notes/2022-04-08-flow-state-barriers.md
+++ b/python/ql/lib/change-notes/2022-04-08-flow-state-barriers.md
@@ -1,4 +0,0 @@
---
-category: breaking
---
-The recently added flow-state versions of `isBarrierIn`, `isBarrierOut`, `isSanitizerIn`, and `isSanitizerOut` in the data flow and taint tracking libraries have been removed.
--- a/python/ql/lib/change-notes/2022-05-12-moduleimport-disallow-dots.md
+++ b/python/ql/lib/change-notes/2022-05-12-moduleimport-disallow-dots.md
@@ -0,0 +1,4 @@
+---
+category: breaking
+---
+`API::moduleImport` no longer has any results for dotted names, such as `API::moduleImport("foo.bar")`. Using `API::moduleImport("foo.bar").getMember("baz").getACall()` previously worked if the Python code was `from foo.bar import baz; baz()`, but not if the code was `import foo.bar; foo.bar.baz()` -- we are making this change to ensure the approach that can handle all cases is always used.
--- a/python/ql/lib/change-notes/released/0.1.0.md
+++ b/python/ql/lib/change-notes/released/0.1.0.md
@@ -0,0 +1,21 @@
+## 0.1.0
+
+### Breaking Changes
+
+* The recently added flow-state versions of `isBarrierIn`, `isBarrierOut`, `isSanitizerIn`, and `isSanitizerOut` in the data flow and taint tracking libraries have been removed.
+
+### Deprecated APIs
+
+* Queries importing a data-flow configuration from `semmle.python.security.dataflow`
+  should ensure that the imported file ends with `Query`, and only import its top-level
+  module. For example, a query that used `CommandInjection::Configuration` from
+  `semmle.python.security.dataflow.CommandInjection` should from now use `Configuration`
+  from `semmle.python.security.dataflow.CommandInjectionQuery` instead.
+
+### Major Analysis Improvements
+
+* Added data-flow for Django ORM models that are saved in a database (no `models.ForeignKey` support).
+
+### Minor Analysis Improvements
+
+* Improved modeling of Flask `Response` objects, so passing a response body with the keyword argument `response` is now recognized.
--- a/python/ql/lib/change-notes/released/0.2.0.md
+++ b/python/ql/lib/change-notes/released/0.2.0.md
@@ -0,0 +1,5 @@
+## 0.2.0
+
+### Breaking Changes
+
+* The signature of `allowImplicitRead` on `DataFlow::Configuration` and `TaintTracking::Configuration` has changed from `allowImplicitRead(DataFlow::Node node, DataFlow::Content c)` to `allowImplicitRead(DataFlow::Node node, DataFlow::ContentSet c)`.
--- a/python/ql/lib/change-notes/released/0.3.0.md
+++ b/python/ql/lib/change-notes/released/0.3.0.md
@@ -0,0 +1,10 @@
+## 0.3.0
+
+### Breaking Changes
+
+* The imports made available from `import python` are no longer exposed under `DataFlow::` after doing `import semmle.python.dataflow.new.DataFlow`, for example using `DataFlow::Add` will now cause a compile error.
+
+### Minor Analysis Improvements
+
+* The modeling of `request.files` in Flask has been fixed, so we now properly handle assignments to local variables (such as `files = request.files; files['key'].filename`).
+* Added taint propagation for `io.StringIO` and `io.BytesIO`. This addition was originally [submitted as part of an experimental query by @jorgectf](https://github.com/github/codeql/pull/6112).
--- a/python/ql/lib/codeql-pack.release.yml
+++ b/python/ql/lib/codeql-pack.release.yml
@@ -1,2 +1,2 @@
 ---
-lastReleaseVersion: 0.0.13
+lastReleaseVersion: 0.3.0
--- a/python/ql/lib/qlpack.yml
+++ b/python/ql/lib/qlpack.yml
@@ -1,5 +1,5 @@
 name: codeql/python-all
-version: 0.1.0-dev
+version: 0.3.1-dev
 groups: python
 dbscheme: semmlecode.python.dbscheme
 extractor: python
--- a/python/ql/lib/semmle/python/ApiGraphs.qll
+++ b/python/ql/lib/semmle/python/ApiGraphs.qll
@@ -280,7 +280,13 @@ module API {
   * you should use `.getMember` on the parent module. For example, for nodes corresponding to the module `foo.bar`,
   * use `moduleImport("foo").getMember("bar")`.
   */
-  Node moduleImport(string m) { result = Impl::MkModuleImport(m) }
+  Node moduleImport(string m) {
+    result = Impl::MkModuleImport(m) and
+    // restrict `moduleImport` so it will never give results for a dotted name. Note
+    // that we cannot move this logic to the `MkModuleImport` construction, since we
+    // need the intermediate API graph nodes for the prefixes in `import foo.bar.baz`.
+    not m.matches("%.%")
+  }

  /** Gets a node corresponding to the built-in with the given name, if any. */
  Node builtin(string n) { result = moduleImport("builtins").getMember(n) }
@@ -779,7 +785,7 @@ module API {
        MkLabelAwait()

      /** A label for a module. */
-      class LabelModule extends ApiLabel {
+      class LabelModule extends ApiLabel, MkLabelModule {
        string mod;

        LabelModule() { this = MkLabelModule(mod) }
@@ -791,7 +797,7 @@ module API {
      }

      /** A label for the member named `prop`. */
-      class LabelMember extends ApiLabel {
+      class LabelMember extends ApiLabel, MkLabelMember {
        string member;

        LabelMember() { this = MkLabelMember(member) }
@@ -803,14 +809,12 @@ module API {
      }

      /** A label for a member with an unknown name. */
-      class LabelUnknownMember extends ApiLabel {
-        LabelUnknownMember() { this = MkLabelUnknownMember() }
-
+      class LabelUnknownMember extends ApiLabel, MkLabelUnknownMember {
        override string toString() { result = "getUnknownMember()" }
      }

      /** A label for parameter `i`. */
-      class LabelParameter extends ApiLabel {
+      class LabelParameter extends ApiLabel, MkLabelParameter {
        int i;

        LabelParameter() { this = MkLabelParameter(i) }
@@ -822,7 +826,7 @@ module API {
      }

      /** A label for a keyword parameter `name`. */
-      class LabelKeywordParameter extends ApiLabel {
+      class LabelKeywordParameter extends ApiLabel, MkLabelKeywordParameter {
        string name;

        LabelKeywordParameter() { this = MkLabelKeywordParameter(name) }
@@ -834,23 +838,17 @@ module API {
      }

      /** A label that gets the return value of a function. */
-      class LabelReturn extends ApiLabel {
-        LabelReturn() { this = MkLabelReturn() }
-
+      class LabelReturn extends ApiLabel, MkLabelReturn {
        override string toString() { result = "getReturn()" }
      }

      /** A label that gets the subclass of a class. */
-      class LabelSubclass extends ApiLabel {
-        LabelSubclass() { this = MkLabelSubclass() }
-
+      class LabelSubclass extends ApiLabel, MkLabelSubclass {
        override string toString() { result = "getASubclass()" }
      }

      /** A label for awaited values. */
-      class LabelAwait extends ApiLabel {
-        LabelAwait() { this = MkLabelAwait() }
-
+      class LabelAwait extends ApiLabel, MkLabelAwait {
        override string toString() { result = "getAwaited()" }
      }
    }
--- a/python/ql/lib/semmle/python/Concepts.qll
+++ b/python/ql/lib/semmle/python/Concepts.qll
@@ -498,6 +498,65 @@ module XML {
      abstract string getName();
    }
  }
+
+  /**
+   * A kind of XML vulnerability.
+   *
+   * See overview of kinds at https://pypi.org/project/defusedxml/#python-xml-libraries
+   *
+   * See PoC at `python/PoCs/XmlParsing/PoC.py` for some tests of vulnerable XML parsing.
+   */
+  class XmlParsingVulnerabilityKind extends string {
+    XmlParsingVulnerabilityKind() { this in ["XML bomb", "XXE", "DTD retrieval"] }
+
+    /**
+     * Holds for XML bomb vulnerability kind, such as 'Billion Laughs' and 'Quadratic
+     * Blowup'.
+     *
+     * While a parser could technically be vulnerable to one and not the other, from our
+     * point of view the interesting part is that it IS vulnerable to these types of
+     * attacks, and not so much which one specifically works. In practice I haven't seen
+     * a parser that is vulnerable to one and not the other.
+     */
+    predicate isXmlBomb() { this = "XML bomb" }
+
+    /** Holds for XXE vulnerability kind. */
+    predicate isXxe() { this = "XXE" }
+
+    /** Holds for DTD retrieval vulnerability kind. */
+    predicate isDtdRetrieval() { this = "DTD retrieval" }
+  }
+
+  /**
+   * A data-flow node that parses XML.
+   *
+   * Extend this class to model new APIs. If you want to refine existing API models,
+   * extend `XmlParsing` instead.
+   */
+  class XmlParsing extends Decoding instanceof XmlParsing::Range {
+    /**
+     * Holds if this XML parsing is vulnerable to `kind`.
+     */
+    predicate vulnerableTo(XmlParsingVulnerabilityKind kind) { super.vulnerableTo(kind) }
+  }
+
+  /** Provides classes for modeling XML parsing APIs. */
+  module XmlParsing {
+    /**
+     * A data-flow node that parses XML.
+     *
+     * Extend this class to model new APIs. If you want to refine existing API models,
+     * extend `XmlParsing` instead.
+     */
+    abstract class Range extends Decoding::Range {
+      /**
+       * Holds if this XML parsing is vulnerable to `kind`.
+       */
+      abstract predicate vulnerableTo(XmlParsingVulnerabilityKind kind);
+
+      override string getFormat() { result = "XML" }
+    }
+  }
 }

 /** Provides classes for modeling LDAP-related APIs. */
@@ -910,6 +969,76 @@ module HTTP {
        abstract DataFlow::Node getValueArg();
      }
    }
+
+    /**
+     * A data-flow node that enables or disables Cross-site request forgery protection
+     * in a global manner.
+     *
+     * Extend this class to refine existing API models. If you want to model new APIs,
+     * extend `CsrfProtectionSetting::Range` instead.
+     */
+    class CsrfProtectionSetting extends DataFlow::Node instanceof CsrfProtectionSetting::Range {
+      /**
+       * Gets the boolean value corresponding to if CSRF protection is enabled
+       * (`true`) or disabled (`false`) by this node.
+       */
+      boolean getVerificationSetting() { result = super.getVerificationSetting() }
+    }
+
+    /** Provides a class for modeling new CSRF protection setting APIs. */
+    module CsrfProtectionSetting {
+      /**
+       * A data-flow node that enables or disables Cross-site request forgery protection
+       * in a global manner.
+       *
+       * Extend this class to model new APIs. If you want to refine existing API models,
+       * extend `CsrfProtectionSetting` instead.
+       */
+      abstract class Range extends DataFlow::Node {
+        /**
+         * Gets the boolean value corresponding to if CSRF protection is enabled
+         * (`true`) or disabled (`false`) by this node.
+         */
+        abstract boolean getVerificationSetting();
+      }
+    }
+
+    /**
+     * A data-flow node that enables or disables Cross-site request forgery protection
+     * for a specific part of an application.
+     *
+     * Extend this class to refine existing API models. If you want to model new APIs,
+     * extend `CsrfLocalProtectionSetting::Range` instead.
+     */
+    class CsrfLocalProtectionSetting extends DataFlow::Node instanceof CsrfLocalProtectionSetting::Range {
+      /**
+       * Gets a request handler whose CSRF protection is changed.
+       */
+      Function getRequestHandler() { result = super.getRequestHandler() }
+
+      /** Holds if CSRF protection is enabled by this setting */
+      predicate csrfEnabled() { super.csrfEnabled() }
+    }
+
+    /** Provides a class for modeling new CSRF protection setting APIs. */
+    module CsrfLocalProtectionSetting {
+      /**
+       * A data-flow node that enables or disables Cross-site request forgery protection
+       * for a specific part of an application.
+       *
+       * Extend this class to model new APIs. If you want to refine existing API models,
+       * extend `CsrfLocalProtectionSetting` instead.
+       */
+      abstract class Range extends DataFlow::Node {
+        /**
+         * Gets a request handler whose CSRF protection is changed.
+         */
+        abstract Function getRequestHandler();
+
+        /** Holds if CSRF protection is enabled by this setting */
+        abstract predicate csrfEnabled();
+      }
+    }
  }

  /** Provides classes for modeling HTTP clients. */
--- a/python/ql/lib/semmle/python/Exprs.qll
+++ b/python/ql/lib/semmle/python/Exprs.qll
@@ -189,7 +189,16 @@ class Call extends Call_ {
   */
  Keyword getKeyword(int index) {
    result = this.getNamedArg(index) and
-    not exists(DictUnpacking d, int lower | d = this.getNamedArg(lower) and lower < index)
+    (
+      not exists(this.getMinimumUnpackingIndex())
+      or
+      index <= this.getMinimumUnpackingIndex()
+    )
+  }
+
+  /** Gets the minimum index (if any) at which a dictionary unpacking (`**foo`) occurs in this call. */
+  private int getMinimumUnpackingIndex() {
+    result = min(int i | this.getNamedArg(i) instanceof DictUnpacking)
  }

  /**
--- a/python/ql/lib/semmle/python/Flow.qll
+++ b/python/ql/lib/semmle/python/Flow.qll
@@ -363,7 +363,7 @@ class CallNode extends ControlFlowNode {
    )
  }

-  /** Gets the flow node corresponding to the nth argument of the call corresponding to this flow node */
+  /** Gets the flow node corresponding to the n'th positional argument of the call corresponding to this flow node */
  ControlFlowNode getArg(int n) {
    exists(Call c |
      this.getNode() = c and
--- a/python/ql/lib/semmle/python/Frameworks.qll
+++ b/python/ql/lib/semmle/python/Frameworks.qll
@@ -52,3 +52,4 @@ private import semmle.python.frameworks.Ujson
 private import semmle.python.frameworks.Urllib3
 private import semmle.python.frameworks.Yaml
 private import semmle.python.frameworks.Yarl
+private import semmle.python.frameworks.Xmltodict
--- a/python/ql/lib/semmle/python/Import.qll
+++ b/python/ql/lib/semmle/python/Import.qll
@@ -184,7 +184,7 @@ class Import extends Import_ {
   * For example, for the import statement `import bar` which
   * is a relative import in package "foo", this would return
   * "foo.bar".
-   * The import statment `from foo import bar` would return
+   * The import statement `from foo import bar` would return
   * `foo` and `foo.bar`
   */
  string getAnImportedModuleName() {
--- a/python/ql/lib/semmle/python/Module.qll
+++ b/python/ql/lib/semmle/python/Module.qll
@@ -1,5 +1,4 @@
 import python
-private import semmle.python.objects.ObjectAPI
 private import semmle.python.objects.Modules
 private import semmle.python.internal.CachedStages

--- a/python/ql/lib/semmle/python/SpecialMethods.qll
+++ b/python/ql/lib/semmle/python/SpecialMethods.qll
@@ -8,7 +8,7 @@
 * Extend `SpecialMethod::Potential` to capture more cases.
 */

-import python
+private import python

 /** A control flow node which might correspond to a special method call. */
 class PotentialSpecialMethodCallNode extends ControlFlowNode {
--- a/python/ql/lib/semmle/python/dataflow/new/internal/Attributes.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/Attributes.qll
@@ -92,6 +92,32 @@ private class AttributeAssignmentAsAttrWrite extends AttrWrite, CfgNode {
  override string getAttributeName() { result = node.getName() }
 }

+/**
+ * An attribute assignment where the object is a global variable: `global_var.attr = value`.
+ *
+ * Syntactically, this is identical to the situation that is covered by
+ * `AttributeAssignmentAsAttrWrite`, however in this case we want to behave as if the object that is
+ * being written is the underlying `ModuleVariableNode`.
+ */
+private class GlobalAttributeAssignmentAsAttrWrite extends AttrWrite, CfgNode {
+  override AttributeAssignmentNode node;
+
+  override Node getValue() { result.asCfgNode() = node.getValue() }
+
+  override Node getObject() {
+    result.(ModuleVariableNode).getVariable() = node.getObject().getNode().(Name).getVariable()
+  }
+
+  override ExprNode getAttributeNameExpr() {
+    // Attribute names don't exist as `Node`s in the control flow graph, as they can only ever be
+    // identifiers, and are therefore represented directly as strings.
+    // Use `getAttributeName` to access the name of the attribute.
+    none()
+  }
+
+  override string getAttributeName() { result = node.getName() }
+}
+
 /** Represents `CallNode`s that may refer to calls to built-in functions or classes. */
 private class BuiltInCallNode extends CallNode {
  string name;
--- a/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowDispatchPointsTo.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowDispatchPointsTo.qll
@@ -172,7 +172,7 @@ module ArgumentPassing {
  /**
   * Gets the node representing the argument to `call` that is passed to the parameter at
   * (zero-based) index `paramN` in `callable`. If this is a positional argument, it must appear
-   * at an index, `argN`, in `call` wich satisfies `paramN = mapping.getParamN(argN)`.
+   * at an index, `argN`, in `call` which satisfies `paramN = mapping.getParamN(argN)`.
   *
   * `mapping` will be the identity for function calls, but not for method- or constructor calls,
   * where the first parameter is `self` and the first positional argument is passed to the second positional parameter.
--- a/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowImpl.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowImpl.qll
@@ -116,7 +116,7 @@ abstract class Configuration extends string {
   * Holds if an arbitrary number of implicit read steps of content `c` may be
   * taken at `node`.
   */
-  predicate allowImplicitRead(Node node, Content c) { none() }
+  predicate allowImplicitRead(Node node, ContentSet c) { none() }

  /**
   * Gets the virtual dispatch branching limit when calculating field flow.
@@ -170,6 +170,14 @@ abstract class Configuration extends string {
   */
  int explorationLimit() { none() }

+  /**
+   * Holds if hidden nodes should be included in the data flow graph.
+   *
+   * This feature should only be used for debugging or when the data flow graph
+   * is not visualized (for example in a `path-problem` query).
+   */
+  predicate includeHiddenNodes() { none() }
+
  /**
   * Holds if there is a partial data flow path from `source` to `node`. The
   * approximate distance between `node` and the closest source is `dist` and
@@ -485,8 +493,9 @@ private predicate additionalJumpStateStep(
  )
 }

-private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
-  read(node1.asNode(), c, node2.asNode()) and
+pragma[nomagic]
+private predicate readSet(NodeEx node1, ContentSet c, NodeEx node2, Configuration config) {
+  readSet(node1.asNode(), c, node2.asNode()) and
  stepFilter(node1, node2, config)
  or
  exists(Node n |
@@ -496,6 +505,37 @@ private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration conf
  )
 }

+// inline to reduce fan-out via `getAReadContent`
+bindingset[c]
+private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
+  exists(ContentSet cs |
+    readSet(node1, cs, node2, config) and
+    pragma[only_bind_out](c) = pragma[only_bind_into](cs).getAReadContent()
+  )
+}
+
+// inline to reduce fan-out via `getAReadContent`
+bindingset[c]
+private predicate clearsContentEx(NodeEx n, Content c) {
+  exists(ContentSet cs |
+    clearsContentCached(n.asNode(), cs) and
+    pragma[only_bind_out](c) = pragma[only_bind_into](cs).getAReadContent()
+  )
+}
+
+// inline to reduce fan-out via `getAReadContent`
+bindingset[c]
+private predicate expectsContentEx(NodeEx n, Content c) {
+  exists(ContentSet cs |
+    expectsContentCached(n.asNode(), cs) and
+    pragma[only_bind_out](c) = pragma[only_bind_into](cs).getAReadContent()
+  )
+}
+
+pragma[nomagic]
+private predicate notExpectsContent(NodeEx n) { not expectsContentCached(n.asNode(), _) }
+
+pragma[nomagic]
 private predicate store(
  NodeEx node1, TypedContent tc, NodeEx node2, DataFlowType contentType, Configuration config
 ) {
@@ -573,9 +613,9 @@ private module Stage1 {
    )
    or
    // read
-    exists(Content c |
-      fwdFlowRead(c, node, cc, config) and
-      fwdFlowConsCand(c, config)
+    exists(ContentSet c |
+      fwdFlowReadSet(c, node, cc, config) and
+      fwdFlowConsCandSet(c, _, config)
    )
    or
    // flow into a callable
@@ -599,10 +639,10 @@ private module Stage1 {
  private predicate fwdFlow(NodeEx node, Configuration config) { fwdFlow(node, _, config) }

  pragma[nomagic]
-  private predicate fwdFlowRead(Content c, NodeEx node, Cc cc, Configuration config) {
+  private predicate fwdFlowReadSet(ContentSet c, NodeEx node, Cc cc, Configuration config) {
    exists(NodeEx mid |
      fwdFlow(mid, cc, config) and
-      read(mid, c, node, config)
+      readSet(mid, c, node, config)
    )
  }

@@ -620,6 +660,16 @@ private module Stage1 {
    )
  }

+  /**
+   * Holds if `cs` may be interpreted in a read as the target of some store
+   * into `c`, in the flow covered by `fwdFlow`.
+   */
+  pragma[nomagic]
+  private predicate fwdFlowConsCandSet(ContentSet cs, Content c, Configuration config) {
+    fwdFlowConsCand(c, config) and
+    c = cs.getAReadContent()
+  }
+
  pragma[nomagic]
  private predicate fwdFlowReturnPosition(ReturnPosition pos, Cc cc, Configuration config) {
    exists(RetNodeEx ret |
@@ -712,9 +762,9 @@ private module Stage1 {
    )
    or
    // read
-    exists(NodeEx mid, Content c |
-      read(node, c, mid, config) and
-      fwdFlowConsCand(c, pragma[only_bind_into](config)) and
+    exists(NodeEx mid, ContentSet c |
+      readSet(node, c, mid, config) and
+      fwdFlowConsCandSet(c, _, pragma[only_bind_into](config)) and
      revFlow(mid, toReturn, pragma[only_bind_into](config))
    )
    or
@@ -740,10 +790,10 @@ private module Stage1 {
   */
  pragma[nomagic]
  private predicate revFlowConsCand(Content c, Configuration config) {
-    exists(NodeEx mid, NodeEx node |
+    exists(NodeEx mid, NodeEx node, ContentSet cs |
      fwdFlow(node, pragma[only_bind_into](config)) and
-      read(node, c, mid, config) and
-      fwdFlowConsCand(c, pragma[only_bind_into](config)) and
+      readSet(node, cs, mid, config) and
+      fwdFlowConsCandSet(cs, c, pragma[only_bind_into](config)) and
      revFlow(pragma[only_bind_into](mid), _, pragma[only_bind_into](config))
    )
  }
@@ -762,7 +812,8 @@ private module Stage1 {
   * Holds if `c` is the target of both a read and a store in the flow covered
   * by `revFlow`.
   */
-  private predicate revFlowIsReadAndStored(Content c, Configuration conf) {
+  pragma[nomagic]
+  predicate revFlowIsReadAndStored(Content c, Configuration conf) {
    revFlowConsCand(c, conf) and
    revFlowStore(c, _, _, conf)
  }
@@ -861,8 +912,8 @@ private module Stage1 {
  pragma[nomagic]
  predicate readStepCand(NodeEx n1, Content c, NodeEx n2, Configuration config) {
    revFlowIsReadAndStored(c, pragma[only_bind_into](config)) and
-    revFlow(n2, pragma[only_bind_into](config)) and
-    read(n1, c, n2, pragma[only_bind_into](config))
+    read(n1, c, n2, pragma[only_bind_into](config)) and
+    revFlow(n2, pragma[only_bind_into](config))
  }

  pragma[nomagic]
@@ -872,7 +923,10 @@ private module Stage1 {
  predicate revFlow(
    NodeEx node, FlowState state, boolean toReturn, ApOption returnAp, Ap ap, Configuration config
  ) {
-    revFlow(node, toReturn, config) and exists(state) and exists(returnAp) and exists(ap)
+    revFlow(node, toReturn, pragma[only_bind_into](config)) and
+    exists(state) and
+    exists(returnAp) and
+    exists(ap)
  }

  private predicate throughFlowNodeCand(NodeEx node, Configuration config) {
@@ -1147,11 +1201,26 @@ private module Stage2 {

  private predicate flowIntoCall = flowIntoCallNodeCand1/5;

+  pragma[nomagic]
+  private predicate expectsContentCand(NodeEx node, Configuration config) {
+    exists(Content c |
+      PrevStage::revFlow(node, pragma[only_bind_into](config)) and
+      PrevStage::revFlowIsReadAndStored(c, pragma[only_bind_into](config)) and
+      expectsContentEx(node, c)
+    )
+  }
+
  bindingset[node, state, ap, config]
  private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
-    PrevStage::revFlowState(state, config) and
+    PrevStage::revFlowState(state, pragma[only_bind_into](config)) and
    exists(ap) and
-    not stateBarrier(node, state, config)
+    not stateBarrier(node, state, config) and
+    (
+      notExpectsContent(node)
+      or
+      ap = true and
+      expectsContentCand(node, config)
+    )
  }

  bindingset[ap, contentType]
@@ -1574,7 +1643,7 @@ private module Stage2 {
    Configuration config
  ) {
    exists(Ap ap2, Content c |
-      store(node1, tc, node2, contentType, config) and
+      PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
      revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
      revFlowConsCand(ap2, c, ap1, config)
    )
@@ -1612,10 +1681,24 @@ private module Stage2 {
    storeStepFwd(_, ap, tc, _, _, config)
  }

-  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+  private predicate revConsCand(TypedContent tc, Ap ap, Configuration config) {
    storeStepCand(_, ap, tc, _, _, config)
  }

+  private predicate validAp(Ap ap, Configuration config) {
+    revFlow(_, _, _, _, ap, config) and ap instanceof ApNil
+    or
+    exists(TypedContent head, Ap tail |
+      consCand(head, tail, config) and
+      ap = apCons(head, tail)
+    )
+  }
+
+  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+    revConsCand(tc, ap, config) and
+    validAp(ap, config)
+  }
+
  pragma[noinline]
  private predicate parameterFlow(
    ParamNodeEx p, Ap ap, Ap ap0, DataFlowCallable c, Configuration config
@@ -1706,7 +1789,8 @@ private module LocalFlowBigStep {
  private class FlowCheckNode extends NodeEx {
    FlowCheckNode() {
      castNode(this.asNode()) or
-      clearsContentCached(this.asNode(), _)
+      clearsContentCached(this.asNode(), _) or
+      expectsContentCached(this.asNode(), _)
    }
  }

@@ -1729,9 +1813,9 @@ private module LocalFlowBigStep {
      or
      node.asNode() instanceof OutNodeExt
      or
-      store(_, _, node, _, config)
+      Stage2::storeStepCand(_, _, _, node, _, config)
      or
-      read(_, _, node, config)
+      Stage2::readStepCand(_, _, node, config)
      or
      node instanceof FlowCheckNode
      or
@@ -1752,8 +1836,8 @@ private module LocalFlowBigStep {
      additionalJumpStep(node, next, config) or
      flowIntoCallNodeCand1(_, node, next, config) or
      flowOutOfCallNodeCand1(_, node, next, config) or
-      store(node, _, next, _, config) or
-      read(node, _, next, config)
+      Stage2::storeStepCand(node, _, _, next, _, config) or
+      Stage2::readStepCand(node, _, next, config)
    )
    or
    exists(NodeEx next, FlowState s | Stage2::revFlow(next, s, config) |
@@ -1926,7 +2010,34 @@ private module Stage3 {
  private predicate flowIntoCall = flowIntoCallNodeCand2/5;

  pragma[nomagic]
-  private predicate clear(NodeEx node, Ap ap) { ap.isClearedAt(node.asNode()) }
+  private predicate clearSet(NodeEx node, ContentSet c, Configuration config) {
+    PrevStage::revFlow(node, config) and
+    clearsContentCached(node.asNode(), c)
+  }
+
+  pragma[nomagic]
+  private predicate clearContent(NodeEx node, Content c, Configuration config) {
+    exists(ContentSet cs |
+      PrevStage::readStepCand(_, pragma[only_bind_into](c), _, pragma[only_bind_into](config)) and
+      c = cs.getAReadContent() and
+      clearSet(node, cs, pragma[only_bind_into](config))
+    )
+  }
+
+  pragma[nomagic]
+  private predicate clear(NodeEx node, Ap ap, Configuration config) {
+    clearContent(node, ap.getHead().getContent(), config)
+  }
+
+  pragma[nomagic]
+  private predicate expectsContentCand(NodeEx node, Ap ap, Configuration config) {
+    exists(Content c |
+      PrevStage::revFlow(node, pragma[only_bind_into](config)) and
+      PrevStage::readStepCand(_, c, _, pragma[only_bind_into](config)) and
+      expectsContentEx(node, c) and
+      c = ap.getHead().getContent()
+    )
+  }

  pragma[nomagic]
  private predicate castingNodeEx(NodeEx node) { node.asNode() instanceof CastingNode }
@@ -1935,8 +2046,13 @@ private module Stage3 {
  private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
    exists(state) and
    exists(config) and
-    not clear(node, ap) and
-    if castingNodeEx(node) then compatibleTypes(node.getDataFlowType(), ap.getType()) else any()
+    not clear(node, ap, config) and
+    (if castingNodeEx(node) then compatibleTypes(node.getDataFlowType(), ap.getType()) else any()) and
+    (
+      notExpectsContent(node)
+      or
+      expectsContentCand(node, ap, config)
+    )
  }

  bindingset[ap, contentType]
@@ -2363,7 +2479,7 @@ private module Stage3 {
    Configuration config
  ) {
    exists(Ap ap2, Content c |
-      store(node1, tc, node2, contentType, config) and
+      PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
      revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
      revFlowConsCand(ap2, c, ap1, config)
    )
@@ -2401,10 +2517,24 @@ private module Stage3 {
    storeStepFwd(_, ap, tc, _, _, config)
  }

-  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+  private predicate revConsCand(TypedContent tc, Ap ap, Configuration config) {
    storeStepCand(_, ap, tc, _, _, config)
  }

+  private predicate validAp(Ap ap, Configuration config) {
+    revFlow(_, _, _, _, ap, config) and ap instanceof ApNil
+    or
+    exists(TypedContent head, Ap tail |
+      consCand(head, tail, config) and
+      ap = apCons(head, tail)
+    )
+  }
+
+  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+    revConsCand(tc, ap, config) and
+    validAp(ap, config)
+  }
+
  pragma[noinline]
  private predicate parameterFlow(
    ParamNodeEx p, Ap ap, Ap ap0, DataFlowCallable c, Configuration config
@@ -3190,7 +3320,7 @@ private module Stage4 {
    Configuration config
  ) {
    exists(Ap ap2, Content c |
-      store(node1, tc, node2, contentType, config) and
+      PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
      revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
      revFlowConsCand(ap2, c, ap1, config)
    )
@@ -3228,10 +3358,24 @@ private module Stage4 {
    storeStepFwd(_, ap, tc, _, _, config)
  }

-  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+  private predicate revConsCand(TypedContent tc, Ap ap, Configuration config) {
    storeStepCand(_, ap, tc, _, _, config)
  }

+  private predicate validAp(Ap ap, Configuration config) {
+    revFlow(_, _, _, _, ap, config) and ap instanceof ApNil
+    or
+    exists(TypedContent head, Ap tail |
+      consCand(head, tail, config) and
+      ap = apCons(head, tail)
+    )
+  }
+
+  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+    revConsCand(tc, ap, config) and
+    validAp(ap, config)
+  }
+
  pragma[noinline]
  private predicate parameterFlow(
    ParamNodeEx p, Ap ap, Ap ap0, DataFlowCallable c, Configuration config
@@ -3300,17 +3444,28 @@ private Configuration unbindConf(Configuration conf) {
  exists(Configuration c | result = pragma[only_bind_into](c) and conf = pragma[only_bind_into](c))
 }

-private predicate nodeMayUseSummary(
-  NodeEx n, FlowState state, AccessPathApprox apa, Configuration config
+pragma[nomagic]
+private predicate nodeMayUseSummary0(
+  NodeEx n, DataFlowCallable c, FlowState state, AccessPathApprox apa, Configuration config
 ) {
-  exists(DataFlowCallable c, AccessPathApprox apa0 |
-    Stage4::parameterMayFlowThrough(_, c, apa, _) and
+  exists(AccessPathApprox apa0 |
+    Stage4::parameterMayFlowThrough(_, c, _, _) and
    Stage4::revFlow(n, state, true, _, apa0, config) and
    Stage4::fwdFlow(n, state, any(CallContextCall ccc), TAccessPathApproxSome(apa), apa0, config) and
    n.getEnclosingCallable() = c
  )
 }

+pragma[nomagic]
+private predicate nodeMayUseSummary(
+  NodeEx n, FlowState state, AccessPathApprox apa, Configuration config
+) {
+  exists(DataFlowCallable c |
+    Stage4::parameterMayFlowThrough(_, c, apa, config) and
+    nodeMayUseSummary0(n, c, state, apa, config)
+  )
+}
+
 private newtype TSummaryCtx =
  TSummaryCtxNone() or
  TSummaryCtxSome(ParamNodeEx p, FlowState state, AccessPath ap) {
@@ -3506,7 +3661,7 @@ private newtype TPathNode =
 * of dereference operations needed to get from the value in the node to the
 * tracked object. The final type indicates the type of the tracked object.
 */
-abstract private class AccessPath extends TAccessPath {
+private class AccessPath extends TAccessPath {
  /** Gets the head of this access path, if any. */
  abstract TypedContent getHead();

@@ -3721,11 +3876,14 @@ abstract private class PathNodeImpl extends PathNode {
  abstract NodeEx getNodeEx();

  predicate isHidden() {
-    hiddenNode(this.getNodeEx().asNode()) and
-    not this.isSource() and
-    not this instanceof PathNodeSink
-    or
-    this.getNodeEx() instanceof TNodeImplicitRead
+    not this.getConfiguration().includeHiddenNodes() and
+    (
+      hiddenNode(this.getNodeEx().asNode()) and
+      not this.isSource() and
+      not this instanceof PathNodeSink
+      or
+      this.getNodeEx() instanceof TNodeImplicitRead
+    )
  }

  private string ppAp() {
@@ -4202,10 +4360,16 @@ private module Subpaths {
    exists(NodeEx n1, NodeEx n2 | n1 = n.getNodeEx() and n2 = result.getNodeEx() |
      localFlowBigStep(n1, _, n2, _, _, _, _, _) or
      store(n1, _, n2, _, _) or
-      read(n1, _, n2, _)
+      readSet(n1, _, n2, _)
    )
  }

+  pragma[nomagic]
+  private predicate hasSuccessor(PathNode pred, PathNodeMid succ, NodeEx succNode) {
+    succ = pred.getASuccessor() and
+    succNode = succ.getNodeEx()
+  }
+
  /**
   * Holds if `(arg, par, ret, out)` forms a subpath-tuple, that is, flow through
   * a subpath between `par` and `ret` with the connecting edges `arg -> par` and
@@ -4213,15 +4377,13 @@ private module Subpaths {
   */
  predicate subpaths(PathNode arg, PathNodeImpl par, PathNodeImpl ret, PathNode out) {
    exists(ParamNodeEx p, NodeEx o, FlowState sout, AccessPath apout, PathNodeMid out0 |
-      pragma[only_bind_into](arg).getASuccessor() = par and
-      pragma[only_bind_into](arg).getASuccessor() = out0 and
-      subpaths03(arg, p, localStepToHidden*(ret), o, sout, apout) and
+      pragma[only_bind_into](arg).getASuccessor() = pragma[only_bind_into](out0) and
+      subpaths03(pragma[only_bind_into](arg), p, localStepToHidden*(ret), o, sout, apout) and
+      hasSuccessor(pragma[only_bind_into](arg), par, p) and
      not ret.isHidden() and
-      par.getNodeEx() = p and
-      out0.getNodeEx() = o and
-      out0.getState() = sout and
-      out0.getAp() = apout and
-      (out = out0 or out = out0.projectToSink())
+      pathNode(out0, o, sout, _, _, apout, _, _)
+    |
+      out = out0 or out = out0.projectToSink()
    )
  }

@@ -4557,7 +4719,11 @@ private module FlowExploration {
      or
      exists(PartialPathNodeRev mid |
        revPartialPathStep(mid, node, state, sc1, sc2, sc3, ap, config) and
-        not clearsContentCached(node.asNode(), ap.getHead()) and
+        not clearsContentEx(node, ap.getHead()) and
+        (
+          notExpectsContent(node) or
+          expectsContentEx(node, ap.getHead())
+        ) and
        not fullBarrier(node, config) and
        not stateBarrier(node, state, config) and
        distSink(node.getEnclosingCallable(), config) <= config.explorationLimit()
@@ -4573,7 +4739,11 @@ private module FlowExploration {
      partialPathStep(mid, node, state, cc, sc1, sc2, sc3, ap, config) and
      not fullBarrier(node, config) and
      not stateBarrier(node, state, config) and
-      not clearsContentCached(node.asNode(), ap.getHead().getContent()) and
+      not clearsContentEx(node, ap.getHead().getContent()) and
+      (
+        notExpectsContent(node) or
+        expectsContentEx(node, ap.getHead().getContent())
+      ) and
      if node.asNode() instanceof CastingNode
      then compatibleTypes(node.getDataFlowType(), ap.getType())
      else any()
--- a/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowImpl2.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowImpl2.qll
@@ -116,7 +116,7 @@ abstract class Configuration extends string {
   * Holds if an arbitrary number of implicit read steps of content `c` may be
   * taken at `node`.
   */
-  predicate allowImplicitRead(Node node, Content c) { none() }
+  predicate allowImplicitRead(Node node, ContentSet c) { none() }

  /**
   * Gets the virtual dispatch branching limit when calculating field flow.
@@ -170,6 +170,14 @@ abstract class Configuration extends string {
   */
  int explorationLimit() { none() }

+  /**
+   * Holds if hidden nodes should be included in the data flow graph.
+   *
+   * This feature should only be used for debugging or when the data flow graph
+   * is not visualized (for example in a `path-problem` query).
+   */
+  predicate includeHiddenNodes() { none() }
+
  /**
   * Holds if there is a partial data flow path from `source` to `node`. The
   * approximate distance between `node` and the closest source is `dist` and
@@ -485,8 +493,9 @@ private predicate additionalJumpStateStep(
  )
 }

-private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
-  read(node1.asNode(), c, node2.asNode()) and
+pragma[nomagic]
+private predicate readSet(NodeEx node1, ContentSet c, NodeEx node2, Configuration config) {
+  readSet(node1.asNode(), c, node2.asNode()) and
  stepFilter(node1, node2, config)
  or
  exists(Node n |
@@ -496,6 +505,37 @@ private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration conf
  )
 }

+// inline to reduce fan-out via `getAReadContent`
+bindingset[c]
+private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
+  exists(ContentSet cs |
+    readSet(node1, cs, node2, config) and
+    pragma[only_bind_out](c) = pragma[only_bind_into](cs).getAReadContent()
+  )
+}
+
+// inline to reduce fan-out via `getAReadContent`
+bindingset[c]
+private predicate clearsContentEx(NodeEx n, Content c) {
+  exists(ContentSet cs |
+    clearsContentCached(n.asNode(), cs) and
+    pragma[only_bind_out](c) = pragma[only_bind_into](cs).getAReadContent()
+  )
+}
+
+// inline to reduce fan-out via `getAReadContent`
+bindingset[c]
+private predicate expectsContentEx(NodeEx n, Content c) {
+  exists(ContentSet cs |
+    expectsContentCached(n.asNode(), cs) and
+    pragma[only_bind_out](c) = pragma[only_bind_into](cs).getAReadContent()
+  )
+}
+
+pragma[nomagic]
+private predicate notExpectsContent(NodeEx n) { not expectsContentCached(n.asNode(), _) }
+
+pragma[nomagic]
 private predicate store(
  NodeEx node1, TypedContent tc, NodeEx node2, DataFlowType contentType, Configuration config
 ) {
@@ -573,9 +613,9 @@ private module Stage1 {
    )
    or
    // read
-    exists(Content c |
-      fwdFlowRead(c, node, cc, config) and
-      fwdFlowConsCand(c, config)
+    exists(ContentSet c |
+      fwdFlowReadSet(c, node, cc, config) and
+      fwdFlowConsCandSet(c, _, config)
    )
    or
    // flow into a callable
@@ -599,10 +639,10 @@ private module Stage1 {
  private predicate fwdFlow(NodeEx node, Configuration config) { fwdFlow(node, _, config) }

  pragma[nomagic]
-  private predicate fwdFlowRead(Content c, NodeEx node, Cc cc, Configuration config) {
+  private predicate fwdFlowReadSet(ContentSet c, NodeEx node, Cc cc, Configuration config) {
    exists(NodeEx mid |
      fwdFlow(mid, cc, config) and
-      read(mid, c, node, config)
+      readSet(mid, c, node, config)
    )
  }

@@ -620,6 +660,16 @@ private module Stage1 {
    )
  }

+  /**
+   * Holds if `cs` may be interpreted in a read as the target of some store
+   * into `c`, in the flow covered by `fwdFlow`.
+   */
+  pragma[nomagic]
+  private predicate fwdFlowConsCandSet(ContentSet cs, Content c, Configuration config) {
+    fwdFlowConsCand(c, config) and
+    c = cs.getAReadContent()
+  }
+
  pragma[nomagic]
  private predicate fwdFlowReturnPosition(ReturnPosition pos, Cc cc, Configuration config) {
    exists(RetNodeEx ret |
@@ -712,9 +762,9 @@ private module Stage1 {
    )
    or
    // read
-    exists(NodeEx mid, Content c |
-      read(node, c, mid, config) and
-      fwdFlowConsCand(c, pragma[only_bind_into](config)) and
+    exists(NodeEx mid, ContentSet c |
+      readSet(node, c, mid, config) and
+      fwdFlowConsCandSet(c, _, pragma[only_bind_into](config)) and
      revFlow(mid, toReturn, pragma[only_bind_into](config))
    )
    or
@@ -740,10 +790,10 @@ private module Stage1 {
   */
  pragma[nomagic]
  private predicate revFlowConsCand(Content c, Configuration config) {
-    exists(NodeEx mid, NodeEx node |
+    exists(NodeEx mid, NodeEx node, ContentSet cs |
      fwdFlow(node, pragma[only_bind_into](config)) and
-      read(node, c, mid, config) and
-      fwdFlowConsCand(c, pragma[only_bind_into](config)) and
+      readSet(node, cs, mid, config) and
+      fwdFlowConsCandSet(cs, c, pragma[only_bind_into](config)) and
      revFlow(pragma[only_bind_into](mid), _, pragma[only_bind_into](config))
    )
  }
@@ -762,7 +812,8 @@ private module Stage1 {
   * Holds if `c` is the target of both a read and a store in the flow covered
   * by `revFlow`.
   */
-  private predicate revFlowIsReadAndStored(Content c, Configuration conf) {
+  pragma[nomagic]
+  predicate revFlowIsReadAndStored(Content c, Configuration conf) {
    revFlowConsCand(c, conf) and
    revFlowStore(c, _, _, conf)
  }
@@ -861,8 +912,8 @@ private module Stage1 {
  pragma[nomagic]
  predicate readStepCand(NodeEx n1, Content c, NodeEx n2, Configuration config) {
    revFlowIsReadAndStored(c, pragma[only_bind_into](config)) and
-    revFlow(n2, pragma[only_bind_into](config)) and
-    read(n1, c, n2, pragma[only_bind_into](config))
+    read(n1, c, n2, pragma[only_bind_into](config)) and
+    revFlow(n2, pragma[only_bind_into](config))
  }

  pragma[nomagic]
@@ -872,7 +923,10 @@ private module Stage1 {
  predicate revFlow(
    NodeEx node, FlowState state, boolean toReturn, ApOption returnAp, Ap ap, Configuration config
  ) {
-    revFlow(node, toReturn, config) and exists(state) and exists(returnAp) and exists(ap)
+    revFlow(node, toReturn, pragma[only_bind_into](config)) and
+    exists(state) and
+    exists(returnAp) and
+    exists(ap)
  }

  private predicate throughFlowNodeCand(NodeEx node, Configuration config) {
@@ -1147,11 +1201,26 @@ private module Stage2 {

  private predicate flowIntoCall = flowIntoCallNodeCand1/5;

+  pragma[nomagic]
+  private predicate expectsContentCand(NodeEx node, Configuration config) {
+    exists(Content c |
+      PrevStage::revFlow(node, pragma[only_bind_into](config)) and
+      PrevStage::revFlowIsReadAndStored(c, pragma[only_bind_into](config)) and
+      expectsContentEx(node, c)
+    )
+  }
+
  bindingset[node, state, ap, config]
  private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
-    PrevStage::revFlowState(state, config) and
+    PrevStage::revFlowState(state, pragma[only_bind_into](config)) and
    exists(ap) and
-    not stateBarrier(node, state, config)
+    not stateBarrier(node, state, config) and
+    (
+      notExpectsContent(node)
+      or
+      ap = true and
+      expectsContentCand(node, config)
+    )
  }

  bindingset[ap, contentType]
@@ -1574,7 +1643,7 @@ private module Stage2 {
    Configuration config
  ) {
    exists(Ap ap2, Content c |
-      store(node1, tc, node2, contentType, config) and
+      PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
      revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
      revFlowConsCand(ap2, c, ap1, config)
    )
@@ -1612,10 +1681,24 @@ private module Stage2 {
    storeStepFwd(_, ap, tc, _, _, config)
  }

-  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+  private predicate revConsCand(TypedContent tc, Ap ap, Configuration config) {
    storeStepCand(_, ap, tc, _, _, config)
  }

+  private predicate validAp(Ap ap, Configuration config) {
+    revFlow(_, _, _, _, ap, config) and ap instanceof ApNil
+    or
+    exists(TypedContent head, Ap tail |
+      consCand(head, tail, config) and
+      ap = apCons(head, tail)
+    )
+  }
+
+  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+    revConsCand(tc, ap, config) and
+    validAp(ap, config)
+  }
+
  pragma[noinline]
  private predicate parameterFlow(
    ParamNodeEx p, Ap ap, Ap ap0, DataFlowCallable c, Configuration config
@@ -1706,7 +1789,8 @@ private module LocalFlowBigStep {
  private class FlowCheckNode extends NodeEx {
    FlowCheckNode() {
      castNode(this.asNode()) or
-      clearsContentCached(this.asNode(), _)
+      clearsContentCached(this.asNode(), _) or
+      expectsContentCached(this.asNode(), _)
    }
  }

@@ -1729,9 +1813,9 @@ private module LocalFlowBigStep {
      or
      node.asNode() instanceof OutNodeExt
      or
-      store(_, _, node, _, config)
+      Stage2::storeStepCand(_, _, _, node, _, config)
      or
-      read(_, _, node, config)
+      Stage2::readStepCand(_, _, node, config)
      or
      node instanceof FlowCheckNode
      or
@@ -1752,8 +1836,8 @@ private module LocalFlowBigStep {
      additionalJumpStep(node, next, config) or
      flowIntoCallNodeCand1(_, node, next, config) or
      flowOutOfCallNodeCand1(_, node, next, config) or
-      store(node, _, next, _, config) or
-      read(node, _, next, config)
+      Stage2::storeStepCand(node, _, _, next, _, config) or
+      Stage2::readStepCand(node, _, next, config)
    )
    or
    exists(NodeEx next, FlowState s | Stage2::revFlow(next, s, config) |
@@ -1926,7 +2010,34 @@ private module Stage3 {
  private predicate flowIntoCall = flowIntoCallNodeCand2/5;

  pragma[nomagic]
-  private predicate clear(NodeEx node, Ap ap) { ap.isClearedAt(node.asNode()) }
+  private predicate clearSet(NodeEx node, ContentSet c, Configuration config) {
+    PrevStage::revFlow(node, config) and
+    clearsContentCached(node.asNode(), c)
+  }
+
+  pragma[nomagic]
+  private predicate clearContent(NodeEx node, Content c, Configuration config) {
+    exists(ContentSet cs |
+      PrevStage::readStepCand(_, pragma[only_bind_into](c), _, pragma[only_bind_into](config)) and
+      c = cs.getAReadContent() and
+      clearSet(node, cs, pragma[only_bind_into](config))
+    )
+  }
+
+  pragma[nomagic]
+  private predicate clear(NodeEx node, Ap ap, Configuration config) {
+    clearContent(node, ap.getHead().getContent(), config)
+  }
+
+  pragma[nomagic]
+  private predicate expectsContentCand(NodeEx node, Ap ap, Configuration config) {
+    exists(Content c |
+      PrevStage::revFlow(node, pragma[only_bind_into](config)) and
+      PrevStage::readStepCand(_, c, _, pragma[only_bind_into](config)) and
+      expectsContentEx(node, c) and
+      c = ap.getHead().getContent()
+    )
+  }

  pragma[nomagic]
  private predicate castingNodeEx(NodeEx node) { node.asNode() instanceof CastingNode }
@@ -1935,8 +2046,13 @@ private module Stage3 {
  private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
    exists(state) and
    exists(config) and
-    not clear(node, ap) and
-    if castingNodeEx(node) then compatibleTypes(node.getDataFlowType(), ap.getType()) else any()
+    not clear(node, ap, config) and
+    (if castingNodeEx(node) then compatibleTypes(node.getDataFlowType(), ap.getType()) else any()) and
+    (
+      notExpectsContent(node)
+      or
+      expectsContentCand(node, ap, config)
+    )
  }

  bindingset[ap, contentType]
@@ -2363,7 +2479,7 @@ private module Stage3 {
    Configuration config
  ) {
    exists(Ap ap2, Content c |
-      store(node1, tc, node2, contentType, config) and
+      PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
      revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
      revFlowConsCand(ap2, c, ap1, config)
    )
@@ -2401,10 +2517,24 @@ private module Stage3 {
    storeStepFwd(_, ap, tc, _, _, config)
  }

-  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+  private predicate revConsCand(TypedContent tc, Ap ap, Configuration config) {
    storeStepCand(_, ap, tc, _, _, config)
  }

+  private predicate validAp(Ap ap, Configuration config) {
+    revFlow(_, _, _, _, ap, config) and ap instanceof ApNil
+    or
+    exists(TypedContent head, Ap tail |
+      consCand(head, tail, config) and
+      ap = apCons(head, tail)
+    )
+  }
+
+  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+    revConsCand(tc, ap, config) and
+    validAp(ap, config)
+  }
+
  pragma[noinline]
  private predicate parameterFlow(
    ParamNodeEx p, Ap ap, Ap ap0, DataFlowCallable c, Configuration config
@@ -3190,7 +3320,7 @@ private module Stage4 {
    Configuration config
  ) {
    exists(Ap ap2, Content c |
-      store(node1, tc, node2, contentType, config) and
+      PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
      revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
      revFlowConsCand(ap2, c, ap1, config)
    )
@@ -3228,10 +3358,24 @@ private module Stage4 {
    storeStepFwd(_, ap, tc, _, _, config)
  }

-  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+  private predicate revConsCand(TypedContent tc, Ap ap, Configuration config) {
    storeStepCand(_, ap, tc, _, _, config)
  }

+  private predicate validAp(Ap ap, Configuration config) {
+    revFlow(_, _, _, _, ap, config) and ap instanceof ApNil
+    or
+    exists(TypedContent head, Ap tail |
+      consCand(head, tail, config) and
+      ap = apCons(head, tail)
+    )
+  }
+
+  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+    revConsCand(tc, ap, config) and
+    validAp(ap, config)
+  }
+
  pragma[noinline]
  private predicate parameterFlow(
    ParamNodeEx p, Ap ap, Ap ap0, DataFlowCallable c, Configuration config
@@ -3300,17 +3444,28 @@ private Configuration unbindConf(Configuration conf) {
  exists(Configuration c | result = pragma[only_bind_into](c) and conf = pragma[only_bind_into](c))
 }

-private predicate nodeMayUseSummary(
-  NodeEx n, FlowState state, AccessPathApprox apa, Configuration config
+pragma[nomagic]
+private predicate nodeMayUseSummary0(
+  NodeEx n, DataFlowCallable c, FlowState state, AccessPathApprox apa, Configuration config
 ) {
-  exists(DataFlowCallable c, AccessPathApprox apa0 |
-    Stage4::parameterMayFlowThrough(_, c, apa, _) and
+  exists(AccessPathApprox apa0 |
+    Stage4::parameterMayFlowThrough(_, c, _, _) and
    Stage4::revFlow(n, state, true, _, apa0, config) and
    Stage4::fwdFlow(n, state, any(CallContextCall ccc), TAccessPathApproxSome(apa), apa0, config) and
    n.getEnclosingCallable() = c
  )
 }

+pragma[nomagic]
+private predicate nodeMayUseSummary(
+  NodeEx n, FlowState state, AccessPathApprox apa, Configuration config
+) {
+  exists(DataFlowCallable c |
+    Stage4::parameterMayFlowThrough(_, c, apa, config) and
+    nodeMayUseSummary0(n, c, state, apa, config)
+  )
+}
+
 private newtype TSummaryCtx =
  TSummaryCtxNone() or
  TSummaryCtxSome(ParamNodeEx p, FlowState state, AccessPath ap) {
@@ -3506,7 +3661,7 @@ private newtype TPathNode =
 * of dereference operations needed to get from the value in the node to the
 * tracked object. The final type indicates the type of the tracked object.
 */
-abstract private class AccessPath extends TAccessPath {
+private class AccessPath extends TAccessPath {
  /** Gets the head of this access path, if any. */
  abstract TypedContent getHead();

@@ -3721,11 +3876,14 @@ abstract private class PathNodeImpl extends PathNode {
  abstract NodeEx getNodeEx();

  predicate isHidden() {
-    hiddenNode(this.getNodeEx().asNode()) and
-    not this.isSource() and
-    not this instanceof PathNodeSink
-    or
-    this.getNodeEx() instanceof TNodeImplicitRead
+    not this.getConfiguration().includeHiddenNodes() and
+    (
+      hiddenNode(this.getNodeEx().asNode()) and
+      not this.isSource() and
+      not this instanceof PathNodeSink
+      or
+      this.getNodeEx() instanceof TNodeImplicitRead
+    )
  }

  private string ppAp() {
@@ -4202,10 +4360,16 @@ private module Subpaths {
    exists(NodeEx n1, NodeEx n2 | n1 = n.getNodeEx() and n2 = result.getNodeEx() |
      localFlowBigStep(n1, _, n2, _, _, _, _, _) or
      store(n1, _, n2, _, _) or
-      read(n1, _, n2, _)
+      readSet(n1, _, n2, _)
    )
  }

+  pragma[nomagic]
+  private predicate hasSuccessor(PathNode pred, PathNodeMid succ, NodeEx succNode) {
+    succ = pred.getASuccessor() and
+    succNode = succ.getNodeEx()
+  }
+
  /**
   * Holds if `(arg, par, ret, out)` forms a subpath-tuple, that is, flow through
   * a subpath between `par` and `ret` with the connecting edges `arg -> par` and
@@ -4213,15 +4377,13 @@ private module Subpaths {
   */
  predicate subpaths(PathNode arg, PathNodeImpl par, PathNodeImpl ret, PathNode out) {
    exists(ParamNodeEx p, NodeEx o, FlowState sout, AccessPath apout, PathNodeMid out0 |
-      pragma[only_bind_into](arg).getASuccessor() = par and
-      pragma[only_bind_into](arg).getASuccessor() = out0 and
-      subpaths03(arg, p, localStepToHidden*(ret), o, sout, apout) and
+      pragma[only_bind_into](arg).getASuccessor() = pragma[only_bind_into](out0) and
+      subpaths03(pragma[only_bind_into](arg), p, localStepToHidden*(ret), o, sout, apout) and
+      hasSuccessor(pragma[only_bind_into](arg), par, p) and
      not ret.isHidden() and
-      par.getNodeEx() = p and
-      out0.getNodeEx() = o and
-      out0.getState() = sout and
-      out0.getAp() = apout and
-      (out = out0 or out = out0.projectToSink())
+      pathNode(out0, o, sout, _, _, apout, _, _)
+    |
+      out = out0 or out = out0.projectToSink()
    )
  }

@@ -4557,7 +4719,11 @@ private module FlowExploration {
      or
      exists(PartialPathNodeRev mid |
        revPartialPathStep(mid, node, state, sc1, sc2, sc3, ap, config) and
-        not clearsContentCached(node.asNode(), ap.getHead()) and
+        not clearsContentEx(node, ap.getHead()) and
+        (
+          notExpectsContent(node) or
+          expectsContentEx(node, ap.getHead())
+        ) and
        not fullBarrier(node, config) and
        not stateBarrier(node, state, config) and
        distSink(node.getEnclosingCallable(), config) <= config.explorationLimit()
@@ -4573,7 +4739,11 @@ private module FlowExploration {
      partialPathStep(mid, node, state, cc, sc1, sc2, sc3, ap, config) and
      not fullBarrier(node, config) and
      not stateBarrier(node, state, config) and
-      not clearsContentCached(node.asNode(), ap.getHead().getContent()) and
+      not clearsContentEx(node, ap.getHead().getContent()) and
+      (
+        notExpectsContent(node) or
+        expectsContentEx(node, ap.getHead().getContent())
+      ) and
      if node.asNode() instanceof CastingNode
      then compatibleTypes(node.getDataFlowType(), ap.getType())
      else any()
--- a/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowImpl3.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowImpl3.qll
@@ -116,7 +116,7 @@ abstract class Configuration extends string {
   * Holds if an arbitrary number of implicit read steps of content `c` may be
   * taken at `node`.
   */
-  predicate allowImplicitRead(Node node, Content c) { none() }
+  predicate allowImplicitRead(Node node, ContentSet c) { none() }

  /**
   * Gets the virtual dispatch branching limit when calculating field flow.
@@ -170,6 +170,14 @@ abstract class Configuration extends string {
   */
  int explorationLimit() { none() }

+  /**
+   * Holds if hidden nodes should be included in the data flow graph.
+   *
+   * This feature should only be used for debugging or when the data flow graph
+   * is not visualized (for example in a `path-problem` query).
+   */
+  predicate includeHiddenNodes() { none() }
+
  /**
   * Holds if there is a partial data flow path from `source` to `node`. The
   * approximate distance between `node` and the closest source is `dist` and
@@ -485,8 +493,9 @@ private predicate additionalJumpStateStep(
  )
 }

-private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
-  read(node1.asNode(), c, node2.asNode()) and
+pragma[nomagic]
+private predicate readSet(NodeEx node1, ContentSet c, NodeEx node2, Configuration config) {
+  readSet(node1.asNode(), c, node2.asNode()) and
  stepFilter(node1, node2, config)
  or
  exists(Node n |
@@ -496,6 +505,37 @@ private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration conf
  )
 }

+// inline to reduce fan-out via `getAReadContent`
+bindingset[c]
+private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
+  exists(ContentSet cs |
+    readSet(node1, cs, node2, config) and
+    pragma[only_bind_out](c) = pragma[only_bind_into](cs).getAReadContent()
+  )
+}
+
+// inline to reduce fan-out via `getAReadContent`
+bindingset[c]
+private predicate clearsContentEx(NodeEx n, Content c) {
+  exists(ContentSet cs |
+    clearsContentCached(n.asNode(), cs) and
+    pragma[only_bind_out](c) = pragma[only_bind_into](cs).getAReadContent()
+  )
+}
+
+// inline to reduce fan-out via `getAReadContent`
+bindingset[c]
+private predicate expectsContentEx(NodeEx n, Content c) {
+  exists(ContentSet cs |
+    expectsContentCached(n.asNode(), cs) and
+    pragma[only_bind_out](c) = pragma[only_bind_into](cs).getAReadContent()
+  )
+}
+
+pragma[nomagic]
+private predicate notExpectsContent(NodeEx n) { not expectsContentCached(n.asNode(), _) }
+
+pragma[nomagic]
 private predicate store(
  NodeEx node1, TypedContent tc, NodeEx node2, DataFlowType contentType, Configuration config
 ) {
@@ -573,9 +613,9 @@ private module Stage1 {
    )
    or
    // read
-    exists(Content c |
-      fwdFlowRead(c, node, cc, config) and
-      fwdFlowConsCand(c, config)
+    exists(ContentSet c |
+      fwdFlowReadSet(c, node, cc, config) and
+      fwdFlowConsCandSet(c, _, config)
    )
    or
    // flow into a callable
@@ -599,10 +639,10 @@ private module Stage1 {
  private predicate fwdFlow(NodeEx node, Configuration config) { fwdFlow(node, _, config) }

  pragma[nomagic]
-  private predicate fwdFlowRead(Content c, NodeEx node, Cc cc, Configuration config) {
+  private predicate fwdFlowReadSet(ContentSet c, NodeEx node, Cc cc, Configuration config) {
    exists(NodeEx mid |
      fwdFlow(mid, cc, config) and
-      read(mid, c, node, config)
+      readSet(mid, c, node, config)
    )
  }

@@ -620,6 +660,16 @@ private module Stage1 {
    )
  }

+  /**
+   * Holds if `cs` may be interpreted in a read as the target of some store
+   * into `c`, in the flow covered by `fwdFlow`.
+   */
+  pragma[nomagic]
+  private predicate fwdFlowConsCandSet(ContentSet cs, Content c, Configuration config) {
+    fwdFlowConsCand(c, config) and
+    c = cs.getAReadContent()
+  }
+
  pragma[nomagic]
  private predicate fwdFlowReturnPosition(ReturnPosition pos, Cc cc, Configuration config) {
    exists(RetNodeEx ret |
@@ -712,9 +762,9 @@ private module Stage1 {
    )
    or
    // read
-    exists(NodeEx mid, Content c |
-      read(node, c, mid, config) and
-      fwdFlowConsCand(c, pragma[only_bind_into](config)) and
+    exists(NodeEx mid, ContentSet c |
+      readSet(node, c, mid, config) and
+      fwdFlowConsCandSet(c, _, pragma[only_bind_into](config)) and
      revFlow(mid, toReturn, pragma[only_bind_into](config))
    )
    or
@@ -740,10 +790,10 @@ private module Stage1 {
   */
  pragma[nomagic]
  private predicate revFlowConsCand(Content c, Configuration config) {
-    exists(NodeEx mid, NodeEx node |
+    exists(NodeEx mid, NodeEx node, ContentSet cs |
      fwdFlow(node, pragma[only_bind_into](config)) and
-      read(node, c, mid, config) and
-      fwdFlowConsCand(c, pragma[only_bind_into](config)) and
+      readSet(node, cs, mid, config) and
+      fwdFlowConsCandSet(cs, c, pragma[only_bind_into](config)) and
      revFlow(pragma[only_bind_into](mid), _, pragma[only_bind_into](config))
    )
  }
@@ -762,7 +812,8 @@ private module Stage1 {
   * Holds if `c` is the target of both a read and a store in the flow covered
   * by `revFlow`.
   */
-  private predicate revFlowIsReadAndStored(Content c, Configuration conf) {
+  pragma[nomagic]
+  predicate revFlowIsReadAndStored(Content c, Configuration conf) {
    revFlowConsCand(c, conf) and
    revFlowStore(c, _, _, conf)
  }
@@ -861,8 +912,8 @@ private module Stage1 {
  pragma[nomagic]
  predicate readStepCand(NodeEx n1, Content c, NodeEx n2, Configuration config) {
    revFlowIsReadAndStored(c, pragma[only_bind_into](config)) and
-    revFlow(n2, pragma[only_bind_into](config)) and
-    read(n1, c, n2, pragma[only_bind_into](config))
+    read(n1, c, n2, pragma[only_bind_into](config)) and
+    revFlow(n2, pragma[only_bind_into](config))
  }

  pragma[nomagic]
@@ -872,7 +923,10 @@ private module Stage1 {
  predicate revFlow(
    NodeEx node, FlowState state, boolean toReturn, ApOption returnAp, Ap ap, Configuration config
  ) {
-    revFlow(node, toReturn, config) and exists(state) and exists(returnAp) and exists(ap)
+    revFlow(node, toReturn, pragma[only_bind_into](config)) and
+    exists(state) and
+    exists(returnAp) and
+    exists(ap)
  }

  private predicate throughFlowNodeCand(NodeEx node, Configuration config) {
@@ -1147,11 +1201,26 @@ private module Stage2 {

  private predicate flowIntoCall = flowIntoCallNodeCand1/5;

+  pragma[nomagic]
+  private predicate expectsContentCand(NodeEx node, Configuration config) {
+    exists(Content c |
+      PrevStage::revFlow(node, pragma[only_bind_into](config)) and
+      PrevStage::revFlowIsReadAndStored(c, pragma[only_bind_into](config)) and
+      expectsContentEx(node, c)
+    )
+  }
+
  bindingset[node, state, ap, config]
  private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
-    PrevStage::revFlowState(state, config) and
+    PrevStage::revFlowState(state, pragma[only_bind_into](config)) and
    exists(ap) and
-    not stateBarrier(node, state, config)
+    not stateBarrier(node, state, config) and
+    (
+      notExpectsContent(node)
+      or
+      ap = true and
+      expectsContentCand(node, config)
+    )
  }

  bindingset[ap, contentType]
@@ -1574,7 +1643,7 @@ private module Stage2 {
    Configuration config
  ) {
    exists(Ap ap2, Content c |
-      store(node1, tc, node2, contentType, config) and
+      PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
      revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
      revFlowConsCand(ap2, c, ap1, config)
    )
@@ -1612,10 +1681,24 @@ private module Stage2 {
    storeStepFwd(_, ap, tc, _, _, config)
  }

-  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+  private predicate revConsCand(TypedContent tc, Ap ap, Configuration config) {
    storeStepCand(_, ap, tc, _, _, config)
  }

+  private predicate validAp(Ap ap, Configuration config) {
+    revFlow(_, _, _, _, ap, config) and ap instanceof ApNil
+    or
+    exists(TypedContent head, Ap tail |
+      consCand(head, tail, config) and
+      ap = apCons(head, tail)
+    )
+  }
+
+  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+    revConsCand(tc, ap, config) and
+    validAp(ap, config)
+  }
+
  pragma[noinline]
  private predicate parameterFlow(
    ParamNodeEx p, Ap ap, Ap ap0, DataFlowCallable c, Configuration config
@@ -1706,7 +1789,8 @@ private module LocalFlowBigStep {
  private class FlowCheckNode extends NodeEx {
    FlowCheckNode() {
      castNode(this.asNode()) or
-      clearsContentCached(this.asNode(), _)
+      clearsContentCached(this.asNode(), _) or
+      expectsContentCached(this.asNode(), _)
    }
  }

@@ -1729,9 +1813,9 @@ private module LocalFlowBigStep {
      or
      node.asNode() instanceof OutNodeExt
      or
-      store(_, _, node, _, config)
+      Stage2::storeStepCand(_, _, _, node, _, config)
      or
-      read(_, _, node, config)
+      Stage2::readStepCand(_, _, node, config)
      or
      node instanceof FlowCheckNode
      or
@@ -1752,8 +1836,8 @@ private module LocalFlowBigStep {
      additionalJumpStep(node, next, config) or
      flowIntoCallNodeCand1(_, node, next, config) or
      flowOutOfCallNodeCand1(_, node, next, config) or
-      store(node, _, next, _, config) or
-      read(node, _, next, config)
+      Stage2::storeStepCand(node, _, _, next, _, config) or
+      Stage2::readStepCand(node, _, next, config)
    )
    or
    exists(NodeEx next, FlowState s | Stage2::revFlow(next, s, config) |
@@ -1926,7 +2010,34 @@ private module Stage3 {
  private predicate flowIntoCall = flowIntoCallNodeCand2/5;

  pragma[nomagic]
-  private predicate clear(NodeEx node, Ap ap) { ap.isClearedAt(node.asNode()) }
+  private predicate clearSet(NodeEx node, ContentSet c, Configuration config) {
+    PrevStage::revFlow(node, config) and
+    clearsContentCached(node.asNode(), c)
+  }
+
+  pragma[nomagic]
+  private predicate clearContent(NodeEx node, Content c, Configuration config) {
+    exists(ContentSet cs |
+      PrevStage::readStepCand(_, pragma[only_bind_into](c), _, pragma[only_bind_into](config)) and
+      c = cs.getAReadContent() and
+      clearSet(node, cs, pragma[only_bind_into](config))
+    )
+  }
+
+  pragma[nomagic]
+  private predicate clear(NodeEx node, Ap ap, Configuration config) {
+    clearContent(node, ap.getHead().getContent(), config)
+  }
+
+  pragma[nomagic]
+  private predicate expectsContentCand(NodeEx node, Ap ap, Configuration config) {
+    exists(Content c |
+      PrevStage::revFlow(node, pragma[only_bind_into](config)) and
+      PrevStage::readStepCand(_, c, _, pragma[only_bind_into](config)) and
+      expectsContentEx(node, c) and
+      c = ap.getHead().getContent()
+    )
+  }

  pragma[nomagic]
  private predicate castingNodeEx(NodeEx node) { node.asNode() instanceof CastingNode }
@@ -1935,8 +2046,13 @@ private module Stage3 {
  private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
    exists(state) and
    exists(config) and
-    not clear(node, ap) and
-    if castingNodeEx(node) then compatibleTypes(node.getDataFlowType(), ap.getType()) else any()
+    not clear(node, ap, config) and
+    (if castingNodeEx(node) then compatibleTypes(node.getDataFlowType(), ap.getType()) else any()) and
+    (
+      notExpectsContent(node)
+      or
+      expectsContentCand(node, ap, config)
+    )
  }

  bindingset[ap, contentType]
@@ -2363,7 +2479,7 @@ private module Stage3 {
    Configuration config
  ) {
    exists(Ap ap2, Content c |
-      store(node1, tc, node2, contentType, config) and
+      PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
      revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
      revFlowConsCand(ap2, c, ap1, config)
    )
@@ -2401,10 +2517,24 @@ private module Stage3 {
    storeStepFwd(_, ap, tc, _, _, config)
  }

-  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+  private predicate revConsCand(TypedContent tc, Ap ap, Configuration config) {
    storeStepCand(_, ap, tc, _, _, config)
  }

+  private predicate validAp(Ap ap, Configuration config) {
+    revFlow(_, _, _, _, ap, config) and ap instanceof ApNil
+    or
+    exists(TypedContent head, Ap tail |
+      consCand(head, tail, config) and
+      ap = apCons(head, tail)
+    )
+  }
+
+  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+    revConsCand(tc, ap, config) and
+    validAp(ap, config)
+  }
+
  pragma[noinline]
  private predicate parameterFlow(
    ParamNodeEx p, Ap ap, Ap ap0, DataFlowCallable c, Configuration config
@@ -3190,7 +3320,7 @@ private module Stage4 {
    Configuration config
  ) {
    exists(Ap ap2, Content c |
-      store(node1, tc, node2, contentType, config) and
+      PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
      revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
      revFlowConsCand(ap2, c, ap1, config)
    )
@@ -3228,10 +3358,24 @@ private module Stage4 {
    storeStepFwd(_, ap, tc, _, _, config)
  }

-  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+  private predicate revConsCand(TypedContent tc, Ap ap, Configuration config) {
    storeStepCand(_, ap, tc, _, _, config)
  }

+  private predicate validAp(Ap ap, Configuration config) {
+    revFlow(_, _, _, _, ap, config) and ap instanceof ApNil
+    or
+    exists(TypedContent head, Ap tail |
+      consCand(head, tail, config) and
+      ap = apCons(head, tail)
+    )
+  }
+
+  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+    revConsCand(tc, ap, config) and
+    validAp(ap, config)
+  }
+
  pragma[noinline]
  private predicate parameterFlow(
    ParamNodeEx p, Ap ap, Ap ap0, DataFlowCallable c, Configuration config
@@ -3300,17 +3444,28 @@ private Configuration unbindConf(Configuration conf) {
  exists(Configuration c | result = pragma[only_bind_into](c) and conf = pragma[only_bind_into](c))
 }

-private predicate nodeMayUseSummary(
-  NodeEx n, FlowState state, AccessPathApprox apa, Configuration config
+pragma[nomagic]
+private predicate nodeMayUseSummary0(
+  NodeEx n, DataFlowCallable c, FlowState state, AccessPathApprox apa, Configuration config
 ) {
-  exists(DataFlowCallable c, AccessPathApprox apa0 |
-    Stage4::parameterMayFlowThrough(_, c, apa, _) and
+  exists(AccessPathApprox apa0 |
+    Stage4::parameterMayFlowThrough(_, c, _, _) and
    Stage4::revFlow(n, state, true, _, apa0, config) and
    Stage4::fwdFlow(n, state, any(CallContextCall ccc), TAccessPathApproxSome(apa), apa0, config) and
    n.getEnclosingCallable() = c
  )
 }

+pragma[nomagic]
+private predicate nodeMayUseSummary(
+  NodeEx n, FlowState state, AccessPathApprox apa, Configuration config
+) {
+  exists(DataFlowCallable c |
+    Stage4::parameterMayFlowThrough(_, c, apa, config) and
+    nodeMayUseSummary0(n, c, state, apa, config)
+  )
+}
+
 private newtype TSummaryCtx =
  TSummaryCtxNone() or
  TSummaryCtxSome(ParamNodeEx p, FlowState state, AccessPath ap) {
@@ -3506,7 +3661,7 @@ private newtype TPathNode =
 * of dereference operations needed to get from the value in the node to the
 * tracked object. The final type indicates the type of the tracked object.
 */
-abstract private class AccessPath extends TAccessPath {
+private class AccessPath extends TAccessPath {
  /** Gets the head of this access path, if any. */
  abstract TypedContent getHead();

@@ -3721,11 +3876,14 @@ abstract private class PathNodeImpl extends PathNode {
  abstract NodeEx getNodeEx();

  predicate isHidden() {
-    hiddenNode(this.getNodeEx().asNode()) and
-    not this.isSource() and
-    not this instanceof PathNodeSink
-    or
-    this.getNodeEx() instanceof TNodeImplicitRead
+    not this.getConfiguration().includeHiddenNodes() and
+    (
+      hiddenNode(this.getNodeEx().asNode()) and
+      not this.isSource() and
+      not this instanceof PathNodeSink
+      or
+      this.getNodeEx() instanceof TNodeImplicitRead
+    )
  }

  private string ppAp() {
@@ -4202,10 +4360,16 @@ private module Subpaths {
    exists(NodeEx n1, NodeEx n2 | n1 = n.getNodeEx() and n2 = result.getNodeEx() |
      localFlowBigStep(n1, _, n2, _, _, _, _, _) or
      store(n1, _, n2, _, _) or
-      read(n1, _, n2, _)
+      readSet(n1, _, n2, _)
    )
  }

+  pragma[nomagic]
+  private predicate hasSuccessor(PathNode pred, PathNodeMid succ, NodeEx succNode) {
+    succ = pred.getASuccessor() and
+    succNode = succ.getNodeEx()
+  }
+
  /**
   * Holds if `(arg, par, ret, out)` forms a subpath-tuple, that is, flow through
   * a subpath between `par` and `ret` with the connecting edges `arg -> par` and
@@ -4213,15 +4377,13 @@ private module Subpaths {
   */
  predicate subpaths(PathNode arg, PathNodeImpl par, PathNodeImpl ret, PathNode out) {
    exists(ParamNodeEx p, NodeEx o, FlowState sout, AccessPath apout, PathNodeMid out0 |
-      pragma[only_bind_into](arg).getASuccessor() = par and
-      pragma[only_bind_into](arg).getASuccessor() = out0 and
-      subpaths03(arg, p, localStepToHidden*(ret), o, sout, apout) and
+      pragma[only_bind_into](arg).getASuccessor() = pragma[only_bind_into](out0) and
+      subpaths03(pragma[only_bind_into](arg), p, localStepToHidden*(ret), o, sout, apout) and
+      hasSuccessor(pragma[only_bind_into](arg), par, p) and
      not ret.isHidden() and
-      par.getNodeEx() = p and
-      out0.getNodeEx() = o and
-      out0.getState() = sout and
-      out0.getAp() = apout and
-      (out = out0 or out = out0.projectToSink())
+      pathNode(out0, o, sout, _, _, apout, _, _)
+    |
+      out = out0 or out = out0.projectToSink()
    )
  }

@@ -4557,7 +4719,11 @@ private module FlowExploration {
      or
      exists(PartialPathNodeRev mid |
        revPartialPathStep(mid, node, state, sc1, sc2, sc3, ap, config) and
-        not clearsContentCached(node.asNode(), ap.getHead()) and
+        not clearsContentEx(node, ap.getHead()) and
+        (
+          notExpectsContent(node) or
+          expectsContentEx(node, ap.getHead())
+        ) and
        not fullBarrier(node, config) and
        not stateBarrier(node, state, config) and
        distSink(node.getEnclosingCallable(), config) <= config.explorationLimit()
@@ -4573,7 +4739,11 @@ private module FlowExploration {
      partialPathStep(mid, node, state, cc, sc1, sc2, sc3, ap, config) and
      not fullBarrier(node, config) and
      not stateBarrier(node, state, config) and
-      not clearsContentCached(node.asNode(), ap.getHead().getContent()) and
+      not clearsContentEx(node, ap.getHead().getContent()) and
+      (
+        notExpectsContent(node) or
+        expectsContentEx(node, ap.getHead().getContent())
+      ) and
      if node.asNode() instanceof CastingNode
      then compatibleTypes(node.getDataFlowType(), ap.getType())
      else any()
--- a/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowImpl4.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowImpl4.qll
@@ -116,7 +116,7 @@ abstract class Configuration extends string {
   * Holds if an arbitrary number of implicit read steps of content `c` may be
   * taken at `node`.
   */
-  predicate allowImplicitRead(Node node, Content c) { none() }
+  predicate allowImplicitRead(Node node, ContentSet c) { none() }

  /**
   * Gets the virtual dispatch branching limit when calculating field flow.
@@ -170,6 +170,14 @@ abstract class Configuration extends string {
   */
  int explorationLimit() { none() }

+  /**
+   * Holds if hidden nodes should be included in the data flow graph.
+   *
+   * This feature should only be used for debugging or when the data flow graph
+   * is not visualized (for example in a `path-problem` query).
+   */
+  predicate includeHiddenNodes() { none() }
+
  /**
   * Holds if there is a partial data flow path from `source` to `node`. The
   * approximate distance between `node` and the closest source is `dist` and
@@ -485,8 +493,9 @@ private predicate additionalJumpStateStep(
  )
 }

-private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
-  read(node1.asNode(), c, node2.asNode()) and
+pragma[nomagic]
+private predicate readSet(NodeEx node1, ContentSet c, NodeEx node2, Configuration config) {
+  readSet(node1.asNode(), c, node2.asNode()) and
  stepFilter(node1, node2, config)
  or
  exists(Node n |
@@ -496,6 +505,37 @@ private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration conf
  )
 }

+// inline to reduce fan-out via `getAReadContent`
+bindingset[c]
+private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
+  exists(ContentSet cs |
+    readSet(node1, cs, node2, config) and
+    pragma[only_bind_out](c) = pragma[only_bind_into](cs).getAReadContent()
+  )
+}
+
+// inline to reduce fan-out via `getAReadContent`
+bindingset[c]
+private predicate clearsContentEx(NodeEx n, Content c) {
+  exists(ContentSet cs |
+    clearsContentCached(n.asNode(), cs) and
+    pragma[only_bind_out](c) = pragma[only_bind_into](cs).getAReadContent()
+  )
+}
+
+// inline to reduce fan-out via `getAReadContent`
+bindingset[c]
+private predicate expectsContentEx(NodeEx n, Content c) {
+  exists(ContentSet cs |
+    expectsContentCached(n.asNode(), cs) and
+    pragma[only_bind_out](c) = pragma[only_bind_into](cs).getAReadContent()
+  )
+}
+
+pragma[nomagic]
+private predicate notExpectsContent(NodeEx n) { not expectsContentCached(n.asNode(), _) }
+
+pragma[nomagic]
 private predicate store(
  NodeEx node1, TypedContent tc, NodeEx node2, DataFlowType contentType, Configuration config
 ) {
@@ -573,9 +613,9 @@ private module Stage1 {
    )
    or
    // read
-    exists(Content c |
-      fwdFlowRead(c, node, cc, config) and
-      fwdFlowConsCand(c, config)
+    exists(ContentSet c |
+      fwdFlowReadSet(c, node, cc, config) and
+      fwdFlowConsCandSet(c, _, config)
    )
    or
    // flow into a callable
@@ -599,10 +639,10 @@ private module Stage1 {
  private predicate fwdFlow(NodeEx node, Configuration config) { fwdFlow(node, _, config) }

  pragma[nomagic]
-  private predicate fwdFlowRead(Content c, NodeEx node, Cc cc, Configuration config) {
+  private predicate fwdFlowReadSet(ContentSet c, NodeEx node, Cc cc, Configuration config) {
    exists(NodeEx mid |
      fwdFlow(mid, cc, config) and
-      read(mid, c, node, config)
+      readSet(mid, c, node, config)
    )
  }

@@ -620,6 +660,16 @@ private module Stage1 {
    )
  }

+  /**
+   * Holds if `cs` may be interpreted in a read as the target of some store
+   * into `c`, in the flow covered by `fwdFlow`.
+   */
+  pragma[nomagic]
+  private predicate fwdFlowConsCandSet(ContentSet cs, Content c, Configuration config) {
+    fwdFlowConsCand(c, config) and
+    c = cs.getAReadContent()
+  }
+
  pragma[nomagic]
  private predicate fwdFlowReturnPosition(ReturnPosition pos, Cc cc, Configuration config) {
    exists(RetNodeEx ret |
@@ -712,9 +762,9 @@ private module Stage1 {
    )
    or
    // read
-    exists(NodeEx mid, Content c |
-      read(node, c, mid, config) and
-      fwdFlowConsCand(c, pragma[only_bind_into](config)) and
+    exists(NodeEx mid, ContentSet c |
+      readSet(node, c, mid, config) and
+      fwdFlowConsCandSet(c, _, pragma[only_bind_into](config)) and
      revFlow(mid, toReturn, pragma[only_bind_into](config))
    )
    or
@@ -740,10 +790,10 @@ private module Stage1 {
   */
  pragma[nomagic]
  private predicate revFlowConsCand(Content c, Configuration config) {
-    exists(NodeEx mid, NodeEx node |
+    exists(NodeEx mid, NodeEx node, ContentSet cs |
      fwdFlow(node, pragma[only_bind_into](config)) and
-      read(node, c, mid, config) and
-      fwdFlowConsCand(c, pragma[only_bind_into](config)) and
+      readSet(node, cs, mid, config) and
+      fwdFlowConsCandSet(cs, c, pragma[only_bind_into](config)) and
      revFlow(pragma[only_bind_into](mid), _, pragma[only_bind_into](config))
    )
  }
@@ -762,7 +812,8 @@ private module Stage1 {
   * Holds if `c` is the target of both a read and a store in the flow covered
   * by `revFlow`.
   */
-  private predicate revFlowIsReadAndStored(Content c, Configuration conf) {
+  pragma[nomagic]
+  predicate revFlowIsReadAndStored(Content c, Configuration conf) {
    revFlowConsCand(c, conf) and
    revFlowStore(c, _, _, conf)
  }
@@ -861,8 +912,8 @@ private module Stage1 {
  pragma[nomagic]
  predicate readStepCand(NodeEx n1, Content c, NodeEx n2, Configuration config) {
    revFlowIsReadAndStored(c, pragma[only_bind_into](config)) and
-    revFlow(n2, pragma[only_bind_into](config)) and
-    read(n1, c, n2, pragma[only_bind_into](config))
+    read(n1, c, n2, pragma[only_bind_into](config)) and
+    revFlow(n2, pragma[only_bind_into](config))
  }

  pragma[nomagic]
@@ -872,7 +923,10 @@ private module Stage1 {
  predicate revFlow(
    NodeEx node, FlowState state, boolean toReturn, ApOption returnAp, Ap ap, Configuration config
  ) {
-    revFlow(node, toReturn, config) and exists(state) and exists(returnAp) and exists(ap)
+    revFlow(node, toReturn, pragma[only_bind_into](config)) and
+    exists(state) and
+    exists(returnAp) and
+    exists(ap)
  }

  private predicate throughFlowNodeCand(NodeEx node, Configuration config) {
@@ -1147,11 +1201,26 @@ private module Stage2 {

  private predicate flowIntoCall = flowIntoCallNodeCand1/5;

+  pragma[nomagic]
+  private predicate expectsContentCand(NodeEx node, Configuration config) {
+    exists(Content c |
+      PrevStage::revFlow(node, pragma[only_bind_into](config)) and
+      PrevStage::revFlowIsReadAndStored(c, pragma[only_bind_into](config)) and
+      expectsContentEx(node, c)
+    )
+  }
+
  bindingset[node, state, ap, config]
  private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
-    PrevStage::revFlowState(state, config) and
+    PrevStage::revFlowState(state, pragma[only_bind_into](config)) and
    exists(ap) and
-    not stateBarrier(node, state, config)
+    not stateBarrier(node, state, config) and
+    (
+      notExpectsContent(node)
+      or
+      ap = true and
+      expectsContentCand(node, config)
+    )
  }

  bindingset[ap, contentType]
@@ -1574,7 +1643,7 @@ private module Stage2 {
    Configuration config
  ) {
    exists(Ap ap2, Content c |
-      store(node1, tc, node2, contentType, config) and
+      PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
      revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
      revFlowConsCand(ap2, c, ap1, config)
    )
@@ -1612,10 +1681,24 @@ private module Stage2 {
    storeStepFwd(_, ap, tc, _, _, config)
  }

-  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+  private predicate revConsCand(TypedContent tc, Ap ap, Configuration config) {
    storeStepCand(_, ap, tc, _, _, config)
  }

+  private predicate validAp(Ap ap, Configuration config) {
+    revFlow(_, _, _, _, ap, config) and ap instanceof ApNil
+    or
+    exists(TypedContent head, Ap tail |
+      consCand(head, tail, config) and
+      ap = apCons(head, tail)
+    )
+  }
+
+  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+    revConsCand(tc, ap, config) and
+    validAp(ap, config)
+  }
+
  pragma[noinline]
  private predicate parameterFlow(
    ParamNodeEx p, Ap ap, Ap ap0, DataFlowCallable c, Configuration config
@@ -1706,7 +1789,8 @@ private module LocalFlowBigStep {
  private class FlowCheckNode extends NodeEx {
    FlowCheckNode() {
      castNode(this.asNode()) or
-      clearsContentCached(this.asNode(), _)
+      clearsContentCached(this.asNode(), _) or
+      expectsContentCached(this.asNode(), _)
    }
  }

@@ -1729,9 +1813,9 @@ private module LocalFlowBigStep {
      or
      node.asNode() instanceof OutNodeExt
      or
-      store(_, _, node, _, config)
+      Stage2::storeStepCand(_, _, _, node, _, config)
      or
-      read(_, _, node, config)
+      Stage2::readStepCand(_, _, node, config)
      or
      node instanceof FlowCheckNode
      or
@@ -1752,8 +1836,8 @@ private module LocalFlowBigStep {
      additionalJumpStep(node, next, config) or
      flowIntoCallNodeCand1(_, node, next, config) or
      flowOutOfCallNodeCand1(_, node, next, config) or
-      store(node, _, next, _, config) or
-      read(node, _, next, config)
+      Stage2::storeStepCand(node, _, _, next, _, config) or
+      Stage2::readStepCand(node, _, next, config)
    )
    or
    exists(NodeEx next, FlowState s | Stage2::revFlow(next, s, config) |
@@ -1926,7 +2010,34 @@ private module Stage3 {
  private predicate flowIntoCall = flowIntoCallNodeCand2/5;

  pragma[nomagic]
-  private predicate clear(NodeEx node, Ap ap) { ap.isClearedAt(node.asNode()) }
+  private predicate clearSet(NodeEx node, ContentSet c, Configuration config) {
+    PrevStage::revFlow(node, config) and
+    clearsContentCached(node.asNode(), c)
+  }
+
+  pragma[nomagic]
+  private predicate clearContent(NodeEx node, Content c, Configuration config) {
+    exists(ContentSet cs |
+      PrevStage::readStepCand(_, pragma[only_bind_into](c), _, pragma[only_bind_into](config)) and
+      c = cs.getAReadContent() and
+      clearSet(node, cs, pragma[only_bind_into](config))
+    )
+  }
+
+  pragma[nomagic]
+  private predicate clear(NodeEx node, Ap ap, Configuration config) {
+    clearContent(node, ap.getHead().getContent(), config)
+  }
+
+  pragma[nomagic]
+  private predicate expectsContentCand(NodeEx node, Ap ap, Configuration config) {
+    exists(Content c |
+      PrevStage::revFlow(node, pragma[only_bind_into](config)) and
+      PrevStage::readStepCand(_, c, _, pragma[only_bind_into](config)) and
+      expectsContentEx(node, c) and
+      c = ap.getHead().getContent()
+    )
+  }

  pragma[nomagic]
  private predicate castingNodeEx(NodeEx node) { node.asNode() instanceof CastingNode }
@@ -1935,8 +2046,13 @@ private module Stage3 {
  private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
    exists(state) and
    exists(config) and
-    not clear(node, ap) and
-    if castingNodeEx(node) then compatibleTypes(node.getDataFlowType(), ap.getType()) else any()
+    not clear(node, ap, config) and
+    (if castingNodeEx(node) then compatibleTypes(node.getDataFlowType(), ap.getType()) else any()) and
+    (
+      notExpectsContent(node)
+      or
+      expectsContentCand(node, ap, config)
+    )
  }

  bindingset[ap, contentType]
@@ -2363,7 +2479,7 @@ private module Stage3 {
    Configuration config
  ) {
    exists(Ap ap2, Content c |
-      store(node1, tc, node2, contentType, config) and
+      PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
      revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
      revFlowConsCand(ap2, c, ap1, config)
    )
@@ -2401,10 +2517,24 @@ private module Stage3 {
    storeStepFwd(_, ap, tc, _, _, config)
  }

-  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+  private predicate revConsCand(TypedContent tc, Ap ap, Configuration config) {
    storeStepCand(_, ap, tc, _, _, config)
  }

+  private predicate validAp(Ap ap, Configuration config) {
+    revFlow(_, _, _, _, ap, config) and ap instanceof ApNil
+    or
+    exists(TypedContent head, Ap tail |
+      consCand(head, tail, config) and
+      ap = apCons(head, tail)
+    )
+  }
+
+  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+    revConsCand(tc, ap, config) and
+    validAp(ap, config)
+  }
+
  pragma[noinline]
  private predicate parameterFlow(
    ParamNodeEx p, Ap ap, Ap ap0, DataFlowCallable c, Configuration config
@@ -3190,7 +3320,7 @@ private module Stage4 {
    Configuration config
  ) {
    exists(Ap ap2, Content c |
-      store(node1, tc, node2, contentType, config) and
+      PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
      revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
      revFlowConsCand(ap2, c, ap1, config)
    )
@@ -3228,10 +3358,24 @@ private module Stage4 {
    storeStepFwd(_, ap, tc, _, _, config)
  }

-  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+  private predicate revConsCand(TypedContent tc, Ap ap, Configuration config) {
    storeStepCand(_, ap, tc, _, _, config)
  }

+  private predicate validAp(Ap ap, Configuration config) {
+    revFlow(_, _, _, _, ap, config) and ap instanceof ApNil
+    or
+    exists(TypedContent head, Ap tail |
+      consCand(head, tail, config) and
+      ap = apCons(head, tail)
+    )
+  }
+
+  predicate consCand(TypedContent tc, Ap ap, Configuration config) {
+    revConsCand(tc, ap, config) and
+    validAp(ap, config)
+  }
+
  pragma[noinline]
  private predicate parameterFlow(
    ParamNodeEx p, Ap ap, Ap ap0, DataFlowCallable c, Configuration config
@@ -3300,17 +3444,28 @@ private Configuration unbindConf(Configuration conf) {
  exists(Configuration c | result = pragma[only_bind_into](c) and conf = pragma[only_bind_into](c))
 }

-private predicate nodeMayUseSummary(
-  NodeEx n, FlowState state, AccessPathApprox apa, Configuration config
+pragma[nomagic]
+private predicate nodeMayUseSummary0(
+  NodeEx n, DataFlowCallable c, FlowState state, AccessPathApprox apa, Configuration config
 ) {
-  exists(DataFlowCallable c, AccessPathApprox apa0 |
-    Stage4::parameterMayFlowThrough(_, c, apa, _) and
+  exists(AccessPathApprox apa0 |
+    Stage4::parameterMayFlowThrough(_, c, _, _) and
    Stage4::revFlow(n, state, true, _, apa0, config) and
    Stage4::fwdFlow(n, state, any(CallContextCall ccc), TAccessPathApproxSome(apa), apa0, config) and
    n.getEnclosingCallable() = c
  )
 }

+pragma[nomagic]
+private predicate nodeMayUseSummary(
+  NodeEx n, FlowState state, AccessPathApprox apa, Configuration config
+) {
+  exists(DataFlowCallable c |
+    Stage4::parameterMayFlowThrough(_, c, apa, config) and
+    nodeMayUseSummary0(n, c, state, apa, config)
+  )
+}
+
 private newtype TSummaryCtx =
  TSummaryCtxNone() or
  TSummaryCtxSome(ParamNodeEx p, FlowState state, AccessPath ap) {
@@ -3506,7 +3661,7 @@ private newtype TPathNode =
 * of dereference operations needed to get from the value in the node to the
 * tracked object. The final type indicates the type of the tracked object.
 */
-abstract private class AccessPath extends TAccessPath {
+private class AccessPath extends TAccessPath {
  /** Gets the head of this access path, if any. */
  abstract TypedContent getHead();

@@ -3721,11 +3876,14 @@ abstract private class PathNodeImpl extends PathNode {
  abstract NodeEx getNodeEx();

  predicate isHidden() {
-    hiddenNode(this.getNodeEx().asNode()) and
-    not this.isSource() and
-    not this instanceof PathNodeSink
-    or
-    this.getNodeEx() instanceof TNodeImplicitRead
+    not this.getConfiguration().includeHiddenNodes() and
+    (
+      hiddenNode(this.getNodeEx().asNode()) and
+      not this.isSource() and
+      not this instanceof PathNodeSink
+      or
+      this.getNodeEx() instanceof TNodeImplicitRead
+    )
  }

  private string ppAp() {
@@ -4202,10 +4360,16 @@ private module Subpaths {
    exists(NodeEx n1, NodeEx n2 | n1 = n.getNodeEx() and n2 = result.getNodeEx() |
      localFlowBigStep(n1, _, n2, _, _, _, _, _) or
      store(n1, _, n2, _, _) or
-      read(n1, _, n2, _)
+      readSet(n1, _, n2, _)
    )
  }

+  pragma[nomagic]
+  private predicate hasSuccessor(PathNode pred, PathNodeMid succ, NodeEx succNode) {
+    succ = pred.getASuccessor() and
+    succNode = succ.getNodeEx()
+  }
+
  /**
   * Holds if `(arg, par, ret, out)` forms a subpath-tuple, that is, flow through
   * a subpath between `par` and `ret` with the connecting edges `arg -> par` and
@@ -4213,15 +4377,13 @@ private module Subpaths {
   */
  predicate subpaths(PathNode arg, PathNodeImpl par, PathNodeImpl ret, PathNode out) {
    exists(ParamNodeEx p, NodeEx o, FlowState sout, AccessPath apout, PathNodeMid out0 |
-      pragma[only_bind_into](arg).getASuccessor() = par and
-      pragma[only_bind_into](arg).getASuccessor() = out0 and
-      subpaths03(arg, p, localStepToHidden*(ret), o, sout, apout) and
+      pragma[only_bind_into](arg).getASuccessor() = pragma[only_bind_into](out0) and
+      subpaths03(pragma[only_bind_into](arg), p, localStepToHidden*(ret), o, sout, apout) and
+      hasSuccessor(pragma[only_bind_into](arg), par, p) and
      not ret.isHidden() and
-      par.getNodeEx() = p and
-      out0.getNodeEx() = o and
-      out0.getState() = sout and
-      out0.getAp() = apout and
-      (out = out0 or out = out0.projectToSink())
+      pathNode(out0, o, sout, _, _, apout, _, _)
+    |
+      out = out0 or out = out0.projectToSink()
    )
  }

@@ -4557,7 +4719,11 @@ private module FlowExploration {
      or
      exists(PartialPathNodeRev mid |
        revPartialPathStep(mid, node, state, sc1, sc2, sc3, ap, config) and
-        not clearsContentCached(node.asNode(), ap.getHead()) and
+        not clearsContentEx(node, ap.getHead()) and
+        (
+          notExpectsContent(node) or
+          expectsContentEx(node, ap.getHead())
+        ) and
        not fullBarrier(node, config) and
        not stateBarrier(node, state, config) and
        distSink(node.getEnclosingCallable(), config) <= config.explorationLimit()
@@ -4573,7 +4739,11 @@ private module FlowExploration {
      partialPathStep(mid, node, state, cc, sc1, sc2, sc3, ap, config) and
      not fullBarrier(node, config) and
      not stateBarrier(node, state, config) and
-      not clearsContentCached(node.asNode(), ap.getHead().getContent()) and
+      not clearsContentEx(node, ap.getHead().getContent()) and
+      (
+        notExpectsContent(node) or
+        expectsContentEx(node, ap.getHead().getContent())
+      ) and
      if node.asNode() instanceof CastingNode
      then compatibleTypes(node.getDataFlowType(), ap.getType())
      else any()
--- a/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowImplCommon.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowImplCommon.qll
@@ -216,10 +216,9 @@ private module LambdaFlow {
    or
    // jump step
    exists(Node mid, DataFlowType t0 |
-      revLambdaFlow(lambdaCall, kind, mid, t0, _, _, _) and
+      revLambdaFlow(lambdaCall, kind, mid, t0, _, _, lastCall) and
      toReturn = false and
-      toJump = true and
-      lastCall = TDataFlowCallNone()
+      toJump = true
    |
      jumpStepCached(node, mid) and
      t = t0
@@ -305,7 +304,7 @@ cached
 private module Cached {
  /**
   * If needed, call this predicate from `DataFlowImplSpecific.qll` in order to
-   * force a stage-dependency on the `DataFlowImplCommon.qll` stage and therby
+   * force a stage-dependency on the `DataFlowImplCommon.qll` stage and thereby
   * collapsing the two stages.
   */
  cached
@@ -326,7 +325,10 @@ private module Cached {
  predicate jumpStepCached(Node node1, Node node2) { jumpStep(node1, node2) }

  cached
-  predicate clearsContentCached(Node n, Content c) { clearsContent(n, c) }
+  predicate clearsContentCached(Node n, ContentSet c) { clearsContent(n, c) }
+
+  cached
+  predicate expectsContentCached(Node n, ContentSet c) { expectsContent(n, c) }

  cached
  predicate isUnreachableInCallCached(Node n, DataFlowCall call) { isUnreachableInCall(n, call) }
@@ -373,7 +375,7 @@ private module Cached {
    // For reads, `x.f`, we want to check that the tracked type after the read (which
    // is obtained by popping the head of the access path stack) is compatible with
    // the type of `x.f`.
-    read(_, _, n)
+    readSet(_, _, n)
  }

  cached
@@ -469,7 +471,7 @@ private module Cached {
        // read
        exists(Node mid |
          parameterValueFlowCand(p, mid, false) and
-          read(mid, _, node) and
+          readSet(mid, _, node) and
          read = true
        )
        or
@@ -657,8 +659,10 @@ private module Cached {
       * Holds if `arg` flows to `out` through a call using only
       * value-preserving steps and a single read step, not taking call
       * contexts into account, thus representing a getter-step.
+       *
+       * This predicate is exposed for testing only.
       */
-      predicate getterStep(ArgNode arg, Content c, Node out) {
+      predicate getterStep(ArgNode arg, ContentSet c, Node out) {
        argumentValueFlowsThrough(arg, TReadStepTypesSome(_, c, _), out)
      }

@@ -781,8 +785,12 @@ private module Cached {
    parameterValueFlow(p, n.getPreUpdateNode(), TReadStepTypesNone())
  }

-  private predicate store(
-    Node node1, Content c, Node node2, DataFlowType contentType, DataFlowType containerType
+  cached
+  predicate readSet(Node node1, ContentSet c, Node node2) { readStep(node1, c, node2) }
+
+  cached
+  predicate storeSet(
+    Node node1, ContentSet c, Node node2, DataFlowType contentType, DataFlowType containerType
  ) {
    storeStep(node1, c, node2) and
    contentType = getNodeDataFlowType(node1) and
@@ -794,14 +802,19 @@ private module Cached {
    |
      argumentValueFlowsThrough(n2, TReadStepTypesSome(containerType, c, contentType), n1)
      or
-      read(n2, c, n1) and
+      readSet(n2, c, n1) and
      contentType = getNodeDataFlowType(n1) and
      containerType = getNodeDataFlowType(n2)
    )
  }

-  cached
-  predicate read(Node node1, Content c, Node node2) { readStep(node1, c, node2) }
+  private predicate store(
+    Node node1, Content c, Node node2, DataFlowType contentType, DataFlowType containerType
+  ) {
+    exists(ContentSet cs |
+      c = cs.getAStoreContent() and storeSet(node1, cs, node2, contentType, containerType)
+    )
+  }

  /**
   * Holds if data can flow from `node1` to `node2` via a direct assignment to
@@ -932,16 +945,16 @@ class CastingNode extends Node {
 }

 private predicate readStepWithTypes(
-  Node n1, DataFlowType container, Content c, Node n2, DataFlowType content
+  Node n1, DataFlowType container, ContentSet c, Node n2, DataFlowType content
 ) {
-  read(n1, c, n2) and
+  readSet(n1, c, n2) and
  container = getNodeDataFlowType(n1) and
  content = getNodeDataFlowType(n2)
 }

 private newtype TReadStepTypesOption =
  TReadStepTypesNone() or
-  TReadStepTypesSome(DataFlowType container, Content c, DataFlowType content) {
+  TReadStepTypesSome(DataFlowType container, ContentSet c, DataFlowType content) {
    readStepWithTypes(_, container, c, _, content)
  }

@@ -950,7 +963,7 @@ private class ReadStepTypesOption extends TReadStepTypesOption {

  DataFlowType getContainerType() { this = TReadStepTypesSome(result, _, _) }

-  Content getContent() { this = TReadStepTypesSome(_, result, _) }
+  ContentSet getContent() { this = TReadStepTypesSome(_, result, _) }

  DataFlowType getContentType() { this = TReadStepTypesSome(_, _, result) }

@@ -1325,8 +1338,6 @@ abstract class AccessPathFront extends TAccessPathFront {
  abstract boolean toBoolNonEmpty();

  TypedContent getHead() { this = TFrontHead(result) }
-
-  predicate isClearedAt(Node n) { clearsContentCached(n, this.getHead().getContent()) }
 }

 class AccessPathFrontNil extends AccessPathFront, TFrontNil {
--- a/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowImplSpecific.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowImplSpecific.qll
@@ -1,9 +1,15 @@
 /**
 * Provides Python-specific definitions for use in the data flow library.
 */
+
+// we need to export `Unit` for the DataFlowImpl* files
+private import python as Python
+
 module Private {
  import DataFlowPrivate
+
  //   import DataFlowDispatch
+  class Unit = Python::Unit;
 }

 module Public {
--- a/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll
@@ -813,6 +813,12 @@ predicate clearsContent(Node n, Content c) {
  attributeClearStep(n, c)
 }

+/**
+ * Holds if the value that is being tracked is expected to be stored inside content `c`
+ * at node `n`.
+ */
+predicate expectsContent(Node n, ContentSet c) { none() }
+
 /**
 * Holds if values stored inside attribute `c` are cleared at node `n`.
 *
--- a/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPublic.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPublic.qll
@@ -211,7 +211,7 @@ class CallCfgNode extends CfgNode, LocalSourceNode {
   */
  Node getFunction() { result.asCfgNode() = node.getFunction() }

-  /** Gets the data-flow node corresponding to the i'th argument of the call corresponding to this data-flow node */
+  /** Gets the data-flow node corresponding to the i'th positional argument of the call corresponding to this data-flow node */
  Node getArg(int i) { result.asCfgNode() = node.getArg(i) }

  /** Gets the data-flow node corresponding to the named argument of the call corresponding to this data-flow node */
@@ -401,8 +401,15 @@ class ModuleVariableNode extends Node, TModuleVariableNode {
 private predicate isAccessedThroughImportStar(Module m) { m = ImportStar::getStarImported(_) }

 private ModuleVariableNode import_star_read(Node n) {
-  ImportStar::importStarResolvesTo(n.asCfgNode(), result.getModule()) and
-  n.asCfgNode().(NameNode).getId() = result.getVariable().getId()
+  resolved_import_star_module(result.getModule(), result.getVariable().getId(), n)
+}
+
+pragma[nomagic]
+private predicate resolved_import_star_module(Module m, string name, Node n) {
+  exists(NameNode nn | nn = n.asCfgNode() |
+    ImportStar::importStarResolvesTo(pragma[only_bind_into](nn), m) and
+    nn.getId() = name
+  )
 }

 /**
@@ -643,3 +650,20 @@ class AttributeContent extends TAttributeContent, Content {

  override string toString() { result = "Attribute " + attr }
 }
+
+/**
+ * An entity that represents a set of `Content`s.
+ *
+ * The set may be interpreted differently depending on whether it is
+ * stored into (`getAStoreContent`) or read from (`getAReadContent`).
+ */
+class ContentSet instanceof Content {
+  /** Gets a content that may be stored into when storing into this set. */
+  Content getAStoreContent() { result = this }
+
+  /** Gets a content that may be read from when reading from this set. */
+  Content getAReadContent() { result = this }
+
+  /** Gets a textual representation of this content set. */
+  string toString() { result = super.toString() }
+}
--- a/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowUtil.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowUtil.qll
@@ -2,6 +2,7 @@
 * Contains utility functions for writing data flow queries
 */

+private import python
 private import DataFlowPrivate
 import DataFlowPublic

--- a/python/ql/lib/semmle/python/dataflow/new/internal/LocalSources.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/LocalSources.qll
@@ -6,7 +6,7 @@
 * local tracking within a function.
 */

-import python
+private import python
 import DataFlowPublic
 private import DataFlowPrivate
 private import semmle.python.internal.CachedStages
--- a/python/ql/lib/semmle/python/dataflow/new/internal/tainttracking1/TaintTrackingImpl.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/tainttracking1/TaintTrackingImpl.qll
@@ -161,7 +161,7 @@ abstract class Configuration extends DataFlow::Configuration {
    this.isAdditionalTaintStep(node1, state1, node2, state2)
  }

-  override predicate allowImplicitRead(DataFlow::Node node, DataFlow::Content c) {
+  override predicate allowImplicitRead(DataFlow::Node node, DataFlow::ContentSet c) {
    (this.isSink(node) or this.isAdditionalTaintStep(node, _)) and
    defaultImplicitTaintRead(node, c)
  }
--- a/python/ql/lib/semmle/python/dataflow/new/internal/tainttracking2/TaintTrackingImpl.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/tainttracking2/TaintTrackingImpl.qll
@@ -161,7 +161,7 @@ abstract class Configuration extends DataFlow::Configuration {
    this.isAdditionalTaintStep(node1, state1, node2, state2)
  }

-  override predicate allowImplicitRead(DataFlow::Node node, DataFlow::Content c) {
+  override predicate allowImplicitRead(DataFlow::Node node, DataFlow::ContentSet c) {
    (this.isSink(node) or this.isAdditionalTaintStep(node, _)) and
    defaultImplicitTaintRead(node, c)
  }
--- a/python/ql/lib/semmle/python/dataflow/new/internal/tainttracking3/TaintTrackingImpl.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/tainttracking3/TaintTrackingImpl.qll
@@ -161,7 +161,7 @@ abstract class Configuration extends DataFlow::Configuration {
    this.isAdditionalTaintStep(node1, state1, node2, state2)
  }

-  override predicate allowImplicitRead(DataFlow::Node node, DataFlow::Content c) {
+  override predicate allowImplicitRead(DataFlow::Node node, DataFlow::ContentSet c) {
    (this.isSink(node) or this.isAdditionalTaintStep(node, _)) and
    defaultImplicitTaintRead(node, c)
  }
--- a/python/ql/lib/semmle/python/dataflow/new/internal/tainttracking4/TaintTrackingImpl.qll
+++ b/python/ql/lib/semmle/python/dataflow/new/internal/tainttracking4/TaintTrackingImpl.qll
@@ -161,7 +161,7 @@ abstract class Configuration extends DataFlow::Configuration {
    this.isAdditionalTaintStep(node1, state1, node2, state2)
  }

-  override predicate allowImplicitRead(DataFlow::Node node, DataFlow::Content c) {
+  override predicate allowImplicitRead(DataFlow::Node node, DataFlow::ContentSet c) {
    (this.isSink(node) or this.isAdditionalTaintStep(node, _)) and
    defaultImplicitTaintRead(node, c)
  }
--- a/python/ql/lib/semmle/python/dataflow/old/StateTracking.qll
+++ b/python/ql/lib/semmle/python/dataflow/old/StateTracking.qll
@@ -9,7 +9,6 @@
 */

 import python
-private import semmle.python.pointsto.Base
 private import semmle.python.pointsto.PointsTo
 private import semmle.python.pointsto.PointsToContext
 private import semmle.python.objects.ObjectInternal
@@ -46,14 +45,14 @@ abstract class TrackableState extends string {
  /**
   * Holds if state starts at `f`.
   * Either this predicate or `startsAt(ControlFlowNode f, Context ctx)`
-   * should be overriden by sub-classes.
+   * should be overridden by sub-classes.
   */
  predicate startsAt(ControlFlowNode f) { none() }

  /**
   * Holds if state starts at `f` given context `ctx`.
   * Either this predicate or `startsAt(ControlFlowNode f)`
-   * should be overriden by sub-classes.
+   * should be overridden by sub-classes.
   */
  pragma[noinline]
  predicate startsAt(ControlFlowNode f, Context ctx) { ctx.appliesTo(f) and this.startsAt(f) }
@@ -61,14 +60,14 @@ abstract class TrackableState extends string {
  /**
   * Holds if state ends at `f`.
   * Either this predicate or `endsAt(ControlFlowNode f, Context ctx)`
-   * may be overriden by sub-classes.
+   * may be overridden by sub-classes.
   */
  predicate endsAt(ControlFlowNode f) { none() }

  /**
   * Holds if state ends at `f` given context `ctx`.
   * Either this predicate or `endsAt(ControlFlowNode f)`
-   * may be overriden by sub-classes.
+   * may be overridden by sub-classes.
   */
  pragma[noinline]
  predicate endsAt(ControlFlowNode f, Context ctx) { ctx.appliesTo(f) and this.endsAt(f) }
--- a/python/ql/lib/semmle/python/essa/Essa.qll
+++ b/python/ql/lib/semmle/python/essa/Essa.qll
@@ -498,13 +498,13 @@ private EssaVariable potential_input(EssaNodeRefinement ref) {

 /** An assignment to a variable `v = val` */
 class AssignmentDefinition extends EssaNodeDefinition {
+  ControlFlowNode value;
+
  AssignmentDefinition() {
-    SsaSource::assignment_definition(this.getSourceVariable(), this.getDefiningNode(), _)
+    SsaSource::assignment_definition(this.getSourceVariable(), this.getDefiningNode(), value)
  }

-  ControlFlowNode getValue() {
-    SsaSource::assignment_definition(this.getSourceVariable(), this.getDefiningNode(), result)
-  }
+  ControlFlowNode getValue() { result = value }

  override string getRepresentation() { result = this.getValue().getNode().toString() }

@@ -764,7 +764,8 @@ class CallsiteRefinement extends EssaNodeRefinement {
 /** An implicit (possible) modification of the object referred at a method call */
 class MethodCallsiteRefinement extends EssaNodeRefinement {
  MethodCallsiteRefinement() {
-    SsaSource::method_call_refinement(this.getSourceVariable(), _, this.getDefiningNode()) and
+    SsaSource::method_call_refinement(pragma[only_bind_into](this.getSourceVariable()), _,
+      this.getDefiningNode()) and
    not this instanceof SingleSuccessorGuard
  }

--- a/python/ql/lib/semmle/python/essa/SsaCompute.qll
+++ b/python/ql/lib/semmle/python/essa/SsaCompute.qll
@@ -496,8 +496,8 @@ private module SsaComputeImpl {
    predicate firstUse(EssaDefinition def, ControlFlowNode use) {
      exists(SsaSourceVariable v, BasicBlock b1, int i1, BasicBlock b2, int i2 |
        adjacentVarRefs(v, b1, i1, b2, i2) and
-        definesAt(def, v, b1, i1) and
-        variableSourceUse(v, use, b2, i2)
+        definesAt(def, pragma[only_bind_into](v), b1, i1) and
+        variableSourceUse(pragma[only_bind_into](v), use, b2, i2)
      )
      or
      exists(
--- a/python/ql/lib/semmle/python/essa/SsaDefinitions.qll
+++ b/python/ql/lib/semmle/python/essa/SsaDefinitions.qll
@@ -4,7 +4,6 @@
 */

 import python
-private import semmle.python.pointsto.Base
 private import semmle.python.internal.CachedStages

 cached
--- a/python/ql/lib/semmle/python/frameworks/Aiomysql.qll
+++ b/python/ql/lib/semmle/python/frameworks/Aiomysql.qll
@@ -25,7 +25,7 @@ private module Aiomysql {
  /**
   * Gets a `Connection` that is created when
   * - the result of `aiomysql.connect()` is awaited.
-   * - the result of calling `aquire` on a `ConnectionPool` is awaited.
+   * - the result of calling `acquire` on a `ConnectionPool` is awaited.
   * See https://aiomysql.readthedocs.io/en/stable/connection.html#connection
   */
  API::Node connection() {
@@ -82,7 +82,7 @@ private module Aiomysql {
  }

  /**
-   * Gets an `SAConnection` that is created when the result of calling `aquire` on an `Engine` is awaited.
+   * Gets an `SAConnection` that is created when the result of calling `acquire` on an `Engine` is awaited.
   * See https://aiomysql.readthedocs.io/en/stable/sa.html#connection
   */
  API::Node saConnection() { result = engine().getMember("acquire").getReturn().getAwaited() }
--- a/python/ql/lib/semmle/python/frameworks/Aiopg.qll
+++ b/python/ql/lib/semmle/python/frameworks/Aiopg.qll
@@ -25,7 +25,7 @@ private module Aiopg {
  /**
   * Gets a `Connection` that is created when
   * - the result of `aiopg.connect()` is awaited.
-   * - the result of calling `aquire` on a `ConnectionPool` is awaited.
+   * - the result of calling `acquire` on a `ConnectionPool` is awaited.
   * See https://aiopg.readthedocs.io/en/stable/core.html#connection
   */
  API::Node connection() {
@@ -78,7 +78,7 @@ private module Aiopg {
  }

  /**
-   * Gets an `SAConnection` that is created when the result of calling `aquire` on an `Engine` is awaited.
+   * Gets an `SAConnection` that is created when the result of calling `acquire` on an `Engine` is awaited.
   * See https://aiopg.readthedocs.io/en/stable/sa.html#connection
   */
  API::Node saConnection() { result = engine().getMember("acquire").getReturn().getAwaited() }
--- a/python/ql/lib/semmle/python/frameworks/Asyncpg.qll
+++ b/python/ql/lib/semmle/python/frameworks/Asyncpg.qll
@@ -20,7 +20,7 @@ private module Asyncpg {
  /**
   * Gets a `Connection` that is created when
   * - the result of `asyncpg.connect()` is awaited.
-   * - the result of calling `aquire` on a `ConnectionPool` is awaited.
+   * - the result of calling `acquire` on a `ConnectionPool` is awaited.
   */
  API::Node connection() {
    result = API::moduleImport("asyncpg").getMember("connect").getReturn().getAwaited()
@@ -33,8 +33,8 @@ private module Asyncpg {
    string methodName;

    SqlExecutionOnConnection() {
-      methodName in ["copy_from_query", "execute", "fetch", "fetchrow", "fetchval", "executemany"] and
-      this.calls([connectionPool().getAUse(), connection().getAUse()], methodName)
+      this = [connectionPool(), connection()].getMember(methodName).getACall() and
+      methodName in ["copy_from_query", "execute", "fetch", "fetchrow", "fetchval", "executemany"]
    }

    override DataFlow::Node getSql() {
@@ -51,8 +51,8 @@ private module Asyncpg {
    string methodName;

    FileAccessOnConnection() {
-      methodName in ["copy_from_query", "copy_from_table", "copy_to_table"] and
-      this.calls([connectionPool().getAUse(), connection().getAUse()], methodName)
+      this = [connectionPool(), connection()].getMember(methodName).getACall() and
+      methodName in ["copy_from_query", "copy_from_table", "copy_to_table"]
    }

    // The path argument is keyword only.
@@ -69,7 +69,7 @@ private module Asyncpg {
   * Provides models of the `PreparedStatement` class in `asyncpg`.
   * `PreparedStatement`s are created when the result of calling `prepare(query)` on a connection is awaited.
   * The result of calling `prepare(query)` is a `PreparedStatementFactory` and the argument, `query` needs to
-   * be tracked to the place where a `PreparedStatement` is created and then futher to any executing methods.
+   * be tracked to the place where a `PreparedStatement` is created and then further to any executing methods.
   * Hence the two type trackers.
   */
  module PreparedStatement {
--- a/python/ql/lib/semmle/python/frameworks/Cryptography.qll
+++ b/python/ql/lib/semmle/python/frameworks/Cryptography.qll
@@ -22,7 +22,7 @@ private module CryptographyModel {
     * Gets a predefined curve class from
     * `cryptography.hazmat.primitives.asymmetric.ec` with a specific key size (in bits).
     */
-    private API::Node predefinedCurveClass(int keySize) {
+    API::Node predefinedCurveClass(int keySize) {
      exists(string curveName |
        result =
          API::moduleImport("cryptography")
@@ -73,41 +73,6 @@ private module CryptographyModel {
        curveName = "BrainpoolP512R1" and keySize = 512
      )
    }
-
-    /** Gets a reference to a predefined curve class with a specific key size (in bits), as well as the origin of the class. */
-    private DataFlow::TypeTrackingNode curveClassWithKeySize(
-      DataFlow::TypeTracker t, int keySize, DataFlow::Node origin
-    ) {
-      t.start() and
-      result = predefinedCurveClass(keySize).getAnImmediateUse() and
-      origin = result
-      or
-      exists(DataFlow::TypeTracker t2 |
-        result = curveClassWithKeySize(t2, keySize, origin).track(t2, t)
-      )
-    }
-
-    /** Gets a reference to a predefined curve class with a specific key size (in bits), as well as the origin of the class. */
-    DataFlow::Node curveClassWithKeySize(int keySize, DataFlow::Node origin) {
-      curveClassWithKeySize(DataFlow::TypeTracker::end(), keySize, origin).flowsTo(result)
-    }
-
-    /** Gets a reference to a predefined curve class instance with a specific key size (in bits), as well as the origin of the class. */
-    private DataFlow::TypeTrackingNode curveClassInstanceWithKeySize(
-      DataFlow::TypeTracker t, int keySize, DataFlow::Node origin
-    ) {
-      t.start() and
-      result.(DataFlow::CallCfgNode).getFunction() = curveClassWithKeySize(keySize, origin)
-      or
-      exists(DataFlow::TypeTracker t2 |
-        result = curveClassInstanceWithKeySize(t2, keySize, origin).track(t2, t)
-      )
-    }
-
-    /** Gets a reference to a predefined curve class instance with a specific key size (in bits), as well as the origin of the class. */
-    DataFlow::Node curveClassInstanceWithKeySize(int keySize, DataFlow::Node origin) {
-      curveClassInstanceWithKeySize(DataFlow::TypeTracker::end(), keySize, origin).flowsTo(result)
-    }
  }

  // ---------------------------------------------------------------------------
@@ -179,9 +144,13 @@ private module CryptographyModel {
    DataFlow::Node getCurveArg() { result in [this.getArg(0), this.getArgByName("curve")] }

    override int getKeySizeWithOrigin(DataFlow::Node origin) {
-      this.getCurveArg() = Ecc::curveClassInstanceWithKeySize(result, origin)
-      or
-      this.getCurveArg() = Ecc::curveClassWithKeySize(result, origin)
+      exists(API::Node n |
+        n = Ecc::predefinedCurveClass(result) and origin = n.getAnImmediateUse()
+      |
+        this.getCurveArg() = n.getAUse()
+        or
+        this.getCurveArg() = n.getReturn().getAUse()
+      )
    }

    // Note: There is not really a key-size argument, since it's always specified by the curve.
@@ -202,9 +171,8 @@ private module CryptographyModel {
    }

    /** Gets a reference to a Cipher instance using algorithm with `algorithmName`. */
-    DataFlow::TypeTrackingNode cipherInstance(DataFlow::TypeTracker t, string algorithmName) {
-      t.start() and
-      exists(DataFlow::CallCfgNode call | result = call |
+    API::Node cipherInstance(string algorithmName) {
+      exists(API::CallNode call | result = call.getReturn() |
        call =
          API::moduleImport("cryptography")
              .getMember("hazmat")
@@ -216,47 +184,6 @@ private module CryptographyModel {
            call.getArg(0), call.getArgByName("algorithm")
          ]
      )
-      or
-      exists(DataFlow::TypeTracker t2 | result = cipherInstance(t2, algorithmName).track(t2, t))
-    }
-
-    /** Gets a reference to a Cipher instance using algorithm with `algorithmName`. */
-    DataFlow::Node cipherInstance(string algorithmName) {
-      cipherInstance(DataFlow::TypeTracker::end(), algorithmName).flowsTo(result)
-    }
-
-    /** Gets a reference to the encryptor of a Cipher instance using algorithm with `algorithmName`. */
-    DataFlow::TypeTrackingNode cipherEncryptor(DataFlow::TypeTracker t, string algorithmName) {
-      t.start() and
-      result.(DataFlow::MethodCallNode).calls(cipherInstance(algorithmName), "encryptor")
-      or
-      exists(DataFlow::TypeTracker t2 | result = cipherEncryptor(t2, algorithmName).track(t2, t))
-    }
-
-    /**
-     * Gets a reference to the encryptor of a Cipher instance using algorithm with `algorithmName`.
-     *
-     * You obtain an encryptor by using the `encryptor()` method on a Cipher instance.
-     */
-    DataFlow::Node cipherEncryptor(string algorithmName) {
-      cipherEncryptor(DataFlow::TypeTracker::end(), algorithmName).flowsTo(result)
-    }
-
-    /** Gets a reference to the dncryptor of a Cipher instance using algorithm with `algorithmName`. */
-    DataFlow::TypeTrackingNode cipherDecryptor(DataFlow::TypeTracker t, string algorithmName) {
-      t.start() and
-      result.(DataFlow::MethodCallNode).calls(cipherInstance(algorithmName), "decryptor")
-      or
-      exists(DataFlow::TypeTracker t2 | result = cipherDecryptor(t2, algorithmName).track(t2, t))
-    }
-
-    /**
-     * Gets a reference to the decryptor of a Cipher instance using algorithm with `algorithmName`.
-     *
-     * You obtain an decryptor by using the `decryptor()` method on a Cipher instance.
-     */
-    DataFlow::Node cipherDecryptor(string algorithmName) {
-      cipherDecryptor(DataFlow::TypeTracker::end(), algorithmName).flowsTo(result)
    }

    /**
@@ -267,11 +194,12 @@ private module CryptographyModel {
      string algorithmName;

      CryptographyGenericCipherOperation() {
-        exists(DataFlow::Node object, string method |
-          object in [cipherEncryptor(algorithmName), cipherDecryptor(algorithmName)] and
-          method in ["update", "update_into"] and
-          this.calls(object, method)
-        )
+        this =
+          cipherInstance(algorithmName)
+              .getMember(["decryptor", "encryptor"])
+              .getReturn()
+              .getMember(["update", "update_into"])
+              .getACall()
      }

      override Cryptography::CryptographicAlgorithm getAlgorithm() {
@@ -298,9 +226,8 @@ private module CryptographyModel {
    }

    /** Gets a reference to a Hash instance using algorithm with `algorithmName`. */
-    private DataFlow::TypeTrackingNode hashInstance(DataFlow::TypeTracker t, string algorithmName) {
-      t.start() and
-      exists(DataFlow::CallCfgNode call | result = call |
+    private API::Node hashInstance(string algorithmName) {
+      exists(API::CallNode call | result = call.getReturn() |
        call =
          API::moduleImport("cryptography")
              .getMember("hazmat")
@@ -312,13 +239,6 @@ private module CryptographyModel {
            call.getArg(0), call.getArgByName("algorithm")
          ]
      )
-      or
-      exists(DataFlow::TypeTracker t2 | result = hashInstance(t2, algorithmName).track(t2, t))
-    }
-
-    /** Gets a reference to a Hash instance using algorithm with `algorithmName`. */
-    DataFlow::Node hashInstance(string algorithmName) {
-      hashInstance(DataFlow::TypeTracker::end(), algorithmName).flowsTo(result)
    }

    /**
@@ -328,7 +248,9 @@ private module CryptographyModel {
      DataFlow::MethodCallNode {
      string algorithmName;

-      CryptographyGenericHashOperation() { this.calls(hashInstance(algorithmName), "update") }
+      CryptographyGenericHashOperation() {
+        this = hashInstance(algorithmName).getMember("update").getACall()
+      }

      override Cryptography::CryptographicAlgorithm getAlgorithm() {
        result.matchesName(algorithmName)
--- a/python/ql/lib/semmle/python/frameworks/Django.qll
+++ b/python/ql/lib/semmle/python/frameworks/Django.qll
@@ -554,7 +554,7 @@ module PrivateDjango {

      /** A `django.db.connection` is a PEP249 compliant DB connection. */
      class DjangoDbConnection extends PEP249::Connection::InstanceSource {
-        DjangoDbConnection() { this = connection().getAUse() }
+        DjangoDbConnection() { this = connection().getAnImmediateUse() }
      }

      // -------------------------------------------------------------------------
@@ -737,6 +737,38 @@ module PrivateDjango {
          }
        }

+        /**
+         * Provides models for the `django.db.models.FileField` class and `ImageField` subclasses.
+         *
+         * See
+         * - https://docs.djangoproject.com/en/3.1/ref/models/fields/#django.db.models.FileField
+         * - https://docs.djangoproject.com/en/3.1/ref/models/fields/#django.db.models.ImageField
+         */
+        module FileField {
+          /** Gets a reference to the `django.db.models.FileField` or  the `django.db.models.ImageField` class or any subclass. */
+          API::Node subclassRef() {
+            exists(string className | className in ["FileField", "ImageField"] |
+              // commonly used alias
+              result =
+                API::moduleImport("django")
+                    .getMember("db")
+                    .getMember("models")
+                    .getMember(className)
+                    .getASubclass*()
+              or
+              // actual class definition
+              result =
+                API::moduleImport("django")
+                    .getMember("db")
+                    .getMember("models")
+                    .getMember("fields")
+                    .getMember("files")
+                    .getMember(className)
+                    .getASubclass*()
+            )
+          }
+        }
+
        /**
         * Gets a reference to the Manager (django.db.models.Manager) for the django Model `modelClass`,
         * accessed by `<modelClass>.objects`.
@@ -2599,6 +2631,36 @@ module PrivateDjango {
    }
  }

+  /**
+   * A parameter that accepts the filename used to upload a file. This is the second
+   * parameter in functions used for the `upload_to` argument to a `FileField`.
+   *
+   * Note that the value this parameter accepts cannot contain a slash. Even when
+   * forcing the filename to contain a slash when sending the request, django does
+   * something like `input_filename.split("/")[-1]` (so other special characters still
+   * allowed). This also means that although the return value from `upload_to` is used
+   * to construct a path, path injection is not possible.
+   *
+   * See
+   *  - https://docs.djangoproject.com/en/3.1/ref/models/fields/#django.db.models.FileField.upload_to
+   *  - https://docs.djangoproject.com/en/3.1/topics/http/file-uploads/#handling-uploaded-files-with-a-model
+   */
+  private class DjangoFileFieldUploadToFunctionFilenameParam extends RemoteFlowSource::Range,
+    DataFlow::ParameterNode {
+    DjangoFileFieldUploadToFunctionFilenameParam() {
+      exists(DataFlow::CallCfgNode call, DataFlow::Node uploadToArg, Function func |
+        this.getParameter() = func.getArg(1) and
+        call = DjangoImpl::DB::Models::FileField::subclassRef().getACall() and
+        uploadToArg in [call.getArg(2), call.getArgByName("upload_to")] and
+        uploadToArg = poorMansFunctionTracker(func)
+      )
+    }
+
+    override string getSourceType() {
+      result = "django filename parameter to function used in FileField.upload_to"
+    }
+  }
+
  // ---------------------------------------------------------------------------
  // django.shortcuts.redirect
  // ---------------------------------------------------------------------------
@@ -2676,4 +2738,67 @@ module PrivateDjango {
            .getAnImmediateUse()
    }
  }
+
+  // ---------------------------------------------------------------------------
+  // Settings
+  // ---------------------------------------------------------------------------
+  /**
+   * A custom middleware stack
+   */
+  private class DjangoSettingsMiddlewareStack extends HTTP::Server::CsrfProtectionSetting::Range {
+    List list;
+
+    DjangoSettingsMiddlewareStack() {
+      this.asExpr() = list and
+      // we look for an assignment to the `MIDDLEWARE` setting
+      exists(DataFlow::Node mw |
+        mw.asVar().getName() = "MIDDLEWARE" and
+        DataFlow::localFlow(this, mw)
+      |
+        // To only include results where CSRF protection matters, we only care about CSRF
+        // protection when the django authentication middleware is enabled.
+        // Since an active session cookie is exactly what would allow an attacker to perform
+        // a CSRF attack.
+        // Notice that this does not ensure that this is not a FP, since the authentication
+        // middleware might be unused.
+        //
+        // This also strongly implies that `mw` is in fact a Django middleware setting and
+        // not just a variable named `MIDDLEWARE`.
+        list.getAnElt().(StrConst).getText() =
+          "django.contrib.auth.middleware.AuthenticationMiddleware"
+      )
+    }
+
+    override boolean getVerificationSetting() {
+      if
+        list.getAnElt().(StrConst).getText() in [
+            "django.middleware.csrf.CsrfViewMiddleware",
+            // see https://github.com/mozilla/django-session-csrf
+            "session_csrf.CsrfMiddleware"
+          ]
+      then result = true
+      else result = false
+    }
+  }
+
+  private class DjangoCsrfDecorator extends HTTP::Server::CsrfLocalProtectionSetting::Range {
+    string decoratorName;
+    Function function;
+
+    DjangoCsrfDecorator() {
+      decoratorName in ["csrf_protect", "csrf_exempt", "requires_csrf_token", "ensure_csrf_cookie"] and
+      this =
+        API::moduleImport("django")
+            .getMember("views")
+            .getMember("decorators")
+            .getMember("csrf")
+            .getMember(decoratorName)
+            .getAUse() and
+      this.asExpr() = function.getADecorator()
+    }
+
+    override Function getRequestHandler() { result = function }
+
+    override predicate csrfEnabled() { decoratorName in ["csrf_protect", "requires_csrf_token"] }
+  }
 }
--- a/python/ql/lib/semmle/python/frameworks/Flask.qll
+++ b/python/ql/lib/semmle/python/frameworks/Flask.qll
@@ -403,47 +403,37 @@ module Flask {
  }

  private class RequestAttrMultiDict extends Werkzeug::MultiDict::InstanceSource {
-    string attr_name;
-
    RequestAttrMultiDict() {
-      attr_name in ["args", "values", "form", "files"] and
-      this.(DataFlow::AttrRead).accesses(request().getAUse(), attr_name)
+      this = request().getMember(["args", "values", "form", "files"]).getAnImmediateUse()
    }
  }

  /** An `FileStorage` instance that originates from a flask request. */
  private class FlaskRequestFileStorageInstances extends Werkzeug::FileStorage::InstanceSource {
    FlaskRequestFileStorageInstances() {
-      // TODO: this currently only works in local-scope, since writing type-trackers for
-      // this is a little too much effort. Once API-graphs are available for more
-      // things, we can rewrite this.
-      //
      // TODO: This approach for identifying member-access is very adhoc, and we should
      // be able to do something more structured for providing modeling of the members
      // of a container-object.
-      exists(DataFlow::AttrRead files | files.accesses(request().getAUse(), "files") |
-        this.asCfgNode().(SubscriptNode).getObject() = files.asCfgNode()
+      exists(API::Node files | files = request().getMember("files") |
+        this.asCfgNode().(SubscriptNode).getObject() = files.getAUse().asCfgNode()
        or
-        this.(DataFlow::MethodCallNode).calls(files, "get")
+        this = files.getMember("get").getACall()
        or
-        exists(DataFlow::MethodCallNode getlistCall | getlistCall.calls(files, "getlist") |
-          this.asCfgNode().(SubscriptNode).getObject() = getlistCall.asCfgNode()
-        )
+        this.asCfgNode().(SubscriptNode).getObject() =
+          files.getMember("getlist").getReturn().getAUse().asCfgNode()
      )
    }
  }

  /** An `Headers` instance that originates from a flask request. */
  private class FlaskRequestHeadersInstances extends Werkzeug::Headers::InstanceSource {
-    FlaskRequestHeadersInstances() {
-      this.(DataFlow::AttrRead).accesses(request().getAUse(), "headers")
-    }
+    FlaskRequestHeadersInstances() { this = request().getMember("headers").getAnImmediateUse() }
  }

  /** An `Authorization` instance that originates from a flask request. */
  private class FlaskRequestAuthorizationInstances extends Werkzeug::Authorization::InstanceSource {
    FlaskRequestAuthorizationInstances() {
-      this.(DataFlow::AttrRead).accesses(request().getAUse(), "authorization")
+      this = request().getMember("authorization").getAnImmediateUse()
    }
  }

--- a/python/ql/lib/semmle/python/frameworks/FlaskSqlAlchemy.qll
+++ b/python/ql/lib/semmle/python/frameworks/FlaskSqlAlchemy.qll
@@ -35,7 +35,7 @@ private module FlaskSqlAlchemy {
  /** Access on a DB resulting in an Engine */
  private class DbEngine extends SqlAlchemy::Engine::InstanceSource {
    DbEngine() {
-      this = dbInstance().getMember("engine").getAUse()
+      this = dbInstance().getMember("engine").getAnImmediateUse()
      or
      this = dbInstance().getMember("get_engine").getACall()
    }
@@ -44,7 +44,7 @@ private module FlaskSqlAlchemy {
  /** Access on a DB resulting in a Session */
  private class DbSession extends SqlAlchemy::Session::InstanceSource {
    DbSession() {
-      this = dbInstance().getMember("session").getAUse()
+      this = dbInstance().getMember("session").getAnImmediateUse()
      or
      this = dbInstance().getMember("create_session").getReturn().getACall()
      or
--- a/python/ql/lib/semmle/python/frameworks/Lxml.qll
+++ b/python/ql/lib/semmle/python/frameworks/Lxml.qll
@@ -19,6 +19,9 @@ private import semmle.python.ApiGraphs
 * - https://lxml.de/tutorial.html
 */
 private module Lxml {
+  // ---------------------------------------------------------------------------
+  // XPath
+  // ---------------------------------------------------------------------------
  /**
   * A class constructor compiling an XPath expression.
   *
@@ -57,13 +60,25 @@ private module Lxml {
   */
  class XPathCall extends XML::XPathExecution::Range, DataFlow::CallCfgNode {
    XPathCall() {
-      this =
-        API::moduleImport("lxml")
-            .getMember("etree")
-            .getMember(["parse", "fromstring", "fromstringlist", "HTML", "XML"])
-            .getReturn()
-            .getMember("xpath")
-            .getACall()
+      exists(API::Node parseResult |
+        parseResult =
+          API::moduleImport("lxml")
+              .getMember("etree")
+              .getMember(["parse", "fromstring", "fromstringlist", "HTML", "XML"])
+              .getReturn()
+        or
+        // TODO: lxml.etree.parseid(<text>)[0] will contain the root element from parsing <text>
+        // but we don't really have a way to model that nicely.
+        parseResult =
+          API::moduleImport("lxml")
+              .getMember("etree")
+              .getMember("XMLParser")
+              .getReturn()
+              .getMember("close")
+              .getReturn()
+      |
+        this = parseResult.getMember("xpath").getACall()
+      )
    }

    override DataFlow::Node getXPath() { result in [this.getArg(0), this.getArgByName("_path")] }
@@ -85,4 +100,235 @@ private module Lxml {

    override string getName() { result = "lxml.etree" }
  }
+
+  // ---------------------------------------------------------------------------
+  // Parsing
+  // ---------------------------------------------------------------------------
+  /**
+   * Provides models for `lxml.etree` parsers.
+   *
+   * See https://lxml.de/apidoc/lxml.etree.html?highlight=xmlparser#lxml.etree.XMLParser
+   */
+  module XmlParser {
+    /**
+     * A source of instances of `lxml.etree` parsers, extend this class to model new instances.
+     *
+     * This can include instantiations of the class, return values from function
+     * calls, or a special parameter that will be set when functions are called by an external
+     * library.
+     *
+     * Use the predicate `XmlParser::instance()` to get references to instances of `lxml.etree` parsers.
+     */
+    abstract class InstanceSource extends DataFlow::LocalSourceNode {
+      /** Holds if this instance is vulnerable to `kind`. */
+      abstract predicate vulnerableTo(XML::XmlParsingVulnerabilityKind kind);
+    }
+
+    /**
+     * A call to `lxml.etree.XMLParser`.
+     *
+     * See https://lxml.de/apidoc/lxml.etree.html?highlight=xmlparser#lxml.etree.XMLParser
+     */
+    private class LxmlParser extends InstanceSource, API::CallNode {
+      LxmlParser() {
+        this = API::moduleImport("lxml").getMember("etree").getMember("XMLParser").getACall()
+      }
+
+      // NOTE: it's not possible to change settings of a parser after constructing it
+      override predicate vulnerableTo(XML::XmlParsingVulnerabilityKind kind) {
+        kind.isXxe() and
+        (
+          // resolve_entities has default True
+          not exists(this.getArgByName("resolve_entities"))
+          or
+          this.getKeywordParameter("resolve_entities").getAValueReachingRhs().asExpr() = any(True t)
+        )
+        or
+        kind.isXmlBomb() and
+        this.getKeywordParameter("huge_tree").getAValueReachingRhs().asExpr() = any(True t) and
+        not this.getKeywordParameter("resolve_entities").getAValueReachingRhs().asExpr() =
+          any(False t)
+        or
+        kind.isDtdRetrieval() and
+        this.getKeywordParameter("load_dtd").getAValueReachingRhs().asExpr() = any(True t) and
+        this.getKeywordParameter("no_network").getAValueReachingRhs().asExpr() = any(False t)
+      }
+    }
+
+    /**
+     * A call to `lxml.etree.get_default_parser`.
+     *
+     * See https://lxml.de/apidoc/lxml.etree.html?highlight=xmlparser#lxml.etree.get_default_parser
+     */
+    private class LxmlDefaultParser extends InstanceSource, DataFlow::CallCfgNode {
+      LxmlDefaultParser() {
+        this =
+          API::moduleImport("lxml").getMember("etree").getMember("get_default_parser").getACall()
+      }
+
+      override predicate vulnerableTo(XML::XmlParsingVulnerabilityKind kind) {
+        // as highlighted by
+        // https://lxml.de/apidoc/lxml.etree.html?highlight=xmlparser#lxml.etree.XMLParser
+        // by default XXE is allow. so as long as the default parser has not been
+        // overridden, the result is also vuln to XXE.
+        kind.isXxe()
+        // TODO: take into account that you can override the default parser with `lxml.etree.set_default_parser`.
+      }
+    }
+
+    /** Gets a reference to an `lxml.etree` parsers instance, with origin in `origin` */
+    private DataFlow::TypeTrackingNode instance(DataFlow::TypeTracker t, InstanceSource origin) {
+      t.start() and
+      result = origin
+      or
+      exists(DataFlow::TypeTracker t2 | result = instance(t2, origin).track(t2, t))
+    }
+
+    /** Gets a reference to an `lxml.etree` parsers instance, with origin in `origin` */
+    DataFlow::Node instance(InstanceSource origin) {
+      instance(DataFlow::TypeTracker::end(), origin).flowsTo(result)
+    }
+
+    /** Gets a reference to an `lxml.etree` parser instance, that is vulnerable to `kind`. */
+    DataFlow::Node instanceVulnerableTo(XML::XmlParsingVulnerabilityKind kind) {
+      exists(InstanceSource origin | result = instance(origin) and origin.vulnerableTo(kind))
+    }
+
+    /**
+     * A call to the `feed` method of an `lxml` parser.
+     */
+    private class LxmlParserFeedCall extends DataFlow::MethodCallNode, XML::XmlParsing::Range {
+      LxmlParserFeedCall() { this.calls(instance(_), "feed") }
+
+      override DataFlow::Node getAnInput() { result in [this.getArg(0), this.getArgByName("data")] }
+
+      override predicate vulnerableTo(XML::XmlParsingVulnerabilityKind kind) {
+        this.calls(instanceVulnerableTo(kind), "feed")
+      }
+
+      override predicate mayExecuteInput() { none() }
+
+      override DataFlow::Node getOutput() {
+        exists(DataFlow::Node objRef |
+          DataFlow::localFlow(this.getObject(), objRef) and
+          result.(DataFlow::MethodCallNode).calls(objRef, "close")
+        )
+      }
+    }
+  }
+
+  /**
+   * A call to either of:
+   * - `lxml.etree.fromstring`
+   * - `lxml.etree.fromstringlist`
+   * - `lxml.etree.XML`
+   * - `lxml.etree.XMLID`
+   * - `lxml.etree.parse`
+   * - `lxml.etree.parseid`
+   *
+   * See
+   * - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.fromstring
+   * - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.fromstringlist
+   * - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.XML
+   * - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.XMLID
+   * - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.parse
+   * - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.parseid
+   */
+  private class LxmlParsing extends DataFlow::CallCfgNode, XML::XmlParsing::Range {
+    string functionName;
+
+    LxmlParsing() {
+      functionName in ["fromstring", "fromstringlist", "XML", "XMLID", "parse", "parseid"] and
+      this = API::moduleImport("lxml").getMember("etree").getMember(functionName).getACall()
+    }
+
+    override DataFlow::Node getAnInput() {
+      result in [
+          this.getArg(0),
+          // fromstring / XML / XMLID
+          this.getArgByName("text"),
+          // fromstringlist
+          this.getArgByName("strings"),
+          // parse / parseid
+          this.getArgByName("source"),
+        ]
+    }
+
+    DataFlow::Node getParserArg() { result in [this.getArg(1), this.getArgByName("parser")] }
+
+    override predicate vulnerableTo(XML::XmlParsingVulnerabilityKind kind) {
+      this.getParserArg() = XmlParser::instanceVulnerableTo(kind)
+      or
+      kind.isXxe() and
+      not exists(this.getParserArg())
+    }
+
+    override predicate mayExecuteInput() { none() }
+
+    override DataFlow::Node getOutput() {
+      // Note: for `parseid`/XMLID the result of the call is a tuple with `(root, dict)`, so
+      // maybe we should not just say that the entire tuple is the decoding output... my
+      // gut feeling is that THIS instance doesn't matter too much, but that it would be
+      // nice to be able to do this in general. (this is a problem for both `lxml.etree`
+      // and `xml.etree`)
+      result = this
+    }
+  }
+
+  /**
+   * A call to `lxml.etree.ElementTree.parse` or `lxml.etree.ElementTree.parseid`, which
+   * takes either a filename or a file-like object as argument. To capture the filename
+   * for path-injection, we have this subclass.
+   *
+   * See
+   * - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.parse
+   * - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.parseid
+   */
+  private class FileAccessFromLxmlParsing extends LxmlParsing, FileSystemAccess::Range {
+    FileAccessFromLxmlParsing() {
+      functionName in ["parse", "parseid"]
+      // I considered whether we should try to reduce FPs from people passing file-like
+      // objects, which will not be a file system access (and couldn't cause a
+      // path-injection).
+      //
+      // I suppose that once we have proper flow-summary support for file-like objects,
+      // we can make the XXE/XML-bomb sinks allow an access-path, while the
+      // path-injection sink wouldn't, and then we will not end up with such FPs.
+    }
+
+    override DataFlow::Node getAPathArgument() { result = this.getAnInput() }
+  }
+
+  /**
+   * A call to `lxml.etree.iterparse`
+   *
+   * See
+   * - https://lxml.de/apidoc/lxml.etree.html?highlight=parseids#lxml.etree.iterparse
+   */
+  private class LxmlIterparseCall extends API::CallNode, XML::XmlParsing::Range,
+    FileSystemAccess::Range {
+    LxmlIterparseCall() {
+      this = API::moduleImport("lxml").getMember("etree").getMember("iterparse").getACall()
+    }
+
+    override DataFlow::Node getAnInput() { result in [this.getArg(0), this.getArgByName("source")] }
+
+    override predicate vulnerableTo(XML::XmlParsingVulnerabilityKind kind) {
+      // note that there is no `resolve_entities` argument, so it's not possible to turn off XXE :O
+      kind.isXxe()
+      or
+      kind.isXmlBomb() and
+      this.getKeywordParameter("huge_tree").getAValueReachingRhs().asExpr() = any(True t)
+      or
+      kind.isDtdRetrieval() and
+      this.getKeywordParameter("load_dtd").getAValueReachingRhs().asExpr() = any(True t) and
+      this.getKeywordParameter("no_network").getAValueReachingRhs().asExpr() = any(False t)
+    }
+
+    override predicate mayExecuteInput() { none() }
+
+    override DataFlow::Node getOutput() { result = this }
+
+    override DataFlow::Node getAPathArgument() { result = this.getAnInput() }
+  }
 }
--- a/python/ql/lib/semmle/python/frameworks/Requests.qll
+++ b/python/ql/lib/semmle/python/frameworks/Requests.qll
@@ -9,7 +9,6 @@
 private import python
 private import semmle.python.Concepts
 private import semmle.python.ApiGraphs
-private import semmle.python.dataflow.new.DataFlow
 private import semmle.python.dataflow.new.TaintTracking
 private import semmle.python.frameworks.internal.InstanceTaintStepsHelper
 private import semmle.python.frameworks.Stdlib
--- a/python/ql/lib/semmle/python/frameworks/Stdlib.qll
+++ b/python/ql/lib/semmle/python/frameworks/Stdlib.qll
@@ -959,13 +959,18 @@ private module StdlibPrivate {
    }
  }

-  /** A call to `os.path.samefile` will raise an exception if an `os.stat()` call on either pathname fails. */
+  /**
+   * A call to `os.path.samefile` will raise an exception if an `os.stat()` call on either pathname fails.
+   *
+   * See https://docs.python.org/3.10/library/os.path.html#os.path.samefile
+   */
  private class OsPathSamefileCall extends FileSystemAccess::Range, DataFlow::CallCfgNode {
    OsPathSamefileCall() { this = OS::path().getMember("samefile").getACall() }

    override DataFlow::Node getAPathArgument() {
      result in [
-          this.getArg(0), this.getArgByName("path1"), this.getArg(1), this.getArgByName("path2")
+          // note that the f1/f2 names doesn't match the documentation, but is what actually works (tested on 3.8.10)
+          this.getArg(0), this.getArgByName("f1"), this.getArg(1), this.getArgByName("f2")
        ]
    }
  }
@@ -2534,6 +2539,56 @@ private module StdlibPrivate {
    PathLibOpenCall() { attrbuteName = "open" }
  }

+  /**
+   * A call to the `link_to`, `hardlink_to`, or `symlink_to` method on a `pathlib.Path` instance.
+   *
+   * See
+   * - https://docs.python.org/3/library/pathlib.html#pathlib.Path.link_to
+   * - https://docs.python.org/3/library/pathlib.html#pathlib.Path.hardlink_to
+   * - https://docs.python.org/3/library/pathlib.html#pathlib.Path.symlink_to
+   */
+  private class PathLibLinkToCall extends PathlibFileAccess, API::CallNode {
+    PathLibLinkToCall() { attrbuteName in ["link_to", "hardlink_to", "symlink_to"] }
+
+    override DataFlow::Node getAPathArgument() {
+      result = super.getAPathArgument()
+      or
+      result = this.getParameter(0, "target").getARhs()
+    }
+  }
+
+  /**
+   * A call to the `replace` or `rename` method on a `pathlib.Path` instance.
+   *
+   * See
+   * - https://docs.python.org/3/library/pathlib.html#pathlib.Path.replace
+   * - https://docs.python.org/3/library/pathlib.html#pathlib.Path.rename
+   */
+  private class PathLibReplaceCall extends PathlibFileAccess, API::CallNode {
+    PathLibReplaceCall() { attrbuteName in ["replace", "rename"] }
+
+    override DataFlow::Node getAPathArgument() {
+      result = super.getAPathArgument()
+      or
+      result = this.getParameter(0, "target").getARhs()
+    }
+  }
+
+  /**
+   * A call to the `samefile` method on a `pathlib.Path` instance.
+   *
+   * See https://docs.python.org/3/library/pathlib.html#pathlib.Path.samefile
+   */
+  private class PathLibSameFileCall extends PathlibFileAccess, API::CallNode {
+    PathLibSameFileCall() { attrbuteName = "samefile" }
+
+    override DataFlow::Node getAPathArgument() {
+      result = super.getAPathArgument()
+      or
+      result = this.getParameter(0, "other_path").getARhs()
+    }
+  }
+
  /** An additional taint steps for objects of type `pathlib.Path` */
  private class PathlibPathTaintStep extends TaintTracking::AdditionalTaintStep {
    override predicate step(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
@@ -2835,70 +2890,6 @@ private module StdlibPrivate {
    override string getKind() { result = Escaping::getRegexKind() }
  }

-  // ---------------------------------------------------------------------------
-  // xml.etree.ElementTree
-  // ---------------------------------------------------------------------------
-  /**
-   * An instance of `xml.etree.ElementTree.ElementTree`.
-   *
-   * See https://docs.python.org/3.10/library/xml.etree.elementtree.html#xml.etree.ElementTree.ElementTree
-   */
-  private API::Node elementTreeInstance() {
-    //parse to a tree
-    result =
-      API::moduleImport("xml")
-          .getMember("etree")
-          .getMember("ElementTree")
-          .getMember("parse")
-          .getReturn()
-    or
-    // construct a tree without parsing
-    result =
-      API::moduleImport("xml")
-          .getMember("etree")
-          .getMember("ElementTree")
-          .getMember("ElementTree")
-          .getReturn()
-  }
-
-  /**
-   * An instance of `xml.etree.ElementTree.Element`.
-   *
-   * See https://docs.python.org/3.10/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element
-   */
-  private API::Node elementInstance() {
-    // parse or go to the root of a tree
-    result = elementTreeInstance().getMember(["parse", "getroot"]).getReturn()
-    or
-    // parse directly to an element
-    result =
-      API::moduleImport("xml")
-          .getMember("etree")
-          .getMember("ElementTree")
-          .getMember(["fromstring", "fromstringlist", "XML"])
-          .getReturn()
-  }
-
-  /**
-   * A call to a find method on a tree or an element will execute an XPath expression.
-   */
-  private class ElementTreeFindCall extends XML::XPathExecution::Range, DataFlow::CallCfgNode {
-    string methodName;
-
-    ElementTreeFindCall() {
-      methodName in ["find", "findall", "findtext"] and
-      (
-        this = elementTreeInstance().getMember(methodName).getACall()
-        or
-        this = elementInstance().getMember(methodName).getACall()
-      )
-    }
-
-    override DataFlow::Node getXPath() { result in [this.getArg(0), this.getArgByName("match")] }
-
-    override string getName() { result = "xml.etree" }
-  }
-
  // ---------------------------------------------------------------------------
  // urllib
  // ---------------------------------------------------------------------------
@@ -3116,6 +3107,547 @@ private module StdlibPrivate {
      result in [this.getArg(0), this.getArgByName("path")]
    }
  }
+
+  // ---------------------------------------------------------------------------
+  // io
+  // ---------------------------------------------------------------------------
+  /**
+   * Provides models for the `io.StringIO`/`io.BytesIO` classes
+   *
+   * See https://docs.python.org/3.10/library/io.html#io.StringIO.
+   */
+  module StringIO {
+    /** Gets a reference to the `io.StringIO` class. */
+    private API::Node classRef() {
+      result = API::moduleImport("io").getMember(["StringIO", "BytesIO"])
+    }
+
+    /**
+     * A source of instances of `io.StringIO`/`io.BytesIO`, extend this class to model new instances.
+     *
+     * This can include instantiations of the class, return values from function
+     * calls, or a special parameter that will be set when functions are called by an external
+     * library.
+     *
+     * Use the predicate `StringIO::instance()` to get references to instances of `io.StringIO`.
+     */
+    abstract class InstanceSource extends Stdlib::FileLikeObject::InstanceSource { }
+
+    /** A direct instantiation of `io.StringIO`/`io.BytesIO`. */
+    private class ClassInstantiation extends InstanceSource, DataFlow::CallCfgNode {
+      ClassInstantiation() { this = classRef().getACall() }
+
+      DataFlow::Node getInitialValue() {
+        result = this.getArg(0)
+        or
+        // `initial_value` for StringIO, `initial_bytes` for BytesIO
+        result = this.getArgByName(["initial_value", "initial_bytes"])
+      }
+    }
+
+    /** Gets a reference to an instance of `io.StringIO`/`io.BytesIO`. */
+    private DataFlow::TypeTrackingNode instance(DataFlow::TypeTracker t) {
+      t.start() and
+      result instanceof InstanceSource
+      or
+      exists(DataFlow::TypeTracker t2 | result = instance(t2).track(t2, t))
+    }
+
+    /** Gets a reference to an instance of `io.StringIO`/`io.BytesIO`. */
+    DataFlow::Node instance() { instance(DataFlow::TypeTracker::end()).flowsTo(result) }
+
+    /**
+     * Extra taint propagation for `io.StringIO`/`io.BytesIO`.
+     */
+    private class AdditionalTaintStep extends TaintTracking::AdditionalTaintStep {
+      override predicate step(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
+        nodeTo.(ClassInstantiation).getInitialValue() = nodeFrom
+      }
+    }
+  }
+
+  // ---------------------------------------------------------------------------
+  // xml.etree.ElementTree
+  // ---------------------------------------------------------------------------
+  /**
+   * An instance of `xml.etree.ElementTree.ElementTree`.
+   *
+   * See https://docs.python.org/3.10/library/xml.etree.elementtree.html#xml.etree.ElementTree.ElementTree
+   */
+  private API::Node elementTreeInstance() {
+    //parse to a tree
+    result =
+      API::moduleImport("xml")
+          .getMember("etree")
+          .getMember("ElementTree")
+          .getMember("parse")
+          .getReturn()
+    or
+    // construct a tree without parsing
+    result =
+      API::moduleImport("xml")
+          .getMember("etree")
+          .getMember("ElementTree")
+          .getMember("ElementTree")
+          .getReturn()
+  }
+
+  /**
+   * An instance of `xml.etree.ElementTree.Element`.
+   *
+   * See https://docs.python.org/3.10/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element
+   */
+  private API::Node elementInstance() {
+    // parse or go to the root of a tree
+    result = elementTreeInstance().getMember(["parse", "getroot"]).getReturn()
+    or
+    // parse directly to an element
+    result =
+      API::moduleImport("xml")
+          .getMember("etree")
+          .getMember("ElementTree")
+          .getMember(["fromstring", "fromstringlist", "XML"])
+          .getReturn()
+    or
+    result =
+      API::moduleImport("xml")
+          .getMember("etree")
+          .getMember("ElementTree")
+          .getMember("XMLParser")
+          .getReturn()
+          .getMember("close")
+          .getReturn()
+  }
+
+  /**
+   * A call to a find method on a tree or an element will execute an XPath expression.
+   */
+  private class ElementTreeFindCall extends XML::XPathExecution::Range, DataFlow::CallCfgNode {
+    string methodName;
+
+    ElementTreeFindCall() {
+      methodName in ["find", "findall", "findtext"] and
+      (
+        this = elementTreeInstance().getMember(methodName).getACall()
+        or
+        this = elementInstance().getMember(methodName).getACall()
+      )
+    }
+
+    override DataFlow::Node getXPath() { result in [this.getArg(0), this.getArgByName("match")] }
+
+    override string getName() { result = "xml.etree" }
+  }
+
+  /**
+   * Provides models for `xml.etree` parsers
+   *
+   * See
+   * - https://docs.python.org/3.10/library/xml.etree.elementtree.html#xml.etree.ElementTree.XMLParser
+   * - https://docs.python.org/3.10/library/xml.etree.elementtree.html#xml.etree.ElementTree.XMLPullParser
+   */
+  module XmlParser {
+    /**
+     * A source of instances of `xml.etree` parsers, extend this class to model new instances.
+     *
+     * This can include instantiations of the class, return values from function
+     * calls, or a special parameter that will be set when functions are called by an external
+     * library.
+     *
+     * Use the predicate `XmlParser::instance()` to get references to instances of `xml.etree` parsers.
+     */
+    abstract class InstanceSource extends DataFlow::LocalSourceNode { }
+
+    /** A direct instantiation of `xml.etree` parsers. */
+    private class ClassInstantiation extends InstanceSource, DataFlow::CallCfgNode {
+      ClassInstantiation() {
+        this =
+          API::moduleImport("xml")
+              .getMember("etree")
+              .getMember("ElementTree")
+              .getMember(["XMLParser", "XMLPullParser"])
+              .getACall()
+      }
+    }
+
+    /** Gets a reference to an `xml.etree` parser instance. */
+    private DataFlow::TypeTrackingNode instance(DataFlow::TypeTracker t) {
+      t.start() and
+      result instanceof InstanceSource
+      or
+      exists(DataFlow::TypeTracker t2 | result = instance(t2).track(t2, t))
+    }
+
+    /** Gets a reference to an `xml.etree` parser instance. */
+    DataFlow::Node instance() { instance(DataFlow::TypeTracker::end()).flowsTo(result) }
+
+    /**
+     * A call to the `feed` method of an `xml.etree` parser.
+     */
+    private class XmlEtreeParserFeedCall extends DataFlow::MethodCallNode, XML::XmlParsing::Range {
+      XmlEtreeParserFeedCall() { this.calls(instance(), "feed") }
+
+      override DataFlow::Node getAnInput() { result in [this.getArg(0), this.getArgByName("data")] }
+
+      override predicate vulnerableTo(XML::XmlParsingVulnerabilityKind kind) { kind.isXmlBomb() }
+
+      override predicate mayExecuteInput() { none() }
+
+      override DataFlow::Node getOutput() {
+        exists(DataFlow::Node objRef |
+          DataFlow::localFlow(this.getObject(), objRef) and
+          result.(DataFlow::MethodCallNode).calls(objRef, "close")
+        )
+      }
+    }
+  }
+
+  /**
+   * A call to either of:
+   * - `xml.etree.ElementTree.fromstring`
+   * - `xml.etree.ElementTree.fromstringlist`
+   * - `xml.etree.ElementTree.XML`
+   * - `xml.etree.ElementTree.XMLID`
+   * - `xml.etree.ElementTree.parse`
+   * - `xml.etree.ElementTree.iterparse`
+   * - `parse` method on an `xml.etree.ElementTree.ElementTree` instance
+   *
+   * See
+   * - https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.fromstring
+   * - https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.fromstringlist
+   * - https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.XML
+   * - https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.XMLID
+   * - https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.parse
+   * - https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse
+   */
+  private class XmlEtreeParsing extends DataFlow::CallCfgNode, XML::XmlParsing::Range {
+    XmlEtreeParsing() {
+      this =
+        API::moduleImport("xml")
+            .getMember("etree")
+            .getMember("ElementTree")
+            .getMember(["fromstring", "fromstringlist", "XML", "XMLID", "parse", "iterparse"])
+            .getACall()
+      or
+      this = elementTreeInstance().getMember("parse").getACall()
+    }
+
+    override DataFlow::Node getAnInput() {
+      result in [
+          this.getArg(0),
+          // fromstring / XML / XMLID
+          this.getArgByName("text"),
+          // fromstringlist
+          this.getArgByName("sequence"),
+          // parse / iterparse
+          this.getArgByName("source"),
+        ]
+    }
+
+    override predicate vulnerableTo(XML::XmlParsingVulnerabilityKind kind) {
+      // note: it does not matter what `xml.etree` parser you are using, you cannot
+      // change the security features anyway :|
+      kind.isXmlBomb()
+    }
+
+    override predicate mayExecuteInput() { none() }
+
+    override DataFlow::Node getOutput() {
+      // Note: for `XMLID` the result of the call is a tuple with `(root, dict)`, so
+      // maybe we should not just say that the entire tuple is the decoding output... my
+      // gut feeling is that THIS instance doesn't matter too much, but that it would be
+      // nice to be able to do this in general. (this is a problem for both `lxml.etree`
+      // and `xml.etree`)
+      result = this
+    }
+  }
+
+  /**
+   * A call to `xml.etree.ElementTree.parse` or `xml.etree.ElementTree.iterparse`, which
+   * takes either a filename or a file-like object as argument. To capture the filename
+   * for path-injection, we have this subclass.
+   *
+   * See
+   * - https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.parse
+   * - https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse
+   */
+  private class FileAccessFromXmlEtreeParsing extends XmlEtreeParsing, FileSystemAccess::Range {
+    FileAccessFromXmlEtreeParsing() {
+      this =
+        API::moduleImport("xml")
+            .getMember("etree")
+            .getMember("ElementTree")
+            .getMember(["parse", "iterparse"])
+            .getACall()
+      or
+      this = elementTreeInstance().getMember("parse").getACall()
+      // I considered whether we should try to reduce FPs from people passing file-like
+      // objects, which will not be a file system access (and couldn't cause a
+      // path-injection).
+      //
+      // I suppose that once we have proper flow-summary support for file-like objects,
+      // we can make the XXE/XML-bomb sinks allow an access-path, while the
+      // path-injection sink wouldn't, and then we will not end up with such FPs.
+    }
+
+    override DataFlow::Node getAPathArgument() { result = this.getAnInput() }
+  }
+
+  // ---------------------------------------------------------------------------
+  // xml.sax
+  // ---------------------------------------------------------------------------
+  /**
+   * A call to the `setFeature` method on a XML sax parser.
+   *
+   * See https://docs.python.org/3.10/library/xml.sax.reader.html#xml.sax.xmlreader.XMLReader.setFeature
+   */
+  private class SaxParserSetFeatureCall extends API::CallNode, DataFlow::MethodCallNode {
+    SaxParserSetFeatureCall() {
+      this =
+        API::moduleImport("xml")
+            .getMember("sax")
+            .getMember("make_parser")
+            .getReturn()
+            .getMember("setFeature")
+            .getACall()
+    }
+
+    // The keyword argument names does not match documentation. I checked (with Python
+    // 3.9.5) that the names used here actually works.
+    API::Node getFeatureArg() { result = this.getParameter(0, "name") }
+
+    API::Node getStateArg() { result = this.getParameter(1, "state") }
+  }
+
+  /**
+   * Gets a reference to a XML sax parser that has `feature_external_ges` turned on.
+   *
+   * See https://docs.python.org/3/library/xml.sax.handler.html#xml.sax.handler.feature_external_ges
+   */
+  private DataFlow::Node saxParserWithFeatureExternalGesTurnedOn(DataFlow::TypeTracker t) {
+    t.start() and
+    exists(SaxParserSetFeatureCall call |
+      call.getFeatureArg().getARhs() =
+        API::moduleImport("xml")
+            .getMember("sax")
+            .getMember("handler")
+            .getMember("feature_external_ges")
+            .getAUse() and
+      call.getStateArg().getAValueReachingRhs().asExpr().(BooleanLiteral).booleanValue() = true and
+      result = call.getObject()
+    )
+    or
+    exists(DataFlow::TypeTracker t2 |
+      t = t2.smallstep(saxParserWithFeatureExternalGesTurnedOn(t2), result)
+    ) and
+    // take account of that we can set the feature to False, which makes the parser safe again
+    not exists(SaxParserSetFeatureCall call |
+      call.getObject() = result and
+      call.getFeatureArg().getARhs() =
+        API::moduleImport("xml")
+            .getMember("sax")
+            .getMember("handler")
+            .getMember("feature_external_ges")
+            .getAUse() and
+      call.getStateArg().getAValueReachingRhs().asExpr().(BooleanLiteral).booleanValue() = false
+    )
+  }
+
+  /**
+   * Gets a reference to a XML sax parser that has `feature_external_ges` turned on.
+   *
+   * See https://docs.python.org/3/library/xml.sax.handler.html#xml.sax.handler.feature_external_ges
+   */
+  DataFlow::Node saxParserWithFeatureExternalGesTurnedOn() {
+    result = saxParserWithFeatureExternalGesTurnedOn(DataFlow::TypeTracker::end())
+  }
+
+  /**
+   * A call to the `parse` method on a SAX XML parser.
+   *
+   * See https://docs.python.org/3/library/xml.sax.reader.html#xml.sax.xmlreader.XMLReader.parse
+   */
+  private class XmlSaxInstanceParsing extends DataFlow::MethodCallNode, XML::XmlParsing::Range,
+    FileSystemAccess::Range {
+    XmlSaxInstanceParsing() {
+      this =
+        API::moduleImport("xml")
+            .getMember("sax")
+            .getMember("make_parser")
+            .getReturn()
+            .getMember("parse")
+            .getACall()
+    }
+
+    override DataFlow::Node getAnInput() { result in [this.getArg(0), this.getArgByName("source")] }
+
+    override predicate vulnerableTo(XML::XmlParsingVulnerabilityKind kind) {
+      // always vuln to these
+      kind.isXmlBomb()
+      or
+      // can be vuln to other things if features has been turned on
+      this.getObject() = saxParserWithFeatureExternalGesTurnedOn() and
+      (kind.isXxe() or kind.isDtdRetrieval())
+    }
+
+    override predicate mayExecuteInput() { none() }
+
+    override DataFlow::Node getOutput() {
+      // note: the output of parsing with SAX is that the content handler gets the
+      // data... but we don't currently model this (it's not trivial to do, and won't
+      // really give us any value, at least not as of right now).
+      none()
+    }
+
+    override DataFlow::Node getAPathArgument() {
+      // I considered whether we should try to reduce FPs from people passing file-like
+      // objects, which will not be a file system access (and couldn't cause a
+      // path-injection).
+      //
+      // I suppose that once we have proper flow-summary support for file-like objects,
+      // we can make the XXE/XML-bomb sinks allow an access-path, while the
+      // path-injection sink wouldn't, and then we will not end up with such FPs.
+      result = this.getAnInput()
+    }
+  }
+
+  /**
+   * A call to either `parse` or `parseString` from `xml.sax` module.
+   *
+   * See:
+   * - https://docs.python.org/3.10/library/xml.sax.html#xml.sax.parse
+   * - https://docs.python.org/3.10/library/xml.sax.html#xml.sax.parseString
+   */
+  private class XmlSaxParsing extends DataFlow::CallCfgNode, XML::XmlParsing::Range {
+    XmlSaxParsing() {
+      this =
+        API::moduleImport("xml").getMember("sax").getMember(["parse", "parseString"]).getACall()
+    }
+
+    override DataFlow::Node getAnInput() {
+      result in [
+          this.getArg(0),
+          // parseString
+          this.getArgByName("string"),
+          // parse
+          this.getArgByName("source"),
+        ]
+    }
+
+    override predicate vulnerableTo(XML::XmlParsingVulnerabilityKind kind) {
+      // always vuln to these
+      kind.isXmlBomb()
+    }
+
+    override predicate mayExecuteInput() { none() }
+
+    override DataFlow::Node getOutput() {
+      // note: the output of parsing with SAX is that the content handler gets the
+      // data... but we don't currently model this (it's not trivial to do, and won't
+      // really give us any value, at least not as of right now).
+      none()
+    }
+  }
+
+  /**
+   * A call to `xml.sax.parse`, which takes either a filename or a file-like object as
+   * argument. To capture the filename for path-injection, we have this subclass.
+   *
+   * See
+   * - https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.parse
+   * - https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse
+   */
+  private class FileAccessFromXmlSaxParsing extends XmlSaxParsing, FileSystemAccess::Range {
+    FileAccessFromXmlSaxParsing() {
+      this = API::moduleImport("xml").getMember("sax").getMember("parse").getACall()
+      // I considered whether we should try to reduce FPs from people passing file-like
+      // objects, which will not be a file system access (and couldn't cause a
+      // path-injection).
+      //
+      // I suppose that once we have proper flow-summary support for file-like objects,
+      // we can make the XXE/XML-bomb sinks allow an access-path, while the
+      // path-injection sink wouldn't, and then we will not end up with such FPs.
+    }
+
+    override DataFlow::Node getAPathArgument() { result = this.getAnInput() }
+  }
+
+  // ---------------------------------------------------------------------------
+  // xml.dom.*
+  // ---------------------------------------------------------------------------
+  /**
+   * A call to the `parse` or `parseString` methods from `xml.dom.minidom` or `xml.dom.pulldom`.
+   *
+   * Both of these modules are based on SAX parsers.
+   *
+   * See
+   * - https://docs.python.org/3/library/xml.dom.minidom.html#xml.dom.minidom.parse
+   * - https://docs.python.org/3/library/xml.dom.pulldom.html#xml.dom.pulldom.parse
+   */
+  private class XmlDomParsing extends DataFlow::CallCfgNode, XML::XmlParsing::Range {
+    XmlDomParsing() {
+      this =
+        API::moduleImport("xml")
+            .getMember("dom")
+            .getMember(["minidom", "pulldom"])
+            .getMember(["parse", "parseString"])
+            .getACall()
+    }
+
+    override DataFlow::Node getAnInput() {
+      result in [
+          this.getArg(0),
+          // parseString
+          this.getArgByName("string"),
+          // minidom.parse
+          this.getArgByName("file"),
+          // pulldom.parse
+          this.getArgByName("stream_or_string"),
+        ]
+    }
+
+    DataFlow::Node getParserArg() { result in [this.getArg(1), this.getArgByName("parser")] }
+
+    override predicate vulnerableTo(XML::XmlParsingVulnerabilityKind kind) {
+      this.getParserArg() = saxParserWithFeatureExternalGesTurnedOn() and
+      (kind.isXxe() or kind.isDtdRetrieval())
+      or
+      kind.isXmlBomb()
+    }
+
+    override predicate mayExecuteInput() { none() }
+
+    override DataFlow::Node getOutput() { result = this }
+  }
+
+  /**
+   * A call to the `parse` or `parseString` methods from `xml.dom.minidom` or
+   * `xml.dom.pulldom`, which takes either a filename or a file-like object as argument.
+   * To capture the filename for path-injection, we have this subclass.
+   *
+   * See
+   * - https://docs.python.org/3/library/xml.dom.minidom.html#xml.dom.minidom.parse
+   * - https://docs.python.org/3/library/xml.dom.pulldom.html#xml.dom.pulldom.parse
+   */
+  private class FileAccessFromXmlDomParsing extends XmlDomParsing, FileSystemAccess::Range {
+    FileAccessFromXmlDomParsing() {
+      this =
+        API::moduleImport("xml")
+            .getMember("dom")
+            .getMember(["minidom", "pulldom"])
+            .getMember("parse")
+            .getACall()
+      // I considered whether we should try to reduce FPs from people passing file-like
+      // objects, which will not be a file system access (and couldn't cause a
+      // path-injection).
+      //
+      // I suppose that once we have proper flow-summary support for file-like objects,
+      // we can make the XXE/XML-bomb sinks allow an access-path, while the
+      // path-injection sink wouldn't, and then we will not end up with such FPs.
+    }
+
+    override DataFlow::Node getAPathArgument() { result = this.getAnInput() }
+  }
 }

 // ---------------------------------------------------------------------------
--- a/python/ql/lib/semmle/python/frameworks/Xmltodict.qll
+++ b/python/ql/lib/semmle/python/frameworks/Xmltodict.qll
@@ -0,0 +1,39 @@
+/**
+ * Provides classes modeling security-relevant aspects of the `xmltodict` PyPI package.
+ *
+ * See
+ * - https://pypi.org/project/xmltodict/
+ */
+
+private import python
+private import semmle.python.dataflow.new.DataFlow
+private import semmle.python.Concepts
+private import semmle.python.ApiGraphs
+
+/**
+ * Provides classes modeling security-relevant aspects of the `xmltodict` PyPI package
+ *
+ * See
+ * - https://pypi.org/project/xmltodict/
+ */
+private module Xmltodict {
+  /**
+   * A call to `xmltodict.parse`.
+   */
+  private class XMLtoDictParsing extends API::CallNode, XML::XmlParsing::Range {
+    XMLtoDictParsing() { this = API::moduleImport("xmltodict").getMember("parse").getACall() }
+
+    override DataFlow::Node getAnInput() {
+      result in [this.getArg(0), this.getArgByName("xml_input")]
+    }
+
+    override predicate vulnerableTo(XML::XmlParsingVulnerabilityKind kind) {
+      kind.isXmlBomb() and
+      this.getKeywordParameter("disable_entities").getAValueReachingRhs().asExpr() = any(False f)
+    }
+
+    override predicate mayExecuteInput() { none() }
+
+    override DataFlow::Node getOutput() { result = this }
+  }
+}
--- a/python/ql/lib/semmle/python/frameworks/internal/SubclassFinder.qll
+++ b/python/ql/lib/semmle/python/frameworks/internal/SubclassFinder.qll
@@ -204,7 +204,7 @@ private module NotExposed {
    FindSubclassesSpec spec, string newSubclassQualified, ClassExpr classExpr, Module mod,
    Location loc
  ) {
-    classExpr = newOrExistingModeling(spec).getASubclass*().getAUse().asExpr() and
+    classExpr = newOrExistingModeling(spec).getASubclass*().getAnImmediateUse().asExpr() and
    classExpr.getScope() = mod and
    newSubclassQualified = mod.getName() + "." + classExpr.getName() and
    loc = classExpr.getLocation() and
--- a/python/ql/lib/semmle/python/objects/Modules.qll
+++ b/python/ql/lib/semmle/python/objects/Modules.qll
@@ -136,6 +136,7 @@ class PackageObjectInternal extends ModuleObjectInternal, TPackageObject {
  /** Gets the init module of this package */
  PythonModuleObjectInternal getInitModule() { result = TPythonModule(this.getSourceModule()) }

+  /** Holds if the folder for this package has no init module. */
  predicate hasNoInitModule() {
    exists(Folder f |
      f = this.getFolder() and
--- a/python/ql/lib/semmle/python/objects/ObjectInternal.qll
+++ b/python/ql/lib/semmle/python/objects/ObjectInternal.qll
@@ -49,7 +49,7 @@ class ObjectInternal extends TObject {
  abstract ObjectInternal getClass();

  /**
-   * True if this "object" can be meaningfully analysed to determine the boolean value of
+   * True if this "object" can be meaningfully analyzed to determine the boolean value of
   * equality tests on it.
   * For example, `None` or `int` can be, but `int()` or an unknown string cannot.
   */
--- a/python/ql/lib/semmle/python/objects/Sequences.qll
+++ b/python/ql/lib/semmle/python/objects/Sequences.qll
@@ -70,7 +70,7 @@ abstract class TupleObjectInternal extends SequenceObjectInternal {
  override ObjectInternal getClass() { result = ObjectInternal::builtin("tuple") }

  /**
-   * True if this "object" can be meaningfully analysed for
+   * True if this "object" can be meaningfully analyzed for
   * truth or false in comparisons. For example, `None` or `int` can be, but `int()`
   * or an unknown string cannot.
   */
--- a/python/ql/lib/semmle/python/objects/TObject.qll
+++ b/python/ql/lib/semmle/python/objects/TObject.qll
@@ -243,7 +243,7 @@ predicate class_method(
 * Holds if the literal corresponding to the control flow node `n` has class `cls`.
 *
 * Helper predicate for `literal_instantiation`. Prevents a bad join with
- * `PointsToContext::appliesTo` from occuring.
+ * `PointsToContext::appliesTo` from occurring.
 */
 pragma[nomagic]
 private predicate literal_node_class(ControlFlowNode n, ClassObjectInternal cls) {
--- a/python/ql/lib/semmle/python/pointsto/PointsTo.qll
+++ b/python/ql/lib/semmle/python/pointsto/PointsTo.qll
@@ -2129,7 +2129,7 @@ module Conditionals {
 /** INTERNAL: Do not use. */
 predicate declaredAttributeVar(PythonClassObjectInternal cls, string name, EssaVariable var) {
  name = var.getName() and
-  var.getAUse() = cls.getScope().getANormalExit()
+  pragma[only_bind_into](pragma[only_bind_into](var).getAUse()) = cls.getScope().getANormalExit()
 }

 cached
--- a/python/ql/lib/semmle/python/regex.qll
+++ b/python/ql/lib/semmle/python/regex.qll
@@ -75,7 +75,7 @@ private string canonical_name(API::Node flag) {
 */
 private DataFlow::TypeTrackingNode re_flag_tracker(string flag_name, DataFlow::TypeTracker t) {
  t.start() and
-  exists(API::Node flag | flag_name = canonical_name(flag) and result = flag.getAUse())
+  exists(API::Node flag | flag_name = canonical_name(flag) and result = flag.getAnImmediateUse())
  or
  exists(BinaryExprNode binop, DataFlow::Node operand |
    operand.getALocalSource() = re_flag_tracker(flag_name, t.continue()) and
--- a/python/ql/lib/semmle/python/security/BadTagFilterQuery.qll
+++ b/python/ql/lib/semmle/python/security/BadTagFilterQuery.qll
@@ -28,14 +28,14 @@ private module RegexpMatching {
     * but if `ignorePrefix` is true, it will only match "foo".
     */
    predicate test(string str, boolean ignorePrefix) {
-      none() // maybe overriden in subclasses
+      none() // maybe overridden in subclasses
    }

    /**
     * Same as `test(..)`, but where the `fillsCaptureGroup` afterwards tells which capture groups were filled by the given string.
     */
    predicate testWithGroups(string str, boolean ignorePrefix) {
-      none() // maybe overriden in subclasses
+      none() // maybe overridden in subclasses
    }

    /**
--- a/python/ql/lib/semmle/python/security/dataflow/XmlBombCustomizations.qll
+++ b/python/ql/lib/semmle/python/security/dataflow/XmlBombCustomizations.qll
@@ -0,0 +1,49 @@
+/**
+ * Provides default sources, sinks and sanitizers for detecting
+ * "XML bomb"
+ * vulnerabilities, as well as extension points for adding your own.
+ */
+
+private import python
+private import semmle.python.dataflow.new.DataFlow
+private import semmle.python.Concepts
+private import semmle.python.dataflow.new.RemoteFlowSources
+
+/**
+ * Provides default sources, sinks and sanitizers for detecting "XML bomb"
+ * vulnerabilities, as well as extension points for adding your own.
+ */
+module XmlBomb {
+  /**
+   * A data flow source for XML-bomb vulnerabilities.
+   */
+  abstract class Source extends DataFlow::Node { }
+
+  /**
+   * A data flow sink for XML-bomb vulnerabilities.
+   */
+  abstract class Sink extends DataFlow::Node { }
+
+  /**
+   * A sanitizer for XML-bomb vulnerabilities.
+   */
+  abstract class Sanitizer extends DataFlow::Node { }
+
+  /** A source of remote user input, considered as a flow source for XML bomb vulnerabilities. */
+  class RemoteFlowSourceAsSource extends Source {
+    RemoteFlowSourceAsSource() { this instanceof RemoteFlowSource }
+  }
+
+  /**
+   * A call to an XML parser that is vulnerable to XML bombs.
+   */
+  class XmlParsingVulnerableToXmlBomb extends Sink {
+    XmlParsingVulnerableToXmlBomb() {
+      exists(XML::XmlParsing parsing, XML::XmlParsingVulnerabilityKind kind |
+        kind.isXmlBomb() and
+        parsing.vulnerableTo(kind) and
+        this = parsing.getAnInput()
+      )
+    }
+  }
+}
--- a/python/ql/lib/semmle/python/security/dataflow/XmlBombQuery.qll
+++ b/python/ql/lib/semmle/python/security/dataflow/XmlBombQuery.qll
@@ -0,0 +1,28 @@
+/**
+ * Provides a taint-tracking configuration for detecting "XML bomb" vulnerabilities.
+ *
+ * Note, for performance reasons: only import this file if
+ * `Configuration` is needed, otherwise
+ * `XmlBombCustomizations` should be imported instead.
+ */
+
+import python
+import semmle.python.dataflow.new.DataFlow
+import semmle.python.dataflow.new.TaintTracking
+import XmlBombCustomizations::XmlBomb
+
+/**
+ * A taint-tracking configuration for detecting "XML bomb" vulnerabilities.
+ */
+class Configuration extends TaintTracking::Configuration {
+  Configuration() { this = "XmlBomb" }
+
+  override predicate isSource(DataFlow::Node source) { source instanceof Source }
+
+  override predicate isSink(DataFlow::Node sink) { sink instanceof Sink }
+
+  override predicate isSanitizer(DataFlow::Node node) {
+    super.isSanitizer(node) or
+    node instanceof Sanitizer
+  }
+}
--- a/python/ql/lib/semmle/python/security/dataflow/XxeCustomizations.qll
+++ b/python/ql/lib/semmle/python/security/dataflow/XxeCustomizations.qll
@@ -0,0 +1,49 @@
+/**
+ * Provides default sources, sinks and sanitizers for detecting
+ * "XML External Entity (XXE)"
+ * vulnerabilities, as well as extension points for adding your own.
+ */
+
+private import python
+private import semmle.python.dataflow.new.DataFlow
+private import semmle.python.Concepts
+private import semmle.python.dataflow.new.RemoteFlowSources
+
+/**
+ * Provides default sources, sinks and sanitizers for detecting "XML External Entity (XXE)"
+ * vulnerabilities, as well as extension points for adding your own.
+ */
+module Xxe {
+  /**
+   * A data flow source for XXE vulnerabilities.
+   */
+  abstract class Source extends DataFlow::Node { }
+
+  /**
+   * A data flow sink for XXE vulnerabilities.
+   */
+  abstract class Sink extends DataFlow::Node { }
+
+  /**
+   * A sanitizer for XXE vulnerabilities.
+   */
+  abstract class Sanitizer extends DataFlow::Node { }
+
+  /** A source of remote user input, considered as a flow source for XXE vulnerabilities. */
+  class RemoteFlowSourceAsSource extends Source {
+    RemoteFlowSourceAsSource() { this instanceof RemoteFlowSource }
+  }
+
+  /**
+   * A call to an XML parser that is vulnerable to XXE.
+   */
+  class XmlParsingVulnerableToXxe extends Sink {
+    XmlParsingVulnerableToXxe() {
+      exists(XML::XmlParsing parsing, XML::XmlParsingVulnerabilityKind kind |
+        kind.isXxe() and
+        parsing.vulnerableTo(kind) and
+        this = parsing.getAnInput()
+      )
+    }
+  }
+}
--- a/python/ql/lib/semmle/python/security/dataflow/XxeQuery.qll
+++ b/python/ql/lib/semmle/python/security/dataflow/XxeQuery.qll
@@ -0,0 +1,28 @@
+/**
+ * Provides a taint-tracking configuration for detecting "XML External Entity (XXE)" vulnerabilities.
+ *
+ * Note, for performance reasons: only import this file if
+ * `Configuration` is needed, otherwise
+ * `XxeCustomizations` should be imported instead.
+ */
+
+import python
+import semmle.python.dataflow.new.DataFlow
+import semmle.python.dataflow.new.TaintTracking
+import XxeCustomizations::Xxe
+
+/**
+ * A taint-tracking configuration for detecting "XML External Entity (XXE)" vulnerabilities.
+ */
+class Configuration extends TaintTracking::Configuration {
+  Configuration() { this = "Xxe" }
+
+  override predicate isSource(DataFlow::Node source) { source instanceof Source }
+
+  override predicate isSink(DataFlow::Node sink) { sink instanceof Sink }
+
+  override predicate isSanitizer(DataFlow::Node node) {
+    super.isSanitizer(node) or
+    node instanceof Sanitizer
+  }
+}
--- a/python/ql/lib/semmle/python/security/performance/ExponentialBackTracking.qll
+++ b/python/ql/lib/semmle/python/security/performance/ExponentialBackTracking.qll
@@ -51,7 +51,7 @@
 *     either a single character, a set of characters represented by a
 *     character class, or the set of all characters.
 *   * The product automaton is constructed lazily, starting with pair states
- *     `(q, q)` where `q` is a fork, and proceding along an over-approximate
+ *     `(q, q)` where `q` is a fork, and proceeding along an over-approximate
 *     step relation.
 *   * The over-approximate step relation allows transitions along pairs of
 *     abstract input symbols where the symbols have overlap in the characters they accept.
--- a/python/ql/lib/semmle/python/security/performance/ReDoSUtil.qll
+++ b/python/ql/lib/semmle/python/security/performance/ReDoSUtil.qll
@@ -610,16 +610,23 @@ State after(RegExpTerm t) {
  or
  exists(RegExpGroup grp | t = grp.getAChild() | result = after(grp))
  or
-  exists(EffectivelyStar star | t = star.getAChild() | result = before(star))
+  exists(EffectivelyStar star | t = star.getAChild() |
+    not isPossessive(star) and
+    result = before(star)
+  )
  or
  exists(EffectivelyPlus plus | t = plus.getAChild() |
-    result = before(plus) or
+    not isPossessive(plus) and
+    result = before(plus)
+    or
    result = after(plus)
  )
  or
  exists(EffectivelyQuestion opt | t = opt.getAChild() | result = after(opt))
  or
-  exists(RegExpRoot root | t = root | result = AcceptAnySuffix(root))
+  exists(RegExpRoot root | t = root |
+    if matchesAnySuffix(root) then result = AcceptAnySuffix(root) else result = Accept(root)
+  )
 }

 /**
@@ -690,7 +697,7 @@ predicate delta(State q1, EdgeLabel lbl, State q2) {
    lbl = Epsilon() and q2 = Accept(root)
  )
  or
-  exists(RegExpRoot root | q1 = Match(root, 0) | lbl = Any() and q2 = q1)
+  exists(RegExpRoot root | q1 = Match(root, 0) | matchesAnyPrefix(root) and lbl = Any() and q2 = q1)
  or
  exists(RegExpDollar dollar | q1 = before(dollar) |
    lbl = Epsilon() and q2 = Accept(getRoot(dollar))
--- a/python/ql/lib/semmle/python/security/performance/ReDoSUtilSpecific.qll
+++ b/python/ql/lib/semmle/python/security/performance/ReDoSUtilSpecific.qll
@@ -13,6 +13,24 @@ predicate isEscapeClass(RegExpTerm term, string clazz) {
  exists(RegExpCharacterClassEscape escape | term = escape | escape.getValue() = clazz)
 }

+/**
+ * Holds if `term` is a possessive quantifier.
+ * As python's regexes do not support possessive quantifiers, this never holds, but is used by the shared library.
+ */
+predicate isPossessive(RegExpQuantifier term) { none() }
+
+/**
+ * Holds if the regex that `term` is part of is used in a way that ignores any leading prefix of the input it's matched against.
+ * Not yet implemented for Python.
+ */
+predicate matchesAnyPrefix(RegExpTerm term) { any() }
+
+/**
+ * Holds if the regex that `term` is part of is used in a way that ignores any trailing suffix of the input it's matched against.
+ * Not yet implemented for Python.
+ */
+predicate matchesAnySuffix(RegExpTerm term) { any() }
+
 /**
 * Holds if the regular expression should not be considered.
 *
--- a/python/ql/lib/semmle/python/security/strings/External.qll
+++ b/python/ql/lib/semmle/python/security/strings/External.qll
@@ -35,7 +35,7 @@ deprecated class ExternalStringSequenceKind extends SequenceKind {
 }

 /**
- * An hierachical dictionary or list where the entire structure is externally controlled
+ * An hierarchical dictionary or list where the entire structure is externally controlled
 * This is typically a parsed JSON object.
 */
 deprecated class ExternalJsonKind extends TaintKind {
--- a/python/ql/lib/semmle/python/types/FunctionObject.qll
+++ b/python/ql/lib/semmle/python/types/FunctionObject.qll
@@ -3,7 +3,6 @@ import semmle.python.types.Exceptions
 private import semmle.python.pointsto.PointsTo
 private import semmle.python.objects.Callables
 private import semmle.python.libraries.Zope
-private import semmle.python.pointsto.Base
 private import semmle.python.objects.ObjectInternal
 private import semmle.python.types.Builtins

--- a/python/ql/lib/semmle/python/types/Object.qll
+++ b/python/ql/lib/semmle/python/types/Object.qll
@@ -1,5 +1,4 @@
 import python
-private import semmle.python.objects.ObjectAPI
 private import semmle.python.objects.ObjectInternal
 private import semmle.python.types.Builtins
 private import semmle.python.internal.CachedStages
--- a/python/ql/lib/semmle/python/web/falcon/Request.qll
+++ b/python/ql/lib/semmle/python/web/falcon/Request.qll
@@ -2,7 +2,6 @@ import python
 import semmle.python.dataflow.TaintTracking
 import semmle.python.web.Http
 import semmle.python.web.falcon.General
-import semmle.python.security.strings.External

 /** https://falcon.readthedocs.io/en/stable/api/request_and_response.html */
 deprecated class FalconRequest extends TaintKind {
--- a/python/ql/lib/semmle/python/web/falcon/Response.qll
+++ b/python/ql/lib/semmle/python/web/falcon/Response.qll
@@ -2,7 +2,6 @@ import python
 import semmle.python.dataflow.TaintTracking
 import semmle.python.web.Http
 import semmle.python.web.falcon.General
-import semmle.python.security.strings.External

 /** https://falcon.readthedocs.io/en/stable/api/request_and_response.html */
 deprecated class FalconResponse extends TaintKind {
--- a/python/ql/lib/semmle/python/web/pyramid/Response.qll
+++ b/python/ql/lib/semmle/python/web/pyramid/Response.qll
@@ -3,7 +3,6 @@ import semmle.python.dataflow.TaintTracking
 import semmle.python.security.strings.Basic
 import semmle.python.web.Http
 private import semmle.python.web.pyramid.View
-private import semmle.python.web.Http

 /**
 * A pyramid response, which is vulnerable to any sort of
--- a/python/ql/src/CHANGELOG.md
+++ b/python/ql/src/CHANGELOG.md
@@ -1,3 +1,15 @@
+## 0.1.2
+
+### New Queries
+
+* "XML external entity expansion" (`py/xxe`). Results will appear by default. This query was based on [an experimental query by @jorgectf](https://github.com/github/codeql/pull/6112).
+* "XML internal entity expansion" (`py/xml-bomb`). Results will appear by default. This query was based on [an experimental query by @jorgectf](https://github.com/github/codeql/pull/6112).
+* The query "CSRF protection weakened or disabled" (`py/csrf-protection-disabled`) has been implemented. Its results will now appear by default.
+
+## 0.1.1
+
+## 0.1.0
+
 ## 0.0.13

 ## 0.0.12
--- a/python/ql/src/Classes/DefineEqualsWhenAddingAttributes.ql
+++ b/python/ql/src/Classes/DefineEqualsWhenAddingAttributes.ql
@@ -11,7 +11,6 @@
 */

 import python
-import semmle.python.SelfAttribute
 import Equality

 predicate class_stores_to_attribute(ClassValue cls, SelfAttributeStore store, string name) {
--- a/python/ql/src/Classes/MaybeUndefinedClassAttribute.ql
+++ b/python/ql/src/Classes/MaybeUndefinedClassAttribute.ql
@@ -11,7 +11,6 @@
 */

 import python
-import semmle.python.SelfAttribute
 import ClassAttributes

 predicate guarded_by_other_attribute(SelfAttributeRead a, CheckClass c) {
--- a/python/ql/src/Classes/UndefinedClassAttribute.ql
+++ b/python/ql/src/Classes/UndefinedClassAttribute.ql
@@ -11,7 +11,6 @@
 */

 import python
-import semmle.python.SelfAttribute
 import ClassAttributes

 predicate undefined_class_attribute(SelfAttributeRead a, CheckClass c, int line, string name) {
--- a/python/ql/src/Expressions/Formatting/UnusedArgumentIn3101Format.ql
+++ b/python/ql/src/Expressions/Formatting/UnusedArgumentIn3101Format.ql
@@ -10,7 +10,6 @@
 * @id py/str-format/surplus-argument
 */

-import python
 import python
 import AdvancedFormatting

--- a/python/ql/src/Lexical/FCommentedOutCode.ql
+++ b/python/ql/src/Lexical/FCommentedOutCode.ql
@@ -10,7 +10,6 @@

 import python
 import Lexical.CommentedOutCode
-import python

 from File f, int n
 where n = count(CommentedOutCodeLine c | not c.maybeExampleCode() and c.getLocation().getFile() = f)
--- a/python/ql/src/Numerics/Pythagorean.ql
+++ b/python/ql/src/Numerics/Pythagorean.ql
@@ -10,36 +10,30 @@
 */

 import python
+import semmle.python.dataflow.new.DataFlow
+import semmle.python.ApiGraphs

-predicate squareOp(BinaryExpr e) {
-  e.getOp() instanceof Pow and e.getRight().(IntegerLiteral).getN() = "2"
-}
-
-predicate squareMul(BinaryExpr e) {
-  e.getOp() instanceof Mult and e.getRight().(Name).getId() = e.getLeft().(Name).getId()
-}
-
-predicate squareRef(Name e) {
-  e.isUse() and
-  exists(SsaVariable v, Expr s | v.getVariable() = e.getVariable() |
-    s = v.getDefinition().getNode().getParentNode().(AssignStmt).getValue() and
-    square(s)
+DataFlow::ExprNode squareOp() {
+  exists(BinaryExpr e | e = result.asExpr() |
+    e.getOp() instanceof Pow and e.getRight().(IntegerLiteral).getN() = "2"
  )
 }

-predicate square(Expr e) {
-  squareOp(e)
-  or
-  squareMul(e)
-  or
-  squareRef(e)
+DataFlow::ExprNode squareMul() {
+  exists(BinaryExpr e | e = result.asExpr() |
+    e.getOp() instanceof Mult and e.getRight().(Name).getId() = e.getLeft().(Name).getId()
+  )
 }

-from Call c, BinaryExpr s
+DataFlow::ExprNode square() { result in [squareOp(), squareMul()] }
+
+from DataFlow::CallCfgNode c, BinaryExpr s, DataFlow::ExprNode left, DataFlow::ExprNode right
 where
-  c.getFunc().toString() = "sqrt" and
-  c.getArg(0) = s and
+  c = API::moduleImport("math").getMember("sqrt").getACall() and
+  c.getArg(0).asExpr() = s and
  s.getOp() instanceof Add and
-  square(s.getLeft()) and
-  square(s.getRight())
+  left.asExpr() = s.getLeft() and
+  right.asExpr() = s.getRight() and
+  left.getALocalSource() = square() and
+  right.getALocalSource() = square()
 select c, "Pythagorean calculation with sub-optimal numerics"
--- a/python/ql/src/Resources/FileOpen.qll
+++ b/python/ql/src/Resources/FileOpen.qll
@@ -1,7 +1,6 @@
 /** Contains predicates concerning when and where files are opened and closed. */

 import python
-import semmle.python.GuardedControlFlow
 import semmle.python.pointsto.Filters

 /** Holds if `open` is a call that returns a newly opened file */
--- a/python/ql/src/Security/CWE-022/TarSlip.ql
+++ b/python/ql/src/Security/CWE-022/TarSlip.ql
@@ -81,11 +81,11 @@ class ExcludeTarFilePy extends Sanitizer {

 /* Any call to an extractall method */
 class ExtractAllSink extends TaintSink {
-  CallNode call;
-
  ExtractAllSink() {
-    this = call.getFunction().(AttrNode).getObject("extractall") and
-    count(call.getAnArg()) = 0
+    exists(CallNode call |
+      this = call.getFunction().(AttrNode).getObject("extractall") and
+      not exists(call.getAnArg())
+    )
  }

  override predicate sinks(TaintKind kind) { kind instanceof OpenTarFile }
--- a/python/ql/src/Security/CWE-079/Jinja2WithoutEscaping.ql
+++ b/python/ql/src/Security/CWE-079/Jinja2WithoutEscaping.ql
@@ -12,6 +12,8 @@
 */

 import python
+import semmle.python.dataflow.new.DataFlow
+import semmle.python.ApiGraphs

 /*
 * Jinja 2 Docs:
@@ -25,25 +27,24 @@ import python
 * safe1_tmpl = Template('Hello {{ name }}!', autoescape=True)
 */

-ClassValue jinja2EnvironmentOrTemplate() {
-  result = Value::named("jinja2.Environment")
+private API::Node jinja2EnvironmentOrTemplate() {
+  result = API::moduleImport("jinja2").getMember("Environment")
  or
-  result = Value::named("jinja2.Template")
+  result = API::moduleImport("jinja2").getMember("Template")
 }

-ControlFlowNode getAutoEscapeParameter(CallNode call) { result = call.getArgByName("autoescape") }
-
-from CallNode call
+from API::CallNode call
 where
-  call.getFunction().pointsTo(jinja2EnvironmentOrTemplate()) and
-  not exists(call.getNode().getStarargs()) and
-  not exists(call.getNode().getKwargs()) and
+  call = jinja2EnvironmentOrTemplate().getACall() and
+  not exists(call.asCfgNode().(CallNode).getNode().getStarargs()) and
+  not exists(call.asCfgNode().(CallNode).getNode().getKwargs()) and
  (
-    not exists(getAutoEscapeParameter(call))
+    not exists(call.getArgByName("autoescape"))
    or
-    exists(Value isFalse |
-      getAutoEscapeParameter(call).pointsTo(isFalse) and
-      isFalse.getDefiniteBooleanValue() = false
-    )
+    call.getKeywordParameter("autoescape")
+        .getAValueReachingRhs()
+        .asExpr()
+        .(ImmutableLiteral)
+        .booleanValue() = false
  )
 select call, "Using jinja2 templates with autoescape=False can potentially allow XSS attacks."
--- a/python/ql/src/Security/CWE-215/FlaskDebug.ql
+++ b/python/ql/src/Security/CWE-215/FlaskDebug.ql
@@ -27,9 +27,9 @@ private DataFlow::TypeTrackingNode truthyLiteral(DataFlow::TypeTracker t) {
 /** Gets a reference to a truthy literal. */
 DataFlow::Node truthyLiteral() { truthyLiteral(DataFlow::TypeTracker::end()).flowsTo(result) }

-from DataFlow::CallCfgNode call, DataFlow::Node debugArg
+from API::CallNode call, DataFlow::Node debugArg
 where
-  call.getFunction() = Flask::FlaskApp::instance().getMember("run").getAUse() and
+  call = Flask::FlaskApp::instance().getMember("run").getACall() and
  debugArg in [call.getArg(2), call.getArgByName("debug")] and
  debugArg = truthyLiteral()
 select call,
--- a/python/ql/src/Security/CWE-285/PamAuthorization.qhelp
+++ b/python/ql/src/Security/CWE-285/PamAuthorization.qhelp
@@ -0,0 +1,53 @@
+<!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
+<qhelp>
+  <overview>
+    <p>
+      Using only a call to
+      <code>pam_authenticate</code>
+      to check the validity of a login can lead to authorization bypass vulnerabilities.
+    </p>
+    <p>
+      A
+      <code>pam_authenticate</code>
+      only verifies the credentials of a user. It does not check if a user has an
+      appropriate authorization to actually login. This means a user with an expired
+      login or a password can still access the system.
+    </p>
+
+  </overview>
+
+  <recommendation>
+    <p>
+      A call to
+      <code>pam_authenticate</code>
+      should be followed by a call to
+      <code>pam_acct_mgmt</code>
+      to check if a user is allowed to login.
+    </p>
+  </recommendation>
+
+  <example>
+    <p>
+      In the following example, the code only checks the credentials of a user. Hence,
+      in this case, a user with expired credentials can still login. This can be
+      verified by creating a new user account, expiring it with
+      <code>chage -E0 `username` </code>
+      and then trying to log in.
+    </p>
+    <sample src="PamAuthorizationBad.py" />
+
+    <p>
+      This can be avoided by calling
+      <code>pam_acct_mgmt</code>
+      call to verify access as has been done in the snippet shown below.
+    </p>
+    <sample src="PamAuthorizationGood.py" />
+  </example>
+
+  <references>
+    <li>
+      Man-Page:
+      <a href="https://man7.org/linux/man-pages/man3/pam_acct_mgmt.3.html">pam_acct_mgmt</a>
+    </li>
+  </references>
+</qhelp>
--- a/python/ql/src/Security/CWE-285/PamAuthorization.ql
+++ b/python/ql/src/Security/CWE-285/PamAuthorization.ql
@@ -0,0 +1,38 @@
+/**
+ * @name PAM authorization bypass due to incorrect usage
+ * @description Not using `pam_acct_mgmt` after `pam_authenticate` to check the validity of a login can lead to authorization bypass.
+ * @kind problem
+ * @problem.severity warning
+ * @security-severity 8.1
+ * @precision high
+ * @id py/pam-auth-bypass
+ * @tags security
+ *       external/cwe/cwe-285
+ */
+
+import python
+import semmle.python.ApiGraphs
+import experimental.semmle.python.Concepts
+import semmle.python.dataflow.new.TaintTracking
+
+API::Node libPam() {
+  exists(API::CallNode findLibCall, API::CallNode cdllCall |
+    findLibCall = API::moduleImport("ctypes").getMember("util").getMember("find_library").getACall() and
+    findLibCall.getParameter(0).getAValueReachingRhs().asExpr().(StrConst).getText() = "pam" and
+    cdllCall = API::moduleImport("ctypes").getMember("CDLL").getACall() and
+    cdllCall.getParameter(0).getAValueReachingRhs() = findLibCall
+  |
+    result = cdllCall.getReturn()
+  )
+}
+
+from API::CallNode authenticateCall, DataFlow::Node handle
+where
+  authenticateCall = libPam().getMember("pam_authenticate").getACall() and
+  handle = authenticateCall.getArg(0) and
+  not exists(API::CallNode acctMgmtCall |
+    acctMgmtCall = libPam().getMember("pam_acct_mgmt").getACall() and
+    DataFlow::localFlow(handle, acctMgmtCall.getArg(0))
+  )
+select authenticateCall,
+  "This PAM authentication call may lead to an authorization bypass, since 'pam_acct_mgmt' is not called afterwards."
--- a/python/ql/src/Security/CWE-285/PamAuthorizationBad.py
+++ b/python/ql/src/Security/CWE-285/PamAuthorizationBad.py
@@ -0,0 +1,19 @@
+libpam                    = CDLL(find_library("pam"))
+
+pam_authenticate          = libpam.pam_authenticate
+pam_authenticate.restype  = c_int
+pam_authenticate.argtypes = [PamHandle, c_int]
+
+def authenticate(username, password, service='login'):
+    def my_conv(n_messages, messages, p_response, app_data):
+        """
+        Simple conversation function that responds to any prompt where the echo is off with the supplied password
+        """
+        ...
+
+    handle = PamHandle()
+    conv   = PamConv(my_conv, 0)
+    retval = pam_start(service, username, byref(conv), byref(handle))
+
+    retval = pam_authenticate(handle, 0)
+    return retval == 0
--- a/python/ql/src/Security/CWE-285/PamAuthorizationGood.py
+++ b/python/ql/src/Security/CWE-285/PamAuthorizationGood.py
@@ -0,0 +1,25 @@
+libpam                    = CDLL(find_library("pam"))
+
+pam_authenticate          = libpam.pam_authenticate
+pam_authenticate.restype  = c_int
+pam_authenticate.argtypes = [PamHandle, c_int]
+
+pam_acct_mgmt          = libpam.pam_acct_mgmt
+pam_acct_mgmt.restype  = c_int
+pam_acct_mgmt.argtypes = [PamHandle, c_int]
+
+def authenticate(username, password, service='login'):
+    def my_conv(n_messages, messages, p_response, app_data):
+        """
+        Simple conversation function that responds to any prompt where the echo is off with the supplied password
+        """
+        ...
+
+    handle = PamHandle()
+    conv   = PamConv(my_conv, 0)
+    retval = pam_start(service, username, byref(conv), byref(handle))
+
+    retval = pam_authenticate(handle, 0)
+    if retval == 0:
+        retval = pam_acct_mgmt(handle, 0)
+    return retval == 0
--- a/python/ql/src/Security/CWE-352/CSRFProtectionDisabled.qhelp
+++ b/python/ql/src/Security/CWE-352/CSRFProtectionDisabled.qhelp
@@ -0,0 +1,60 @@
+<!DOCTYPE qhelp PUBLIC
+ "-//Semmle//qhelp//EN"
+ "qhelp.dtd">
+<qhelp>
+
+  <overview>
+    <p>
+      Cross-site request forgery (CSRF) is a type of vulnerability in which an
+      attacker is able to force a user to carry out an action that the user did
+      not intend.
+    </p>
+
+    <p>
+      The attacker tricks an authenticated user into submitting a request to the
+      web application. Typically this request will result in a state change on
+      the server, such as changing the user's password. The request can be
+      initiated when the user visits a site controlled by the attacker. If the
+      web application relies only on cookies for authentication, or on other
+      credentials that are automatically included in the request, then this
+      request will appear as legitimate to the server.
+    </p>
+
+    <p>
+      A common countermeasure for CSRF is to generate a unique token to be
+      included in the HTML sent from the server to a user. This token can be
+      used as a hidden field to be sent back with requests to the server, where
+      the server can then check that the token is valid and associated with the
+      relevant user session.
+    </p>
+  </overview>
+
+  <recommendation>
+    <p>
+      In many web frameworks, CSRF protection is enabled by default. In these
+      cases, using the default configuration is sufficient to guard against most
+      CSRF attacks.
+    </p>
+  </recommendation>
+
+  <example>
+    <p>
+      The following example shows a case where CSRF protection is disabled by
+      overriding the default middleware stack and not including the one protecting against CSRF.
+    </p>
+
+    <sample src="examples/settings.py"/>
+
+    <p>
+      The protecting middleware was probably commented out during a testing phase, when server-side token generation was not set up.
+      Simply commenting it back in will enable CSRF protection.
+    </p>
+
+  </example>
+
+  <references>
+    <li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Cross-site_request_forgery">Cross-site request forgery</a></li>
+    <li>OWASP: <a href="https://owasp.org/www-community/attacks/csrf">Cross-site request forgery</a></li>
+  </references>
+
+</qhelp>
--- a/python/ql/src/Security/CWE-352/CSRFProtectionDisabled.ql
+++ b/python/ql/src/Security/CWE-352/CSRFProtectionDisabled.ql
@@ -0,0 +1,37 @@
+/**
+ * @name CSRF protection weakened or disabled
+ * @description Disabling or weakening CSRF protection may make the application
+ *              vulnerable to a Cross-Site Request Forgery (CSRF) attack.
+ * @kind problem
+ * @problem.severity warning
+ * @security-severity 8.8
+ * @precision high
+ * @id py/csrf-protection-disabled
+ * @tags security
+ *       external/cwe/cwe-352
+ */
+
+import python
+import semmle.python.Concepts
+
+predicate relevantSetting(HTTP::Server::CsrfProtectionSetting s) {
+  // rule out test code as this is a common place to turn off CSRF protection.
+  // We don't use normal `TestScope` to find test files, since we also want to match
+  // a settings file such as `.../integration-tests/settings.py`
+  not s.getLocation().getFile().getAbsolutePath().matches("%test%")
+}
+
+predicate vulnerableSetting(HTTP::Server::CsrfProtectionSetting s) {
+  s.getVerificationSetting() = false and
+  not exists(HTTP::Server::CsrfLocalProtectionSetting p | p.csrfEnabled()) and
+  relevantSetting(s)
+}
+
+from HTTP::Server::CsrfProtectionSetting setting
+where
+  vulnerableSetting(setting) and
+  // We have seen examples of dummy projects with vulnerable settings alongside a main
+  // project with a protecting settings file. We want to rule out this scenario, so we
+  // require all non-test settings to be vulnerable.
+  forall(HTTP::Server::CsrfProtectionSetting s | relevantSetting(s) | vulnerableSetting(s))
+select setting, "Potential CSRF vulnerability due to forgery protection being disabled or weakened."
--- a/python/ql/src/Security/CWE-352/examples/settings.py
+++ b/python/ql/src/Security/CWE-352/examples/settings.py
@@ -0,0 +1,9 @@
+MIDDLEWARE = [
+    'django.middleware.security.SecurityMiddleware',
+    'django.contrib.sessions.middleware.SessionMiddleware',
+    'django.middleware.common.CommonMiddleware',
+    # 'django.middleware.csrf.CsrfViewMiddleware',
+    'django.contrib.auth.middleware.AuthenticationMiddleware',
+    'django.contrib.messages.middleware.MessageMiddleware',
+    'django.middleware.clickjacking.XFrameOptionsMiddleware',
+]
--- a/python/ql/src/Security/CWE-611/Xxe.qhelp
+++ b/python/ql/src/Security/CWE-611/Xxe.qhelp
@@ -0,0 +1,74 @@
+<!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
+<qhelp>
+
+<overview>
+<p>
+Parsing untrusted XML files with a weakly configured XML parser may lead to an
+XML External Entity (XXE) attack. This type of attack uses external entity references
+to access arbitrary files on a system, carry out denial-of-service (DoS) attacks, or server-side
+request forgery. Even when the result of parsing is not returned to the user, DoS attacks are still possible
+and out-of-band data retrieval techniques may allow attackers to steal sensitive data.
+</p>
+</overview>
+
+<recommendation>
+<p>
+The easiest way to prevent XXE attacks is to disable external entity handling when
+parsing untrusted data. How this is done depends on the library being used. Note that some
+libraries, such as recent versions of the XML libraries in the standard library of Python 3,
+disable entity expansion by default,
+so unless you have explicitly enabled entity expansion, no further action needs to be taken.
+</p>
+
+<p>
+We recommend using the <a href="https://pypi.org/project/defusedxml/">defusedxml</a>
+PyPI package, which has been created to prevent XML attacks (both XXE and XML bombs).
+</p>
+</recommendation>
+
+<example>
+<p>
+The following example uses the <code>lxml</code> XML parser to parse a string
+<code>xml_src</code>. That string is from an untrusted source, so this code is
+vulnerable to an XXE attack, since the <a href="https://lxml.de/apidoc/lxml.etree.html#lxml.etree.XMLParser">
+default parser</a> from <code>lxml.etree</code> allows local external entities to be resolved.
+</p>
+<sample src="examples/XxeBad.py"/>
+
+<p>
+To guard against XXE attacks with the <code>lxml</code> library, you should create a
+parser with <code>resolve_entities</code> set to <code>false</code>. This means that no
+entity expansion is undertaken, althuogh standard predefined entities such as
+<code>&amp;gt;</code>, for writing <code>&gt;</code> inside the text of an XML element,
+are still allowed.
+</p>
+<sample src="examples/XxeGood.py"/>
+</example>
+
+<references>
+<li>
+OWASP:
+<a href="https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing">XML External Entity (XXE) Processing</a>.
+</li>
+<li>
+Timothy Morgen:
+<a href="https://research.nccgroup.com/2014/05/19/xml-schema-dtd-and-entity-attacks-a-compendium-of-known-techniques/">XML Schema, DTD, and Entity Attacks</a>.
+</li>
+<li>
+Timur Yunusov, Alexey Osipov:
+<a href="https://www.slideshare.net/qqlan/bh-ready-v4">XML Out-Of-Band Data Retrieval</a>.
+</li>
+<li>
+Python 3 standard library:
+<a href="https://docs.python.org/3/library/xml.html#xml-vulnerabilities">XML Vulnerabilities</a>.
+</li>
+<li>
+Python 2 standard library:
+<a href="https://docs.python.org/2/library/xml.html#xml-vulnerabilities">XML Vulnerabilities</a>.
+</li>
+<li>
+PortSwigger:
+<a href="https://portswigger.net/web-security/xxe">XML external entity (XXE) injection</a>.
+</li>
+</references>
+</qhelp>
--- a/python/ql/src/Security/CWE-611/Xxe.ql
+++ b/python/ql/src/Security/CWE-611/Xxe.ql
@@ -0,0 +1,23 @@
+/**
+ * @name XML external entity expansion
+ * @description Parsing user input as an XML document with external
+ *              entity expansion is vulnerable to XXE attacks.
+ * @kind path-problem
+ * @problem.severity error
+ * @security-severity 9.1
+ * @precision high
+ * @id py/xxe
+ * @tags security
+ *       external/cwe/cwe-611
+ *       external/cwe/cwe-827
+ */
+
+import python
+import semmle.python.security.dataflow.XxeQuery
+import DataFlow::PathGraph
+
+from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
+where cfg.hasFlowPath(source, sink)
+select sink.getNode(), source, sink,
+  "A $@ is parsed as XML without guarding against external entity expansion.", source.getNode(),
+  "user-provided value"
--- a/python/ql/src/Security/CWE-611/examples/XxeBad.py
+++ b/python/ql/src/Security/CWE-611/examples/XxeBad.py
@@ -0,0 +1,10 @@
+from flask import Flask, request
+import lxml.etree
+
+app = Flask(__name__)
+
+@app.post("/upload")
+def upload():
+    xml_src = request.get_data()
+    doc = lxml.etree.fromstring(xml_src)
+    return lxml.etree.tostring(doc)
--- a/python/ql/src/Security/CWE-611/examples/XxeGood.py
+++ b/python/ql/src/Security/CWE-611/examples/XxeGood.py
@@ -0,0 +1,11 @@
+from flask import Flask, request
+import lxml.etree
+
+app = Flask(__name__)
+
+@app.post("/upload")
+def upload():
+    xml_src = request.get_data()
+    parser = lxml.etree.XMLParser(resolve_entities=False)
+    doc = lxml.etree.fromstring(xml_src, parser=parser)
+    return lxml.etree.tostring(doc)
--- a/python/ql/src/Security/CWE-776/XmlBomb.qhelp
+++ b/python/ql/src/Security/CWE-776/XmlBomb.qhelp
@@ -0,0 +1,74 @@
+<!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
+<qhelp>
+
+<overview>
+<p>
+Parsing untrusted XML files with a weakly configured XML parser may be vulnerable to
+denial-of-service (DoS) attacks exploiting uncontrolled internal entity expansion.
+</p>
+<p>
+In XML, so-called <i>internal entities</i> are a mechanism for introducing an abbreviation
+for a piece of text or part of a document. When a parser that has been configured
+to expand entities encounters a reference to an internal entity, it replaces the entity
+by the data it represents. The replacement text may itself contain other entity references,
+which are expanded recursively. This means that entity expansion can increase document size
+dramatically.
+</p>
+<p>
+If untrusted XML is parsed with entity expansion enabled, a malicious attacker could
+submit a document that contains very deeply nested entity definitions, causing the parser
+to take a very long time or use large amounts of memory. This is sometimes called an
+<i>XML bomb</i> attack.
+</p>
+</overview>
+
+<recommendation>
+<p>
+The safest way to prevent XML bomb attacks is to disable entity expansion when parsing untrusted
+data. Whether this can be done depends on the library being used. Note that some libraries, such as
+<code>lxml</code>, have measures enabled by default to prevent such DoS XML attacks, so
+unless you have explicitly set <code>huge_tree</code> to <code>True</code>, no further action is needed.
+</p>
+
+<p>
+We recommend using the <a href="https://pypi.org/project/defusedxml/">defusedxml</a>
+PyPI package, which has been created to prevent XML attacks (both XXE and XML bombs).
+</p>
+</recommendation>
+
+<example>
+<p>
+The following example uses the <code>xml.etree</code> XML parser provided by the Python standard library to
+parse a string <code>xml_src</code>. That string is from an untrusted source, so this code is
+vulnerable to a DoS attack, since the <code>xml.etree</code> XML parser expands internal entities by default:
+</p>
+<sample src="examples/XmlBombBad.py"/>
+
+<p>
+It is not possible to guard against internal entity expansion with
+<code>xml.etree</code>, so to guard against these attacks, the following example uses
+the <a href="https://pypi.org/project/defusedxml/">defusedxml</a>
+PyPI package instead, which is not exposed to such internal entity expansion attacks.
+</p>
+<sample src="examples/XmlBombGood.py"/>
+</example>
+
+<references>
+<li>
+Wikipedia:
+<a href="https://en.wikipedia.org/wiki/Billion_laughs">Billion Laughs</a>.
+</li>
+<li>
+Bryan Sullivan:
+<a href="https://msdn.microsoft.com/en-us/magazine/ee335713.aspx">Security Briefs - XML Denial of Service Attacks and Defenses</a>.
+</li>
+<li>
+Python 3 standard library:
+<a href="https://docs.python.org/3/library/xml.html#xml-vulnerabilities">XML Vulnerabilities</a>.
+</li>
+<li>
+Python 2 standard library:
+<a href="https://docs.python.org/2/library/xml.html#xml-vulnerabilities">XML Vulnerabilities</a>.
+</li>
+</references>
+</qhelp>
--- a/python/ql/src/Security/CWE-776/XmlBomb.ql
+++ b/python/ql/src/Security/CWE-776/XmlBomb.ql
@@ -0,0 +1,23 @@
+/**
+ * @name XML internal entity expansion
+ * @description Parsing user input as an XML document with arbitrary internal
+ *              entity expansion is vulnerable to denial-of-service attacks.
+ * @kind path-problem
+ * @problem.severity warning
+ * @security-severity 7.5
+ * @precision high
+ * @id py/xml-bomb
+ * @tags security
+ *       external/cwe/cwe-776
+ *       external/cwe/cwe-400
+ */
+
+import python
+import semmle.python.security.dataflow.XmlBombQuery
+import DataFlow::PathGraph
+
+from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
+where cfg.hasFlowPath(source, sink)
+select sink.getNode(), source, sink,
+  "A $@ is parsed as XML without guarding against uncontrolled entity expansion.", source.getNode(),
+  "user-provided value"
--- a/python/ql/src/Security/CWE-776/examples/XmlBombBad.py
+++ b/python/ql/src/Security/CWE-776/examples/XmlBombBad.py
@@ -0,0 +1,10 @@
+from flask import Flask, request
+import xml.etree.ElementTree as ET
+
+app = Flask(__name__)
+
+@app.post("/upload")
+def upload():
+    xml_src = request.get_data()
+    doc = ET.fromstring(xml_src)
+    return ET.tostring(doc)
--- a/python/ql/src/Security/CWE-776/examples/XmlBombGood.py
+++ b/python/ql/src/Security/CWE-776/examples/XmlBombGood.py
@@ -0,0 +1,10 @@
+from flask import Flask, request
+import defusedxml.ElementTree as ET
+
+app = Flask(__name__)
+
+@app.post("/upload")
+def upload():
+    xml_src = request.get_data()
+    doc = ET.fromstring(xml_src)
+    return ET.tostring(doc)
--- a/python/ql/src/analysis/Consistency.ql
+++ b/python/ql/src/analysis/Consistency.ql
@@ -194,7 +194,7 @@ predicate function_object_consistency(string clsname, string problem, string wha
  exists(FunctionObject func | clsname = func.getAQlClass() |
    what = func.getName() and
    (
-      count(func.descriptiveString()) = 0 and problem = "no descriptiveString()"
+      not exists(func.descriptiveString()) and problem = "no descriptiveString()"
      or
      exists(int c | c = strictcount(func.descriptiveString()) and c > 1 |
        problem = c + "descriptiveString()s"
--- a/python/ql/src/analysis/ImportFailure.qhelp
+++ b/python/ql/src/analysis/ImportFailure.qhelp
@@ -21,7 +21,7 @@ Ensure that all required modules and packages can be found when running the extr
 </recommendation>
 <references>

-<li>Semmle Tutorial: <a href="https://help.semmle.com/codeql/codeql-cli/procedures/create-codeql-database.html">Creating a CodeQL database</a>.</li>
+<li>CodeQL Tutorial: <a href="https://codeql.github.com/docs/codeql-cli/creating-codeql-databases">Creating CodeQL databases</a>.</li>


 </references>
--- a/python/ql/src/change-notes/2022-05-10-promote-pam-authentication-bypass.md
+++ b/python/ql/src/change-notes/2022-05-10-promote-pam-authentication-bypass.md
@@ -0,0 +1,4 @@
+---
+category: newQuery
+---
+* The query "PAM authorization bypass due to incorrect usage" (`py/pam-auth-bypass`) has been promoted from experimental to the main query pack. Its results will now appear by default. This query was originally [submitted as an experimental query by @porcupineyhairs](https://github.com/github/codeql/pull/8595).
--- a/Show More
+++ b/Show More