Docs: Update data flow documentation to the new API.

2026-04-29 18:55:14 +02:00 · 2023-07-13 09:21:12 +02:00
parent a0e96594d8
commit 2947f176ef
18 changed files with 352 additions and 431 deletions
--- a/docs/codeql/writing-codeql-queries/creating-path-queries.rst
+++ b/docs/codeql/writing-codeql-queries/creating-path-queries.rst
@@ -58,18 +58,22 @@ You should use the following template:
    import <language>
    // For some languages (Java/C++/Python/Swift) you need to explicitly import the data flow library, such as
    // import semmle.code.java.dataflow.DataFlow or import codeql.swift.dataflow.DataFlow
-    import DataFlow::PathGraph
    ...

-    from MyConfiguration config, DataFlow::PathNode source, DataFlow::PathNode sink
-    where config.hasFlowPath(source, sink)
+    module Flow = DataFlow::Global<MyConfiguration>;
+    import Flow::PathGraph
+
+    from Flow::PathNode source, Flow::PathNode sink
+    where Flow::flowPath(source, sink)
    select sink.getNode(), source, sink, "<message>"

 Where:

- ``DataFlow::Pathgraph`` is the path graph module you need to import from the standard CodeQL libraries.
- ``source`` and ``sink`` are nodes on the `path graph <https://en.wikipedia.org/wiki/Path_graph>`__, and ``DataFlow::PathNode`` is their type.
- ``MyConfiguration`` is a class containing the predicates which define how data may flow between the ``source`` and the ``sink``.
+- ``MyConfiguration`` is a module containing the predicates that define how data may flow between the ``source`` and the ``sink``.
+- ``Flow`` is the result of the data flow computation based on ``MyConfiguration``.
+- ``Flow::Pathgraph`` is the resulting data flow graph module you need to import in order to include path explanations in the query.
+- ``source`` and ``sink`` are nodes in the graph as defined in the configuration, and ``Flow::PathNode`` is their type.
+- ``DataFlow::Global<..>`` is an invocation of data flow. ``TaintTracking::Global<..>`` can be used instead to include a default set of additional taint steps.


 The following sections describe the main requirements for a valid path query.
@@ -83,14 +87,14 @@ The other metadata requirements depend on how you intend to run the query. For m
 Generating path explanations
 ****************************

-In order to generate path explanations, your query needs to compute a `path graph <https://en.wikipedia.org/wiki/Path_graph>`__.
+In order to generate path explanations, your query needs to compute a graph.
 To do this you need to define a :ref:`query predicate <query-predicates>` called ``edges`` in your query.
 This predicate defines the edge relations of the graph you are computing, and it is used to compute the paths related to each result that your query generates.
 You can import a predefined ``edges`` predicate from a path graph module in one of the standard data flow libraries. In addition to the path graph module, the data flow libraries contain the other ``classes``, ``predicates``, and ``modules`` that are commonly used in data flow analysis.

 .. code-block:: ql

-    import DataFlow::PathGraph
+    import MyFlow::PathGraph

 This statement imports the ``PathGraph`` module from the data flow library (``DataFlow.qll``), in which ``edges`` is defined.

@@ -106,7 +110,7 @@ You can also define your own ``edges`` predicate in the body of your query. It s
 .. code-block:: ql

    query predicate edges(PathNode a, PathNode b) {
-    /** Logical conditions which hold if `(a,b)` is an edge in the data flow graph */
+      /* Logical conditions which hold if `(a,b)` is an edge in the data flow graph */
    }

 For more examples of how to define an ``edges`` predicate, visit the `standard CodeQL libraries <https://codeql.github.com/codeql-standard-libraries>`__ and search for ``edges``.
@@ -117,14 +121,23 @@ Declaring sources and sinks
 You must provide information about the ``source`` and ``sink`` in your path query. These are objects that correspond to the nodes of the paths that you are exploring.
 The name and the type of the ``source`` and the ``sink`` must be declared in the ``from`` statement of the query, and the types must be compatible with the nodes of the graph computed by the ``edges`` predicate.

-If you are querying C/C++, C#, Go, Java, JavaScript, Python, or Ruby code (and you have used ``import DataFlow::PathGraph`` in your query), the definitions of the ``source`` and ``sink`` are accessed via the ``Configuration`` class in the data flow library. You should declare all three of these objects in the ``from`` statement.
+If you are querying C/C++, C#, Go, Java, JavaScript, Python, or Ruby code (and you have used ``import MyFlow::PathGraph`` in your query), the definitions of the ``source`` and ``sink`` are accessed via the module resulting from the application of the ``Global<..>`` module in the data flow library. You should declare both of these objects in the ``from`` statement.
 For example:

 .. code-block:: ql

-    from Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink
+    module MyFlow = DataFlow::Global<MyConfiguration>;

-The configuration class is accessed by importing the data flow library. This class contains the predicates which define how data flow is treated in the query:
+    from MyFlow::PathNode source, MyFlow::PathNode sink
+
+The configuration module must be defined to include definitions of sources and sinks. For example:
+
+.. code-block:: ql
+
+    module MyConfiguration implements DataFlow::ConfigSig {
+      predicate isSource(DataFlow::Node source) { ... }
+      predicate isSink(DataFlow::Node source) { ... }
+    }

 - ``isSource()`` defines where data may flow from.
 - ``isSink()`` defines where data may flow to.
@@ -141,11 +154,11 @@ This clause can use :ref:`aggregations <aggregations>`, :ref:`predicates <predic

 When writing a path queries, you would typically include a predicate that holds only if data flows from the ``source`` to the ``sink``.

-You can use the ``hasFlowPath`` predicate to specify flow from the ``source`` to the ``sink`` for a given ``Configuration``:
+You can use the ``flowPath`` predicate to specify flow from the ``source`` to the ``sink`` for a given ``Configuration``:

 .. code-block:: ql

-    where config.hasFlowPath(source, sink)
+    where MyFlow::flowPath(source, sink)


 Select clause
--- a/docs/codeql/writing-codeql-queries/debugging-data-flow-queries-using-partial-flow.rst
+++ b/docs/codeql/writing-codeql-queries/debugging-data-flow-queries-using-partial-flow.rst
@@ -11,24 +11,24 @@ A typical data-flow query looks like this:
 .. code-block:: ql


-    class MyConfig extends TaintTracking::Configuration {
-      MyConfig() { this = "MyConfig" }
+    module MyConfig implements DataFlow::ConfigSig {
+      predicate isSource(DataFlow::Node node) { node instanceof MySource }

-      override predicate isSource(DataFlow::Node node) { node instanceof MySource }
-
-      override predicate isSink(DataFlow::Node node) { node instanceof MySink }
+      predicate isSink(DataFlow::Node node) { node instanceof MySink }
    }

-    from MyConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
-    where config.hasFlowPath(source, sink)
+    module MyFlow = TaintTracking::Global<MyConfig>;
+
+    from MyFlow::PathNode source, MyFlow::PathNode sink
+    where MyFlow::flowPath(source, sink)
    select sink.getNode(), source, sink, "Sink is reached from $@.", source.getNode(), "here"

 The same query can be slightly simplified by rewriting it without :ref:`path explanations <creating-path-queries>`:

 .. code-block:: ql

-    from MyConfig config, DataFlow::Node source, DataFlow::Node sink
-    where config.hasPath(source, sink)
+    from DataFlow::Node source, DataFlow::Node sink
+    where MyFlow::flow(source, sink)
    select sink, "Sink is reached from $@.", source.getNode(), "here"

 If a data-flow query that you have written doesn't produce the results you expect it to, there may be a problem with your query.
@@ -48,7 +48,7 @@ Data-flow configurations contain a parameter called ``fieldFlowBranchLimit``. If

 .. code-block:: ql

-    override int fieldFlowBranchLimit() { result = 5000 }
+    int fieldFlowBranchLimit() { result = 5000 }

 If there are still no results and performance is still useable, then it is best to leave this set to a high value while doing further debugging.

@@ -57,7 +57,7 @@ Partial flow

 A naive next step could be to change the sink definition to ``any()``. This would mean that we would get a lot of flow to all the places that are reachable from the sources. While this approach may work in some cases, you might find that it produces so many results that it's very hard to explore the findings. It can also dramatically affect query performance. More importantly, you might not even see all the partial flow paths. This is because the data-flow library tries very hard to prune impossible paths and, since field stores and reads must be evenly matched along a path, we will never see paths going through a store that fail to reach a corresponding read. This can make it hard to see where flow actually stops.

-To avoid these problems, a data-flow ``Configuration`` comes with a mechanism for exploring partial flow that tries to deal with these caveats. This is the ``Configuration.hasPartialFlow`` predicate:
+To avoid these problems, the data-flow library comes with a mechanism for exploring partial flow that tries to deal with these caveats. This is the ``MyFlow::FlowExploration<explorationLimit/0>::partialFlow`` predicate:

 .. code-block:: ql

@@ -71,25 +71,23 @@ To avoid these problems, a data-flow ``Configuration`` comes with a mechanism fo
       * perform poorly if the number of sources is too big and/or the exploration
       * limit is set too high without using barriers.
       *
-       * This predicate is disabled (has no results) by default. Override
-       * `explorationLimit()` with a suitable number to enable this predicate.
-       *
       * To use this in a `path-problem` query, import the module `PartialPathGraph`.
       */
-      final predicate hasPartialFlow(PartialPathNode source, PartialPathNode node, int dist) {
+      predicate partialFlow(PartialPathNode source, PartialPathNode node, int dist) {

-There is also a ``Configuration.hasPartialFlowRev`` for exploring flow backwards from a sink.
+There is also a ``partialFlowRev`` for exploring flow backwards from a sink.

-As noted in the documentation for ``hasPartialFlow`` (for example, in the 
-`CodeQL for Java documentation <https://codeql.github.com/codeql-standard-libraries/java/semmle/code/java/dataflow/internal/DataFlowImpl2.qll/predicate.DataFlowImpl2$Configuration$hasPartialFlow.3.html>`__) you must first enable this by adding an override of ``explorationLimit``. For example:
+To get access to these predicates you must instantiate the ``MyFlow::FlowExploration<>`` module with an exploration limit. For example:

 .. code-block:: ql

-    override int explorationLimit() { result = 5 }
+    int explorationLimit() { result = 5 }

-This defines the exploration radius within which ``hasPartialFlow`` returns results.
+    module MyPartialFlow = MyFlow::FlowExploration<explorationLimit/0>;

-To get good performance when using ``hasPartialFlow`` it is important to ensure the ``isSink`` predicate of the configuration has no results. Likewise, when using ``hasPartialFlowRev`` the ``isSource`` predicate of the configuration should have no results.
+This defines the exploration radius within which ``partialFlow`` returns results.
+
+To get good performance when using ``partialFlow`` it is important to ensure the ``isSink`` predicate of the configuration has no results. Likewise, when using ``partialFlowRev`` the ``isSource`` predicate of the configuration should have no results.

 It is also useful to focus on a single source at a time as the starting point for the flow exploration. This is most easily done by adding a temporary restriction in the ``isSource`` predicate.

@@ -97,9 +95,9 @@ To do quick evaluations of partial flow it is often easiest to add a predicate t

 .. code-block:: ql

-    predicate adhocPartialFlow(Callable c, PartialPathNode n, Node src, int dist) {
-      exists(MyConfig conf, PartialPathNode source |
-        conf.hasPartialFlow(source, n, dist) and
+    predicate adhocPartialFlow(Callable c, MyPartialFlow::PartialPathNode n, Node src, int dist) {
+      exists(MyPartialFlow::PartialPathNode source |
+        MyPartialFlow::partialFlow(source, n, dist) and
        src = source.getNode() and
        c = n.getNode().getEnclosingCallable()
      )
@@ -111,7 +109,7 @@ If you are focusing on a single source then the ``src`` column is superfluous. Y
 If you see a large number of partial flow results, you can focus them in a couple of ways: 

 - If flow travels a long distance following an expected path, that can result in a lot of uninteresting flow being included in the exploration radius. To reduce the amount of uninteresting flow, you can replace the source definition with a suitable ``node`` that appears along the path and restart the partial flow exploration from that point. 
- Creative use of barriers and sanitizers can be used to cut off flow paths that are uninteresting. This also reduces the number of partial flow results to explore while debugging.
+- Creative use of barriers can be used to cut off flow paths that are uninteresting. This also reduces the number of partial flow results to explore while debugging.

 Further reading
 ----------------