Docs: Update data flow documentation to the new API.

This commit is contained in:
Anders Schack-Mulligen
2023-07-13 09:21:12 +02:00
parent a0e96594d8
commit 2947f176ef
18 changed files with 352 additions and 431 deletions

View File

@@ -168,74 +168,61 @@ Global data flow tracks data flow throughout the entire program, and is therefor
Using global data flow
~~~~~~~~~~~~~~~~~~~~~~
The global data flow library is used by extending the class ``DataFlow::Configuration`` as follows:
The global data flow library is used by implementing the signature ``DataFlow::ConfigSig`` and applying the module ``DataFlow::Global<ConfigSig>`` as follows:
.. code-block:: ql
import semmle.code.cpp.dataflow.new.DataFlow
class MyDataFlowConfiguration extends DataFlow::Configuration {
MyDataFlowConfiguration() { this = "MyDataFlowConfiguration" }
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
...
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
...
}
}
module MyFlow = DataFlow::Global<MyFlowConfiguration>;
The following predicates are defined in the configuration:
- ``isSource``—defines where data may flow from
- ``isSink``—defines where data may flow to
- ``isBarrier``—optional, restricts the data flow
- ``isBarrierGuard``—optional, restricts the data flow
- ``isAdditionalFlowStep``—optional, adds additional flow steps
The characteristic predicate ``MyDataFlowConfiguration()`` defines the name of the configuration, so ``"MyDataFlowConfiguration"`` should be replaced by the name of your class.
The data flow analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``:
The data flow analysis is performed using the predicate ``flow(DataFlow::Node source, DataFlow::Node sink)``:
.. code-block:: ql
from MyDataFlowConfiguration dataflow, DataFlow::Node source, DataFlow::Node sink
where dataflow.hasFlow(source, sink)
from DataFlow::Node source, DataFlow::Node sink
where MyFlow::flow(source, sink)
select source, "Data flow to $@.", sink, sink.toString()
Using global taint tracking
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Global taint tracking is to global data flow as local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. The global taint tracking library is used by extending the class ``TaintTracking::Configuration`` as follows:
Global taint tracking is to global data flow as local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. The global taint tracking library is used by applying the module ``TaintTracking::Global<ConfigSig>`` to your configuration instead of ``DataFlow::Global<ConfigSig>`` as follows:
.. code-block:: ql
import semmle.code.cpp.dataflow.new.TaintTracking
class MyTaintTrackingConfiguration extends TaintTracking::Configuration {
MyTaintTrackingConfiguration() { this = "MyTaintTrackingConfiguration" }
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
...
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
...
}
}
The following predicates are defined in the configuration:
module MyFlow = TaintTracking::Global<MyFlowConfiguration>;
- ``isSource``—defines where taint may flow from
- ``isSink``—defines where taint may flow to
- ``isSanitizer``—optional, restricts the taint flow
- ``isSanitizerGuard``—optional, restricts the taint flow
- ``isAdditionalTaintStep``—optional, adds additional taint steps
Similar to global data flow, the characteristic predicate ``MyTaintTrackingConfiguration()`` defines the unique name of the configuration, so ``"MyTaintTrackingConfiguration"`` should be replaced by the name of your class.
The taint tracking analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``.
The resulting module is completely similar to the one obtained from ``DataFlow::Global<ConfigSig>``.
Examples
~~~~~~~~
@@ -247,17 +234,15 @@ The following data flow configuration tracks data flow from environment variable
import cpp
import semmle.code.cpp.dataflow.new.DataFlow
class EnvironmentToFileConfiguration extends DataFlow::Configuration {
EnvironmentToFileConfiguration() { this = "EnvironmentToFileConfiguration" }
override predicate isSource(DataFlow::Node source) {
module EnvironmentToFileConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
exists(Function getenv |
source.asIndirectExpr(1).(FunctionCall).getTarget() = getenv and
getenv.hasGlobalName("getenv")
)
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
exists(FunctionCall fc |
sink.asIndirectExpr(1) = fc.getArgument(0) and
fc.getTarget().hasGlobalName("fopen")
@@ -265,16 +250,17 @@ The following data flow configuration tracks data flow from environment variable
}
}
module EnvironmentToFileFlow = DataFlow::Global<EnvironmentToFileConfiguration>;
from
Expr getenv, Expr fopen, EnvironmentToFileConfiguration config, DataFlow::Node source,
DataFlow::Node sink
Expr getenv, Expr fopen, DataFlow::Node source, DataFlow::Node sink
where
source.asIndirectExpr(1) = getenv and
sink.asIndirectExpr(1) = fopen and
config.hasFlow(source, sink)
EnvironmentToFileFlow::flow(source, sink)
select fopen, "This 'fopen' uses data from $@.", getenv, "call to 'getenv'"
The following taint-tracking configuration tracks data from a call to ``ntohl`` to an array index operation. It uses the ``Guards`` library to recognize expressions that have been bounds-checked, and defines ``isSanitizer`` to prevent taint from propagating through them. It also uses ``isAdditionalTaintStep`` to add flow from loop bounds to loop indexes.
The following taint-tracking configuration tracks data from a call to ``ntohl`` to an array index operation. It uses the ``Guards`` library to recognize expressions that have been bounds-checked, and defines ``isBarrier`` to prevent taint from propagating through them. It also uses ``isAdditionalFlowStep`` to add flow from loop bounds to loop indexes.
.. code-block:: ql
@@ -282,18 +268,16 @@ The following taint-tracking configuration tracks data from a call to ``ntohl``
import semmle.code.cpp.controlflow.Guards
import semmle.code.cpp.dataflow.new.TaintTracking
class NetworkToBufferSizeConfiguration extends TaintTracking::Configuration {
NetworkToBufferSizeConfiguration() { this = "NetworkToBufferSizeConfiguration" }
override predicate isSource(DataFlow::Node node) {
module NetworkToBufferSizeConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node node) {
node.asExpr().(FunctionCall).getTarget().hasGlobalName("ntohl")
}
override predicate isSink(DataFlow::Node node) {
predicate isSink(DataFlow::Node node) {
exists(ArrayExpr ae | node.asExpr() = ae.getArrayOffset())
}
override predicate isAdditionalTaintStep(DataFlow::Node pred, DataFlow::Node succ) {
predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) {
exists(Loop loop, LoopCounter lc |
loop = lc.getALoop() and
loop.getControllingExpr().(RelationalOperation).getGreaterOperand() = pred.asExpr()
@@ -302,7 +286,7 @@ The following taint-tracking configuration tracks data from a call to ``ntohl``
)
}
override predicate isSanitizer(DataFlow::Node node) {
predicate isBarrier(DataFlow::Node node) {
exists(GuardCondition gc, Variable v |
gc.getAChild*() = v.getAnAccess() and
node.asExpr() = v.getAnAccess() and
@@ -312,8 +296,10 @@ The following taint-tracking configuration tracks data from a call to ``ntohl``
}
}
from DataFlow::Node ntohl, DataFlow::Node offset, NetworkToBufferSizeConfiguration conf
where conf.hasFlow(ntohl, offset)
module NetworkToBufferSizeFlow = TaintTracking::Global<NetworkToBufferSizeConfiguration>;
from DataFlow::Node ntohl, DataFlow::Node offset
where NetworkToBufferSizeFlow::flow(ntohl, offset)
select offset, "This array offset may be influenced by $@.", ntohl,
"converted data from the network"
@@ -353,14 +339,12 @@ Exercise 2
import cpp
import semmle.code.cpp.dataflow.new.DataFlow
class LiteralToGethostbynameConfiguration extends DataFlow::Configuration {
LiteralToGethostbynameConfiguration() { this = "LiteralToGethostbynameConfiguration" }
override predicate isSource(DataFlow::Node source) {
module LiteralToGethostbynameConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source.asIndirectExpr(1) instanceof StringLiteral
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
exists(FunctionCall fc |
sink.asIndirectExpr(1) = fc.getArgument(0) and
fc.getTarget().hasName("gethostbyname")
@@ -368,13 +352,14 @@ Exercise 2
}
}
module LiteralToGethostbynameFlow = DataFlow::Global<LiteralToGethostbynameConfiguration>;
from
StringLiteral sl, FunctionCall fc, LiteralToGethostbynameConfiguration cfg, DataFlow::Node source,
DataFlow::Node sink
StringLiteral sl, FunctionCall fc, DataFlow::Node source, DataFlow::Node sink
where
source.asIndirectExpr(1) = sl and
sink.asIndirectExpr(1) = fc.getArgument(0) and
cfg.hasFlow(source, sink)
LiteralToGethostbynameFlow::flow(source, sink)
select sl, fc
Exercise 3
@@ -401,12 +386,10 @@ Exercise 4
GetenvSource() { this.asIndirectExpr(1).(FunctionCall).getTarget().hasGlobalName("getenv") }
}
class GetenvToGethostbynameConfiguration extends DataFlow::Configuration {
GetenvToGethostbynameConfiguration() { this = "GetenvToGethostbynameConfiguration" }
module GetenvToGethostbynameConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { source instanceof GetenvSource }
override predicate isSource(DataFlow::Node source) { source instanceof GetenvSource }
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
exists(FunctionCall fc |
sink.asIndirectExpr(1) = fc.getArgument(0) and
fc.getTarget().hasName("gethostbyname")
@@ -414,13 +397,14 @@ Exercise 4
}
}
module GetenvToGethostbynameFlow = DataFlow::Global<GetenvToGethostbynameConfiguration>;
from
Expr getenv, FunctionCall fc, GetenvToGethostbynameConfiguration cfg, DataFlow::Node source,
DataFlow::Node sink
Expr getenv, FunctionCall fc, DataFlow::Node source, DataFlow::Node sink
where
source.asIndirectExpr(1) = getenv and
sink.asIndirectExpr(1) = fc.getArgument(0) and
cfg.hasFlow(source, sink)
GetenvToGethostbynameFlow::flow(source, sink)
select getenv, fc
Further reading
@@ -430,4 +414,4 @@ Further reading
.. include:: ../reusables/cpp-further-reading.rst
.. include:: ../reusables/codeql-ref-tools-further-reading.rst
.. include:: ../reusables/codeql-ref-tools-further-reading.rst

View File

@@ -152,74 +152,62 @@ Global data flow tracks data flow throughout the entire program, and is therefor
Using global data flow
~~~~~~~~~~~~~~~~~~~~~~
The global data flow library is used by extending the class ``DataFlow::Configuration`` as follows:
The global data flow library is used by implementing the signature ``DataFlow::ConfigSig`` and applying the module ``DataFlow::Global<ConfigSig>`` as follows:
.. code-block:: ql
import semmle.code.cpp.dataflow.DataFlow
class MyDataFlowConfiguration extends DataFlow::Configuration {
MyDataFlowConfiguration() { this = "MyDataFlowConfiguration" }
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
...
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
...
}
}
module MyFlow = DataFlow::Global<MyFlowConfiguration>;
The following predicates are defined in the configuration:
- ``isSource``—defines where data may flow from
- ``isSink``—defines where data may flow to
- ``isBarrier``—optional, restricts the data flow
- ``isBarrierGuard``—optional, restricts the data flow
- ``isAdditionalFlowStep``—optional, adds additional flow steps
The characteristic predicate ``MyDataFlowConfiguration()`` defines the name of the configuration, so ``"MyDataFlowConfiguration"`` should be replaced by the name of your class.
The data flow analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``:
The data flow analysis is performed using the predicate ``flow(DataFlow::Node source, DataFlow::Node sink)``:
.. code-block:: ql
from MyDataFlowConfiguration dataflow, DataFlow::Node source, DataFlow::Node sink
where dataflow.hasFlow(source, sink)
from DataFlow::Node source, DataFlow::Node sink
where MyFlow::flow(source, sink)
select source, "Data flow to $@.", sink, sink.toString()
Using global taint tracking
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Global taint tracking is to global data flow as local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. The global taint tracking library is used by extending the class ``TaintTracking::Configuration`` as follows:
Global taint tracking is to global data flow as local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. The global taint tracking library is used by applying the module ``TaintTracking::Global<ConfigSig>`` to your configuration instead of ``DataFlow::Global<ConfigSig>`` as follows:
.. code-block:: ql
import semmle.code.cpp.dataflow.TaintTracking
class MyTaintTrackingConfiguration extends TaintTracking::Configuration {
MyTaintTrackingConfiguration() { this = "MyTaintTrackingConfiguration" }
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
...
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
...
}
}
The following predicates are defined in the configuration:
module MyFlow = TaintTracking::Global<MyFlowConfiguration>;
- ``isSource``—defines where taint may flow from
- ``isSink``—defines where taint may flow to
- ``isSanitizer``—optional, restricts the taint flow
- ``isSanitizerGuard``—optional, restricts the taint flow
- ``isAdditionalTaintStep``—optional, adds additional taint steps
Similar to global data flow, the characteristic predicate ``MyTaintTrackingConfiguration()`` defines the unique name of the configuration, so ``"MyTaintTrackingConfiguration"`` should be replaced by the name of your class.
The taint tracking analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``.
The resulting module is completely similar to the one obtained from ``DataFlow::Global<ConfigSig>``.
Examples
~~~~~~~~
@@ -230,17 +218,15 @@ The following data flow configuration tracks data flow from environment variable
import semmle.code.cpp.dataflow.DataFlow
class EnvironmentToFileConfiguration extends DataFlow::Configuration {
EnvironmentToFileConfiguration() { this = "EnvironmentToFileConfiguration" }
override predicate isSource(DataFlow::Node source) {
module EnvironmentToFileConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
exists (Function getenv |
source.asExpr().(FunctionCall).getTarget() = getenv and
getenv.hasGlobalName("getenv")
)
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
exists (FunctionCall fc |
sink.asExpr() = fc.getArgument(0) and
fc.getTarget().hasGlobalName("fopen")
@@ -248,12 +234,14 @@ The following data flow configuration tracks data flow from environment variable
}
}
from Expr getenv, Expr fopen, EnvironmentToFileConfiguration config
where config.hasFlow(DataFlow::exprNode(getenv), DataFlow::exprNode(fopen))
module EnvironmentToFileFlow = DataFlow::Global<EnvironmentToFileConfiguration>;
from Expr getenv, Expr fopen
where EnvironmentToFileFlow::flow(DataFlow::exprNode(getenv), DataFlow::exprNode(fopen))
select fopen, "This 'fopen' uses data from $@.",
getenv, "call to 'getenv'"
The following taint-tracking configuration tracks data from a call to ``ntohl`` to an array index operation. It uses the ``Guards`` library to recognize expressions that have been bounds-checked, and defines ``isSanitizer`` to prevent taint from propagating through them. It also uses ``isAdditionalTaintStep`` to add flow from loop bounds to loop indexes.
The following taint-tracking configuration tracks data from a call to ``ntohl`` to an array index operation. It uses the ``Guards`` library to recognize expressions that have been bounds-checked, and defines ``isBarrier`` to prevent taint from propagating through them. It also uses ``isAdditionalFlowStep`` to add flow from loop bounds to loop indexes.
.. code-block:: ql
@@ -261,18 +249,16 @@ The following taint-tracking configuration tracks data from a call to ``ntohl``
import semmle.code.cpp.controlflow.Guards
import semmle.code.cpp.dataflow.TaintTracking
class NetworkToBufferSizeConfiguration extends TaintTracking::Configuration {
NetworkToBufferSizeConfiguration() { this = "NetworkToBufferSizeConfiguration" }
override predicate isSource(DataFlow::Node node) {
module NetworkToBufferSizeConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node node) {
node.asExpr().(FunctionCall).getTarget().hasGlobalName("ntohl")
}
override predicate isSink(DataFlow::Node node) {
predicate isSink(DataFlow::Node node) {
exists(ArrayExpr ae | node.asExpr() = ae.getArrayOffset())
}
override predicate isAdditionalTaintStep(DataFlow::Node pred, DataFlow::Node succ) {
predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) {
exists(Loop loop, LoopCounter lc |
loop = lc.getALoop() and
loop.getControllingExpr().(RelationalOperation).getGreaterOperand() = pred.asExpr() |
@@ -280,7 +266,7 @@ The following taint-tracking configuration tracks data from a call to ``ntohl``
)
}
override predicate isSanitizer(DataFlow::Node node) {
predicate isBarrier(DataFlow::Node node) {
exists(GuardCondition gc, Variable v |
gc.getAChild*() = v.getAnAccess() and
node.asExpr() = v.getAnAccess() and
@@ -289,8 +275,10 @@ The following taint-tracking configuration tracks data from a call to ``ntohl``
}
}
from DataFlow::Node ntohl, DataFlow::Node offset, NetworkToBufferSizeConfiguration conf
where conf.hasFlow(ntohl, offset)
module NetworkToBufferSizeFlow = TaintTracking::Global<NetworkToBufferSizeConfiguration>;
from DataFlow::Node ntohl, DataFlow::Node offset
where NetworkToBufferSizeFlow::flow(ntohl, offset)
select offset, "This array offset may be influenced by $@.", ntohl,
"converted data from the network"
@@ -327,24 +315,22 @@ Exercise 2
import semmle.code.cpp.dataflow.DataFlow
class LiteralToGethostbynameConfiguration extends DataFlow::Configuration {
LiteralToGethostbynameConfiguration() {
this = "LiteralToGethostbynameConfiguration"
}
override predicate isSource(DataFlow::Node source) {
module LiteralToGethostbynameConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source.asExpr() instanceof StringLiteral
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
exists (FunctionCall fc |
sink.asExpr() = fc.getArgument(0) and
fc.getTarget().hasName("gethostbyname"))
}
}
from StringLiteral sl, FunctionCall fc, LiteralToGethostbynameConfiguration cfg
where cfg.hasFlow(DataFlow::exprNode(sl), DataFlow::exprNode(fc.getArgument(0)))
module LiteralToGethostbynameFlow = DataFlow::Global<LiteralToGethostbynameConfiguration>;
from StringLiteral sl, FunctionCall fc
where LiteralToGethostbynameFlow::flow(DataFlow::exprNode(sl), DataFlow::exprNode(fc.getArgument(0)))
select sl, fc
Exercise 3
@@ -373,24 +359,22 @@ Exercise 4
}
}
class GetenvToGethostbynameConfiguration extends DataFlow::Configuration {
GetenvToGethostbynameConfiguration() {
this = "GetenvToGethostbynameConfiguration"
}
override predicate isSource(DataFlow::Node source) {
module GetenvToGethostbynameConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof GetenvSource
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
exists (FunctionCall fc |
sink.asExpr() = fc.getArgument(0) and
fc.getTarget().hasName("gethostbyname"))
}
}
from DataFlow::Node getenv, FunctionCall fc, GetenvToGethostbynameConfiguration cfg
where cfg.hasFlow(getenv, DataFlow::exprNode(fc.getArgument(0)))
module GetenvToGethostbynameFlow = DataFlow::Global<GetenvToGethostbynameConfiguration>;
from DataFlow::Node getenv, FunctionCall fc
where GetenvToGethostbynameFlow::flow(getenv, DataFlow::exprNode(fc.getArgument(0)))
select getenv.asExpr(), fc
Further reading
@@ -400,4 +384,4 @@ Further reading
.. include:: ../reusables/cpp-further-reading.rst
.. include:: ../reusables/codeql-ref-tools-further-reading.rst
.. include:: ../reusables/codeql-ref-tools-further-reading.rst

View File

@@ -146,24 +146,24 @@ Global data flow tracks data flow throughout the entire program, and is therefor
Using global data flow
~~~~~~~~~~~~~~~~~~~~~~
The global data flow library is used by extending the class ``DataFlow::Configuration``:
The global data flow library is used by implementing the signature ``DataFlow::ConfigSig`` and applying the module ``DataFlow::Global<ConfigSig>``:
.. code-block:: ql
import csharp
class MyDataFlowConfiguration extends DataFlow::Configuration {
MyDataFlowConfiguration() { this = "..." }
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
...
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
...
}
}
module MyFlow = DataFlow::Global<MyFlowConfiguration>;
These predicates are defined in the configuration:
- ``isSource`` - defines where data may flow from.
@@ -171,45 +171,36 @@ These predicates are defined in the configuration:
- ``isBarrier`` - optionally, restricts the data flow.
- ``isAdditionalFlowStep`` - optionally, adds additional flow steps.
The characteristic predicate (``MyDataFlowConfiguration()``) defines the name of the configuration, so ``"..."`` must be replaced with a unique name.
The data flow analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``:
The data flow analysis is performed using the predicate ``flow(DataFlow::Node source, DataFlow::Node sink)``:
.. code-block:: ql
from MyDataFlowConfiguation dataflow, DataFlow::Node source, DataFlow::Node sink
where dataflow.hasFlow(source, sink)
from DataFlow::Node source, DataFlow::Node sink
where MyFlow::flow(source, sink)
select source, "Dataflow to $@.", sink, sink.toString()
Using global taint tracking
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Global taint tracking is to global data flow what local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. The global taint tracking library is used by extending the class ``TaintTracking::Configuration``:
Global taint tracking is to global data flow what local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. The global taint tracking library is used by applying the module ``TaintTracking::Global<ConfigSig>`` to your configuration instead of ``DataFlow::Global<ConfigSig>``:
.. code-block:: ql
import csharp
class MyTaintTrackingConfiguration extends TaintTracking::Configuration {
MyTaintTrackingConfiguration() { this = "..." }
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
...
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
...
}
}
These predicates are defined in the configuration:
module MyFlow = TaintTracking::Global<MyFlowConfiguration>;
- ``isSource`` - defines where taint may flow from.
- ``isSink`` - defines where taint may flow to.
- ``isSanitizer`` - optionally, restricts the taint flow.
- ``isAdditionalTaintStep`` - optionally, adds additional taint steps.
Similar to global data flow, the characteristic predicate (``MyTaintTrackingConfiguration()``) defines the unique name of the configuration and the taint analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``.
The resulting module is completely similar to the one obtained from ``DataFlow::Global<ConfigSig>``.
Flow sources
~~~~~~~~~~~~
@@ -228,12 +219,8 @@ This query shows a data flow configuration that uses all public API parameters a
import csharp
import semmle.code.csharp.dataflow.flowsources.PublicCallableParameter
class MyDataFlowConfiguration extends DataFlow::Configuration {
MyDataFlowConfiguration() {
this = "..."
}
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof PublicCallableParameterFlowSource
}
@@ -243,7 +230,6 @@ This query shows a data flow configuration that uses all public API parameters a
Class hierarchy
~~~~~~~~~~~~~~~
- ``DataFlow::Configuration`` - base class for custom global data flow analysis.
- ``DataFlow::Node`` - an element behaving as a data flow node.
- ``DataFlow::ExprNode`` - an expression behaving as a data flow node.
@@ -261,8 +247,6 @@ Class hierarchy
- ``WcfRemoteFlowSource`` - data flow from a WCF web service.
- ``AspNetServiceRemoteFlowSource`` - data flow from an ASP.NET web service.
- ``TaintTracking::Configuration`` - base class for custom global taint tracking analysis.
Examples
~~~~~~~~
@@ -272,17 +256,15 @@ This data flow configuration tracks data flow from environment variables to open
import csharp
class EnvironmentToFileConfiguration extends DataFlow::Configuration {
EnvironmentToFileConfiguration() { this = "Environment opening files" }
override predicate isSource(DataFlow::Node source) {
module EnvironmentToFileConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
exists(Method m |
m = source.asExpr().(MethodCall).getTarget() and
m.hasQualifiedName("System.Environment.GetEnvironmentVariable")
)
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
exists(MethodCall mc |
mc.getTarget().hasQualifiedName("System.IO.File.Open") and
sink.asExpr() = mc.getArgument(0)
@@ -290,8 +272,10 @@ This data flow configuration tracks data flow from environment variables to open
}
}
from Expr environment, Expr fileOpen, EnvironmentToFileConfiguration config
where config.hasFlow(DataFlow::exprNode(environment), DataFlow::exprNode(fileOpen))
module EnvironmentToFileFlow = DataFlow::Global<EnvironmentToFileConfiguration>;
from Expr environment, Expr fileOpen
where EnvironmentToFileFlow::flow(DataFlow::exprNode(environment), DataFlow::exprNode(fileOpen))
select fileOpen, "This 'File.Open' uses data from $@.",
environment, "call to 'GetEnvironmentVariable'"
@@ -435,21 +419,21 @@ Exercise 2
import csharp
class Configuration extends DataFlow::Configuration {
Configuration() { this="String to System.Uri" }
override predicate isSource(DataFlow::Node src) {
module StringToUriConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node src) {
src.asExpr().hasValue()
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
exists(Call c | c.getTarget().(Constructor).getDeclaringType().hasQualifiedName("System.Uri")
and sink.asExpr()=c.getArgument(0))
}
}
from DataFlow::Node src, DataFlow::Node sink, Configuration config
where config.hasFlow(src, sink)
module StringToUriFlow = DataFlow::Global<StringToUriConfig>;
from DataFlow::Node src, DataFlow::Node sink
where StringToUriFlow::flow(src, sink)
select src, "This string constructs a 'System.Uri' $@.", sink, "here"
Exercise 3
@@ -476,21 +460,21 @@ Exercise 4
}
}
class Configuration extends DataFlow::Configuration {
Configuration() { this="Environment to System.Uri" }
override predicate isSource(DataFlow::Node src) {
module EnvironmentToUriConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node src) {
src instanceof EnvironmentVariableFlowSource
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
exists(Call c | c.getTarget().(Constructor).getDeclaringType().hasQualifiedName("System.Uri")
and sink.asExpr()=c.getArgument(0))
}
}
from DataFlow::Node src, DataFlow::Node sink, Configuration config
where config.hasFlow(src, sink)
module EnvironmentToUriFlow = DataFlow::Global<EnvironmentToUriConfig>;
from DataFlow::Node src, DataFlow::Node sink
where EnvironmentToUriFlow::flow(src, sink)
select src, "This environment variable constructs a 'System.Uri' $@.", sink, "here"
Exercise 5

View File

@@ -160,24 +160,24 @@ Global data flow tracks data flow throughout the entire program, and is therefor
Using global data flow
~~~~~~~~~~~~~~~~~~~~~~
You use the global data flow library by extending the class ``DataFlow::Configuration``:
You use the global data flow library by implementing the signature ``DataFlow::ConfigSig`` and applying the module ``DataFlow::Global<ConfigSig>``:
.. code-block:: ql
import semmle.code.java.dataflow.DataFlow
class MyDataFlowConfiguration extends DataFlow::Configuration {
MyDataFlowConfiguration() { this = "MyDataFlowConfiguration" }
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
...
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
...
}
}
module MyFlow = DataFlow::Global<MyFlowConfiguration>;
These predicates are defined in the configuration:
- ``isSource``—defines where data may flow from
@@ -185,47 +185,36 @@ These predicates are defined in the configuration:
- ``isBarrier``—optional, restricts the data flow
- ``isAdditionalFlowStep``—optional, adds additional flow steps
The characteristic predicate ``MyDataFlowConfiguration()`` defines the name of the configuration, so ``"MyDataFlowConfiguration"`` should be a unique name, for example, the name of your class.
The data flow analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``:
The data flow analysis is performed using the predicate ``flow(DataFlow::Node source, DataFlow::Node sink)``:
.. code-block:: ql
from MyDataFlowConfiguration dataflow, DataFlow::Node source, DataFlow::Node sink
where dataflow.hasFlow(source, sink)
from DataFlow::Node source, DataFlow::Node sink
where MyFlow::flow(source, sink)
select source, "Data flow to $@.", sink, sink.toString()
Using global taint tracking
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Global taint tracking is to global data flow as local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. You use the global taint tracking library by extending the class ``TaintTracking::Configuration``:
Global taint tracking is to global data flow as local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. You use the global taint tracking library by applying the module ``TaintTracking::Global<ConfigSig>`` to your configuration instead of ``DataFlow::Global<ConfigSig>``:
.. code-block:: ql
import semmle.code.java.dataflow.TaintTracking
class MyTaintTrackingConfiguration extends TaintTracking::Configuration {
MyTaintTrackingConfiguration() { this = "MyTaintTrackingConfiguration" }
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
...
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
...
}
}
These predicates are defined in the configuration:
module MyFlow = TaintTracking::Global<MyFlowConfiguration>;
- ``isSource``—defines where taint may flow from
- ``isSink``—defines where taint may flow to
- ``isSanitizer``—optional, restricts the taint flow
- ``isAdditionalTaintStep``—optional, adds additional taint steps
Similar to global data flow, the characteristic predicate ``MyTaintTrackingConfiguration()`` defines the unique name of the configuration.
The taint tracking analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``.
The resulting module is completely similar to the one obtained from ``DataFlow::Global<ConfigSig>``.
Flow sources
~~~~~~~~~~~~
@@ -242,18 +231,16 @@ This query shows a taint-tracking configuration that uses remote user input as d
import java
import semmle.code.java.dataflow.FlowSources
class MyTaintTrackingConfiguration extends TaintTracking::Configuration {
MyTaintTrackingConfiguration() {
this = "..."
}
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
...
}
module MyTaintFlow = TaintTracking::Global<MyFlowConfiguration>;
Exercises
~~~~~~~~~
@@ -287,16 +274,12 @@ Exercise 2
import semmle.code.java.dataflow.DataFlow
class Configuration extends DataFlow::Configuration {
Configuration() {
this = "LiteralToURL Configuration"
}
override predicate isSource(DataFlow::Node source) {
module LiteralToURLConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source.asExpr() instanceof StringLiteral
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
exists(Call call |
sink.asExpr() = call.getArgument(0) and
call.getCallee().(Constructor).getDeclaringType().hasQualifiedName("java.net", "URL")
@@ -304,8 +287,10 @@ Exercise 2
}
}
from DataFlow::Node src, DataFlow::Node sink, Configuration config
where config.hasFlow(src, sink)
module LiteralToURLFlow = DataFlow::Global<LiteralToURLConfig>;
from DataFlow::Node src, DataFlow::Node sink
where LiteralToURLFlow::flow(src, sink)
select src, "This string constructs a URL $@.", sink, "here"
Exercise 3
@@ -340,16 +325,12 @@ Exercise 4
}
}
class GetenvToURLConfiguration extends DataFlow::Configuration {
GetenvToURLConfiguration() {
this = "GetenvToURLConfiguration"
}
override predicate isSource(DataFlow::Node source) {
module GetenvToURLConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof GetenvSource
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
exists(Call call |
sink.asExpr() = call.getArgument(0) and
call.getCallee().(Constructor).getDeclaringType().hasQualifiedName("java.net", "URL")
@@ -357,8 +338,10 @@ Exercise 4
}
}
from DataFlow::Node src, DataFlow::Node sink, GetenvToURLConfiguration config
where config.hasFlow(src, sink)
module GetenvToURLFlow = DataFlow::Global<GetenvToURLConfig>;
from DataFlow::Node src, DataFlow::Node sink
where GetenvToURLFlow::flow(src, sink)
select src, "This environment variable constructs a URL $@.", sink, "here"
Further reading
@@ -368,4 +351,4 @@ Further reading
.. include:: ../reusables/java-further-reading.rst
.. include:: ../reusables/codeql-ref-tools-further-reading.rst
.. include:: ../reusables/codeql-ref-tools-further-reading.rst

View File

@@ -204,24 +204,24 @@ Global data flow tracks data flow throughout the entire program, and is therefor
Using global data flow
~~~~~~~~~~~~~~~~~~~~~~
The global data flow library is used by extending the class ``DataFlow::Configuration``:
The global data flow library is used by implementing the signature ``DataFlow::ConfigSig`` and applying the module ``DataFlow::Global<ConfigSig>``:
.. code-block:: ql
import python
class MyDataFlowConfiguration extends DataFlow::Configuration {
MyDataFlowConfiguration() { this = "..." }
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
...
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
...
}
}
module MyFlow = DataFlow::Global<MyFlowConfiguration>;
These predicates are defined in the configuration:
- ``isSource`` - defines where data may flow from.
@@ -229,45 +229,36 @@ These predicates are defined in the configuration:
- ``isBarrier`` - optionally, restricts the data flow.
- ``isAdditionalFlowStep`` - optionally, adds additional flow steps.
The characteristic predicate (``MyDataFlowConfiguration()``) defines the name of the configuration, so ``"..."`` must be replaced with a unique name (for instance the class name).
The data flow analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``:
The data flow analysis is performed using the predicate ``flow(DataFlow::Node source, DataFlow::Node sink)``:
.. code-block:: ql
from MyDataFlowConfiguation dataflow, DataFlow::Node source, DataFlow::Node sink
where dataflow.hasFlow(source, sink)
from DataFlow::Node source, DataFlow::Node sink
where MyFlow::flow(source, sink)
select source, "Dataflow to $@.", sink, sink.toString()
Using global taint tracking
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Global taint tracking is to global data flow what local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. The global taint tracking library is used by extending the class ``TaintTracking::Configuration``:
Global taint tracking is to global data flow what local taint tracking is to local data flow. That is, global taint tracking extends global data flow with additional non-value-preserving steps. The global taint tracking library is used by applying the module ``TaintTracking::Global<ConfigSig>`` to your configuration instead of ``DataFlow::Global<ConfigSig>``:
.. code-block:: ql
import python
class MyTaintTrackingConfiguration extends TaintTracking::Configuration {
MyTaintTrackingConfiguration() { this = "..." }
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
...
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
...
}
}
These predicates are defined in the configuration:
module MyFlow = TaintTracking::Global<MyFlowConfiguration>;
- ``isSource`` - defines where taint may flow from.
- ``isSink`` - defines where taint may flow to.
- ``isSanitizer`` - optionally, restricts the taint flow.
- ``isAdditionalTaintStep`` - optionally, adds additional taint steps.
Similar to global data flow, the characteristic predicate (``MyTaintTrackingConfiguration()``) defines the unique name of the configuration and the taint analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``.
The resulting module is completely similar to the one obtained from ``DataFlow::Global<ConfigSig>``.
Predefined sources and sinks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -283,7 +274,6 @@ For global flow, it is also useful to restrict sources to instances of ``LocalSo
Class hierarchy
~~~~~~~~~~~~~~~
- ``DataFlow::Configuration`` - base class for custom global data flow analysis.
- ``DataFlow::Node`` - an element behaving as a data flow node.
- ``DataFlow::CfgNode`` - a control-flow node behaving as a data flow node.
@@ -305,8 +295,6 @@ Class hierarchy
- ``Concepts::HTTP::Server::RouteSetup`` - a data-flow node that sets up a route on a server.
- ``Concepts::HTTP::Server::HttpResponse`` - a data-flow node that creates a HTTP response on a server.
- ``TaintTracking::Configuration`` - base class for custom global taint tracking analysis.
Examples
~~~~~~~~
@@ -320,20 +308,20 @@ This query shows a data flow configuration that uses all network input as data s
import semmle.python.dataflow.new.RemoteFlowSources
import semmle.python.Concepts
class RemoteToFileConfiguration extends TaintTracking::Configuration {
RemoteToFileConfiguration() { this = "RemoteToFileConfiguration" }
override predicate isSource(DataFlow::Node source) {
module RemoteToFileConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
sink = any(FileSystemAccess fa).getAPathArgument()
}
}
from DataFlow::Node input, DataFlow::Node fileAccess, RemoteToFileConfiguration config
where config.hasFlow(input, fileAccess)
module RemoteToFileFlow = TaintTracking::Global<RemoteToFileConfiguration>;
from DataFlow::Node input, DataFlow::Node fileAccess
where RemoteToFileFlow::flow(input, fileAccess)
select fileAccess, "This file access uses data from $@.",
input, "user-controllable input."
@@ -345,14 +333,12 @@ This data flow configuration tracks data flow from environment variables to open
import semmle.python.dataflow.new.TaintTracking
import semmle.python.ApiGraphs
class EnvironmentToFileConfiguration extends DataFlow::Configuration {
EnvironmentToFileConfiguration() { this = "EnvironmentToFileConfiguration" }
override predicate isSource(DataFlow::Node source) {
module EnvironmentToFileConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source = API::moduleImport("os").getMember("getenv").getACall()
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
exists(DataFlow::CallCfgNode call |
call = API::moduleImport("os").getMember("open").getACall() and
sink = call.getArg(0)
@@ -360,8 +346,10 @@ This data flow configuration tracks data flow from environment variables to open
}
}
from Expr environment, Expr fileOpen, EnvironmentToFileConfiguration config
where config.hasFlow(DataFlow::exprNode(environment), DataFlow::exprNode(fileOpen))
module EnvironmentToFileFlow = DataFlow::Global<EnvironmentToFileConfiguration>;
from Expr environment, Expr fileOpen
where EnvironmentToFileFlow::flow(DataFlow::exprNode(environment), DataFlow::exprNode(fileOpen))
select fileOpen, "This call to 'os.open' uses data from $@.",
environment, "call to 'os.getenv'"

View File

@@ -224,24 +224,24 @@ However, global data flow is less precise than local data flow, and the analysis
Using global data flow
~~~~~~~~~~~~~~~~~~~~~~
You can use the global data flow library by extending the class ``DataFlow::Configuration``:
You can use the global data flow library by implementing the signature ``DataFlow::ConfigSig`` and applying the module ``DataFlow::Global<ConfigSig>``:
.. code-block:: ql
import codeql.ruby.DataFlow
class MyDataFlowConfiguration extends DataFlow::Configuration {
MyDataFlowConfiguration() { this = "..." }
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
...
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
...
}
}
module MyFlow = DataFlow::Global<MyFlowConfiguration>;
These predicates are defined in the configuration:
- ``isSource`` - defines where data may flow from.
@@ -249,14 +249,12 @@ These predicates are defined in the configuration:
- ``isBarrier`` - optionally, restricts the data flow.
- ``isAdditionalFlowStep`` - optionally, adds additional flow steps.
The characteristic predicate (``MyDataFlowConfiguration()``) defines the name of the configuration, so ``"..."`` must be replaced with a unique name (for instance the class name).
The data flow analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``:
The data flow analysis is performed using the predicate ``flow(DataFlow::Node source, DataFlow::Node sink)``:
.. code-block:: ql
from MyDataFlowConfiguation dataflow, DataFlow::Node source, DataFlow::Node sink
where dataflow.hasFlow(source, sink)
from DataFlow::Node source, DataFlow::Node sink
where MyFlow::flow(source, sink)
select source, "Dataflow to $@.", sink, sink.toString()
Using global taint tracking
@@ -264,33 +262,26 @@ Using global taint tracking
Global taint tracking is to global data flow what local taint tracking is to local data flow.
That is, global taint tracking extends global data flow with additional non-value-preserving steps.
The global taint tracking library is used by extending the class ``TaintTracking::Configuration``:
The global taint tracking library is used by applying the module ``TaintTracking::Global<ConfigSig>`` to your configuration instead of ``DataFlow::Global<ConfigSig>``:
.. code-block:: ql
import codeql.ruby.DataFlow
import codeql.ruby.TaintTracking
class MyTaintTrackingConfiguration extends TaintTracking::Configuration {
MyTaintTrackingConfiguration() { this = "..." }
override predicate isSource(DataFlow::Node source) {
module MyFlowConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
...
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
...
}
}
These predicates are defined in the configuration:
module MyFlow = TaintTracking::Global<MyFlowConfiguration>;
- ``isSource`` - defines where taint may flow from.
- ``isSink`` - defines where taint may flow to.
- ``isSanitizer`` - optionally, restricts the taint flow.
- ``isAdditionalTaintStep`` - optionally, adds additional taint steps.
Similar to global data flow, the characteristic predicate (``MyTaintTrackingConfiguration()``) defines the unique name of the configuration and the taint analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``.
The resulting module is completely similar to the one obtained from ``DataFlow::Global<ConfigSig>``.
Predefined sources and sinks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -306,7 +297,6 @@ The predefined sources generally do that.
Class hierarchy
~~~~~~~~~~~~~~~
- ``DataFlow::Configuration`` - base class for custom global data flow analysis.
- ``DataFlow::Node`` - an element behaving as a data-flow node.
- ``DataFlow::LocalSourceNode`` - a local origin of data, as a data-flow node.
- ``DataFlow::ExprNode`` - an expression behaving as a data-flow node.
@@ -321,13 +311,11 @@ Class hierarchy
- ``Concepts::HTTP::Server::RouteSetup`` - a data-flow node that sets up a route on a server.
- ``Concepts::HTTP::Server::HttpResponse`` - a data-flow node that creates an HTTP response on a server.
- ``TaintTracking::Configuration`` - base class for custom global taint tracking analysis.
Examples of global data flow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following global taint-tracking query finds path arguments in filesystem accesses that can be controlled by a remote user.
- Since this is a taint-tracking query, the configuration class extends ``TaintTracking::Configuration``.
- Since this is a taint-tracking query, the ``TaintTracking::Global<ConfigSig>`` module is used.
- The ``isSource`` predicate defines sources as any data-flow nodes that are instances of ``RemoteFlowSource``.
- The ``isSink`` predicate defines sinks as path arguments in any filesystem access, using ``FileSystemAccess`` from the ``Concepts`` library.
@@ -338,22 +326,22 @@ The following global taint-tracking query finds path arguments in filesystem acc
import codeql.ruby.Concepts
import codeql.ruby.dataflow.RemoteFlowSources
class RemoteToFileConfiguration extends TaintTracking::Configuration {
RemoteToFileConfiguration() { this = "RemoteToFileConfiguration" }
module RemoteToFileConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { source instanceof RemoteFlowSource }
override predicate isSource(DataFlow::Node source) { source instanceof RemoteFlowSource }
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
sink = any(FileSystemAccess fa).getAPathArgument()
}
}
module RemoteToFileFlow = TaintTracking::Global<RemoteToFileConfiguration>;
from DataFlow::Node input, DataFlow::Node fileAccess, RemoteToFileConfiguration config
where config.hasFlow(input, fileAccess)
from DataFlow::Node input, DataFlow::Node fileAccess
where RemoteToFileFlow::flow(input, fileAccess)
select fileAccess, "This file access uses data from $@.", input, "user-controllable input."
The following global data-flow query finds calls to ``File.open`` where the filename argument comes from an environment variable.
- Since this is a data-flow query, the configuration class extends ``DataFlow::Configuration``.
- Since this is a data-flow query, the ``DataFlow::Global<ConfigSig>`` module is used.
- The ``isSource`` predicate defines sources as expression nodes representing lookups on the ``ENV`` hash.
- The ``isSink`` predicate defines sinks as the first argument in any call to ``File.open``.
@@ -363,23 +351,23 @@ The following global data-flow query finds calls to ``File.open`` where the file
import codeql.ruby.controlflow.CfgNodes
import codeql.ruby.ApiGraphs
class EnvironmentToFileConfiguration extends DataFlow::Configuration {
EnvironmentToFileConfiguration() { this = "EnvironmentToFileConfiguration" }
override predicate isSource(DataFlow::Node source) {
module EnvironmentToFileConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
exists(ExprNodes::ConstantReadAccessCfgNode env |
env.getExpr().getName() = "ENV" and
env = source.asExpr().(ExprNodes::ElementReferenceCfgNode).getReceiver()
)
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
sink = API::getTopLevelMember("File").getAMethodCall("open").getArgument(0)
}
}
module EnvironmentToFileFlow = DataFlow::Global<EnvironmentToFileConfiguration>;
from EnvironmentToFileConfiguration config, DataFlow::Node environment, DataFlow::Node fileOpen
where config.hasFlow(environment, fileOpen)
from DataFlow::Node environment, DataFlow::Node fileOpen
where EnvironmentToFileFlow::flow(environment, fileOpen)
select fileOpen, "This call to 'File.open' uses data from $@.", environment,
"an environment variable"

View File

@@ -494,14 +494,14 @@ which are sets of data-flow nodes. Given these three sets, CodeQL provides a gen
finding paths from a source to a sink, possibly going into and out of functions and fields, but
never flowing through a barrier.
To define a data-flow configuration, you can define a subclass of ``DataFlow::Configuration``,
overriding the member predicates ``isSource``, ``isSink``, and ``isBarrier`` to define the sets of
sources, sinks, and barriers.
To define a data-flow configuration, you can define a module implementing ``DataFlow::ConfigSig``,
including the predicates ``isSource``, ``isSink``, and ``isBarrier`` to define the sets of
sources, sinks, and barriers. Data flow is then computed by applying
``DataFlow::Global<..>`` to the configuration.
Going beyond pure data flow, many security analyses need to perform more general `taint tracking`,
which also considers flow through value-transforming operations such as string operations. To track
taint, you can define a subclass of ``TaintTracking::Configuration``, which works similar to
data-flow configurations.
taint, you apply ``TaintTracking::Global<..>`` to your configuration instead.
A detailed exposition of global data flow and taint tracking is out of scope for this brief
introduction. For a general overview of data flow and taint tracking, see ":ref:`About data flow analysis <about-data-flow-analysis>`."

View File

@@ -161,20 +161,20 @@ is read flows into a call to ``File.write``.
import codeql.ruby.DataFlow
import codeql.ruby.ApiGraphs
class Configuration extends DataFlow::Configuration {
Configuration() { this = "File read/write Configuration" }
override predicate isSource(DataFlow::Node source) {
module Configuration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source = API::getTopLevelMember("File").getMethod("read").getReturn().asSource()
}
override predicate isSink(DataFlow::Node sink) {
predicate isSink(DataFlow::Node sink) {
sink = API::getTopLevelMember("File").getMethod("write").getParameter(1).asSink()
}
}
from DataFlow::Node src, DataFlow::Node sink, Configuration config
where config.hasFlow(src, sink)
module Flow = DataFlow::Global<Configuration>;
from DataFlow::Node src, DataFlow::Node sink
where Flow::flow(src, sink)
select src, "The data read here flows into a $@ call.", sink, "File.write"
Further reading

View File

@@ -62,8 +62,8 @@ The library class ``SecurityOptions`` provides a (configurable) model of what co
import semmle.code.cpp.security.Security
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isSource(DataFlow::Node source) {
module TaintedFormatConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
exists (SecurityOptions opts |
opts.isUserInput(source.asExpr(), _)
)
@@ -85,8 +85,8 @@ Use the ``FormattingFunction`` class to fill in the definition of ``isSink``.
import semmle.code.cpp.security.Security
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isSink(DataFlow::Node sink) {
module TaintedFormatConfig implements DataFlow::ConfigSig {
predicate isSink(DataFlow::Node sink) {
/* Fill me in */
}
...
@@ -105,8 +105,8 @@ Use the ``FormattingFunction`` class, we can write the sink as:
import semmle.code.cpp.security.Security
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isSink(DataFlow::Node sink) {
module TaintedFormatConfig implements DataFlow::ConfigSig {
predicate isSink(DataFlow::Node sink) {
exists (FormattingFunction ff, Call c |
c.getTarget() = ff and
c.getArgument(ff.getFormatParameterIndex()) = sink.asExpr()
@@ -132,9 +132,8 @@ Add an additional taint step that (heuristically) taints a local variable if it
.. code-block:: ql
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isAdditionalTaintStep(DataFlow::Node pred,
DataFlow::Node succ) {
module TaintedFormatConfig implements DataFlow::ConfigSig {
predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) {
exists (Call c, Expr arg, LocalVariable lv |
arg = c.getAnArgument() and
arg = pred.asExpr() and
@@ -153,8 +152,8 @@ Add a sanitizer, stopping propagation at parameters of formatting functions, to
.. code-block:: ql
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isSanitizer(DataFlow::Node nd) {
module TaintedFormatConfig implements DataFlow::ConfigSig {
predicate isBarrier(DataFlow::Node nd) {
exists (FormattingFunction ff, int idx |
idx = ff.getFormatParameterIndex() and
nd = DataFlow::parameterNode(ff.getParameter(idx))

View File

@@ -71,7 +71,7 @@ Finding the RCE yourself
**Hint**: Use ``Method.getDeclaringType()`` and ``Type.getASupertype()``
#. Implement a ``DataFlow::Configuration``, defining the source as the first parameter of a ``toObject`` method, and the sink as an instance of ``UnsafeDeserializationSink``.
#. Implement a ``DataFlow::ConfigSig``, defining the source as the first parameter of a ``toObject`` method, and the sink as an instance of ``UnsafeDeserializationSink``.
**Hint**: Use ``Node::asParameter()``
@@ -114,13 +114,13 @@ Model answer, step 3
* Configuration that tracks the flow of taint from the first parameter of
* `ContentTypeHandler.toObject` to an instance of unsafe deserialization.
*/
class StrutsUnsafeDeserializationConfig extends Configuration {
StrutsUnsafeDeserializationConfig() { this = "StrutsUnsafeDeserializationConfig" }
override predicate isSource(Node source) {
module StrutsUnsafeDeserializationConfig implements ConfigSig {
predicate isSource(Node source) {
source.asParameter() = any(ContentTypeHandlerDeserialization des).getParameter(0)
}
override predicate isSink(Node sink) { sink instanceof UnsafeDeserializationSink }
predicate isSink(Node sink) { sink instanceof UnsafeDeserializationSink }
}
module StrutsUnsafeDeserializationFlow = Global<StrutsUnsafeDeserializationConfig>;
Model answer, step 4
====================
@@ -129,9 +129,8 @@ Model answer, step 4
import PathGraph
...
from PathNode source, PathNode sink, StrutsUnsafeDeserializationConfig conf
where conf.hasFlowPath(source, sink)
and sink.getNode() instanceof UnsafeDeserializationSink
select sink.getNode().(UnsafeDeserializationSink).getMethodAccess(), source, sink, "Unsafe deserialization of $@.", source, "user input"
from PathNode source, PathNode sink
where StrutsUnsafeDeserializationFlow::flowPath(source, sink)
select sink.getNode().(UnsafeDeserializationSink).getMethodAccess(), source, sink, "Unsafe deserialization of $@.", source, "user input"
More full-featured version: https://github.com/github/securitylab/tree/main/CodeQL_Queries/java/Apache_Struts_CVE-2017-9805

View File

@@ -78,12 +78,12 @@ We want to look for method calls where the method name is ``getNamespace()``, an
import semmle.code.java.security.Security
class TaintedOGNLConfig extends TaintTracking::Configuration {
override predicate isSource(DataFlow::Node source) {
module TaintedOGNLConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
exists(Method m |
m.getName() = "getNamespace" and
m.getDeclaringType().getName() = "ActionProxy" and
source.asExpr() = m.getAReference()
m.getName() = "getNamespace" and
m.getDeclaringType().getName() = "ActionProxy" and
source.asExpr() = m.getAReference()
)
}
...
@@ -105,8 +105,8 @@ Fill in the definition of ``isSink``.
import semmle.code.java.security.Security
class TaintedOGNLConfig extends TaintTracking::Configuration {
override predicate isSink(DataFlow::Node sink) {
module TaintedOGNLConfig implements DataFlow::ConfigSig {
predicate isSink(DataFlow::Node sink) {
/* Fill me in */
}
...
@@ -125,9 +125,9 @@ Find a method access to ``compileAndExecute``, and mark the first argument.
import semmle.code.java.security.Security
class TaintedOGNLConfig extends TaintTracking::Configuration {
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess ma |
module TaintedOGNLConfig implements DataFlow::ConfigSig {
predicate isSink(DataFlow::Node sink) {
exists(MethodAccess ma |
ma.getMethod().getName() = "compileAndExecute" and
ma.getArgument(0) = sink.asExpr()
)
@@ -148,8 +148,8 @@ A sanitizer allows us to *prevent* flow through a particular node in the graph.
.. code-block:: ql
class TaintedOGNLConfig extends TaintTracking::Configuration {
override predicate isSanitizer(DataFlow::Node nd) {
module TaintedOGNLConfig implements DataFlow::ConfigSig {
predicate isBarrier(DataFlow::Node nd) {
nd.getEnclosingCallable()
.getDeclaringType()
.getName() = "ValueStackShadowMap"
@@ -164,9 +164,8 @@ Add an additional taint step that (heuristically) taints a local variable if it
.. code-block:: ql
class TaintedOGNLConfig extends TaintTracking::Configuration {
override predicate isAdditionalTaintStep(DataFlow::Node node1,
DataFlow::Node node2) {
module TaintedOGNLConfig implements DataFlow::ConfigSig {
predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) {
exists(Field f, RefType t |
node1.asExpr() = f.getAnAssignedValue() and
node2.asExpr() = f.getAnAccess() and

View File

@@ -1,14 +1,14 @@
import cpp
import semmle.code.cpp.dataflow.TaintTracking
class TaintedFormatConfig extends TaintTracking::Configuration {
TaintedFormatConfig() { this = "TaintedFormatConfig" }
module TaintedFormatConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { /* TBD */ }
override predicate isSource(DataFlow::Node source) { /* TBD */ }
override predicate isSink(DataFlow::Node sink) { /* TBD */ }
predicate isSink(DataFlow::Node sink) { /* TBD */ }
}
from TaintedFormatConfig cfg, DataFlow::Node source, DataFlow::Node sink
where cfg.hasFlow(source, sink)
module TaintedFormatFlow = TaintTracking::Global<TaintedFormatConfig>;
from DataFlow::Node source, DataFlow::Node sink
where TaintedFormatFlow::flow(source, sink)
select sink, "This format string may be derived from a $@.", source, "user-controlled value"

View File

@@ -1,14 +1,14 @@
import java
import semmle.code.java.dataflow.TaintTracking
class TaintedOGNLConfig extends TaintTracking::Configuration {
TaintedOGNLConfig() { this = "TaintedOGNLConfig" }
module TaintedOGNLConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { /* TBD */ }
override predicate isSource(DataFlow::Node source) { /* TBD */ }
override predicate isSink(DataFlow::Node sink) { /* TBD */ }
predicate isSink(DataFlow::Node sink) { /* TBD */ }
}
from TaintedOGNLConfig cfg, DataFlow::Node source, DataFlow::Node sink
where cfg.hasFlow(source, sink)
module TaintedOGNLFlow = TaintTracking::Global<TaintedOGNLConfig>;
from DataFlow::Node source, DataFlow::Node sink
where TaintedOGNLFlow::flow(source, sink)
select source, "This untrusted input is evaluated as an OGNL expression $@.", sink, "here"

View File

@@ -34,18 +34,18 @@ Global taint tracking library
The ``semmle.code.<language>.dataflow.TaintTracking`` library provides a framework for implementing solvers for global taint tracking problems:
#. Subclass ``TaintTracking::Configuration`` following this template:
#. Implement ``DataFlow::ConfigSig`` and use ``TaintTracking::Global`` following this template:
.. code-block:: ql
class Config extends TaintTracking::Configuration {
Config() { this = "<some unique identifier>" }
override predicate isSource(DataFlow::Node nd) { ... }
override predicate isSink(DataFlow::Node nd) { ... }
module Config implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node nd) { ... }
predicate isSink(DataFlow::Node nd) { ... }
}
module Flow = TaintTracking::Global<Config>;
#. Use ``Config.hasFlow(source, sink)`` to find inter-procedural paths.
#. Use ``Flow::flow(source, sink)`` to find inter-procedural paths.
.. note::
In addition to the taint tracking configuration described here, there is also an equivalent *data flow* configuration in ``semmle.code.<language>.dataflow.DataFlow``, ``DataFlow::Configuration``. Data flow configurations are used to track whether the exact value produced by a source is used by a sink, whereas taint tracking configurations are used to determine whether the source may influence the value used at the sink. Whether you use taint tracking or data flow depends on the analysis problem you are trying to solve.
In addition to the taint tracking flow configuration described here, there is also an equivalent *data flow* in ``semmle.code.<language>.dataflow.DataFlow``, ``DataFlow::Global<DataFlow::ConfigSig>``. Data flow is used to track whether the exact value produced by a source is used by a sink, whereas taint tracking is used to determine whether the source may influence the value used at the sink. Whether you use taint tracking or data flow depends on the analysis problem you are trying to solve.

View File

@@ -13,12 +13,16 @@ Use this template:
*/
import semmle.code.<language>.dataflow.TaintTracking
import DataFlow::PathGraph
...
from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
module Flow = TaintTracking::Global<Configuration>;
import Flow::PathGraph
from Flow::PathNode source, Flow::PathNode sink
where Flow::flowPath(source, sink)
select sink, source, sink, "<message>"
.. note::
To see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to workwe need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
To see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to workwe need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).

View File

@@ -58,18 +58,22 @@ You should use the following template:
import <language>
// For some languages (Java/C++/Python/Swift) you need to explicitly import the data flow library, such as
// import semmle.code.java.dataflow.DataFlow or import codeql.swift.dataflow.DataFlow
import DataFlow::PathGraph
...
from MyConfiguration config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
module Flow = DataFlow::Global<MyConfiguration>;
import Flow::PathGraph
from Flow::PathNode source, Flow::PathNode sink
where Flow::flowPath(source, sink)
select sink.getNode(), source, sink, "<message>"
Where:
- ``DataFlow::Pathgraph`` is the path graph module you need to import from the standard CodeQL libraries.
- ``source`` and ``sink`` are nodes on the `path graph <https://en.wikipedia.org/wiki/Path_graph>`__, and ``DataFlow::PathNode`` is their type.
- ``MyConfiguration`` is a class containing the predicates which define how data may flow between the ``source`` and the ``sink``.
- ``MyConfiguration`` is a module containing the predicates that define how data may flow between the ``source`` and the ``sink``.
- ``Flow`` is the result of the data flow computation based on ``MyConfiguration``.
- ``Flow::Pathgraph`` is the resulting data flow graph module you need to import in order to include path explanations in the query.
- ``source`` and ``sink`` are nodes in the graph as defined in the configuration, and ``Flow::PathNode`` is their type.
- ``DataFlow::Global<..>`` is an invocation of data flow. ``TaintTracking::Global<..>`` can be used instead to include a default set of additional taint steps.
The following sections describe the main requirements for a valid path query.
@@ -83,14 +87,14 @@ The other metadata requirements depend on how you intend to run the query. For m
Generating path explanations
****************************
In order to generate path explanations, your query needs to compute a `path graph <https://en.wikipedia.org/wiki/Path_graph>`__.
In order to generate path explanations, your query needs to compute a graph.
To do this you need to define a :ref:`query predicate <query-predicates>` called ``edges`` in your query.
This predicate defines the edge relations of the graph you are computing, and it is used to compute the paths related to each result that your query generates.
You can import a predefined ``edges`` predicate from a path graph module in one of the standard data flow libraries. In addition to the path graph module, the data flow libraries contain the other ``classes``, ``predicates``, and ``modules`` that are commonly used in data flow analysis.
.. code-block:: ql
import DataFlow::PathGraph
import MyFlow::PathGraph
This statement imports the ``PathGraph`` module from the data flow library (``DataFlow.qll``), in which ``edges`` is defined.
@@ -106,7 +110,7 @@ You can also define your own ``edges`` predicate in the body of your query. It s
.. code-block:: ql
query predicate edges(PathNode a, PathNode b) {
/** Logical conditions which hold if `(a,b)` is an edge in the data flow graph */
/* Logical conditions which hold if `(a,b)` is an edge in the data flow graph */
}
For more examples of how to define an ``edges`` predicate, visit the `standard CodeQL libraries <https://codeql.github.com/codeql-standard-libraries>`__ and search for ``edges``.
@@ -117,14 +121,23 @@ Declaring sources and sinks
You must provide information about the ``source`` and ``sink`` in your path query. These are objects that correspond to the nodes of the paths that you are exploring.
The name and the type of the ``source`` and the ``sink`` must be declared in the ``from`` statement of the query, and the types must be compatible with the nodes of the graph computed by the ``edges`` predicate.
If you are querying C/C++, C#, Go, Java, JavaScript, Python, or Ruby code (and you have used ``import DataFlow::PathGraph`` in your query), the definitions of the ``source`` and ``sink`` are accessed via the ``Configuration`` class in the data flow library. You should declare all three of these objects in the ``from`` statement.
If you are querying C/C++, C#, Go, Java, JavaScript, Python, or Ruby code (and you have used ``import MyFlow::PathGraph`` in your query), the definitions of the ``source`` and ``sink`` are accessed via the module resulting from the application of the ``Global<..>`` module in the data flow library. You should declare both of these objects in the ``from`` statement.
For example:
.. code-block:: ql
from Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink
module MyFlow = DataFlow::Global<MyConfiguration>;
The configuration class is accessed by importing the data flow library. This class contains the predicates which define how data flow is treated in the query:
from MyFlow::PathNode source, MyFlow::PathNode sink
The configuration module must be defined to include definitions of sources and sinks. For example:
.. code-block:: ql
module MyConfiguration implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { ... }
predicate isSink(DataFlow::Node source) { ... }
}
- ``isSource()`` defines where data may flow from.
- ``isSink()`` defines where data may flow to.
@@ -141,11 +154,11 @@ This clause can use :ref:`aggregations <aggregations>`, :ref:`predicates <predic
When writing a path queries, you would typically include a predicate that holds only if data flows from the ``source`` to the ``sink``.
You can use the ``hasFlowPath`` predicate to specify flow from the ``source`` to the ``sink`` for a given ``Configuration``:
You can use the ``flowPath`` predicate to specify flow from the ``source`` to the ``sink`` for a given ``Configuration``:
.. code-block:: ql
where config.hasFlowPath(source, sink)
where MyFlow::flowPath(source, sink)
Select clause

View File

@@ -11,24 +11,24 @@ A typical data-flow query looks like this:
.. code-block:: ql
class MyConfig extends TaintTracking::Configuration {
MyConfig() { this = "MyConfig" }
module MyConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node node) { node instanceof MySource }
override predicate isSource(DataFlow::Node node) { node instanceof MySource }
override predicate isSink(DataFlow::Node node) { node instanceof MySink }
predicate isSink(DataFlow::Node node) { node instanceof MySink }
}
from MyConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
module MyFlow = TaintTracking::Global<MyConfig>;
from MyFlow::PathNode source, MyFlow::PathNode sink
where MyFlow::flowPath(source, sink)
select sink.getNode(), source, sink, "Sink is reached from $@.", source.getNode(), "here"
The same query can be slightly simplified by rewriting it without :ref:`path explanations <creating-path-queries>`:
.. code-block:: ql
from MyConfig config, DataFlow::Node source, DataFlow::Node sink
where config.hasPath(source, sink)
from DataFlow::Node source, DataFlow::Node sink
where MyFlow::flow(source, sink)
select sink, "Sink is reached from $@.", source.getNode(), "here"
If a data-flow query that you have written doesn't produce the results you expect it to, there may be a problem with your query.
@@ -48,7 +48,7 @@ Data-flow configurations contain a parameter called ``fieldFlowBranchLimit``. If
.. code-block:: ql
override int fieldFlowBranchLimit() { result = 5000 }
int fieldFlowBranchLimit() { result = 5000 }
If there are still no results and performance is still useable, then it is best to leave this set to a high value while doing further debugging.
@@ -57,7 +57,7 @@ Partial flow
A naive next step could be to change the sink definition to ``any()``. This would mean that we would get a lot of flow to all the places that are reachable from the sources. While this approach may work in some cases, you might find that it produces so many results that it's very hard to explore the findings. It can also dramatically affect query performance. More importantly, you might not even see all the partial flow paths. This is because the data-flow library tries very hard to prune impossible paths and, since field stores and reads must be evenly matched along a path, we will never see paths going through a store that fail to reach a corresponding read. This can make it hard to see where flow actually stops.
To avoid these problems, a data-flow ``Configuration`` comes with a mechanism for exploring partial flow that tries to deal with these caveats. This is the ``Configuration.hasPartialFlow`` predicate:
To avoid these problems, the data-flow library comes with a mechanism for exploring partial flow that tries to deal with these caveats. This is the ``MyFlow::FlowExploration<explorationLimit/0>::partialFlow`` predicate:
.. code-block:: ql
@@ -71,25 +71,23 @@ To avoid these problems, a data-flow ``Configuration`` comes with a mechanism fo
* perform poorly if the number of sources is too big and/or the exploration
* limit is set too high without using barriers.
*
* This predicate is disabled (has no results) by default. Override
* `explorationLimit()` with a suitable number to enable this predicate.
*
* To use this in a `path-problem` query, import the module `PartialPathGraph`.
*/
final predicate hasPartialFlow(PartialPathNode source, PartialPathNode node, int dist) {
predicate partialFlow(PartialPathNode source, PartialPathNode node, int dist) {
There is also a ``Configuration.hasPartialFlowRev`` for exploring flow backwards from a sink.
There is also a ``partialFlowRev`` for exploring flow backwards from a sink.
As noted in the documentation for ``hasPartialFlow`` (for example, in the
`CodeQL for Java documentation <https://codeql.github.com/codeql-standard-libraries/java/semmle/code/java/dataflow/internal/DataFlowImpl2.qll/predicate.DataFlowImpl2$Configuration$hasPartialFlow.3.html>`__) you must first enable this by adding an override of ``explorationLimit``. For example:
To get access to these predicates you must instantiate the ``MyFlow::FlowExploration<>`` module with an exploration limit. For example:
.. code-block:: ql
override int explorationLimit() { result = 5 }
int explorationLimit() { result = 5 }
This defines the exploration radius within which ``hasPartialFlow`` returns results.
module MyPartialFlow = MyFlow::FlowExploration<explorationLimit/0>;
To get good performance when using ``hasPartialFlow`` it is important to ensure the ``isSink`` predicate of the configuration has no results. Likewise, when using ``hasPartialFlowRev`` the ``isSource`` predicate of the configuration should have no results.
This defines the exploration radius within which ``partialFlow`` returns results.
To get good performance when using ``partialFlow`` it is important to ensure the ``isSink`` predicate of the configuration has no results. Likewise, when using ``partialFlowRev`` the ``isSource`` predicate of the configuration should have no results.
It is also useful to focus on a single source at a time as the starting point for the flow exploration. This is most easily done by adding a temporary restriction in the ``isSource`` predicate.
@@ -97,9 +95,9 @@ To do quick evaluations of partial flow it is often easiest to add a predicate t
.. code-block:: ql
predicate adhocPartialFlow(Callable c, PartialPathNode n, Node src, int dist) {
exists(MyConfig conf, PartialPathNode source |
conf.hasPartialFlow(source, n, dist) and
predicate adhocPartialFlow(Callable c, MyPartialFlow::PartialPathNode n, Node src, int dist) {
exists(MyPartialFlow::PartialPathNode source |
MyPartialFlow::partialFlow(source, n, dist) and
src = source.getNode() and
c = n.getNode().getEnclosingCallable()
)
@@ -111,7 +109,7 @@ If you are focusing on a single source then the ``src`` column is superfluous. Y
If you see a large number of partial flow results, you can focus them in a couple of ways:
- If flow travels a long distance following an expected path, that can result in a lot of uninteresting flow being included in the exploration radius. To reduce the amount of uninteresting flow, you can replace the source definition with a suitable ``node`` that appears along the path and restart the partial flow exploration from that point.
- Creative use of barriers and sanitizers can be used to cut off flow paths that are uninteresting. This also reduces the number of partial flow results to explore while debugging.
- Creative use of barriers can be used to cut off flow paths that are uninteresting. This also reduces the number of partial flow results to explore while debugging.
Further reading
----------------

View File

@@ -72,14 +72,12 @@ Importing new files can modify the behaviour of the standard library, by introdu
Therefore, unless you have good reason not to, you should ensure that all subclasses are included when the base-class is (to the extent possible).
One example where this _does not_ apply: `DataFlow::Configuration` and its variants are meant to be subclassed, but we generally do not want to import all configurations into the same scope at once.
## Abstract classes as open or closed unions
A class declared as `abstract` in QL represents a union of its direct subtypes (restricted by the intersections of its supertypes and subject to its characteristic predicate). Depending on context, we may want this union to be considered "open" or "closed".
An open union is generally used for extensibility. For example, the abstract classes suggested by the `::Range` design pattern are explicitly intended as extension hooks. As another example, the `DataFlow::Configuration` design pattern provides an abstract class that is intended to be subclassed as a configuration mechanism.
An open union is generally used for extensibility. For example, the abstract classes suggested by the `::Range` design pattern are explicitly intended as extension hooks.
A closed union is a class for which we do not expect users of the library to add more values. Historically, we have occasionally modelled this as `abstract` classes in QL, but these days that would be considered an anti-pattern: Abstract classes that are intended to be closed behave in surprising ways when subclassed by library users, and importing libraries that include derived classes can invalidate compilation caches and subvert the meaning of the program.