mirror of
https://github.com/github/codeql.git
synced 2025-12-16 16:53:25 +01:00
Merge pull request #18467 from github/js/shared-dataflow-branch
JS: Migrate to shared data flow library (targeting main!) 🚀
This commit is contained in:
@@ -204,58 +204,45 @@ data flow solver that can check whether there is (global) data flow from a sourc
|
||||
Optionally, configurations may specify extra data flow edges to be added to the data flow graph, and may also specify `barriers`. Barriers are data flow nodes or edges through
|
||||
which data should not be tracked for the purposes of this analysis.
|
||||
|
||||
To define a configuration, extend the class ``DataFlow::Configuration`` as follows:
|
||||
To define a configuration, add a module that implements the signature ``DataFlow::ConfigSig`` and pass it to ``DataFlow::Global`` as follows:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class MyDataFlowConfiguration extends DataFlow::Configuration {
|
||||
MyDataFlowConfiguration() { this = "MyDataFlowConfiguration" }
|
||||
module MyAnalysisConfig implements DataFlow::ConfigSig {
|
||||
predicate isSource(DataFlow::Node source) { /* ... */ }
|
||||
|
||||
override predicate isSource(DataFlow::Node source) { /* ... */ }
|
||||
predicate isSink(DataFlow::Node sink) { /* ... */ }
|
||||
|
||||
override predicate isSink(DataFlow::Node sink) { /* ... */ }
|
||||
|
||||
// optional overrides:
|
||||
override predicate isBarrier(DataFlow::Node nd) { /* ... */ }
|
||||
override predicate isBarrierEdge(DataFlow::Node pred, DataFlow::Node succ) { /* ... */ }
|
||||
override predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) { /* ... */ }
|
||||
// optional predicates:
|
||||
predicate isBarrier(DataFlow::Node nd) { /* ... */ }
|
||||
predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) { /* ... */ }
|
||||
}
|
||||
|
||||
The characteristic predicate ``MyDataFlowConfiguration()`` defines the name of the configuration, so ``"MyDataFlowConfiguration"`` should be replaced by a suitable
|
||||
name describing your particular analysis configuration.
|
||||
module MyAnalysisFlow = DataFlow::Global<MyAnalysisConfig>
|
||||
|
||||
The data flow analysis is performed using the predicate ``hasFlow(source, sink)``:
|
||||
The data flow analysis is performed using the predicate ``MyAnalysisFlow::flow(source, sink)``:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
from MyDataFlowConfiguration dataflow, DataFlow::Node source, DataFlow::Node sink
|
||||
where dataflow.hasFlow(source, sink)
|
||||
from DataFlow::Node source, DataFlow::Node sink
|
||||
where MyAnalysisFlow::flow(source, sink)
|
||||
select source, "Data flow from $@ to $@.", source, source.toString(), sink, sink.toString()
|
||||
|
||||
Using global taint tracking
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Global taint tracking extends global data flow with additional non-value-preserving steps, such as flow through string-manipulating operations. To use it, simply extend
|
||||
``TaintTracking::Configuration`` instead of ``DataFlow::Configuration``:
|
||||
Global taint tracking extends global data flow with additional non-value-preserving steps, such as flow through string-manipulating operations. To use it, simply
|
||||
use ``TaintTracking::Global<...>`` instead of ``DataFlow::Global<...>``:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class MyTaintTrackingConfiguration extends TaintTracking::Configuration {
|
||||
MyTaintTrackingConfiguration() { this = "MyTaintTrackingConfiguration" }
|
||||
|
||||
override predicate isSource(DataFlow::Node source) { /* ... */ }
|
||||
|
||||
override predicate isSink(DataFlow::Node sink) { /* ... */ }
|
||||
module MyAnalysisConfig implements DataFlow::ConfigSig {
|
||||
/* ... */
|
||||
}
|
||||
|
||||
Analogous to ``isAdditionalFlowStep``, there is a predicate ``isAdditionalTaintStep`` that you can override to specify custom flow steps to consider in the analysis.
|
||||
Instead of the ``isBarrier`` and ``isBarrierEdge`` predicates, the taint tracking configuration includes ``isSanitizer`` and ``isSanitizerEdge`` predicates that specify
|
||||
data flow nodes or edges that act as taint sanitizers and hence stop flow from a source to a sink.
|
||||
module MyAnalysisFlow = TaintTracking::Global<MyAnalysisConfig>
|
||||
|
||||
Similar to global data flow, the characteristic predicate ``MyTaintTrackingConfiguration()`` defines the unique name of the configuration, so ``"MyTaintTrackingConfiguration"``
|
||||
should be replaced by an appropriate descriptive name.
|
||||
|
||||
The taint tracking analysis is again performed using the predicate ``hasFlow(source, sink)``.
|
||||
The taint tracking analysis is again performed using the predicate ``MyAnalysisFlow::flow(source, sink)``.
|
||||
|
||||
Examples
|
||||
~~~~~~~~
|
||||
@@ -267,20 +254,20 @@ time using global taint tracking.
|
||||
|
||||
import javascript
|
||||
|
||||
class CommandLineFileNameConfiguration extends TaintTracking::Configuration {
|
||||
CommandLineFileNameConfiguration() { this = "CommandLineFileNameConfiguration" }
|
||||
|
||||
override predicate isSource(DataFlow::Node source) {
|
||||
module CommandLineFileNameConfig implements DataFlow::ConfigSig {
|
||||
predicate isSource(DataFlow::Node source) {
|
||||
DataFlow::globalVarRef("process").getAPropertyRead("argv").getAPropertyRead() = source
|
||||
}
|
||||
|
||||
override predicate isSink(DataFlow::Node sink) {
|
||||
predicate isSink(DataFlow::Node sink) {
|
||||
DataFlow::moduleMember("fs", "readFile").getACall().getArgument(0) = sink
|
||||
}
|
||||
}
|
||||
|
||||
from CommandLineFileNameConfiguration cfg, DataFlow::Node source, DataFlow::Node sink
|
||||
where cfg.hasFlow(source, sink)
|
||||
module CommandLineFileNameFlow = TaintTracking::Global<CommandLineFileNameConfig>;
|
||||
|
||||
from DataFlow::Node source, DataFlow::Node sink
|
||||
where CommandLineFileNameFlow::flow(source, sink)
|
||||
select source, sink
|
||||
|
||||
This query will now find flows that involve inter-procedural steps, like in the following example (where the individual steps have been marked with comments
|
||||
@@ -325,15 +312,15 @@ with an error if it does not. We could then use that function in ``readFileHelpe
|
||||
}
|
||||
|
||||
For the purposes of our above analysis, ``checkPath`` is a `sanitizer`: its output is always untainted, even if its input is tainted. To model this
|
||||
we can add an override of ``isSanitizer`` to our taint-tracking configuration like this:
|
||||
we can add an ``isBarrier`` predicate to our taint-tracking configuration like this:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class CommandLineFileNameConfiguration extends TaintTracking::Configuration {
|
||||
module CommandLineFileNameConfig implements DataFlow::ConfigSig {
|
||||
|
||||
// ...
|
||||
|
||||
override predicate isSanitizer(DataFlow::Node nd) {
|
||||
predicate isBarrier(DataFlow::Node nd) {
|
||||
nd.(DataFlow::CallNode).getCalleeName() = "checkPath"
|
||||
}
|
||||
}
|
||||
@@ -359,36 +346,36 @@ Note that ``checkPath`` is now no longer a sanitizer in the sense described abov
|
||||
through ``checkPath`` any more. The flow is, however, `guarded` by ``checkPath`` in the sense that the expression ``checkPath(p)`` has to evaluate
|
||||
to ``true`` (or, more precisely, to a truthy value) in order for the flow to happen.
|
||||
|
||||
Such sanitizer guards can be supported by defining a new subclass of ``TaintTracking::SanitizerGuardNode`` and overriding the predicate
|
||||
``isSanitizerGuard`` in the taint-tracking configuration class to add all instances of this class as sanitizer guards to the configuration.
|
||||
Such sanitizer guards can be supported by defining a class with a ``blocksExpr`` predicate and using the `DataFlow::MakeBarrierGuard`` module
|
||||
to implement the ``isBarrier`` predicate.
|
||||
|
||||
For our above example, we would begin by defining a subclass of ``SanitizerGuardNode`` that identifies guards of the form ``checkPath(...)``:
|
||||
For our above example, we would begin by defining a subclass of ``DataFlow::CallNode`` that identifies guards of the form ``checkPath(...)``:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class CheckPathSanitizerGuard extends TaintTracking::SanitizerGuardNode, DataFlow::CallNode {
|
||||
class CheckPathSanitizerGuard extends DataFlow::CallNode {
|
||||
CheckPathSanitizerGuard() { this.getCalleeName() = "checkPath" }
|
||||
|
||||
override predicate sanitizes(boolean outcome, Expr e) {
|
||||
predicate blocksExpr(boolean outcome, Expr e) {
|
||||
outcome = true and
|
||||
e = getArgument(0).asExpr()
|
||||
e = this.getArgument(0).asExpr()
|
||||
}
|
||||
}
|
||||
|
||||
The characteristic predicate of this class checks that the sanitizer guard is a call to a function named ``checkPath``. The overriding definition
|
||||
of ``sanitizes`` says such a call sanitizes its first argument (that is, ``getArgument(0)``) if it evaluates to ``true`` (or rather, a truthy
|
||||
The characteristic predicate of this class checks that the sanitizer guard is a call to a function named ``checkPath``. The definition
|
||||
of ``blocksExpr`` says such a call sanitizes its first argument (that is, ``getArgument(0)``) if it evaluates to ``true`` (or rather, a truthy
|
||||
value).
|
||||
|
||||
Now we can override ``isSanitizerGuard`` to add these sanitizer guards to our configuration:
|
||||
Now we can implement ``isBarrier`` to add this sanitizer guard to our configuration:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class CommandLineFileNameConfiguration extends TaintTracking::Configuration {
|
||||
module CommandLineFileNameConfig implements DataFlow::ConfigSig {
|
||||
|
||||
// ...
|
||||
|
||||
override predicate isSanitizerGuard(TaintTracking::SanitizerGuardNode nd) {
|
||||
nd instanceof CheckPathSanitizerGuard
|
||||
predicate isBarrier(DataFlow::Node node) {
|
||||
node = DataFlow::MakeBarrierGuard<CheckPathSanitizerGuard>::getABarrierNode()
|
||||
}
|
||||
}
|
||||
|
||||
@@ -399,7 +386,7 @@ reach there if ``checkPath(p)`` evaluates to a truthy value. Consequently, there
|
||||
Additional taint steps
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Sometimes the default data flow and taint steps provided by ``DataFlow::Configuration`` and ``TaintTracking::Configuration`` are not sufficient
|
||||
Sometimes the default data flow and taint steps provided by the data flow library are not sufficient
|
||||
and we need to add additional flow or taint steps to our configuration to make it find the expected flow. For example, this can happen because
|
||||
the analyzed program uses a function from an external library whose source code is not available to the analysis, or because it uses a function
|
||||
that is too difficult to analyze.
|
||||
@@ -420,20 +407,20 @@ to resolve any symlinks in the path ``p`` before passing it to ``readFile``:
|
||||
Resolving symlinks does not make an unsafe path any safer, so we would still like our query to flag this, but since the standard library does
|
||||
not have a model of ``resolve-symlinks`` it will no longer return any results.
|
||||
|
||||
We can fix this quite easily by adding an overriding definition of the ``isAdditionalTaintStep`` predicate to our configuration, introducing an
|
||||
We can fix this quite easily by adding a definition of the ``isAdditionalFlowStep`` predicate to our configuration, introducing an
|
||||
additional taint step from the first argument of ``resolveSymlinks`` to its result:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class CommandLineFileNameConfiguration extends TaintTracking::Configuration {
|
||||
module CommandLineFileNameConfig implements DataFlow::ConfigSig {
|
||||
|
||||
// ...
|
||||
|
||||
override predicate isAdditionalTaintStep(DataFlow::Node pred, DataFlow::Node succ) {
|
||||
predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) {
|
||||
exists(DataFlow::CallNode c |
|
||||
c = DataFlow::moduleImport("resolve-symlinks").getACall() and
|
||||
pred = c.getArgument(0) and
|
||||
succ = c
|
||||
node1 = c.getArgument(0) and
|
||||
node2 = c
|
||||
)
|
||||
}
|
||||
}
|
||||
@@ -444,11 +431,11 @@ to wrap it in a new subclass of ``TaintTracking::SharedTaintStep`` like this:
|
||||
.. code-block:: ql
|
||||
|
||||
class StepThroughResolveSymlinks extends TaintTracking::SharedTaintStep {
|
||||
override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
|
||||
override predicate step(DataFlow::Node node1, DataFlow::Node node2) {
|
||||
exists(DataFlow::CallNode c |
|
||||
c = DataFlow::moduleImport("resolve-symlinks").getACall() and
|
||||
pred = c.getArgument(0) and
|
||||
succ = c
|
||||
node1 = c.getArgument(0) and
|
||||
node2 = c
|
||||
)
|
||||
}
|
||||
}
|
||||
@@ -494,18 +481,18 @@ Exercise 2
|
||||
|
||||
import javascript
|
||||
|
||||
class HardCodedTagNameConfiguration extends DataFlow::Configuration {
|
||||
HardCodedTagNameConfiguration() { this = "HardCodedTagNameConfiguration" }
|
||||
module HardCodedTagNameConfig implements DataFlow::ConfigSig {
|
||||
predicate isSource(DataFlow::Node source) { source.asExpr() instanceof ConstantString }
|
||||
|
||||
override predicate isSource(DataFlow::Node source) { source.asExpr() instanceof ConstantString }
|
||||
|
||||
override predicate isSink(DataFlow::Node sink) {
|
||||
predicate isSink(DataFlow::Node sink) {
|
||||
sink = DataFlow::globalVarRef("document").getAMethodCall("createElement").getArgument(0)
|
||||
}
|
||||
}
|
||||
|
||||
from HardCodedTagNameConfiguration cfg, DataFlow::Node source, DataFlow::Node sink
|
||||
where cfg.hasFlow(source, sink)
|
||||
module HardCodedTagNameFlow = DataFlow::Global<HardCodedTagNameConfig>;
|
||||
|
||||
from DataFlow::Node source, DataFlow::Node sink
|
||||
where HardCodedTagNameFlow::flow(source, sink)
|
||||
select source, sink
|
||||
|
||||
Exercise 3
|
||||
@@ -540,18 +527,18 @@ Exercise 4
|
||||
}
|
||||
}
|
||||
|
||||
class HardCodedTagNameConfiguration extends DataFlow::Configuration {
|
||||
HardCodedTagNameConfiguration() { this = "HardCodedTagNameConfiguration" }
|
||||
module HardCodedTagNameConfig implements DataFlow::ConfigSig {
|
||||
predicate isSource(DataFlow::Node source) { source instanceof ArrayEntryCallResult }
|
||||
|
||||
override predicate isSource(DataFlow::Node source) { source instanceof ArrayEntryCallResult }
|
||||
|
||||
override predicate isSink(DataFlow::Node sink) {
|
||||
predicate isSink(DataFlow::Node sink) {
|
||||
sink = DataFlow::globalVarRef("document").getAMethodCall("createElement").getArgument(0)
|
||||
}
|
||||
}
|
||||
|
||||
from HardCodedTagNameConfiguration cfg, DataFlow::Node source, DataFlow::Node sink
|
||||
where cfg.hasFlow(source, sink)
|
||||
module HardCodedTagNameFlow = DataFlow::Global<HardCodedTagNameConfig>;
|
||||
|
||||
from DataFlow::Node source, DataFlow::Node sink
|
||||
where HardCodedTagNameFlow::flow(source, sink)
|
||||
select source, sink
|
||||
|
||||
Further reading
|
||||
|
||||
@@ -18,6 +18,7 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat
|
||||
abstract-syntax-tree-classes-for-working-with-javascript-and-typescript-programs
|
||||
data-flow-cheat-sheet-for-javascript
|
||||
customizing-library-models-for-javascript
|
||||
migrating-javascript-dataflow-queries
|
||||
|
||||
- :doc:`Basic query for JavaScript and TypeScript code <basic-query-for-javascript-code>`: Learn to write and run a simple CodeQL query.
|
||||
|
||||
@@ -37,4 +38,6 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat
|
||||
|
||||
- :doc:`Data flow cheat sheet for JavaScript <data-flow-cheat-sheet-for-javascript>`: This article describes parts of the JavaScript libraries commonly used for variant analysis and in data flow queries.
|
||||
|
||||
- :doc:`Customizing library models for JavaScript <customizing-library-models-for-javascript>`: You can model frameworks and libraries that your codebase depends on using data extensions and publish them as CodeQL model packs.
|
||||
- :doc:`Customizing library models for JavaScript <customizing-library-models-for-javascript>`: You can model frameworks and libraries that your codebase depends on using data extensions and publish them as CodeQL model packs.
|
||||
|
||||
- :doc:`Migrating JavaScript dataflow queries <migrating-javascript-dataflow-queries>`: Guide on migrating data flow queries to the new data flow library.
|
||||
|
||||
@@ -700,19 +700,16 @@ The data flow graph-based analyses described so far are all intraprocedural: the
|
||||
|
||||
We distinguish here between data flow proper, and *taint tracking*: the latter not only considers value-preserving flow (such as from variable definitions to uses), but also cases where one value influences ("taints") another without determining it entirely. For example, in the assignment ``s2 = s1.substring(i)``, the value of ``s1`` influences the value of ``s2``, because ``s2`` is assigned a substring of ``s1``. In general, ``s2`` will not be assigned ``s1`` itself, so there is no data flow from ``s1`` to ``s2``, but ``s1`` still taints ``s2``.
|
||||
|
||||
It is a common pattern that we wish to specify data flow or taint analysis in terms of its *sources* (where flow starts), *sinks* (where it should be tracked), and *barriers* or *sanitizers* (where flow is interrupted). Sanitizers they are very common in security analyses: for example, an analysis that tracks the flow of untrusted user input into, say, a SQL query has to keep track of code that validates the input, thereby making it safe to use. Such a validation step is an example of a sanitizer.
|
||||
It is a common pattern that we wish to specify data flow or taint analysis in terms of its *sources* (where flow starts), *sinks* (where it should be tracked), and *barriers* (also called *sanitizers*) where flow is interrupted. Sanitizers they are very common in security analyses: for example, an analysis that tracks the flow of untrusted user input into, say, a SQL query has to keep track of code that validates the input, thereby making it safe to use. Such a validation step is an example of a sanitizer.
|
||||
|
||||
The classes ``DataFlow::Configuration`` and ``TaintTracking::Configuration`` allow specifying a data flow or taint analysis, respectively, by overriding the following predicates:
|
||||
A module implementing the signature `DataFlow::ConfigSig` may specify a data flow or taint analysis by implementing the following predicates:
|
||||
|
||||
- ``isSource(DataFlow::Node nd)`` selects all nodes ``nd`` from where flow tracking starts.
|
||||
- ``isSink(DataFlow::Node nd)`` selects all nodes ``nd`` to which the flow is tracked.
|
||||
- ``isBarrier(DataFlow::Node nd)`` selects all nodes ``nd`` that act as a barrier for data flow; ``isSanitizer`` is the corresponding predicate for taint tracking configurations.
|
||||
- ``isBarrierEdge(DataFlow::Node src, DataFlow::Node trg)`` is a variant of ``isBarrier(nd)`` that allows specifying barrier *edges* in addition to barrier nodes; again, ``isSanitizerEdge`` is the corresponding predicate for taint tracking;
|
||||
- ``isAdditionalFlowStep(DataFlow::Node src, DataFlow::Node trg)`` allows specifying custom additional flow steps for this analysis; ``isAdditionalTaintStep`` is the corresponding predicate for taint tracking configurations.
|
||||
- ``isBarrier(DataFlow::Node nd)`` selects all nodes ``nd`` that act as a barrier/sanitizer for data flow.
|
||||
- ``isAdditionalFlowStep(DataFlow::Node src, DataFlow::Node trg)`` allows specifying custom additional flow steps for this analysis.
|
||||
|
||||
Since for technical reasons both ``Configuration`` classes are subtypes of ``string``, you have to choose a unique name for each flow configuration and equate ``this`` with it in the characteristic predicate (as in the example below).
|
||||
|
||||
The predicate ``Configuration.hasFlow`` performs the actual flow tracking, starting at a source and looking for flow to a sink that does not pass through a barrier node or edge.
|
||||
Such a module can be passed to ``DataFlow::Global<...>``. This will produce a module with a ``flow`` predicate that performs the actual flow tracking, starting at a source and looking for flow to a sink that does not pass through a barrier node.
|
||||
|
||||
For example, suppose that we are developing an analysis to find hard-coded passwords. We might write a simple query that looks for string constants flowing into variables named ``"password"``.
|
||||
|
||||
@@ -720,35 +717,27 @@ For example, suppose that we are developing an analysis to find hard-coded passw
|
||||
|
||||
import javascript
|
||||
|
||||
class PasswordTracker extends DataFlow::Configuration {
|
||||
PasswordTracker() {
|
||||
// unique identifier for this configuration
|
||||
this = "PasswordTracker"
|
||||
}
|
||||
module PasswordConfig implements DataFlow::ConfigSig {
|
||||
predicate isSource(DataFlow::Node nd) { nd.asExpr() instanceof StringLiteral }
|
||||
|
||||
override predicate isSource(DataFlow::Node nd) {
|
||||
nd.asExpr() instanceof StringLiteral
|
||||
}
|
||||
|
||||
override predicate isSink(DataFlow::Node nd) {
|
||||
passwordVarAssign(_, nd)
|
||||
}
|
||||
|
||||
predicate passwordVarAssign(Variable v, DataFlow::Node nd) {
|
||||
v.getAnAssignedExpr() = nd.asExpr() and
|
||||
v.getName().toLowerCase() = "password"
|
||||
}
|
||||
predicate isSink(DataFlow::Node nd) { passwordVarAssign(_, nd) }
|
||||
}
|
||||
|
||||
Now we can rephrase our query to use ``Configuration.hasFlow``:
|
||||
predicate passwordVarAssign(Variable v, DataFlow::Node nd) {
|
||||
v.getAnAssignedExpr() = nd.asExpr() and
|
||||
v.getName().toLowerCase() = "password"
|
||||
}
|
||||
|
||||
module PasswordFlow = DataFlow::Global<PasswordConfig>;
|
||||
|
||||
Now we can rephrase our query to use ``PasswordFlow::flow``:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
from PasswordTracker pt, DataFlow::Node source, DataFlow::Node sink, Variable v
|
||||
where pt.hasFlow(source, sink) and pt.passwordVarAssign(v, sink)
|
||||
from DataFlow::Node source, DataFlow::Node sink, Variable v
|
||||
where PasswordFlow::flow(_, sink) and passwordVarAssign(v, sink)
|
||||
select sink, "Password variable " + v + " is assigned a constant string."
|
||||
|
||||
|
||||
Syntax errors
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -16,18 +16,17 @@ Use the following template to create a taint tracking path query:
|
||||
* @kind path-problem
|
||||
*/
|
||||
import javascript
|
||||
import DataFlow
|
||||
import DataFlow::PathGraph
|
||||
|
||||
class MyConfig extends TaintTracking::Configuration {
|
||||
MyConfig() { this = "MyConfig" }
|
||||
override predicate isSource(Node node) { ... }
|
||||
override predicate isSink(Node node) { ... }
|
||||
override predicate isAdditionalTaintStep(Node pred, Node succ) { ... }
|
||||
module MyConfig implements DataFlow::ConfigSig {
|
||||
predicate isSource(DataFlow::Node node) { ... }
|
||||
predicate isSink(DataFlow::Node node) { ... }
|
||||
predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) { ... }
|
||||
}
|
||||
|
||||
from MyConfig cfg, PathNode source, PathNode sink
|
||||
where cfg.hasFlowPath(source, sink)
|
||||
module MyFlow = TaintTracking::Global<MyConfig>;
|
||||
|
||||
from MyFlow::PathNode source, MyFlow::PathNode sink
|
||||
where MyFlow::flowPath(source, sink)
|
||||
select sink.getNode(), source, sink, "taint from $@.", source.getNode(), "here"
|
||||
|
||||
This query reports flow paths which:
|
||||
|
||||
@@ -0,0 +1,301 @@
|
||||
.. _migrating-javascript-dataflow-queries:
|
||||
|
||||
Migrating JavaScript Dataflow Queries
|
||||
=====================================
|
||||
|
||||
The JavaScript analysis used to have its own data flow library, which differed from the shared data flow
|
||||
library used by other languages. This library has now been deprecated in favor of the shared library.
|
||||
|
||||
This article explains how to migrate JavaScript data flow queries to use the shared data flow library,
|
||||
and some important differences to be aware of. Note that the article on :ref:`analyzing data flow in JavaScript and TypeScript <analyzing-data-flow-in-javascript-and-typescript>`
|
||||
provides a general guide to the new data flow library, whereas this article aims to help with migrating existing queries from the old data flow library.
|
||||
|
||||
Note that the ``DataFlow::Configuration`` class is still backed by the original data flow library, but has been marked as deprecated.
|
||||
This means data flow queries using this class will continue to work, albeit with deprecation warnings, until the 1-year deprecation period expires in early 2026.
|
||||
It is recommended that all custom queries are migrated before this time, to ensure they continue to work in the future.
|
||||
|
||||
Data flow queries should be migrated to use ``DataFlow::ConfigSig``-style modules instead of the ``DataFlow::Configuration`` class.
|
||||
This is identical to the interface found in other languages.
|
||||
When making this switch, the query will become backed by the shared data flow library instead. That is, data flow queries will only work
|
||||
with the shared data flow library when they have been migrated to ``ConfigSig``-style, as shown in the following table:
|
||||
|
||||
.. list-table:: Data flow libraries
|
||||
:widths: 20 80
|
||||
:header-rows: 1
|
||||
|
||||
* - API
|
||||
- Implementation
|
||||
* - ``DataFlow::Configuration``
|
||||
- Old library (deprecated, to be removed in early 2026)
|
||||
* - ``DataFlow::ConfigSig``
|
||||
- Shared library
|
||||
|
||||
A straightforward translation to ``DataFlow::ConfigSig``-style is usually possible, although there are some complications
|
||||
that may cause the query to behave differently.
|
||||
We'll first cover some straightforward migration examples, and then go over some of the complications that may arise.
|
||||
|
||||
Simple migration example
|
||||
------------------------
|
||||
|
||||
A simple example of a query using the old data flow library is shown below:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
/** @kind path-problem */
|
||||
import javascript
|
||||
import DataFlow::PathGraph
|
||||
|
||||
class MyConfig extends DataFlow::Configuration {
|
||||
MyConfig() { this = "MyConfig" }
|
||||
|
||||
override predicate isSource(DataFlow::Node node) { ... }
|
||||
|
||||
override predicate isSink(DataFlow::Node node) { ... }
|
||||
}
|
||||
|
||||
from MyConfig cfg, DataFlow::PathNode source, DataFlow::PathNode sink
|
||||
where cfg.hasFlowPath(source, sink)
|
||||
select sink, source, sink, "Flow found"
|
||||
|
||||
With the new style this would look like this:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
/** @kind path-problem */
|
||||
import javascript
|
||||
|
||||
module MyConfig implements DataFlow::ConfigSig {
|
||||
predicate isSource(DataFlow::Node node) { ... }
|
||||
|
||||
predicate isSink(DataFlow::Node node) { ... }
|
||||
}
|
||||
|
||||
module MyFlow = DataFlow::Global<MyConfig>;
|
||||
|
||||
import MyFlow::PathGraph
|
||||
|
||||
from MyFlow::PathNode source, MyFlow::PathNode sink
|
||||
where MyFlow::flowPath(source, sink)
|
||||
select sink, source, sink, "Flow found"
|
||||
|
||||
The changes can be summarized as:
|
||||
|
||||
- The ``DataFlow::Configuration`` class was replaced with a module implementing ``DataFlow::ConfigSig``.
|
||||
- The characteristic predicate was removed (modules have no characteristic predicates).
|
||||
- Predicates such as ``isSource`` no longer have the ``override`` keyword (as they are defined in a module now).
|
||||
- The configuration module is being passed to ``DataFlow::Global``, resulting in a new module, called ``MyFlow`` in this example.
|
||||
- The query imports ``MyFlow::PathGraph`` instead of ``DataFlow::PathGraph``.
|
||||
- The ``MyConfig cfg`` variable was removed from the ``from`` clause.
|
||||
- The ``hasFlowPath`` call was replaced with ``MyFlow::flowPath``.
|
||||
- The type ``DataFlow::PathNode`` was replaced with ``MyFlow::PathNode``.
|
||||
|
||||
With these changes, we have produced an equivalent query that is backed by the new data flow library.
|
||||
|
||||
Taint tracking
|
||||
--------------
|
||||
|
||||
For configuration classes extending ``TaintTracking::Configuration``, the migration is similar but with a few differences:
|
||||
|
||||
- The ``TaintTracking::Global`` module should be used instead of ``DataFlow::Global``.
|
||||
- Some predicates originating from ``TaintTracking::Configuration`` should be renamed to match the ``DataFlow::ConfigSig`` interface:
|
||||
- ``isSanitizer`` should be renamed to ``isBarrier``.
|
||||
- ``isAdditionalTaintStep`` should be renamed to ``isAdditionalFlowStep``.
|
||||
|
||||
Note that there is no such thing as ``TaintTracking::ConfigSig``. The ``DataFlow::ConfigSig`` interface is used for both data flow and taint tracking.
|
||||
|
||||
For example:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class MyConfig extends TaintTracking::Configuration {
|
||||
MyConfig() { this = "MyConfig" }
|
||||
|
||||
predicate isSanitizer(DataFlow::Node node) { ... }
|
||||
predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) { ... }
|
||||
...
|
||||
}
|
||||
|
||||
The above configuration can be migrated to the shared data flow library as follows:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
module MyConfig implements DataFlow::ConfigSig {
|
||||
predicate isBarrier(DataFlow::Node node) { ... }
|
||||
predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) { ... }
|
||||
...
|
||||
}
|
||||
|
||||
module MyFlow = TaintTracking::Global<MyConfig>;
|
||||
|
||||
|
||||
Flow labels and flow states
|
||||
---------------------------
|
||||
|
||||
The ``DataFlow::FlowLabel`` class has been deprecated. Queries that relied on flow labels should use the new `flow state` concept instead.
|
||||
This is done by implementing ``DataFlow::StateConfigSig`` instead of ``DataFlow::ConfigSig``, and passing the module to ``DataFlow::GlobalWithState``
|
||||
or ``TaintTracking::GlobalWithState``. See :ref:`using flow state <using-flow-labels-for-precise-data-flow-analysis>` for more details about flow state.
|
||||
|
||||
Some changes to be aware of:
|
||||
|
||||
- The 4-argument version of ``isAdditionalFlowStep`` now takes parameters in a different order.
|
||||
It now takes ``node1, state1, node2, state2`` instead of ``node1, node2, state1, state2``.
|
||||
- Taint steps apply to all flow states, not just the ``taint`` flow label. See more details further down in this article.
|
||||
|
||||
Barrier guards
|
||||
--------------
|
||||
|
||||
The predicates ``isBarrierGuard`` and ``isSanitizerGuard`` have been removed.
|
||||
|
||||
Instead, the ``isBarrier`` predicate must be used to define all barriers. To do this, barrier guards can be reduced to a set of barrier nodes using the ``DataFlow::MakeBarrierGuard`` module.
|
||||
|
||||
For example, consider this data flow configuration using a barrier guard:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class MyConfig extends DataFlow::Configuration {
|
||||
override predicate isBarrierGuard(DataFlow::BarrierGuardNode node) {
|
||||
node instanceof MyBarrierGuard
|
||||
}
|
||||
..
|
||||
}
|
||||
|
||||
class MyBarrierGuard extends DataFlow::BarrierGuardNode {
|
||||
MyBarrierGuard() { ... }
|
||||
|
||||
override predicate blocks(Expr e, boolean outcome) { ... }
|
||||
}
|
||||
|
||||
This can be migrated to the shared data flow library as follows:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
module MyConfig implements DataFlow::ConfigSig {
|
||||
predicate isBarrier(DataFlow::Node node) {
|
||||
node = DataFlow::MakeBarrierGuard<MyBarrierGuard>::getABarrierNode()
|
||||
}
|
||||
..
|
||||
}
|
||||
|
||||
class MyBarrierGuard extends DataFlow::Node {
|
||||
MyBarrierGuard() { ... }
|
||||
|
||||
predicate blocksExpr(Expr e, boolean outcome) { ... }
|
||||
}
|
||||
|
||||
The changes can be summarized as:
|
||||
- The contents of ``isBarrierGuard`` have been moved to ``isBarrier``.
|
||||
- The ``node instanceof MyBarrierGuard`` check was replaced with ``node = DataFlow::MakeBarrierGuard<MyBarrierGuard>::getABarrierNode()``.
|
||||
- The ``MyBarrierGuard`` class no longer has ``DataFlow::BarrierGuardNode`` as a base class. We simply use ``DataFlow::Node`` instead.
|
||||
- The ``blocks`` predicate has been renamed to ``blocksExpr`` and no longer has the ``override`` keyword.
|
||||
|
||||
See :ref:`using flow state <using-flow-labels-for-precise-data-flow-analysis>` for examples of how to use barrier guards with flow state.
|
||||
|
||||
Query-specific load and store steps
|
||||
-----------------------------------
|
||||
|
||||
The predicates ``isAdditionalLoadStep``, ``isAdditionalStoreStep``, and ``isAdditionalLoadStoreStep`` have been removed. There is no way to emulate the original behavior.
|
||||
|
||||
Library models can still contribute such steps, but they will be applicable to all queries. Also see the section on jump steps further down.
|
||||
|
||||
Changes in behavior
|
||||
--------------------
|
||||
|
||||
When the query has been migrated to the new interface, it may seem to behave differently due to some technical differences in the internals of
|
||||
the two data flow libraries. The most significant changes are described below.
|
||||
|
||||
Taint steps now propagate all flow states
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
There's an important change from the old data flow library when using flow state and taint-tracking together.
|
||||
|
||||
When using ``TaintTracking::GlobalWithState``, all flow states can propagate along taint steps.
|
||||
In the old data flow library, only the ``taint`` flow label could propagate along taint steps.
|
||||
A straightforward translation of such a query may therefore result in new flow paths being found, which might be unexpected.
|
||||
|
||||
To emulate the old behavior, use ``DataFlow::GlobalWithState`` instead of ``TaintTracking::GlobalWithState``,
|
||||
and manually add taint steps using ``isAdditionalFlowStep``. The predicate ``TaintTracking::defaultTaintStep`` can be used to access to the set of taint steps.
|
||||
|
||||
For example:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
module MyConfig implements DataFlow::StateConfigSig {
|
||||
class FlowState extends string {
|
||||
FlowState() { this = ["taint", "foo"] }
|
||||
}
|
||||
|
||||
predicate isAdditionalFlowStep(DataFlow::Node node1, FlowState state1, DataFlow::Node node2, FlowState state2) {
|
||||
// Allow taint steps to propagate the "taint" flow state
|
||||
TaintTracking::defaultTaintStep(node1, node2) and
|
||||
state1 = "taint" and
|
||||
state2 = state
|
||||
}
|
||||
|
||||
...
|
||||
}
|
||||
|
||||
module MyFlow = DataFlow::GlobalWithState<MyConfig>;
|
||||
|
||||
|
||||
Jump steps across function boundaries
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When a flow step crosses a function boundary, that is, it starts and ends in two different functions, it will now be classified as a "jump" step.
|
||||
|
||||
Jump steps can be problematic in some cases. Roughly speaking, the data flow library will "forget" which call site it came from when following a jump step.
|
||||
This can lead to spurious flow paths that go into a function through one call site, and back out of a different call site.
|
||||
|
||||
If the step was generated by a library model, that is, the step is applicable to all queries, this is best mitigated by converting the step to a flow summary.
|
||||
For example, the following library model adds a taint step from ``x`` to ``y`` in ``foo.bar(x, y => {})``:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class MyStep extends TaintTracking::SharedTaintStep {
|
||||
override predicate step(DataFlow::Node node1, DataFlow::Node node2) {
|
||||
exists(DataFlow::CallNode call |
|
||||
call = DataFlow::moduleMember("foo", "bar").getACall() and
|
||||
node1 = call.getArgument(0) and
|
||||
node2 = call.getCallback(1).getParameter(0)
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
Because this step crosses a function boundary, it becomes a jump step. This can be avoided by converting it to a flow summary as follows:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class MySummary extends DataFlow::SummarizedCallable {
|
||||
MySummary() { this = "MySummary" }
|
||||
|
||||
override DataFlow::CallNode getACall() { result = DataFlow::moduleMember("foo", "bar").getACall() }
|
||||
|
||||
override predicate propagatesFlow(string input, string output, boolean preservesValue) {
|
||||
input = "Argument[this]" and
|
||||
output = "Argument[1].Parameter[0]" and
|
||||
preservesValue = false // taint step
|
||||
}
|
||||
}
|
||||
|
||||
See :ref:`customizing library models for JavaScript <customizing-library-models-for-javascript>` for details about the format of the ``input`` and ``output`` strings.
|
||||
The aforementioned article also provides guidance on how to store the flow summary in a data extension.
|
||||
|
||||
For query-specific steps that cross function boundaries, that is, steps added with ``isAdditionalFlowStep``, there is currently no way to emulate the original behavior.
|
||||
A possible workaround is to convert the query-specific step to a flow summary. In this case it should be stored in a data extension to avoid performance issues, although this also means
|
||||
that all other queries will be able to use the flow summary.
|
||||
|
||||
Barriers block all flows
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In the shared data flow library, a barrier blocks all flows, even if the tracked value is inside a content.
|
||||
|
||||
In the old data flow library, only barriers specific to the ``data`` flow label blocked flows when the tracked value was inside a content.
|
||||
|
||||
This rarely has significant impact, but some users may observe some result changes because of this.
|
||||
|
||||
There is currently no way to emulate the original behavior.
|
||||
|
||||
Further reading
|
||||
---------------
|
||||
|
||||
- :ref:`Analyzing data flow in JavaScript and TypeScript <analyzing-data-flow-in-javascript-and-typescript>` provides a general guide to the new data flow library.
|
||||
- :ref:`Using flow state for precise data flow analysis <using-flow-labels-for-precise-data-flow-analysis>` provides a general guide on using flow state.
|
||||
@@ -1,9 +1,9 @@
|
||||
.. _using-flow-labels-for-precise-data-flow-analysis:
|
||||
|
||||
Using flow labels for precise data flow analysis
|
||||
Using flow state for precise data flow analysis
|
||||
================================================
|
||||
|
||||
You can associate flow labels with each value tracked by the flow analysis to determine whether the flow contains potential vulnerabilities.
|
||||
You can associate a flow state with each value tracked by the flow analysis to determine whether the flow contains potential vulnerabilities.
|
||||
|
||||
Overview
|
||||
--------
|
||||
@@ -16,9 +16,9 @@ program, and associates a flag with every data value telling us whether it might
|
||||
source node.
|
||||
|
||||
In some cases, you may want to track more detailed information about data values. This can be done
|
||||
by associating flow labels with data values, as shown in this tutorial. We will first discuss the
|
||||
general idea behind flow labels and then show how to use them in practice. Finally, we will give an
|
||||
overview of the API involved and provide some pointers to standard queries that use flow labels.
|
||||
by associating flow states with data values, as shown in this tutorial. We will first discuss the
|
||||
general idea behind flow states and then show how to use them in practice. Finally, we will give an
|
||||
overview of the API involved and provide some pointers to standard queries that use flow states.
|
||||
|
||||
Limitations of basic data-flow analysis
|
||||
---------------------------------------
|
||||
@@ -47,22 +47,21 @@ contain ``..`` components. Untrusted user input has both bits set initially, ind
|
||||
off individual bits, and if a value that has at least one bit set is interpreted as a path, a
|
||||
potential vulnerability is flagged.
|
||||
|
||||
Using flow labels
|
||||
Using flow states
|
||||
-----------------
|
||||
|
||||
You can handle these cases and others like them by associating a set of `flow labels` (sometimes
|
||||
also referred to as `taint kinds`) with each value being tracked by the analysis. Value-preserving
|
||||
You can handle these cases and others like them by associating a set of `flow states` (sometimes
|
||||
also referred to as `flow labels` or `taint kinds`) with each value being tracked by the analysis. Value-preserving
|
||||
data-flow steps (such as flow steps from writes to a variable to its reads) preserve the set of flow
|
||||
labels, but other steps may add or remove flow labels. Sanitizers, in particular, are simply flow
|
||||
steps that remove some or all flow labels. The initial set of flow labels for a value is determined
|
||||
states, but other steps may add or remove flow states. The initial set of flow states for a value is determined
|
||||
by the source node that gives rise to it. Similarly, sink nodes can specify that an incoming value
|
||||
needs to have a certain flow label (or one of a set of flow labels) in order for the flow to be
|
||||
needs to have a certain flow state (or one of a set of flow states) in order for the flow to be
|
||||
flagged as a potential vulnerability.
|
||||
|
||||
Example
|
||||
-------
|
||||
|
||||
As an example of using flow labels, we will show how to write a query that flags property accesses
|
||||
As an example of using flow state, we will show how to write a query that flags property accesses
|
||||
on JSON values that come from user-controlled input where we have not checked whether the value is
|
||||
``null``, so that the property access may cause a runtime exception.
|
||||
|
||||
@@ -88,8 +87,8 @@ This code, on the other hand, should not be flagged:
|
||||
}
|
||||
}
|
||||
|
||||
We will first try to write a query to find this kind of problem without flow labels, and use the
|
||||
difficulties we encounter as a motivation for bringing flow labels into play, which will make the
|
||||
We will first try to write a query to find this kind of problem without flow state, and use the
|
||||
difficulties we encounter as a motivation for bringing flow state into play, which will make the
|
||||
query much easier to implement.
|
||||
|
||||
To get started, let's write a query that simply flags any flow from ``JSON.parse`` into the base of
|
||||
@@ -99,24 +98,24 @@ a property access:
|
||||
|
||||
import javascript
|
||||
|
||||
class JsonTrackingConfig extends DataFlow::Configuration {
|
||||
JsonTrackingConfig() { this = "JsonTrackingConfig" }
|
||||
|
||||
override predicate isSource(DataFlow::Node nd) {
|
||||
module JsonTrackingConfig implements DataFlow::ConfigSig {
|
||||
predicate isSource(DataFlow::Node nd) {
|
||||
exists(JsonParserCall jpc |
|
||||
nd = jpc.getOutput()
|
||||
)
|
||||
}
|
||||
|
||||
override predicate isSink(DataFlow::Node nd) {
|
||||
predicate isSink(DataFlow::Node nd) {
|
||||
exists(DataFlow::PropRef pr |
|
||||
nd = pr.getBase()
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
from JsonTrackingConfig cfg, DataFlow::Node source, DataFlow::Node sink
|
||||
where cfg.hasFlow(source, sink)
|
||||
module JsonTrackingFlow = DataFlow::Global<JsonTrackingConfig>;
|
||||
|
||||
from DataFlow::Node source, DataFlow::Node sink
|
||||
where JsonTrackingFlow::flow(source, sink)
|
||||
select sink, "Property access on JSON value originating $@.", source, "here"
|
||||
|
||||
Note that we use the ``JsonParserCall`` class from the standard library to model various JSON
|
||||
@@ -127,8 +126,7 @@ introduced any sanitizers yet.
|
||||
|
||||
There are many ways of checking for nullness directly or indirectly. Since this is not the main
|
||||
focus of this tutorial, we will only show how to model one specific case: if some variable ``v`` is
|
||||
known to be truthy, it cannot be ``null``. This kind of condition is easily expressed using a
|
||||
``BarrierGuardNode`` (or its counterpart ``SanitizerGuardNode`` for taint-tracking configurations).
|
||||
known to be truthy, it cannot be ``null``. This kind of condition is expressed using a "barrier guard".
|
||||
A barrier guard node is a data-flow node ``b`` that blocks flow through some other node ``nd``,
|
||||
provided that some condition checked at ``b`` is known to hold, that is, evaluate to a truthy value.
|
||||
|
||||
@@ -139,29 +137,29 @@ is a barrier guard blocking flow through the use of ``data`` on the right-hand s
|
||||
At this point we know that ``data`` has evaluated to a truthy value, so it cannot be ``null``
|
||||
anymore.
|
||||
|
||||
Implementing this additional condition is easy. We implement a subclass of ``DataFlow::BarrierGuardNode``:
|
||||
Implementing this additional condition is easy. We implement a class with a predicate called ``blocksExpr``:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class TruthinessCheck extends DataFlow::BarrierGuardNode, DataFlow::ValueNode {
|
||||
class TruthinessCheck extends DataFlow::ValueNode {
|
||||
SsaVariable v;
|
||||
|
||||
TruthinessCheck() {
|
||||
astNode = v.getAUse()
|
||||
}
|
||||
|
||||
override predicate blocks(boolean outcome, Expr e) {
|
||||
predicate blocksExpr(boolean outcome, Expr e) {
|
||||
outcome = true and
|
||||
e = astNode
|
||||
}
|
||||
}
|
||||
|
||||
and then use it to override predicate ``isBarrierGuard`` in our configuration class:
|
||||
and then use it to implement the predicate ``isBarrier`` in our configuration module:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
override predicate isBarrierGuard(DataFlow::BarrierGuardNode guard) {
|
||||
guard instanceof TruthinessCheck
|
||||
predicate isBarrier(DataFlow::Node node) {
|
||||
node = DataFlow::MakeBarrierGuard<TruthinessCheck>::getABarrierNode()
|
||||
}
|
||||
|
||||
With this change, we now flag the problematic case and don't flag the unproblematic case above.
|
||||
@@ -182,11 +180,11 @@ checked for null-guardedness:
|
||||
}
|
||||
}
|
||||
|
||||
We could try to remedy the situation by overriding ``isAdditionalFlowStep`` in our configuration class to track values through property reads:
|
||||
We could try to remedy the situation by adding ``isAdditionalFlowStep`` in our configuration module to track values through property reads:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
override predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) {
|
||||
predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) {
|
||||
succ.(DataFlow::PropRead).getBase() = pred
|
||||
}
|
||||
|
||||
@@ -199,79 +197,86 @@ altogether, it should simply record the fact that ``root`` itself is known to be
|
||||
Any property read from ``root``, on the other hand, may well be null and needs to be checked
|
||||
separately.
|
||||
|
||||
We can achieve this by introducing two different flow labels, ``json`` and ``maybe-null``. The former
|
||||
We can achieve this by introducing two different flow states, ``json`` and ``maybe-null``. The former
|
||||
means that the value we are dealing with comes from a JSON object, the latter that it may be
|
||||
``null``. The result of any call to ``JSON.parse`` has both labels. A property read from a value
|
||||
with label ``json`` also has both labels. Checking truthiness removes the ``maybe-null`` label.
|
||||
Accessing a property on a value that has the ``maybe-null`` label should be flagged.
|
||||
``null``. The result of any call to ``JSON.parse`` has both states. A property read from a value
|
||||
with state ``json`` also results in a value with both states. Checking truthiness removes the ``maybe-null`` state.
|
||||
Accessing a property on a value that has the ``maybe-null`` state should be flagged.
|
||||
|
||||
To implement this, we start by defining two new subclasses of the class ``DataFlow::FlowLabel``:
|
||||
To implement this, we first change the signature of our configuration module to ``DataFlow::StateConfigSig``, and
|
||||
replace ``DataFlow::Global<...>`` with ``DataFlow::GlobalWithState<...>``:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class JsonLabel extends DataFlow::FlowLabel {
|
||||
JsonLabel() {
|
||||
this = "json"
|
||||
}
|
||||
module JsonTrackingConfig implements DataFlow::StateConfigSig {
|
||||
/* ... */
|
||||
}
|
||||
|
||||
class MaybeNullLabel extends DataFlow::FlowLabel {
|
||||
MaybeNullLabel() {
|
||||
this = "maybe-null"
|
||||
}
|
||||
}
|
||||
module JsonTrackingFlow = DataFlow::GlobalWithState<JsonTrackingConfig>;
|
||||
|
||||
Then we extend our ``isSource`` predicate from above to track flow labels by overriding the two-argument version instead of the one-argument version:
|
||||
We then add a class called ``FlowState`` which has one value for each flow state:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
override predicate isSource(DataFlow::Node nd, DataFlow::FlowLabel lbl) {
|
||||
module JsonTrackingConfig implements DataFlow::StateConfigSig {
|
||||
class FlowState extends string {
|
||||
FlowState() {
|
||||
this = ["json", "maybe-null"]
|
||||
}
|
||||
}
|
||||
|
||||
/* ... */
|
||||
}
|
||||
|
||||
Then we extend our ``isSource`` predicate with an additional parameter to specify the flow state:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
predicate isSource(DataFlow::Node nd, FlowState state) {
|
||||
exists(JsonParserCall jpc |
|
||||
nd = jpc.getOutput() and
|
||||
(lbl instanceof JsonLabel or lbl instanceof MaybeNullLabel)
|
||||
state = ["json", "maybe-null"] // start in either state
|
||||
)
|
||||
}
|
||||
|
||||
Similarly, we make ``isSink`` flow-label aware and require the base of the property read to have the ``maybe-null`` label:
|
||||
Similarly, we update ``isSink`` and require the base of the property read to have the ``maybe-null`` state:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
override predicate isSink(DataFlow::Node nd, DataFlow::FlowLabel lbl) {
|
||||
predicate isSink(DataFlow::Node nd, FlowState state) {
|
||||
exists(DataFlow::PropRef pr |
|
||||
nd = pr.getBase() and
|
||||
lbl instanceof MaybeNullLabel
|
||||
state = "maybe-null"
|
||||
)
|
||||
}
|
||||
|
||||
Our overriding definition of ``isAdditionalFlowStep`` now needs to specify two flow labels, a
|
||||
predecessor label ``predlbl`` and a successor label ``succlbl``. In addition to specifying flow from
|
||||
the predecessor node ``pred`` to the successor node ``succ``, it requires that ``pred`` has label
|
||||
``predlbl``, and adds label ``succlbl`` to ``succ``. In our case, we use this to add both the
|
||||
``json`` label and the ``maybe-null`` label to any property read from a value labeled with ``json``
|
||||
(no matter whether it has the ``maybe-null`` label):
|
||||
Our definition of ``isAdditionalFlowStep`` now needs to specify two flow states, a
|
||||
predecessor state ``predState`` and a successor state ``succState``. In addition to specifying flow from
|
||||
the predecessor node ``pred`` to the successor node ``succ``, it requires that ``pred`` has state
|
||||
``predState``, and adds state ``succState`` to ``succ``. In our case, we use this to add both the
|
||||
``json`` state and the ``maybe-null`` state to any property read from a value in the ``json`` state
|
||||
(no matter whether it has the ``maybe-null`` state):
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
override predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ,
|
||||
DataFlow::FlowLabel predlbl, DataFlow::FlowLabel succlbl) {
|
||||
predicate isAdditionalFlowStep(DataFlow::Node pred, FlowState predState,
|
||||
DataFlow::Node succ, FlowState succState) {
|
||||
succ.(DataFlow::PropRead).getBase() = pred and
|
||||
predlbl instanceof JsonLabel and
|
||||
(succlbl instanceof JsonLabel or succlbl instanceof MaybeNullLabel)
|
||||
predState = "json" and
|
||||
succState = ["json", "maybe-null"]
|
||||
}
|
||||
|
||||
Finally, we turn ``TruthinessCheck`` from a ``BarrierGuardNode`` into a ``LabeledBarrierGuardNode``,
|
||||
specifying that it only removes the ``maybe-null`` label (but not the ``json`` label) from the
|
||||
sanitized value:
|
||||
Finally, we add an additional parameter to the ``isBarrier`` predicate to specify the flow state
|
||||
to block at the ``TruthinessCheck`` barrier.
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
class TruthinessCheck extends DataFlow::LabeledBarrierGuardNode, DataFlow::ValueNode {
|
||||
...
|
||||
module JsonTrackingConfig implements DataFlow::StateConfigSig {
|
||||
/* ... */
|
||||
|
||||
override predicate blocks(boolean outcome, Expr e, DataFlow::FlowLabel lbl) {
|
||||
outcome = true and
|
||||
e = astNode and
|
||||
lbl instanceof MaybeNullLabel
|
||||
predicate isBarrier(DataFlow::Node node, FlowState state) {
|
||||
node = DataFlow::MakeBarrierGuard<TruthinessCheck>::getABarrierNode() and
|
||||
state = "maybe-null"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -283,66 +288,60 @@ step by step in the UI:
|
||||
/** @kind path-problem */
|
||||
|
||||
import javascript
|
||||
import DataFlow::PathGraph
|
||||
|
||||
class JsonLabel extends DataFlow::FlowLabel {
|
||||
JsonLabel() {
|
||||
this = "json"
|
||||
}
|
||||
}
|
||||
|
||||
class MaybeNullLabel extends DataFlow::FlowLabel {
|
||||
MaybeNullLabel() {
|
||||
this = "maybe-null"
|
||||
}
|
||||
}
|
||||
|
||||
class TruthinessCheck extends DataFlow::LabeledBarrierGuardNode, DataFlow::ValueNode {
|
||||
class TruthinessCheck extends DataFlow::ValueNode {
|
||||
SsaVariable v;
|
||||
|
||||
TruthinessCheck() {
|
||||
astNode = v.getAUse()
|
||||
}
|
||||
|
||||
override predicate blocks(boolean outcome, Expr e, DataFlow::FlowLabel lbl) {
|
||||
predicate blocksExpr(boolean outcome, Expr e, JsonTrackingConfig::FlowState state) {
|
||||
outcome = true and
|
||||
e = astNode and
|
||||
lbl instanceof MaybeNullLabel
|
||||
state = "maybe-null"
|
||||
}
|
||||
}
|
||||
|
||||
class JsonTrackingConfig extends DataFlow::Configuration {
|
||||
JsonTrackingConfig() { this = "JsonTrackingConfig" }
|
||||
module JsonTrackingConfig implements DataFlow::StateConfigSig {
|
||||
class FlowState extends string {
|
||||
FlowState() {
|
||||
this = ["json", "maybe-null"]
|
||||
}
|
||||
}
|
||||
|
||||
override predicate isSource(DataFlow::Node nd, DataFlow::FlowLabel lbl) {
|
||||
predicate isSource(DataFlow::Node nd, FlowState state) {
|
||||
exists(JsonParserCall jpc |
|
||||
nd = jpc.getOutput() and
|
||||
(lbl instanceof JsonLabel or lbl instanceof MaybeNullLabel)
|
||||
state = ["json", "maybe-null"] // start in either state
|
||||
)
|
||||
}
|
||||
|
||||
override predicate isSink(DataFlow::Node nd, DataFlow::FlowLabel lbl) {
|
||||
predicate isSink(DataFlow::Node nd, FlowState state) {
|
||||
exists(DataFlow::PropRef pr |
|
||||
nd = pr.getBase() and
|
||||
lbl instanceof MaybeNullLabel
|
||||
state = "maybe-null"
|
||||
)
|
||||
}
|
||||
|
||||
override predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ,
|
||||
DataFlow::FlowLabel predlbl, DataFlow::FlowLabel succlbl) {
|
||||
predicate isAdditionalFlowStep(DataFlow::Node pred, FlowState predState,
|
||||
DataFlow::Node succ, FlowState succState) {
|
||||
succ.(DataFlow::PropRead).getBase() = pred and
|
||||
predlbl instanceof JsonLabel and
|
||||
(succlbl instanceof JsonLabel or succlbl instanceof MaybeNullLabel)
|
||||
predState = "json" and
|
||||
succState = ["json", "maybe-null"]
|
||||
}
|
||||
|
||||
override predicate isBarrierGuard(DataFlow::BarrierGuardNode guard) {
|
||||
guard instanceof TruthinessCheck
|
||||
predicate isBarrier(DataFlow::Node node, FlowState state) {
|
||||
node = DataFlow::MakeBarrierGuard<TruthinessCheck>::getABarrierNode() and
|
||||
state = "maybe-null"
|
||||
}
|
||||
}
|
||||
|
||||
from JsonTrackingConfig cfg, DataFlow::PathNode source, DataFlow::PathNode sink
|
||||
where cfg.hasFlowPath(source, sink)
|
||||
select sink, source, sink, "Property access on JSON value originating $@.", source, "here"
|
||||
module JsonTrackingFlow = DataFlow::GlobalWithState<JsonTrackingConfig>;
|
||||
|
||||
from DataFlow::Node source, DataFlow::Node sink
|
||||
where JsonTrackingFlow::flow(source, sink)
|
||||
select sink, "Property access on JSON value originating $@.", source, "here"
|
||||
|
||||
We ran this query on the https://github.com/finos/plexus-interop repository. Many of the
|
||||
results were false positives since the query does not currently model many ways in which we can check
|
||||
@@ -354,52 +353,30 @@ this tutorial.
|
||||
API
|
||||
---
|
||||
|
||||
Plain data-flow configurations implicitly use a single flow label "data", which indicates that a
|
||||
data value originated from a source. You can use the predicate ``DataFlow::FlowLabel::data()``,
|
||||
which returns this flow label, as a symbolic name for it.
|
||||
Flow state can be used in modules implementing the ``DataFlow::StateConfigSig`` signature. Compared to a ``DataFlow::ConfigSig`` the main differences are:
|
||||
|
||||
Taint-tracking configurations add a second flow label "taint" (``DataFlow::FlowLabel::taint()``),
|
||||
which is similar to "data", but includes values that have passed through non-value preserving steps
|
||||
such as string operations.
|
||||
- The module must be passed to ``DataFlow::GlobalWithState<...>`` or ``TaintTracking::GlobalWithState<...>``.
|
||||
instead of ``DataFlow::Global<...>`` or ``TaintTracking::Global<...>``.
|
||||
- The module must contain a type named ``FlowState``.
|
||||
- ``isSource`` expects an additional parameter specifying the flow state.
|
||||
- ``isSink`` optionally can take an additional parameter specifying the flow state.
|
||||
If omitted, the sinks are in effect for all flow states.
|
||||
- ``isAdditionalFlowStep`` optionally can take two additional parameters specifying the predecessor and successor flow states.
|
||||
If omitted, the generated steps apply for any flow state and preserve the current flow state.
|
||||
- ``isBarrier`` optionally can take an additional parameter specifying the flow state to block.
|
||||
If omitted, the barriers block all flow states.
|
||||
|
||||
Each of the three member predicates ``isSource``, ``isSink`` and
|
||||
``isAdditionalFlowStep``/``isAdditionalTaintStep`` has one version that uses the default flow
|
||||
labels, and one version that allows specifying custom flow labels through additional arguments.
|
||||
|
||||
For ``isSource``, there is one additional argument specifying which flow label(s) should be
|
||||
associated with values originating from this source. If multiple flow labels are specified, each
|
||||
value is associated with `all` of them.
|
||||
|
||||
For ``isSink``, the additional argument specifies which flow label(s) a value that flows into this
|
||||
source may be associated with. If multiple flow labels are specified, then any value that is
|
||||
associated with `at least one` of them will be considered by the configuration.
|
||||
|
||||
For ``isAdditionalFlowStep`` there are two additional arguments ``predlbl`` and ``succlbl``, which
|
||||
allow flow steps to act as flow label transformers. If a value associated with ``predlbl`` arrives
|
||||
at the start node of the additional step, it is propagated to the end node and associated with
|
||||
``succlbl``. Of course, ``predlbl`` and ``succlbl`` may be the same, indicating that the flow step
|
||||
preserves this label. There can also be multiple values of ``succlbl`` for a single ``predlbl`` or
|
||||
vice versa.
|
||||
|
||||
Note that if you do not restrict ``succlbl`` then it will be allowed to range over all flow labels.
|
||||
This may cause labels that were previously blocked on a path to reappear, which is not usually what
|
||||
you want.
|
||||
|
||||
The flow label-aware version of ``isBarrier`` is called ``isLabeledBarrier``: unlike ``isBarrier``,
|
||||
which prevents any flow past the given node, it only blocks flow of values associated with one of
|
||||
the specified flow labels.
|
||||
|
||||
Standard queries using flow labels
|
||||
Standard queries using flow state
|
||||
----------------------------------
|
||||
|
||||
Some of our standard security queries use flow labels. You can look at their implementation
|
||||
to get a feeling for how to use flow labels in practice.
|
||||
Some of our standard security queries use flow state. You can look at their implementation
|
||||
to get a feeling for how to use flow state in practice.
|
||||
|
||||
In particular, both of the examples mentioned in the section on limitations of basic data flow above
|
||||
are from standard security queries that use flow labels. The `Prototype-polluting merge call
|
||||
<https://codeql.github.com/codeql-query-help/javascript/js-prototype-pollution/>`_ query uses two flow labels to distinguish completely
|
||||
are from standard security queries that use flow state. The `Prototype-polluting merge call
|
||||
<https://codeql.github.com/codeql-query-help/javascript/js-prototype-pollution/>`_ query uses two flow states to distinguish completely
|
||||
tainted objects from partially tainted objects. The `Uncontrolled data used in path expression
|
||||
<https://codeql.github.com/codeql-query-help/javascript/js-path-injection/>`_ query uses four flow labels to track whether a user-controlled
|
||||
<https://codeql.github.com/codeql-query-help/javascript/js-path-injection/>`_ query uses four flow states to track whether a user-controlled
|
||||
string may be an absolute path and whether it may contain ``..`` components.
|
||||
|
||||
Further reading
|
||||
|
||||
Reference in New Issue
Block a user