From b1da636be0cab23ce4fca168b77d730c022af52b Mon Sep 17 00:00:00 2001 From: Nick Rolfe Date: Fri, 21 Oct 2022 15:11:43 +0100 Subject: [PATCH 1/6] Ruby: first draft of data flow docs --- .../analyzing-data-flow-in-ruby.rst | 390 ++++++++++++++++++ .../codeql-for-ruby.rst | 2 + 2 files changed, 392 insertions(+) create mode 100644 docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst diff --git a/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst b/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst new file mode 100644 index 00000000000..feaa6415486 --- /dev/null +++ b/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst @@ -0,0 +1,390 @@ +.. _analyzing-data-flow-in-ruby: + +Analyzing data flow in Ruby +============================= + +You can use CodeQL to track the flow of data through a Ruby program to places where the data is used. + +About this article +------------------ + +This article describes how data flow analysis is implemented in the CodeQL libraries for Ruby and includes examples to help you write your own data flow queries. +The following sections describe how to use the libraries for local data flow, global data flow, and taint tracking. +For a more general introduction to modeling data flow, see ":ref:`About data flow analysis `." + +Local data flow +--------------- + +Local data flow is data flow within a single method or callable. Local data flow is easier, faster, and more precise than global data flow, and is sufficient for many queries. + +Using local data flow +~~~~~~~~~~~~~~~~~~~~~ + +The local data flow library is in the module ``DataFlow`` and it defines the class ``Node``, representing any element through which data can flow. +``Node``\ s are divided into expression nodes (``ExprNode``) and parameter nodes (``ParameterNode``). +You can map between a data flow ``ParameterNode`` and its corresponding ``Parameter`` AST node using the ``asParameter`` member predicate. +Meanwhile, the ``asExpr`` member predicate maps between a data flow ``ExprNode`` and its corresponding ``ExprCfgNode`` in the control-flow library. + +.. code-block:: ql + + class Node { + /** Gets the expression corresponding to this node, if any. */ + CfgNodes::ExprCfgNode asExpr() { ... } + + /** Gets the parameter corresponding to this node, if any. */ + Parameter asParameter() { ... } + + ... + } + +You can also use the predicates ``exprNode`` and ``parameterNode``: + +.. code-block:: ql + + /** + * Gets a node corresponding to expression `e`. + */ + ExprNode exprNode(CfgNodes::ExprCfgNode e) { ... } + + /** + * Gets the node corresponding to the value of parameter `p` at function entry. + */ + ParameterNode parameterNode(Parameter p) { ... } + +Note that since ``asExpr`` and ``exprNode`` map between data-flow and control-flow nodes, you then need to call the ``getExpr`` member predicate on the control-flow node to map to the corresponding AST node, +e.g. by writing ``node.asExpr().getExpr()``. +Due to the control-flow graph being split, there can be multiple data-flow and control-flow nodes associated with a single expression AST node. + +The predicate ``localFlowStep(Node nodeFrom, Node nodeTo)`` holds if there is an immediate data flow edge from the node ``nodeFrom`` to the node ``nodeTo``. +You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localFlow``. + +For example, you can find flow from an expression ``source`` to an expression ``sink`` in zero or more local steps: + +.. code-block:: ql + + DataFlow::localFlow(source, sink) + +Using local taint tracking +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Local taint tracking extends local data flow by including non-value-preserving flow steps. +For example: + +.. code-block:: ruby + + temp = x + y = temp + ", " + temp + +If ``x`` is a tainted string then ``y`` is also tainted. + +The local taint tracking library is in the module ``TaintTracking``. +Like local data flow, a predicate ``localTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo)`` holds if there is an immediate taint propagation edge from the node ``nodeFrom`` to the node ``nodeTo``. +You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localTaint``. + +For example, you can find taint propagation from an expression ``source`` to an expression ``sink`` in zero or more local steps: + +.. code-block:: ql + + TaintTracking::localTaint(source, sink) + + +Using local sources +~~~~~~~~~~~~~~~~~~~ + +When asking for local data flow or taint propagation between two expressions as above, you would normally constrain the expressions to be relevant to a certain investigation. +The next section will give some concrete examples, but there is a more abstract concept that we should call out explicitly, namely that of a local source. + +A local source is a data-flow node with no local data flow into it. +As such, it is a local origin of data flow, a place where a new value is created. +This includes parameters (which only receive global data flow) and most expressions (because they are not value-preserving). +Restricting attention to such local sources gives a much lighter and more performant data-flow graph and in most cases also a more suitable abstraction for the investigation of interest. +The class ``LocalSourceNode`` represents data-flow nodes that are also local sources. +It comes with a useful member predicate ``flowsTo(DataFlow::Node node)``, which holds if there is local data flow from the local source to ``node``. + +Examples +~~~~~~~~ + +This query finds the filename argument passed in each call to ``File.open``: + +.. code-block:: ql + + import codeql.ruby.DataFlow + import codeql.ruby.ApiGraphs + + from DataFlow::CallNode call + where call = API::getTopLevelMember("File").getAMethodCall("open") + select call.getArgument(0) + +Notice the use of the ``API`` module for referring to library methods. +For more information, see ":doc:`Using API graphs in Ruby `." + +Unfortunately this will only give the expression in the argument, not the values which could be passed to it. +So we use local data flow to find all expressions that flow into the argument: + +.. code-block:: ql + + import codeql.ruby.DataFlow + import codeql.ruby.ApiGraphs + + from DataFlow::CallNode call, DataFlow::ExprNode expr + where + call = API::getTopLevelMember("File").getAMethodCall("open") and + DataFlow::localFlow(expr, call.getArgument(0)) + select call, expr + +Many expressions flow to the same call. +If you run this query, you may notice that you get several data-flow nodes for an expression as it flows towards a call (notice repeated locations in the ``call`` column). +We are mostly interested in the "first" of these, what might be called the local source for the file name. +To restrict attention to such local sources, and to simultaneously make the analysis more performant, we have the QL class ``LocalSourceNode``. +We could demand that ``expr`` is such a node: + +.. code-block:: ql + + import codeql.ruby.DataFlow + import codeql.ruby.ApiGraphs + + from DataFlow::CallNode call, DataFlow::ExprNode expr + where + call = API::getTopLevelMember("File").getAMethodCall("open") and + DataFlow::localFlow(expr, call.getArgument(0)) and + expr instanceof DataFlow::LocalSourceNode + select call, expr + +However, we could also enforce this by casting. +That would allow us to use the member predicate ``flowsTo`` on ``LocalSourceNode`` like so: + +.. code-block:: ql + + import codeql.ruby.DataFlow + import codeql.ruby.ApiGraphs + + from DataFlow::CallNode call, DataFlow::ExprNode expr + where + call = API::getTopLevelMember("File").getAMethodCall("open") and + expr.(DataFlow::LocalSourceNode).flowsTo(call.getArgument(0)) + select call, expr + +As an alternative, we can ask more directly that ``expr`` is a local source of the first argument, via the predicate ``getALocalSource``: + +.. code-block:: ql + + import codeql.ruby.DataFlow + import codeql.ruby.ApiGraphs + + from DataFlow::CallNode call, DataFlow::ExprNode expr + where + call = API::getTopLevelMember("File").getAMethodCall("open") and + expr = call.getArgument(0).getALocalSource() + select call, expr + +All these three queries give identical results. +We now mostly have one expression per call. + +We may still have cases of more than one expression flowing to a call, but then they flow through different code paths (possibly due to control-flow splitting). + +We might want to make the source more specific, for example a parameter to a method or block. +This query finds instances where a parameter is used as the name when opening a file: + +.. code-block:: ql + + import codeql.ruby.DataFlow + import codeql.ruby.ApiGraphs + + from DataFlow::CallNode call, DataFlow::ParameterNode p + where + call = API::getTopLevelMember("File").getAMethodCall("open") and + DataFlow::localFlow(p, call.getArgument(0)) + select call, p + +Using the exact name supplied via the parameter may be too strict. +If we want to know if the parameter influences the file name, we can use taint tracking instead of data flow. +This query finds calls to ``File.open`` where the filename is derived from a parameter: + +.. code-block:: ql + + import codeql.ruby.DataFlow + import codeql.ruby.TaintTracking + import codeql.ruby.ApiGraphs + + from DataFlow::CallNode call, DataFlow::ParameterNode p + where + call = API::getTopLevelMember("File").getAMethodCall("open") and + TaintTracking::localTaint(p, call.getArgument(0)) + select call, p + +Global data flow +---------------- + +Global data flow tracks data flow throughout the entire program, and is therefore more powerful than local data flow. +However, global data flow is less precise than local data flow, and the analysis typically requires significantly more time and memory to perform. + +.. pull-quote:: Note + + .. include:: ../reusables/path-problem.rst + +Using global data flow +~~~~~~~~~~~~~~~~~~~~~~ + +The global data flow library is used by extending the class ``DataFlow::Configuration``: + +.. code-block:: ql + + import codeql.ruby.DataFlow + + class MyDataFlowConfiguration extends DataFlow::Configuration { + MyDataFlowConfiguration() { this = "..." } + + override predicate isSource(DataFlow::Node source) { + ... + } + + override predicate isSink(DataFlow::Node sink) { + ... + } + } + +These predicates are defined in the configuration: + +- ``isSource`` - defines where data may flow from. +- ``isSink`` - defines where data may flow to. +- ``isBarrier`` - optionally, restricts the data flow. +- ``isAdditionalFlowStep`` - optionally, adds additional flow steps. + +The characteristic predicate (``MyDataFlowConfiguration()``) defines the name of the configuration, so ``"..."`` must be replaced with a unique name (for instance the class name). + +The data flow analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``: + +.. code-block:: ql + + from MyDataFlowConfiguation dataflow, DataFlow::Node source, DataFlow::Node sink + where dataflow.hasFlow(source, sink) + select source, "Dataflow to $@.", sink, sink.toString() + +Using global taint tracking +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Global taint tracking is to global data flow what local taint tracking is to local data flow. +That is, global taint tracking extends global data flow with additional non-value-preserving steps. +The global taint tracking library is used by extending the class ``TaintTracking::Configuration``: + +.. code-block:: ql + + import codeql.ruby.DataFlow + import codeql.ruby.TaintTracking + + class MyTaintTrackingConfiguration extends TaintTracking::Configuration { + MyTaintTrackingConfiguration() { this = "..." } + + override predicate isSource(DataFlow::Node source) { + ... + } + + override predicate isSink(DataFlow::Node sink) { + ... + } + } + +These predicates are defined in the configuration: + +- ``isSource`` - defines where taint may flow from. +- ``isSink`` - defines where taint may flow to. +- ``isSanitizer`` - optionally, restricts the taint flow. +- ``isAdditionalTaintStep`` - optionally, adds additional taint steps. + +Similar to global data flow, the characteristic predicate (``MyTaintTrackingConfiguration()``) defines the unique name of the configuration and the taint analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``. + +Predefined sources and sinks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The data flow library contains a number of predefined sources and sinks, providing a good starting point for defining data flow based security queries. + +- The class ``RemoteFlowSource`` (defined in module ``codeql.ruby.dataflow.RemoteFlowSources``) represents data flow from remote network inputs. This is useful for finding security problems in networked services. +- The library ``Concepts`` (defined in module ``codeql.ruby.Concepts``) contains several subclasses of ``DataFlow::Node`` that are security relevant, such as ``FileSystemAccess`` and ``SqlExecution``. + +For global flow, it is also useful to restrict sources to instances of ``LocalSourceNode``. +The predefined sources generally do that. + +Class hierarchy +~~~~~~~~~~~~~~~ + +- ``DataFlow::Configuration`` - base class for custom global data flow analysis. +- ``DataFlow::Node`` - an element behaving as a data-flow node. + + - ``DataFlow::CfgNode`` - a control-flow node behaving as a data-flow node. + + - ``DataFlow::ExprNode`` - an expression behaving as a data-flow node. + - ``DataFlow::ParameterNode`` - a parameter data-flow node representing the value of a parameter at method/block entry. + + - ``RemoteFlowSource`` - data flow from network/remote input. + - ``Concepts::SystemCommandExecution`` - a data-flow node that executes an operating system command, for instance by spawning a new process. + - ``Concepts::FileSystemAccess`` - a data-flow node that performs a file system access, including reading and writing data, creating and deleting files and folders, checking and updating permissions, and so on. + - ``Concepts::Path::PathNormalization`` - a data-flow node that performs path normalization. This is often needed in order to safely access paths. + - ``Concepts::CodeExecution`` - a data-flow node that dynamically executes Python code. + - ``Concepts::SqlExecution`` - a data-flow node that executes SQL statements. + - ``Concepts::HTTP::Server::RouteSetup`` - a data-flow node that sets up a route on a server. + - ``Concepts::HTTP::Server::HttpResponse`` - a data-flow node that creates an HTTP response on a server. + +- ``TaintTracking::Configuration`` - base class for custom global taint tracking analysis. + +Examples +~~~~~~~~ + +This query shows a data flow configuration that uses all network input as data sources: + +.. code-block:: ql + + import codeql.ruby.DataFlow + import codeql.ruby.TaintTracking + import codeql.ruby.Concepts + import codeql.ruby.dataflow.RemoteFlowSources + + class RemoteToFileConfiguration extends TaintTracking::Configuration { + RemoteToFileConfiguration() { this = "RemoteToFileConfiguration" } + + override predicate isSource(DataFlow::Node source) { source instanceof RemoteFlowSource } + + override predicate isSink(DataFlow::Node sink) { + sink = any(FileSystemAccess fa).getAPathArgument() + } + } + + from DataFlow::Node input, DataFlow::Node fileAccess, RemoteToFileConfiguration config + where config.hasFlow(input, fileAccess) + select fileAccess, "This file access uses data from $@.", input, "user-controllable input." + +This data flow configuration tracks data flow from environment variables to opening files: + +.. code-block:: ql + + import codeql.ruby.DataFlow + import codeql.ruby.controlflow.CfgNodes + import codeql.ruby.ApiGraphs + + class EnvironmentToFileConfiguration extends DataFlow::Configuration { + EnvironmentToFileConfiguration() { this = "EnvironmentToFileConfiguration" } + + override predicate isSource(DataFlow::Node source) { + exists(ExprNodes::ConstantReadAccessCfgNode env | + env.getExpr().getName() = "ENV" and + env = source.asExpr().(ExprNodes::ElementReferenceCfgNode).getReceiver() + ) + } + + override predicate isSink(DataFlow::Node sink) { + sink = API::getTopLevelMember("File").getAMethodCall("open").getArgument(0) + } + } + + from EnvironmentToFileConfiguration config, DataFlow::Node environment, DataFlow::Node fileOpen + where config.hasFlow(environment, fileOpen) + select fileOpen, "This call to 'File.open' uses data from $@.", environment, + "an environment variable" + +Further reading +--------------- + +- ":ref:`Exploring data flow with path queries `" + + +.. include:: ../reusables/ruby-further-reading.rst +.. include:: ../reusables/codeql-ref-tools-further-reading.rst diff --git a/docs/codeql/codeql-language-guides/codeql-for-ruby.rst b/docs/codeql/codeql-language-guides/codeql-for-ruby.rst index bfb29a012ef..8e2dfe267e3 100644 --- a/docs/codeql/codeql-language-guides/codeql-for-ruby.rst +++ b/docs/codeql/codeql-language-guides/codeql-for-ruby.rst @@ -15,4 +15,6 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat - :doc:`CodeQL library for Ruby `: When you're analyzing a Ruby program, you can make use of the large collection of classes in the CodeQL library for Ruby. +- :doc:`Analyzing data flow in Ruby `: You can use CodeQL to track the flow of data through a Ruby program to places where the data is used. + .. include:: ../reusables/ruby-beta-note.rst From 5369ba1d832e39013ef1d82581c2582239856290 Mon Sep 17 00:00:00 2001 From: Nick Rolfe Date: Mon, 31 Oct 2022 11:24:30 +0000 Subject: [PATCH 2/6] ruby docs: remove distracting sentence --- .../codeql-language-guides/analyzing-data-flow-in-ruby.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst b/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst index feaa6415486..49a633ba2a7 100644 --- a/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst +++ b/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst @@ -97,7 +97,6 @@ The next section will give some concrete examples, but there is a more abstract A local source is a data-flow node with no local data flow into it. As such, it is a local origin of data flow, a place where a new value is created. This includes parameters (which only receive global data flow) and most expressions (because they are not value-preserving). -Restricting attention to such local sources gives a much lighter and more performant data-flow graph and in most cases also a more suitable abstraction for the investigation of interest. The class ``LocalSourceNode`` represents data-flow nodes that are also local sources. It comes with a useful member predicate ``flowsTo(DataFlow::Node node)``, which holds if there is local data flow from the local source to ``node``. From 23db9c573f2e7793160814074412660c42beb728 Mon Sep 17 00:00:00 2001 From: Nick Rolfe Date: Mon, 31 Oct 2022 16:25:34 +0000 Subject: [PATCH 3/6] Ruby docs: add LocalSourceNode and remove CfgNode from class list --- .../analyzing-data-flow-in-ruby.rst | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst b/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst index 49a633ba2a7..5d6b8c90ac4 100644 --- a/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst +++ b/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst @@ -308,11 +308,9 @@ Class hierarchy - ``DataFlow::Configuration`` - base class for custom global data flow analysis. - ``DataFlow::Node`` - an element behaving as a data-flow node. - - - ``DataFlow::CfgNode`` - a control-flow node behaving as a data-flow node. - - - ``DataFlow::ExprNode`` - an expression behaving as a data-flow node. - - ``DataFlow::ParameterNode`` - a parameter data-flow node representing the value of a parameter at method/block entry. + - ``DataFlow::LocalSourceNode`` - a local origin of data, as a data-flow node. + - ``DataFlow::ExprNode`` - an expression behaving as a data-flow node. + - ``DataFlow::ParameterNode`` - a parameter data-flow node representing the value of a parameter at method/block entry. - ``RemoteFlowSource`` - data flow from network/remote input. - ``Concepts::SystemCommandExecution`` - a data-flow node that executes an operating system command, for instance by spawning a new process. From 1a702bfd5015e8e4e6de3c09fa0cd5f853585b89 Mon Sep 17 00:00:00 2001 From: Felicity Chapman Date: Tue, 1 Nov 2022 17:26:36 +0000 Subject: [PATCH 4/6] Add new article to `toctree` to fix test --- docs/codeql/codeql-language-guides/codeql-for-ruby.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/codeql/codeql-language-guides/codeql-for-ruby.rst b/docs/codeql/codeql-language-guides/codeql-for-ruby.rst index 8e2dfe267e3..17bb8749120 100644 --- a/docs/codeql/codeql-language-guides/codeql-for-ruby.rst +++ b/docs/codeql/codeql-language-guides/codeql-for-ruby.rst @@ -10,6 +10,7 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat basic-query-for-ruby-code codeql-library-for-ruby + analyzing-data-flow-in-ruby - :doc:`Basic query for Ruby code `: Learn to write and run a simple CodeQL query using LGTM. From 9998752147f23be8ffcbdd6f0ce23907a3cdaa5d Mon Sep 17 00:00:00 2001 From: Nick Rolfe Date: Wed, 2 Nov 2022 10:53:21 +0000 Subject: [PATCH 5/6] Accept suggested wording improvements Co-authored-by: Felicity Chapman --- .../analyzing-data-flow-in-ruby.rst | 44 +++++++++---------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst b/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst index 5d6b8c90ac4..bec5bc79ee6 100644 --- a/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst +++ b/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst @@ -15,15 +15,15 @@ For a more general introduction to modeling data flow, see ":ref:`About data flo Local data flow --------------- -Local data flow is data flow within a single method or callable. Local data flow is easier, faster, and more precise than global data flow, and is sufficient for many queries. +Local data flow tracks the flow of data within a single method or callable. Local data flow is easier, faster, and more precise than global data flow. Before looking at more complex tracking, you should always consider local tracking because it is sufficient for many queries. Using local data flow ~~~~~~~~~~~~~~~~~~~~~ -The local data flow library is in the module ``DataFlow`` and it defines the class ``Node``, representing any element through which data can flow. +You can use the local data flow library by importing the ``DataFlow`` module. The library uses the class ``Node`` to represent any element through which data can flow. ``Node``\ s are divided into expression nodes (``ExprNode``) and parameter nodes (``ParameterNode``). -You can map between a data flow ``ParameterNode`` and its corresponding ``Parameter`` AST node using the ``asParameter`` member predicate. -Meanwhile, the ``asExpr`` member predicate maps between a data flow ``ExprNode`` and its corresponding ``ExprCfgNode`` in the control-flow library. +You can map a data flow ``ParameterNode`` to its corresponding ``Parameter`` AST node using the ``asParameter`` member predicate. +Similarly, you can use the ``asExpr`` member predicate to map a data flow ``ExprNode`` to its corresponding ``ExprCfgNode`` in the control-flow library. .. code-block:: ql @@ -37,7 +37,7 @@ Meanwhile, the ``asExpr`` member predicate maps between a data flow ``ExprNode`` ... } -You can also use the predicates ``exprNode`` and ``parameterNode``: +You can use the predicates ``exprNode`` and ``parameterNode`` to map from expressions and parameters to their data-flow node: .. code-block:: ql @@ -52,8 +52,8 @@ You can also use the predicates ``exprNode`` and ``parameterNode``: ParameterNode parameterNode(Parameter p) { ... } Note that since ``asExpr`` and ``exprNode`` map between data-flow and control-flow nodes, you then need to call the ``getExpr`` member predicate on the control-flow node to map to the corresponding AST node, -e.g. by writing ``node.asExpr().getExpr()``. -Due to the control-flow graph being split, there can be multiple data-flow and control-flow nodes associated with a single expression AST node. +for example, by writing ``node.asExpr().getExpr()``. +A control-flow graph considers every way control can flow through code, consequently, there can be multiple data-flow and control-flow nodes associated with a single expression node in the AST. The predicate ``localFlowStep(Node nodeFrom, Node nodeTo)`` holds if there is an immediate data flow edge from the node ``nodeFrom`` to the node ``nodeTo``. You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localFlow``. @@ -67,7 +67,7 @@ For example, you can find flow from an expression ``source`` to an expression `` Using local taint tracking ~~~~~~~~~~~~~~~~~~~~~~~~~~ -Local taint tracking extends local data flow by including non-value-preserving flow steps. +Local taint tracking extends local data flow to include flow steps where values are not preserved, for example, string manipulation. For example: .. code-block:: ruby @@ -91,17 +91,17 @@ For example, you can find taint propagation from an expression ``source`` to an Using local sources ~~~~~~~~~~~~~~~~~~~ -When asking for local data flow or taint propagation between two expressions as above, you would normally constrain the expressions to be relevant to a certain investigation. -The next section will give some concrete examples, but there is a more abstract concept that we should call out explicitly, namely that of a local source. +When exploring local data flow or taint propagation between two expressions as above, you would normally constrain the expressions to be relevant to your investigation. +The next section gives some concrete examples, but first it's helpful to introduce the concept of a local source. A local source is a data-flow node with no local data flow into it. As such, it is a local origin of data flow, a place where a new value is created. -This includes parameters (which only receive global data flow) and most expressions (because they are not value-preserving). +This includes parameters (which only receive values from global data flow) and most expressions (because they are not value-preserving). The class ``LocalSourceNode`` represents data-flow nodes that are also local sources. It comes with a useful member predicate ``flowsTo(DataFlow::Node node)``, which holds if there is local data flow from the local source to ``node``. -Examples -~~~~~~~~ +Examples of local data flow +~~~~~~~~~~~~~~~~~~~~~~~~~~~ This query finds the filename argument passed in each call to ``File.open``: @@ -134,8 +134,8 @@ So we use local data flow to find all expressions that flow into the argument: Many expressions flow to the same call. If you run this query, you may notice that you get several data-flow nodes for an expression as it flows towards a call (notice repeated locations in the ``call`` column). We are mostly interested in the "first" of these, what might be called the local source for the file name. -To restrict attention to such local sources, and to simultaneously make the analysis more performant, we have the QL class ``LocalSourceNode``. -We could demand that ``expr`` is such a node: +To restrict the results to local sources for the file name, and to simultaneously make the analysis more efficient, we can use the CodeQL class ``LocalSourceNode``. +We can update the query to specify that ``expr`` is an instance of a ``LocalSourceNode``. .. code-block:: ql @@ -149,7 +149,7 @@ We could demand that ``expr`` is such a node: expr instanceof DataFlow::LocalSourceNode select call, expr -However, we could also enforce this by casting. +An alternative approach to limit the results to local sources for the file name is to enforce this by casting. That would allow us to use the member predicate ``flowsTo`` on ``LocalSourceNode`` like so: .. code-block:: ql @@ -181,7 +181,7 @@ We now mostly have one expression per call. We may still have cases of more than one expression flowing to a call, but then they flow through different code paths (possibly due to control-flow splitting). -We might want to make the source more specific, for example a parameter to a method or block. +We might want to make the source more specific, for example, a parameter to a method or block. This query finds instances where a parameter is used as the name when opening a file: .. code-block:: ql @@ -197,7 +197,7 @@ This query finds instances where a parameter is used as the name when opening a Using the exact name supplied via the parameter may be too strict. If we want to know if the parameter influences the file name, we can use taint tracking instead of data flow. -This query finds calls to ``File.open`` where the filename is derived from a parameter: +This query finds calls to ``File.open`` where the file name is derived from a parameter: .. code-block:: ql @@ -224,7 +224,7 @@ However, global data flow is less precise than local data flow, and the analysis Using global data flow ~~~~~~~~~~~~~~~~~~~~~~ -The global data flow library is used by extending the class ``DataFlow::Configuration``: +You can use the global data flow library by extending the class ``DataFlow::Configuration``: .. code-block:: ql @@ -316,15 +316,15 @@ Class hierarchy - ``Concepts::SystemCommandExecution`` - a data-flow node that executes an operating system command, for instance by spawning a new process. - ``Concepts::FileSystemAccess`` - a data-flow node that performs a file system access, including reading and writing data, creating and deleting files and folders, checking and updating permissions, and so on. - ``Concepts::Path::PathNormalization`` - a data-flow node that performs path normalization. This is often needed in order to safely access paths. - - ``Concepts::CodeExecution`` - a data-flow node that dynamically executes Python code. + - ``Concepts::CodeExecution`` - a data-flow node that dynamically executes Ruby code. - ``Concepts::SqlExecution`` - a data-flow node that executes SQL statements. - ``Concepts::HTTP::Server::RouteSetup`` - a data-flow node that sets up a route on a server. - ``Concepts::HTTP::Server::HttpResponse`` - a data-flow node that creates an HTTP response on a server. - ``TaintTracking::Configuration`` - base class for custom global taint tracking analysis. -Examples -~~~~~~~~ +Examples of global data flow +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This query shows a data flow configuration that uses all network input as data sources: From 8786c700c2b804cbbd12ae1bb6c0d5ddea3b6948 Mon Sep 17 00:00:00 2001 From: Nick Rolfe Date: Wed, 2 Nov 2022 11:30:37 +0000 Subject: [PATCH 6/6] Expand explanations of example global data-flow queries --- .../analyzing-data-flow-in-ruby.rst | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst b/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst index bec5bc79ee6..b326bfa59aa 100644 --- a/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst +++ b/docs/codeql/codeql-language-guides/analyzing-data-flow-in-ruby.rst @@ -326,7 +326,10 @@ Class hierarchy Examples of global data flow ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -This query shows a data flow configuration that uses all network input as data sources: +The following global taint-tracking query finds path arguments in filesystem accesses that can be controlled by a remote user. + - Since this is a taint-tracking query, the configuration class extends ``TaintTracking::Configuration``. + - The ``isSource`` predicate defines sources as any data-flow nodes that are instances of ``RemoteFlowSource``. + - The ``isSink`` predicate defines sinks as path arguments in any filesystem access, using ``FileSystemAccess`` from the ``Concepts`` library. .. code-block:: ql @@ -349,7 +352,10 @@ This query shows a data flow configuration that uses all network input as data s where config.hasFlow(input, fileAccess) select fileAccess, "This file access uses data from $@.", input, "user-controllable input." -This data flow configuration tracks data flow from environment variables to opening files: +The following global data-flow query finds calls to ``File.open`` where the filename argument comes from an environment variable. + - Since this is a data-flow query, the configuration class extends ``DataFlow::Configuration``. + - The ``isSource`` predicate defines sources as expression nodes representing lookups on the ``ENV`` hash. + - The ``isSink`` predicate defines sinks as the first argument in any call to ``File.open``. .. code-block:: ql