Delete old docs for data flow in Go

This commit is contained in:
Owen Mansel-Chan
2025-01-16 12:03:14 +00:00
parent cab9c64fbc
commit 4f2d7ade5b
2 changed files with 0 additions and 246 deletions

View File

@@ -1,124 +0,0 @@
.. _modeling-data-flow-in-go-libraries:
Modeling data flow in Go libraries
==================================
When analyzing a Go program, CodeQL does not examine the source code for
external packages. To track the flow of untrusted data through a library, you
can create a model of the library.
You can find existing models in the ``go/ql/lib/semmle/go/frameworks/`` folder of the
`CodeQL repository <https://github.com/github/codeql/tree/main/go/ql/lib/semmle/go/frameworks>`__.
To add a new model, you should make a new file in that folder, named after the library.
Sources
-------
To mark a source of data that is controlled by an untrusted user, we
create a class extending ``RemoteFlowSource::Range``. Inheritance and
the characteristic predicate of the class should be used to specify
exactly the dataflow node that introduces the data. Here is a short
example from ``Mux.qll``.
.. code-block:: ql
class RequestVars extends DataFlow::RemoteFlowSource::Range, DataFlow::CallNode {
RequestVars() { this.getTarget().hasQualifiedName("github.com/gorilla/mux", "Vars") }
}
This has the effect that all calls to `the function Vars from the
package mux <https://github.com/gorilla/mux>`__ are
treated as sources of untrusted data.
Flow propagation
----------------
By default, we assume that all functions in libraries do not have
any data flow. To indicate that a particular function does have data flow,
create a class extending ``TaintTracking::FunctionModel`` (or
``DataFlow::FunctionModel`` if the untrusted user data is passed on
without being modified).
Inheritance and the characteristic predicate of the class should specify
the function. The class should also have a member predicate with the signature
``override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp)``
(or
``override predicate hasDataFlow(FunctionInput inp, FunctionOutput outp)``
if extending ``DataFlow::FunctionModel``). The body should constrain
``inp`` and ``outp``.
``FunctionInput`` is an abstract representation of the inputs to a
function. The options are:
* the receiver (``inp.isReceiver()``)
* one of the parameters (``inp.isParameter(i)``)
* one of the results (``inp.isResult(i)``, or ``inp.isResult`` if there is only one result)
Note that it may seem strange that the result of a function could be
considered as a function input, but it is needed in some cases. For
instance, the function ``bufio.NewWriter`` returns a writer ``bw`` that
buffers write operations to an underlying writer ``w``. If tainted data
is written to ``bw``, then it makes sense to propagate that taint back
to the underlying writer ``w``, which can be modeled by saying that
``bufio.NewWriter`` propagates taint from its result to its first
argument.
Similarly, ``FunctionOutput`` is an abstract representation of the
outputs to a function. The options are:
* the receiver (``outp.isReceiver()``)
* one of the parameters (``outp.isParameter(i)``)
* one of the results (``outp.isResult(i)``, or ``outp.isResult`` if there is only one result)
Here is an example from ``Gin.qll``, which has been slightly simplified.
.. code-block:: ql
private class ParamsGet extends TaintTracking::FunctionModel, Method {
ParamsGet() { this.hasQualifiedName("github.com/gin-gonic/gin", "Params", "Get") }
override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp) {
inp.isReceiver() and outp.isResult(0)
}
}
This has the effect that calls to the ``Get`` method with receiver type
``Params`` from the ``gin-gonic/gin`` package allow taint to flow from
the receiver to the first result. In other words, if ``p`` has type
``Params`` and taint can flow to it, then after the line
``x := p.Get("foo")`` taint can also flow to ``x``.
Sanitizers
----------
It is not necessary to indicate that library functions are sanitizers.
Their bodies are not analyzed, so it is assumed that data does not
flow through them.
Sinks
-----
Data-flow sinks are specified by queries rather than by library models.
However, you can use library models to indicate when functions belong to
special categories. Queries can then use these categories when specifying
sinks. Classes representing these special categories are contained in
``go/ql/lib/semmle/go/Concepts.qll`` in the `CodeQL repository
<https://github.com/github/codeql/blob/main/go/ql/lib/semmle/go/Concepts.qll>`__.
``Concepts.qll`` includes classes for logger mechanisms,
HTTP response writers, HTTP redirects, and marshaling and unmarshaling
functions.
Here is a short example from ``Stdlib.qll``, which has been slightly simplified.
.. code-block:: ql
private class PrintfCall extends LoggerCall::Range, DataFlow::CallNode {
PrintfCall() { this.getTarget().hasQualifiedName("fmt", ["Print", "Printf", "Println"]) }
override DataFlow::Node getAMessageComponent() { result = this.getAnArgument() }
}
This has the effect that any call to ``Print``, ``Printf``, or
``Println`` in the package ``fmt`` is recognized as a logger call.
Any query that uses logger calls as a sink will then identify when tainted data
has been passed as an argument to ``Print``, ``Printf``, or ``Println``.

View File

@@ -1,122 +0,0 @@
Modeling data flow in Go libraries
==================================
When analyzing a Go program, CodeQL does not examine the source code for
external packages. To track the flow of untrusted data through a library, you
can create a model of the library.
You can find existing models in the ``go/ql/lib/semmle/go/frameworks/`` folder of the
`CodeQL repository <https://github.com/github/codeql/tree/main/go/ql/lib/semmle/go/frameworks>`__.
To add a new model, you should make a new file in that folder, named after the library.
Sources
-------
To mark a source of data that is controlled by an untrusted user, we
create a class extending ``RemoteFlowSource::Range``. Inheritance and
the characteristic predicate of the class should be used to specify
exactly the dataflow node that introduces the data. Here is a short
example from ``Mux.qll``.
.. code-block:: ql
class RequestVars extends DataFlow::RemoteFlowSource::Range, DataFlow::CallNode {
RequestVars() { this.getTarget().hasQualifiedName("github.com/gorilla/mux", "Vars") }
}
This has the effect that all calls to `the function Vars from the
package mux <http://www.gorillatoolkit.org/pkg/mux#Vars>`__ are
treated as sources of untrusted data.
Flow propagation
----------------
By default, we assume that all functions in libraries do not have
any data flow. To indicate that a particular function does have data flow,
create a class extending ``TaintTracking::FunctionModel`` (or
``DataFlow::FunctionModel`` if the untrusted user data is passed on
without being modified).
Inheritance and the characteristic predicate of the class should specify
the function. The class should also have a member predicate with the signature
``override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp)``
(or
``override predicate hasDataFlow(FunctionInput inp, FunctionOutput outp)``
if extending ``DataFlow::FunctionModel``). The body should constrain
``inp`` and ``outp``.
``FunctionInput`` is an abstract representation of the inputs to a
function. The options are:
* the receiver (``inp.isReceiver()``)
* one of the parameters (``inp.isParameter(i)``)
* one of the results (``inp.isResult(i)``, or ``inp.isResult`` if there is only one result)
Note that it may seem strange that the result of a function could be
considered as a function input, but it is needed in some cases. For
instance, the function ``bufio.NewWriter`` returns a writer ``bw`` that
buffers write operations to an underlying writer ``w``. If tainted data
is written to ``bw``, then it makes sense to propagate that taint back
to the underlying writer ``w``, which can be modeled by saying that
``bufio.NewWriter`` propagates taint from its result to its first
argument.
Similarly, ``FunctionOutput`` is an abstract representation of the
outputs to a function. The options are:
* the receiver (``outp.isReceiver()``)
* one of the parameters (``outp.isParameter(i)``)
* one of the results (``outp.isResult(i)``, or ``outp.isResult`` if there is only one result)
Here is an example from ``Gin.qll``, which has been slightly simplified.
.. code-block:: ql
private class ParamsGet extends TaintTracking::FunctionModel, Method {
ParamsGet() { this.hasQualifiedName("github.com/gin-gonic/gin", "Params", "Get") }
override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp) {
inp.isReceiver() and outp.isResult(0)
}
}
This has the effect that calls to the ``Get`` method with receiver type
``Params`` from the ``gin-gonic/gin`` package allow taint to flow from
the receiver to the first result. In other words, if ``p`` has type
``Params`` and taint can flow to it, then after the line
``x := p.Get("foo")`` taint can also flow to ``x``.
Sanitizers
----------
It is not necessary to indicate that library functions are sanitizers.
Their bodies are not analyzed, so it is assumed that data does not
flow through them.
Sinks
-----
Data-flow sinks are specified by queries rather than by library models.
However, you can use library models to indicate when functions belong to
special categories. Queries can then use these categories when specifying
sinks. Classes representing these special categories are contained in
``go/ql/lib/semmle/go/Concepts.qll`` in the `CodeQL for Go repository
<https://github.com/github/codeql/blob/main/go/ql/lib/semmle/go/Concepts.qll>`__.
``Concepts.qll`` includes classes for logger mechanisms,
HTTP response writers, HTTP redirects, and marshaling and unmarshaling
functions.
Here is a short example from ``Stdlib.qll``, which has been slightly simplified.
.. code-block:: ql
private class PrintfCall extends LoggerCall::Range, DataFlow::CallNode {
PrintfCall() { this.getTarget().hasQualifiedName("fmt", ["Print", "Printf", "Println"]) }
override DataFlow::Node getAMessageComponent() { result = this.getAnArgument() }
}
This has the effect that any call to ``Print``, ``Printf``, or
``Println`` in the package ``fmt`` is recognized as a logger call.
Any query that uses logger calls as a sink will then identify when tainted data
has been passed as an argument to ``Print``, ``Printf``, or ``Println``.