From 63b2afb4ce71eec38575a1cc7cb56f6857cdd4fb Mon Sep 17 00:00:00 2001 From: Owen Mansel-Chan Date: Mon, 29 Jun 2020 11:46:09 +0100 Subject: [PATCH 1/5] Create guide for modeling go libraries --- .../learn-ql/go/library-modeling-go.md | 61 +++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 docs/language/learn-ql/go/library-modeling-go.md diff --git a/docs/language/learn-ql/go/library-modeling-go.md b/docs/language/learn-ql/go/library-modeling-go.md new file mode 100644 index 00000000000..70c3b2d8b43 --- /dev/null +++ b/docs/language/learn-ql/go/library-modeling-go.md @@ -0,0 +1,61 @@ +# Modeling Go libraries for CodeQL + +CodeQL does not examine the source code for external packages. To track the flow of untrusted data through them you need to create a model of the library. Existing models can be found in `ql/src/semmle/go/frameworks/`, and a good source of examples. You should make a new file in that folder, named after the library. + +## Sources + +To mark a source of data that is controlled by an untrusted user, we create a class extending `UntrustedFlowSource::Range`. Inheritance and the characteristic predicate of the class should be used to specify exactly the dataflow node that introduces the data. Here is a short example from `Mux.qll`. + +```go + class RequestVars extends DataFlow::UntrustedFlowSource::Range, DataFlow::CallNode { + RequestVars() { this.getTarget().hasQualifiedName("github.com/gorilla/mux", "Vars") } + } +``` + +## Flow propagation + +By default, it is assumed that all functions in libraries do not have any data flow. To indicate that a particular function does, you need to create a class extending `TaintTracking::FunctionModel`. Inheritance and the characteristic predicate of the class should specify the function and a method with the signature `override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp)` is need. The body should specify either `inp.isReceiver()` or `inp.isParameter(i)` for some `i`. It should also specify either `outp.isResult()` (only valid if there is only one return value), `outp.isResult(i)` or `outp.isParameter(i)` for some `i`. Here is an example from `Gin.qll`, slightly modified for brevity. + +```go +private class ParamsGet extends TaintTracking::FunctionModel, Method { + ParamsGet() { this.hasQualifiedName("github.com/gin-gonic/gin", "Params", "Get") } + + override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp) { + inp.isReceiver() and outp.isResult(0) + } +} +``` + +## Sanitizers + +It is not necessary to indicate that library methods are sanitizers - because the body is not analysed it is assumed that data does not flow through them. + +## Sinks + +Data-flow sinks are specified by queries rather than by library models. What library models can do is to indicate when functions belong to special categories of function, which queries can use when specifying sinks. Categories representing these special categories are contained in `ql/src/semmle/go/Concepts.qll`. For example, a call to a logging mechanism should be indicated by making a class that extends `LoggerCall::Range`, as in the following example from `Glog.qll`. + +```go +private class GlogCall extends LoggerCall::Range, DataFlow::CallNode { + GlogCall() { + exists(string fn | + fn.regexpMatch("Error(|f|ln)") + or + fn.regexpMatch("Exit(|f|ln)") + or + fn.regexpMatch("Fatal(|f|ln)") + or + fn.regexpMatch("Info(|f|ln)") + or + fn.regexpMatch("Warning(|f|ln)") + | + this.getTarget().hasQualifiedName("github.com/golang/glog", fn) + or + this.getTarget().(Method).hasQualifiedName("github.com/golang/glog", "Verbose", fn) + ) + } + + override DataFlow::Node getAMessageComponent() { result = this.getAnArgument() } +} +``` + +Other useful classes in `ql/src/semmle/go/Concepts.qll` include ones for HTTP response writers, HTTP redirects and marshaling and unmarshaling functions. From 88e2ae1b2e4db61d2e54a0c35cb51395530c4937 Mon Sep 17 00:00:00 2001 From: Owen Mansel-Chan Date: Tue, 30 Jun 2020 17:00:05 +0100 Subject: [PATCH 2/5] Address review comments --- .../learn-ql/go/library-modeling-go.md | 42 ++++++++----------- 1 file changed, 18 insertions(+), 24 deletions(-) diff --git a/docs/language/learn-ql/go/library-modeling-go.md b/docs/language/learn-ql/go/library-modeling-go.md index 70c3b2d8b43..d94681babcf 100644 --- a/docs/language/learn-ql/go/library-modeling-go.md +++ b/docs/language/learn-ql/go/library-modeling-go.md @@ -1,6 +1,6 @@ # Modeling Go libraries for CodeQL -CodeQL does not examine the source code for external packages. To track the flow of untrusted data through them you need to create a model of the library. Existing models can be found in `ql/src/semmle/go/frameworks/`, and a good source of examples. You should make a new file in that folder, named after the library. +When analyzing a Go program, CodeQL does not examine the source code for external packages. To track the flow of untrusted data through them you need to create a model of the library. Existing models can be found in `ql/src/semmle/go/frameworks/`, and are a good source of examples. You should make a new file in that folder, named after the library. ## Sources @@ -12,9 +12,15 @@ To mark a source of data that is controlled by an untrusted user, we create a cl } ``` +This has the effect that all calls to [the function `Vars` from the package `mux`](http://www.gorillatoolkit.org/pkg/mux#Vars) are treated as sources of untrusted data. + ## Flow propagation -By default, it is assumed that all functions in libraries do not have any data flow. To indicate that a particular function does, you need to create a class extending `TaintTracking::FunctionModel`. Inheritance and the characteristic predicate of the class should specify the function and a method with the signature `override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp)` is need. The body should specify either `inp.isReceiver()` or `inp.isParameter(i)` for some `i`. It should also specify either `outp.isResult()` (only valid if there is only one return value), `outp.isResult(i)` or `outp.isParameter(i)` for some `i`. Here is an example from `Gin.qll`, slightly modified for brevity. +By default, it is assumed that all functions in libraries do not have any data flow. To indicate that a particular function does, you need to create a class extending `TaintTracking::FunctionModel` (or `DataFlow::FunctionModel` if the untrusted user data is passed on without being modified). + +Inheritance and the characteristic predicate of the class should specify the function and a member predicate is needed with the signature `override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp)` (or `override predicate hasDataFlow(FunctionInput inp, FunctionOutput outp)` if extending `DataFlow::FunctionModel`). The body should constrain `inp` and `outp`. `FunctionOutput` is an abstract representation of the inputs to a function. The options are the receiver (`outp.isReceiver()`), one of the parameters (`outp.isParameter(i)`) or one of the results (`outp.isResult(i)`, or `outp.isResult` if there is only one result). Similarly, `FunctionInput` is an abstract representation of the inputs to a function. The options are the receiver (`inp.isReceiver()`), one of the parameters (`inp.isParameter(i)`) or one of the results (`inp.isResult(i)`, or `inp.isResult` if there is only one result). Note that it may seem strange that the result of a function could be considered as a function input, but it is needed in some cases. For instance, the function `bufio.NewWriter` returns a writer `bw` that buffers write operations to an underlying writer `w`. If tainted data is written to `bw`, then it makes sense to propagate that taint back to the underlying writer `w`, which can be modeled by saying that `bufio.NewWriter` propagates taint from its result to its first argument. + +Here is an example from `Gin.qll`, slightly modified for brevity. ```go private class ParamsGet extends TaintTracking::FunctionModel, Method { @@ -26,36 +32,24 @@ private class ParamsGet extends TaintTracking::FunctionModel, Method { } ``` +This has the effect that calls to the `Get` method with receiver type `Params` from the `gin-gonic/gin` package allow taint to flow from the receiver to the first result. In other words, if `p` has type `Params` and taint can flow to it then after the line `x := p.Get("foo")` taint can also flow to `x`. + ## Sanitizers -It is not necessary to indicate that library methods are sanitizers - because the body is not analysed it is assumed that data does not flow through them. +It is not necessary to indicate that library functions are sanitizers - because their bodies are not analyzed it is assumed that data does not flow through them. ## Sinks -Data-flow sinks are specified by queries rather than by library models. What library models can do is to indicate when functions belong to special categories of function, which queries can use when specifying sinks. Categories representing these special categories are contained in `ql/src/semmle/go/Concepts.qll`. For example, a call to a logging mechanism should be indicated by making a class that extends `LoggerCall::Range`, as in the following example from `Glog.qll`. +Data-flow sinks are specified by queries rather than by library models. What library models can do is to indicate when functions belong to special categories of function, which queries can use when specifying sinks. Classes representing these special categories are contained in `ql/src/semmle/go/Concepts.qll`, including ones for logger mechanisms, HTTP response writers, HTTP redirects and marshaling and unmarshaling functions. + +Here is a short example from `Stdlib.qll`, slightyly modified for brevity. ```go -private class GlogCall extends LoggerCall::Range, DataFlow::CallNode { - GlogCall() { - exists(string fn | - fn.regexpMatch("Error(|f|ln)") - or - fn.regexpMatch("Exit(|f|ln)") - or - fn.regexpMatch("Fatal(|f|ln)") - or - fn.regexpMatch("Info(|f|ln)") - or - fn.regexpMatch("Warning(|f|ln)") - | - this.getTarget().hasQualifiedName("github.com/golang/glog", fn) - or - this.getTarget().(Method).hasQualifiedName("github.com/golang/glog", "Verbose", fn) - ) - } +private class PrintfCall extends LoggerCall::Range, DataFlow::CallNode { + PrintfCall() { this.getTarget().hasQualifiedName("fmt", ["Print", "Printf", "Println"]) } - override DataFlow::Node getAMessageComponent() { result = this.getAnArgument() } + override DataFlow::Node getAMessageComponent() { result = this.getAnArgument } } ``` -Other useful classes in `ql/src/semmle/go/Concepts.qll` include ones for HTTP response writers, HTTP redirects and marshaling and unmarshaling functions. +This has the effect that any call to `Print`, `Printf` or `Println` in the package `fmt` is recognised as a logger call, and in any query which uses logger calls as a sink then passing tainted data as an argument to `Print`, `Printf` or `Println` will create a result for the query. From 126d214a2d02024654d126d7aa0fb36ed6b73317 Mon Sep 17 00:00:00 2001 From: Owen Mansel-Chan Date: Wed, 1 Jul 2020 10:04:55 +0100 Subject: [PATCH 3/5] Address review comments --- .../learn-ql/go/library-modeling-go.md | 26 ++++++++++++++----- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/docs/language/learn-ql/go/library-modeling-go.md b/docs/language/learn-ql/go/library-modeling-go.md index d94681babcf..bfce02f0c97 100644 --- a/docs/language/learn-ql/go/library-modeling-go.md +++ b/docs/language/learn-ql/go/library-modeling-go.md @@ -6,7 +6,7 @@ When analyzing a Go program, CodeQL does not examine the source code for externa To mark a source of data that is controlled by an untrusted user, we create a class extending `UntrustedFlowSource::Range`. Inheritance and the characteristic predicate of the class should be used to specify exactly the dataflow node that introduces the data. Here is a short example from `Mux.qll`. -```go +```ql class RequestVars extends DataFlow::UntrustedFlowSource::Range, DataFlow::CallNode { RequestVars() { this.getTarget().hasQualifiedName("github.com/gorilla/mux", "Vars") } } @@ -18,11 +18,23 @@ This has the effect that all calls to [the function `Vars` from the package `mux By default, it is assumed that all functions in libraries do not have any data flow. To indicate that a particular function does, you need to create a class extending `TaintTracking::FunctionModel` (or `DataFlow::FunctionModel` if the untrusted user data is passed on without being modified). -Inheritance and the characteristic predicate of the class should specify the function and a member predicate is needed with the signature `override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp)` (or `override predicate hasDataFlow(FunctionInput inp, FunctionOutput outp)` if extending `DataFlow::FunctionModel`). The body should constrain `inp` and `outp`. `FunctionOutput` is an abstract representation of the inputs to a function. The options are the receiver (`outp.isReceiver()`), one of the parameters (`outp.isParameter(i)`) or one of the results (`outp.isResult(i)`, or `outp.isResult` if there is only one result). Similarly, `FunctionInput` is an abstract representation of the inputs to a function. The options are the receiver (`inp.isReceiver()`), one of the parameters (`inp.isParameter(i)`) or one of the results (`inp.isResult(i)`, or `inp.isResult` if there is only one result). Note that it may seem strange that the result of a function could be considered as a function input, but it is needed in some cases. For instance, the function `bufio.NewWriter` returns a writer `bw` that buffers write operations to an underlying writer `w`. If tainted data is written to `bw`, then it makes sense to propagate that taint back to the underlying writer `w`, which can be modeled by saying that `bufio.NewWriter` propagates taint from its result to its first argument. +Inheritance and the characteristic predicate of the class should specify the function and a member predicate is needed with the signature `override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp)` (or `override predicate hasDataFlow(FunctionInput inp, FunctionOutput outp)` if extending `DataFlow::FunctionModel`). The body should constrain `inp` and `outp`. + +`FunctionInput` is an abstract representation of the inputs to a function. The options are: +* the receiver (`inp.isReceiver()`) +* one of the parameters (`inp.isParameter(i)`) +* one of the results (`inp.isResult(i)`, or `inp.isResult` if there is only one result) + +Note that it may seem strange that the result of a function could be considered as a function input, but it is needed in some cases. For instance, the function `bufio.NewWriter` returns a writer `bw` that buffers write operations to an underlying writer `w`. If tainted data is written to `bw`, then it makes sense to propagate that taint back to the underlying writer `w`, which can be modeled by saying that `bufio.NewWriter` propagates taint from its result to its first argument. + +Similarly, `FunctionOutput` is an abstract representation of the inputs to a function. The options are: +* the receiver (`outp.isReceiver()`) +* one of the parameters (`outp.isParameter(i)`) +* one of the results (`outp.isResult(i)`, or `outp.isResult` if there is only one result) Here is an example from `Gin.qll`, slightly modified for brevity. -```go +```ql private class ParamsGet extends TaintTracking::FunctionModel, Method { ParamsGet() { this.hasQualifiedName("github.com/gin-gonic/gin", "Params", "Get") } @@ -42,14 +54,14 @@ It is not necessary to indicate that library functions are sanitizers - because Data-flow sinks are specified by queries rather than by library models. What library models can do is to indicate when functions belong to special categories of function, which queries can use when specifying sinks. Classes representing these special categories are contained in `ql/src/semmle/go/Concepts.qll`, including ones for logger mechanisms, HTTP response writers, HTTP redirects and marshaling and unmarshaling functions. -Here is a short example from `Stdlib.qll`, slightyly modified for brevity. +Here is a short example from `Stdlib.qll`, slightly modified for brevity. -```go +```ql private class PrintfCall extends LoggerCall::Range, DataFlow::CallNode { PrintfCall() { this.getTarget().hasQualifiedName("fmt", ["Print", "Printf", "Println"]) } - override DataFlow::Node getAMessageComponent() { result = this.getAnArgument } + override DataFlow::Node getAMessageComponent() { result = this.getAnArgument() } } ``` -This has the effect that any call to `Print`, `Printf` or `Println` in the package `fmt` is recognised as a logger call, and in any query which uses logger calls as a sink then passing tainted data as an argument to `Print`, `Printf` or `Println` will create a result for the query. +This has the effect that any call to `Print`, `Printf`, or `Println` in the package `fmt` is recognised as a logger call, and in any query which uses logger calls as a sink then passing tainted data as an argument to `Print`, `Printf`, or `Println` will create a result for the query. From 3a2a33b956ad6d58a4f3d1098cc5adaa56c42268 Mon Sep 17 00:00:00 2001 From: Owen Mansel-Chan Date: Wed, 1 Jul 2020 10:43:08 +0100 Subject: [PATCH 4/5] Convert to reStructuredText Annoyingly rst won't easily let you make some text monospace inside the text for a link. The only other things I've changed from pandoc's output are changing "code::" to "code-block::" and adding whitespace to get the lists to format correctly. --- .../learn-ql/go/library-modeling-go.rst | 120 ++++++++++++++++++ 1 file changed, 120 insertions(+) create mode 100644 docs/language/learn-ql/go/library-modeling-go.rst diff --git a/docs/language/learn-ql/go/library-modeling-go.rst b/docs/language/learn-ql/go/library-modeling-go.rst new file mode 100644 index 00000000000..44f9c4538b6 --- /dev/null +++ b/docs/language/learn-ql/go/library-modeling-go.rst @@ -0,0 +1,120 @@ +Modeling Go libraries for CodeQL +================================ + +When analyzing a Go program, CodeQL does not examine the source code for +external packages. To track the flow of untrusted data through them you +need to create a model of the library. Existing models can be found in +``ql/src/semmle/go/frameworks/``, and are a good source of examples. You +should make a new file in that folder, named after the library. + +Sources +------- + +To mark a source of data that is controlled by an untrusted user, we +create a class extending ``UntrustedFlowSource::Range``. Inheritance and +the characteristic predicate of the class should be used to specify +exactly the dataflow node that introduces the data. Here is a short +example from ``Mux.qll``. + +.. code-block:: ql + + class RequestVars extends DataFlow::UntrustedFlowSource::Range, DataFlow::CallNode { + RequestVars() { this.getTarget().hasQualifiedName("github.com/gorilla/mux", "Vars") } + } + +This has the effect that all calls to `the function Vars from the +package mux `__ are +treated as sources of untrusted data. + +Flow propagation +---------------- + +By default, it is assumed that all functions in libraries do not have +any data flow. To indicate that a particular function does, you need to +create a class extending ``TaintTracking::FunctionModel`` (or +``DataFlow::FunctionModel`` if the untrusted user data is passed on +without being modified). + +Inheritance and the characteristic predicate of the class should specify +the function and a member predicate is needed with the signature +``override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp)`` +(or +``override predicate hasDataFlow(FunctionInput inp, FunctionOutput outp)`` +if extending ``DataFlow::FunctionModel``). The body should constrain +``inp`` and ``outp``. + +``FunctionInput`` is an abstract representation of the inputs to a +function. The options are: + +* the receiver (``inp.isReceiver()``) +* one of the parameters (``inp.isParameter(i)``) +* one of the results (``inp.isResult(i)``, or ``inp.isResult`` if there is only one result) + +Note that it may seem strange that the result of a function could be +considered as a function input, but it is needed in some cases. For +instance, the function ``bufio.NewWriter`` returns a writer ``bw`` that +buffers write operations to an underlying writer ``w``. If tainted data +is written to ``bw``, then it makes sense to propagate that taint back +to the underlying writer ``w``, which can be modeled by saying that +``bufio.NewWriter`` propagates taint from its result to its first +argument. + +Similarly, ``FunctionOutput`` is an abstract representation of the +inputs to a function. The options are: + +* the receiver (``outp.isReceiver()``) +* one of the parameters (``outp.isParameter(i)``) +* one of the results (``outp.isResult(i)``, or ``outp.isResult`` if there is only one result) + +Here is an example from ``Gin.qll``, slightly modified for brevity. + +.. code-block:: ql + + private class ParamsGet extends TaintTracking::FunctionModel, Method { + ParamsGet() { this.hasQualifiedName("github.com/gin-gonic/gin", "Params", "Get") } + + override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp) { + inp.isReceiver() and outp.isResult(0) + } + } + +This has the effect that calls to the ``Get`` method with receiver type +``Params`` from the ``gin-gonic/gin`` package allow taint to flow from +the receiver to the first result. In other words, if ``p`` has type +``Params`` and taint can flow to it then after the line +``x := p.Get("foo")`` taint can also flow to ``x``. + +Sanitizers +---------- + +It is not necessary to indicate that library functions are sanitizers - +because their bodies are not analyzed it is assumed that data does not +flow through them. + +Sinks +----- + +Data-flow sinks are specified by queries rather than by library models. +What library models can do is to indicate when functions belong to +special categories of function, which queries can use when specifying +sinks. Classes representing these special categories are contained in +``ql/src/semmle/go/Concepts.qll``, including ones for logger mechanisms, +HTTP response writers, HTTP redirects and marshaling and unmarshaling +functions. + +Here is a short example from ``Stdlib.qll``, slightly modified for +brevity. + +.. code-block:: ql + + private class PrintfCall extends LoggerCall::Range, DataFlow::CallNode { + PrintfCall() { this.getTarget().hasQualifiedName("fmt", ["Print", "Printf", "Println"]) } + + override DataFlow::Node getAMessageComponent() { result = this.getAnArgument() } + } + +This has the effect that any call to ``Print``, ``Printf``, or +``Println`` in the package ``fmt`` is recognised as a logger call, and +in any query which uses logger calls as a sink then passing tainted data +as an argument to ``Print``, ``Printf``, or ``Println`` will create a +result for the query. From 4a002c304452294091b1345db749a5c8826fd109 Mon Sep 17 00:00:00 2001 From: Owen Mansel-Chan Date: Wed, 1 Jul 2020 15:08:00 +0100 Subject: [PATCH 5/5] Address review comments and delete md file --- .../learn-ql/go/library-modeling-go.md | 67 ------------------- .../learn-ql/go/library-modeling-go.rst | 41 ++++++------ 2 files changed, 22 insertions(+), 86 deletions(-) delete mode 100644 docs/language/learn-ql/go/library-modeling-go.md diff --git a/docs/language/learn-ql/go/library-modeling-go.md b/docs/language/learn-ql/go/library-modeling-go.md deleted file mode 100644 index bfce02f0c97..00000000000 --- a/docs/language/learn-ql/go/library-modeling-go.md +++ /dev/null @@ -1,67 +0,0 @@ -# Modeling Go libraries for CodeQL - -When analyzing a Go program, CodeQL does not examine the source code for external packages. To track the flow of untrusted data through them you need to create a model of the library. Existing models can be found in `ql/src/semmle/go/frameworks/`, and are a good source of examples. You should make a new file in that folder, named after the library. - -## Sources - -To mark a source of data that is controlled by an untrusted user, we create a class extending `UntrustedFlowSource::Range`. Inheritance and the characteristic predicate of the class should be used to specify exactly the dataflow node that introduces the data. Here is a short example from `Mux.qll`. - -```ql - class RequestVars extends DataFlow::UntrustedFlowSource::Range, DataFlow::CallNode { - RequestVars() { this.getTarget().hasQualifiedName("github.com/gorilla/mux", "Vars") } - } -``` - -This has the effect that all calls to [the function `Vars` from the package `mux`](http://www.gorillatoolkit.org/pkg/mux#Vars) are treated as sources of untrusted data. - -## Flow propagation - -By default, it is assumed that all functions in libraries do not have any data flow. To indicate that a particular function does, you need to create a class extending `TaintTracking::FunctionModel` (or `DataFlow::FunctionModel` if the untrusted user data is passed on without being modified). - -Inheritance and the characteristic predicate of the class should specify the function and a member predicate is needed with the signature `override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp)` (or `override predicate hasDataFlow(FunctionInput inp, FunctionOutput outp)` if extending `DataFlow::FunctionModel`). The body should constrain `inp` and `outp`. - -`FunctionInput` is an abstract representation of the inputs to a function. The options are: -* the receiver (`inp.isReceiver()`) -* one of the parameters (`inp.isParameter(i)`) -* one of the results (`inp.isResult(i)`, or `inp.isResult` if there is only one result) - -Note that it may seem strange that the result of a function could be considered as a function input, but it is needed in some cases. For instance, the function `bufio.NewWriter` returns a writer `bw` that buffers write operations to an underlying writer `w`. If tainted data is written to `bw`, then it makes sense to propagate that taint back to the underlying writer `w`, which can be modeled by saying that `bufio.NewWriter` propagates taint from its result to its first argument. - -Similarly, `FunctionOutput` is an abstract representation of the inputs to a function. The options are: -* the receiver (`outp.isReceiver()`) -* one of the parameters (`outp.isParameter(i)`) -* one of the results (`outp.isResult(i)`, or `outp.isResult` if there is only one result) - -Here is an example from `Gin.qll`, slightly modified for brevity. - -```ql -private class ParamsGet extends TaintTracking::FunctionModel, Method { - ParamsGet() { this.hasQualifiedName("github.com/gin-gonic/gin", "Params", "Get") } - - override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp) { - inp.isReceiver() and outp.isResult(0) - } -} -``` - -This has the effect that calls to the `Get` method with receiver type `Params` from the `gin-gonic/gin` package allow taint to flow from the receiver to the first result. In other words, if `p` has type `Params` and taint can flow to it then after the line `x := p.Get("foo")` taint can also flow to `x`. - -## Sanitizers - -It is not necessary to indicate that library functions are sanitizers - because their bodies are not analyzed it is assumed that data does not flow through them. - -## Sinks - -Data-flow sinks are specified by queries rather than by library models. What library models can do is to indicate when functions belong to special categories of function, which queries can use when specifying sinks. Classes representing these special categories are contained in `ql/src/semmle/go/Concepts.qll`, including ones for logger mechanisms, HTTP response writers, HTTP redirects and marshaling and unmarshaling functions. - -Here is a short example from `Stdlib.qll`, slightly modified for brevity. - -```ql -private class PrintfCall extends LoggerCall::Range, DataFlow::CallNode { - PrintfCall() { this.getTarget().hasQualifiedName("fmt", ["Print", "Printf", "Println"]) } - - override DataFlow::Node getAMessageComponent() { result = this.getAnArgument() } -} -``` - -This has the effect that any call to `Print`, `Printf`, or `Println` in the package `fmt` is recognised as a logger call, and in any query which uses logger calls as a sink then passing tainted data as an argument to `Print`, `Printf`, or `Println` will create a result for the query. diff --git a/docs/language/learn-ql/go/library-modeling-go.rst b/docs/language/learn-ql/go/library-modeling-go.rst index 44f9c4538b6..26ab5de341c 100644 --- a/docs/language/learn-ql/go/library-modeling-go.rst +++ b/docs/language/learn-ql/go/library-modeling-go.rst @@ -1,11 +1,13 @@ -Modeling Go libraries for CodeQL -================================ +Modeling data flow in Go libraries +================================== When analyzing a Go program, CodeQL does not examine the source code for -external packages. To track the flow of untrusted data through them you -need to create a model of the library. Existing models can be found in -``ql/src/semmle/go/frameworks/``, and are a good source of examples. You -should make a new file in that folder, named after the library. +external packages. To track the flow of untrusted data through a library you +can create a model of the library. + +You can find existing models in the ``ql/src/semmle/go/frameworks/`` folder of the +`CodeQL for Go repository `__. +To add a new model, you should make a new file in that folder, named after the library. Sources ------- @@ -29,14 +31,14 @@ treated as sources of untrusted data. Flow propagation ---------------- -By default, it is assumed that all functions in libraries do not have -any data flow. To indicate that a particular function does, you need to +By default, we assume that all functions in libraries do not have +any data flow. To indicate that a particular function does have data flow, create a class extending ``TaintTracking::FunctionModel`` (or ``DataFlow::FunctionModel`` if the untrusted user data is passed on without being modified). Inheritance and the characteristic predicate of the class should specify -the function and a member predicate is needed with the signature +the function. The class should also have a member predicate with the signature ``override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp)`` (or ``override predicate hasDataFlow(FunctionInput inp, FunctionOutput outp)`` @@ -60,7 +62,7 @@ to the underlying writer ``w``, which can be modeled by saying that argument. Similarly, ``FunctionOutput`` is an abstract representation of the -inputs to a function. The options are: +outputs to a function. The options are: * the receiver (``outp.isReceiver()``) * one of the parameters (``outp.isParameter(i)``) @@ -81,7 +83,7 @@ Here is an example from ``Gin.qll``, slightly modified for brevity. This has the effect that calls to the ``Get`` method with receiver type ``Params`` from the ``gin-gonic/gin`` package allow taint to flow from the receiver to the first result. In other words, if ``p`` has type -``Params`` and taint can flow to it then after the line +``Params`` and taint can flow to it, then after the line ``x := p.Get("foo")`` taint can also flow to ``x``. Sanitizers @@ -95,11 +97,13 @@ Sinks ----- Data-flow sinks are specified by queries rather than by library models. -What library models can do is to indicate when functions belong to -special categories of function, which queries can use when specifying +However, you can use library models to indicate when functions belong to +special categories. Queries can then use these categories when specifying sinks. Classes representing these special categories are contained in -``ql/src/semmle/go/Concepts.qll``, including ones for logger mechanisms, -HTTP response writers, HTTP redirects and marshaling and unmarshaling +``ql/src/semmle/go/Concepts.qll`` in the `CodeQL for Go repository +`__, +including classes for logger mechanisms, +HTTP response writers, HTTP redirects, and marshaling and unmarshaling functions. Here is a short example from ``Stdlib.qll``, slightly modified for @@ -114,7 +118,6 @@ brevity. } This has the effect that any call to ``Print``, ``Printf``, or -``Println`` in the package ``fmt`` is recognised as a logger call, and -in any query which uses logger calls as a sink then passing tainted data -as an argument to ``Print``, ``Printf``, or ``Println`` will create a -result for the query. +``Println`` in the package ``fmt`` is recognized as a logger call. +Any query that uses logger calls as a sink will then identify when tainted data +has been passed as an argument to ``Print``, ``Printf``, or ``Println``.