docs: add several cpp training slides

This commit is contained in:
james
2019-08-05 10:46:57 +01:00
parent b4856e928b
commit 819f308010
19 changed files with 1603 additions and 2 deletions

View File

@@ -0,0 +1,228 @@
Example: Bad overflow guard
===========================
.. container:: semmle-logo
Semmle :sup:`TM`
Getting started and setting up
==============================
To try the examples in this presentation you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- Snapshot: `ChakraCore <https://downloads.lgtm.com/snapshots/cpp/microsoft/chakracore/ChakraCore-revision-2017-April-12--18-13-26.zip>`__
More resources:
- To learn more about the main features of QL, try looking at the `QL language handbook <https://help.semmle.com/QL/ql-handbook/>`__.
- For further information about writing queries in QL, see `Writing QL queries <https://help.semmle.com/QL/learn-ql/ql/writing-queries/writing-queries.html>`__.
.. note::
To run the queries featured in this training presentation, we recommend you download the free-to-use `QL for Eclipse plugin <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/getting-started.html>`__.
This plugin allows you to locally access the latest features of QL, including the standard QL libraries and queries. It also provides standard IDE features such as syntax highlighting, jump-to-definition, and tab completion.
A good project to start analyzing is `ChakraCore <https://github.com/microsoft/ChakraCore>`__a suitable snapshot to query is available by visiting the link on the slide.
Alternatively, you can query any project (including ChakraCore) in the `query console on LGTM.com <https://lgtm.com/query/project:2034240708/lang:cpp/>`__.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been addedthe snapshot available to download above is based on an historical version of the code base.
Checking for overflow in C
==========================
- Arithmetic operations overflow if the result is too large for their type.
- Developers sometimes exploit this to write overflow checks:
.. code-block:: cpp
if (v + b < v) {
handle_error("overflow");
} else {
result = v + b;
}
Where might this go wrong?
.. note::
- In C/C++ we often need to check for whether an operation `overflows <https://en.wikipedia.org/wiki/Integer_overflow>`__.
- An overflow is when an arithmetic operation, such as an addition, results in a number which is too large to be stored in the type.
- When an operation overflows, the value “wraps” around.
- A typical way to check for overflow of an addition, therefore, is whether the result is less than one of the arguments - i.e. the result has “wrapped”.
Integer promotion
=================
From `https://en.cppreference.com/w/c/language/conversion <https://en.cppreference.com/w/c/language/conversion>`__:
*Integer promotion is the implicit conversion of a value of any integer type with rank less or equal to rank of* ``int`` *... to the value of type* ``int`` *or* ``unsigned int``.
The arguments of the following arithmetic operators undergo implicit conversions:
- binary arithmetic (* / % + - )
- relational operators (< > <= >= == !=)
- binary bitwise operators (& ^ \|)
- the conditional operator (?:)
.. note::
- Most of the time integer conversion works fine. However, the rules governing addition in C/C++ are complex, and it easy to get bitten.
- CPUs generally prefer to perform arithmetic operations on 32 bit or larger integers, as the architectures are optimised to perform those efficiently.
- The compiler therefore performs “integer promotion” for arguments to arithmetic operations that are smaller than 32 bit.
Checking for overflow in C revisited
====================================
Which branch gets executed in this example? What is the value of ``result``?
.. code-block:: cpp
uint16_t v = 65535;
uint16_t b = 1;
uint16_t result;
if (v + b < v) {
handle_error("overflow");
} else {
result = v + b;
}
.. note::
In this example the second branch is executed, even though there is a 16-bit overflow, and ``result`` is set to zero.
Checking for overflow in C revisited
====================================
Here is the example again, with the conversions made explicit:
.. code-block:: cpp
uint16_t v = 65535;
uint16_t b = 1;
uint16_t result;
if ((int)v + (int)b < (int)v) {
handle_error("overflow");
} else {
result = (uint16_t)((int)v + (int)b);
}
.. note::
In this example the second branch is executed, even though there is a 16-bit overflow, and result is set to zero.
Explanation:
- The two integer arguments to the addition, v and b, are promoted to 32 bit integers.
- The comparison (<) is also an arithmetic operation, therefore it will also be completed on 32 bit integers.
- This means that v + b < v will never be true, because v and b can hold at most 216.
- Therefore, the second branch is executed, but the result of the addition is stored into the result variable. Overflow will still occur as result is a 16 bit integer.
This happens even though the overflow check passed!
.. rst-class:: background2
Developing a QL query
=====================
Finding bad overflow guards
QL query: bad overflow guards
=============================
Lets look for overflow guards of the form ``v + b < v``, using the classes
``AddExpr``, ``Variable`` and ``RelationalOperation`` from the ``cpp`` library.
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/bad-overflow-guard-1.ql
:language: ql
.. note::
- When performing `variant analysis <https://semmle.com/variant-analysis>`__, it is usually helpful to write a simple query that finds the simple syntactic pattern, before trying to go on to describe the cases where it goes wrong.
- In this case, we start by looking for all the *overflow* checks, before trying to refine the query to find all *bad overflow* checks.
- The select clause defines what this query is looking for:
- an ``AddExpr``: the expression that is being checked for overflow.
- a ``RelationalOperation``: the overflow comparison check.
- a ``Variable``: used as an argument to both the addition and comparison.
- The where part of the query ties these three QL variables together using `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ defined in the `standard QL for C/C++ library <https://help.semmle.com/qldoc/cpp/>`__.
QL query: bad overflow guards
=============================
We want to ensure the operands being added have size less than 4 bytes.
We may want to reuse this logic, so let us create a separate predicate.
Looking at autocomplete suggestions, we see that we can get the type of an expression using the ``getType()`` method.
We can get the size (in bytes) of a type using the ``getSize()`` method.
.. rst-class:: build
.. code-block:: ql
predicate isSmall(Expr e) {
e.getType().getSize() < 4
}
.. note::
- An important part of the query is to determine whether a given expression has a “small” type that is going to trigger integer promotion.
- We therefore write a helper predicate for small expressions.
- This predicate effectively represents the set of all expressions in the database where the size of the type of the expression is less than 4 bytes, i.e. less than 32 bits.
QL query: bad overflow guards
=============================
We can ensure the operands being added have size less than 4 bytes, using our new predicate.
QL has logical quantifiers like ``exists`` and ``forall``, allowing us to declare variables and enforce a certain condition on them.
Now our query becomes:
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/bad-overflow-guard-2.ql
:language: ql
.. note::
- Recall from earlier that what makes an overflow check a “bad” check is that all the arguments to the addition are integers smaller than 32 bits.
- We could write this by using our helper predicate ``isSmall`` to specify that each individual operand to the addition ``isSmall`` (i.e. under 32 bits):
.. code-block:: ql
isSmall(a.getLeftOperand()) and
isSmall(a.getRightOperand())
- However, this is a little bit repetitive. What we really want to say is that: all the operands of the addition are small. Fortunately, QL provides a ``forall`` formula that we can use in these circumstances.
- A ``forall`` has three parts:
- A declaration part, where we can introduce variables.
- A “range” part, which allows us to restrict those variables.
- A “condition” part. The ``foral`` as a whole holds if the condition holds for each of the values in the range.
- In our case:
- The declaration introduces a variable for Expressions, called ``op``. At this stage, this variable represents all the expressions in the program.
- The “range” part, ``op = a.getAnOperand()``, restricts ``op`` to being one of the two operands to the addition.
- The “condition” part, ``isSmall(op)``, says that the ``forall`` holds only if the condition - that the ``op`` is small - holds for everything in the range - i.e. both the arguments to the addition
QL query: bad overflow guards
=============================
In some cases the result of the addition is cast to a small type of size less than 4 bytes, preventing automatic widening. We dont want our query to flag these instances.
We can use predicate ``Expr.getExplicitlyConverted()`` to reason about casts that are applied to an expression, adding this restriction to our query:
.. code-block:: ql
not isSmall(a.getExplicitlyConverted())
The final query
===============
.. literalinclude:: ../query-examples/cpp/bad-overflow-guard-3.ql
:language: ql
This query finds a single result, which is `a genuine bug in ChakraCore <https://github.com/Microsoft/ChakraCore/commit/2500e1cdc12cb35af73d5c8c9b85656aba6bab4d>`__.

View File

@@ -0,0 +1,343 @@
Analyzing control flow for C/C++
================================
Getting started and setting up
==============================
To try the examples in this presentation you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- Snapshot: `ChakraCore <https://downloads.lgtm.com/snapshots/cpp/microsoft/chakracore/ChakraCore-revision-2017-April-12--18-13-26.zip>`__
More resources:
- To learn more about the main features of QL, try looking at the `QL language handbook <https://help.semmle.com/QL/ql-handbook/>`__.
- For further information about writing queries in QL, see `Writing QL queries <https://help.semmle.com/QL/learn-ql/ql/writing-queries/writing-queries.html>`__.
.. note::
To run the queries featured in this training presentation, we recommend you download the free-to-use `QL for Eclipse plugin <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/getting-started.html>`__.
This plugin allows you to locally access the latest features of QL, including the standard QL libraries and queries. It also provides standard IDE features such as syntax highlighting, jump-to-definition, and tab completion.
A good project to start analyzing is `ChakraCore <https://github.com/microsoft/ChakraCore>`__a suitable snapshot to query is available by visiting the link on the slide.
Alternatively, you can query any project (including ChakraCore) in the `query console on LGTM.com <https://lgtm.com/query/project:2034240708/lang:cpp/>`__.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been addedthe snapshot available to download above is based on an historical version of the code base.
Agenda
======
- Control flow graphs
- Exercise: use after free
- Recursion over the control flow graph
- Basic blocks
- Guard conditions
Control flow graphs
===================
.. container:: column-left
We frequently want to ask questions about the possible *order of execution* for a program.
Example:
.. code-block:: cpp
if (x) {
return 1;
} else {
return 2;
}
.. container:: column-right
Possible execution order is usually represented by a *control flow graph*:
.. graphviz::
digraph {
graph [ dpi = 1000 ]
node [shape=polygon,sides=4,color=blue4,style="filled,rounded",fontname=consolas,fontcolor=white]
a [label=<if<BR /><FONT POINT-SIZE="10">IfStmt</FONT>>]
b [label=<x<BR /><FONT POINT-SIZE="10">VariableAccess</FONT>>]
c [label=<1<BR /><FONT POINT-SIZE="10">Literal</FONT>>]
d [label=<2<BR /><FONT POINT-SIZE="10">Literal</FONT>>]
e [label=<return<BR /><FONT POINT-SIZE="10">ReturnStmt</FONT>>]
f [label=<return<BR /><FONT POINT-SIZE="10">ReturnStmt</FONT>>]
a -> b
b -> {c, d}
c -> e
d -> f
}
.. note::
The control flow graph is a static over-approximation of possible control flow at runtime. Its nodes are program elements such as expressions and statements. If there is an edge from one node to another, then it means that the semantic operation corresponding to the first node may be immediately followed by the operation corresponding to the second node. Some nodes (such as conditions of “if” statements or loop conditions) have more than one successor, representing conditional control flow at runtime.
Modeling control flow
=====================
The control flow is modelled with a QL class, ``ControlFlowNode``. Examples of control flow nodes include statements and expressions.
``ControlFlowNode`` provides API for traversing the control flow graph:
- ``ControlFlowNode ControlFlowNode.getASuccessor()``
- ``ControlFlowNode ControlFlowNode.getAPredecessor()``
- ``ControlFlowNode ControlFlowNode.getATrueSuccessor()``
- ``ControlFlowNode ControlFlowNode.getAFalseSuccessor()``
The control-flow graph is *intra-procedural* - in other words, only models paths within a function. To find the associated function, use
- ``Function ControlFlowNode.getControlFlowScope()``
.. note::
The control flow graph is similar in concept to data flow graphs. In contrast to data flow, however, the AST nodes are directly control flow graph nodes.
The predecessor/successor predicates are prime examples of member predicates with results that are used in functional syntax, but that are not actually functions, since a control flow node may have any number of predecessors and successors (including zero or more than one).
Example: malloc/free pairs
==========================
Find calls to free that are reachable from an allocation on the same variable:
.. literalinclude:: ../query-examples/cpp/control-flow-cpp-1.ql
:language: ql
.. note::
Predicates allocationCall and freeCall are defined in the standard library and model a number of standard alloc/free-like functions.
Exercise: use after free
========================
Based on this query, write a query that finds accesses to the variable that occur after the free.
.. rst-class:: build
- What do you find? What problems occur with this approach to detecting use-after-free vulnerabilities?
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/control-flow-cpp-2.ql
:language: ql
Utilizing recursion
===================
The main problem we observed in the previous exercise was that the successors relation is unaware of changes to the variable that would invalidate our results.
We can fix this by writing our own successor predicate that stops traversing the CFG if the variable is re-defined.
Utilizing recursion
===================
.. code-block:: ql
ControlFlowNode reachesWithoutReassignment(FunctionCall free, LocalScopeVariable v)
{
freeCall(free, v.getAnAccess()) and
(
// base case
result = free
or
// recursive case
exists(ControlFlowNode mid |
mid = reachesWithoutReassignment(free, v) and
result = mid.getASuccessor() and
// stop tracking when the value may change
not result = v.getAnAssignedValue() and
not result.(AddressOfExpr).getOperand() = v.getAnAccess()
)
)
}
Exercise
========
Find local variables that are written to, and then never accessed again.
**Hint**: Use ``LocalVariable.getAnAssignment()``.
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/control-flow-cpp-3.ql
:language: ql
.. rst-class:: background2
More control flow
=================
Basic blocks
============
``BasicBlock`` represents basic blocks, that is, straight-line sequences of control flow nodes without branching.
- ``ControlFlowNode BasicBlock.getNode(int)``
- ``BasicBlock BasicBlock.getASuccessor()``
- ``BasicBlock BasicBlock.getAPredecessor()``
- ``BasicBlock BasicBlock.getATrueSuccessor()``
- ``BasicBlock BasicBlock.getAFalseSuccessor()``
Often, queries can be made more efficient by treating basic blocks as a unit instead of reasoning about individual control flow nodes.
Exercise: unreachable blocks
============================
Write a query to find unreachable basic blocks.
**Hint**: First define a recursive predicate to identify reachable blocks. Class ``EntryBasicBlock`` may be useful.
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/control-flow-cpp-4.ql
:language: ql
.. note::
This query has a good number of false positives on Chakra, many of them to do with templating and macros.
.. rst-class:: end-slide
Extra slides
============
.. rst-class:: background2
Appendix: Library customizations
================================
Call graph customizations
=========================
The default implementation of call target resolution does not handle function pointers, because they are difficult to deal with in general.
We can, however, add support for particular patterns of use by contributing a new override of ``Call.getTarget``.
Exercise: unresolvable calls
============================
Write a query that finds all calls for which no call target can be determined, and run it on libjpeg-turbo.
Examine the results. What do you notice?
.. rst-class:: build
.. code-block:: ql
import cpp
from Call c
where not exists(c.getTarget())
select c
.. rst-class:: build
- Many results are calls through struct fields emulating virtual dispatch.
Exercise: resolving calls through variables
===========================================
Write a query that resolves the call at `cjpeg.c:640 <https://github.com/libjpeg-turbo/libjpeg-turbo/blob/9bc8eb6449a32f452ab3fc9f94af672a0af13f81/cjpeg.c#L640>`__.
**Hint**: Use classes ``ExprCall``, ``PointerDereferenceExpr``, and ``Access``.
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/control-flow-cpp-5.ql
:language: ql
Exercise: customizing the call graph
====================================
Create a subclass of ``ExprCall`` that uses your query to implement ``getTarget``.
.. rst-class:: build
.. code-block:: ql
class CallThroughVariable extends ExprCall {
Variable v;
CallThroughVariable() {
exists(PointerDereferenceExpr callee | callee = getExpr() |
callee.getOperand() = v.getAnAccess()
)
}
override Function getTarget() {
result = super.getTarget() or
exists(Access init | init = v.getAnAssignedValue() |
result = init.getTarget()
)
}
}
Control flow graph customizations
=================================
The default control-flow graph implementation recognizes a few common patterns for non-returning functions, but sometimes it fails to spot them, which can cause imprecision.
We can add support for new non-returning functions by overriding ``ControlFlowNode.getASuccessor()``.
Exercise: calls to ``error_exit``
=================================
Write a query that finds all calls to a field called ``error_exit``.
**Hint**: Reuse (parts of) the ``CallThroughVariable`` class from before.
.. rst-class:: build
.. code-block:: ql
class CallThroughVariable extends ExprCall { … }
class ErrorExitCall extends CallThroughVariable {
override Field v;
ErrorExitCall() { v.getName() = "error_exit" }
}
from ErrorExitCall eec
select eec
Exercise: customizing the control-flow graph
============================================
Override ``ControlFlowNode`` to mark calls to ``error_exit`` as non-returning.
**Hint**: ``ExprCall`` is an indirect subclass of ``ControlFlowNode``.
.. rst-class:: build
.. code-block:: ql
class CallThroughVariable extends ExprCall { … }
class ErrorExitCall extends CallThroughVariable {
override Field v;
ErrorExitCall() { v.getName() = "error_exit" }
override ControlFlowNode getASuccessor() { none() }
}
``CustomOptions`` class
=======================
The Options library defines a ``CustomOptions`` class with various member predicates that can be overridden to customize aspects of the analysis.
In particular, it has an ``exprExits`` predicate that can be overridden to more easily perform the customization on the previous slide:
.. code-block:: ql
import Options
class MyOptions extends CustomOptions {
override predicate exprExits(Expr e) {
super.exprExits(e) or ...
}
}

View File

@@ -0,0 +1,329 @@
Introduction to data flow
=========================
Finding string formatting vulnerabilities in C/C++
Getting started and setting up
==============================
To try the examples in this presentation you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- Snapshot: `dotnet/coreclr <http://downloads.lgtm.com/snapshots/cpp/dotnet/coreclr/dotnet_coreclr_fbe0c77.zip>`__
More resources:
- To learn more about the main features of QL, try looking at the `QL language handbook <https://help.semmle.com/QL/ql-handbook/>`__.
- For further information about writing queries in QL, see `Writing QL queries <https://help.semmle.com/QL/learn-ql/ql/writing-queries/writing-queries.html>`__.
.. note::
To run the queries featured in this training presentation, we recommend you download the free-to-use `QL for Eclipse plugin <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/getting-started.html>`__.
This plugin allows you to locally access the latest features of QL, including the standard QL libraries and queries. It also provides standard IDE features such as syntax highlighting, jump-to-definition, and tab completion.
A good project to start analyzing is `ChakraCore <https://github.com/dotnet/coreclr>`__a suitable snapshot to query is available by visiting the link on the slide.
Alternatively, you can query any project (including ChakraCore) in the `query console on LGTM.com <https://lgtm.com/query/projects:1505958977333/lang:cpp/>`__.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been addedthe snapshot available to download above is based on an historical version of the code base.
Agenda
======
- Non-constant format string
- Data flow
- Modules and libraries
- Local data flow
- Local taint tracking
Motivation
==========
Lets write a query to identify instances of `CWE-134 <https://cwe.mitre.org/data/definitions/134.html>`__ “Use of externally controlled format string”.
.. code-block:: cpp
printf(userControlledString, arg1);
**Goal**: Find uses of ``printf`` (or similar) where the format string can be controlled by an attacker.
.. note::
Formatting functions allow the programmer to construct a string output using a *format string* and an optional set of arguments. The *format string* is specified using a simple template language, where the output string is constructed by processing the format string to find *format specifiers*, and inserting values provided as arguments. For example:
.. code-block:: cpp
printf("Name: %s, Age: %d", "Freddie", 2);
would produce the output “Name: Freddie, Age: 2”. So far, so good. However, problems arise if there is a mismatch between the number of formatting specifiers, and the number of arguments. For example:
.. code-block:: cpp
printf("Name: %s, Age: %d", "Freddie");
In this case, we have one more format specifier than we have arguments. In a managed language such as Java or C#, this simply leads to a runtime exception. However, in C/C++, the formatting functions are typically implemented by reading values from the stack without any validation of the number of arguments. This means a mismatch in the number of format specifiers and format arguments can lead to information disclosure.
Of course, in practice this happens rarely with *constant* formatting strings. Instead, its most problematic when the formatting string can be specified by the user, allowing an attacker to provide a formatting string with the wrong number of format specifiers. Furthermore, if an attacker can control the format string, they may be able to provide the %n format specifier, which causes ``printf`` to write the number characters in the generated output string to a specified location.
See https://en.wikipedia.org/wiki/Uncontrolled_format_string for more background.
Exercise: Non-constant format string
====================================
Write a query that flags ``printf`` calls where the format argument is not a ``StringLiteral``.
**Hint**: Import ``semmle.code.cpp.commons.Printf`` and use class ``FormattingFunction`` and ``getFormatParameterIndex()``.
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/data-flow-cpp-1.ql
:language: ql
.. note::
This first query is about finding places where the format specifier is not a constant string. In QL for C/C++, constant strings are modeled as ``StringLiteral`` nodes, so we are looking for calls to format functions where the format specifier argument is not a string literal.
The `C/C++ standard libraries <https://help.semmle.com/qldoc/cpp/>`__ include many different formatting functions that may be vulnerable to this particular attackincluding ``printf``, ``snprintf``, and others. Furthermore, each of these different formatting functions may include the format string in a different position in the argument list. Instead of laboriously listing all these different variants, we can make use of the QL for C/C++ standard library class ``FormattingFunction``, which provides an interface that models common formatting functions in C/C++.
Meh...
======
Results are unsatisfactory:
- Query flags cases where the format string is a symbolic constant.
- Query flags cases where the format string is itself a format argument.
- Query doesn't recognize wrapper functions around ``printf``-like functions.
We need something better.
.. note::
For example, consider the results which appear in ``/src/ToolBox/SOS/Strike/util.h``, between lines 965 and 970:
.. code-block:: cpp
const char *format = align == AlignLeft ? "%-*.*s" : "%*.*s";
if (IsDMLEnabled())
DMLOut(format, width, precision, mValue);
else
ExtOut(format, width, precision, mValue);
Here, ``DMLOut`` and ``ExtOut`` are macros that expand to formatting calls. The format specifier is not constant, in the sense that the format argument is not a string literal. However, it is clearly one of two possible constants, both with the same number of format specifiers.
What we need is a way to determine whether the format argument is ever set to something that is not constant.
Data flow analysis
==================
- Models flow of data through the program.
- Implemented in the module ``semmle.code.cpp.dataflow.DataFlow``.
- Class ``DataFlow::Node`` represents program elements that have a value, such as expressions and fucntion parameters.
- Nodes of the data flow graph.
- Various predicated represent flow between these nodes.
Edges of the data flow graph.
.. note::
The solution here is to use *data flow*. Data flow is, as the name suggests, about tracking the flow of data through the program. It helps answers questions like “does this expression ever hold a value that originates from a particular other place in the program”.
We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two edges.
Data flow graphs
================
.. container:: column-left
Example:
.. code-block:: cpp
int func(int, tainted) {
int x = tainted;
if (someCondition) {
int y = x;
callFoo(y);
} else {
return x;
}
return -1;
}
.. container:: column-right
Data flow graph:
.. container:: image-box
.. graphviz::
digraph {
graph [ dpi = 1000 ]
node [shape=polygon,sides=4,color=blue4,style="filled,rounded", fontname=consolas,fontcolor=white]
a [label=<tainted<BR /><FONT POINT-SIZE="10">ParameterNode</FONT>>]
b [label=<tainted<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
c [label=<x<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
d [label=<x<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
e [label=<y<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
a -> b
b -> {c, d}
c -> e
}
Local vs global data flow
=========================
- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
- Different APIs, so discussed separately
This slide deck focuses on the former.
.. note::
For further information, see:
- `Introduction to data flow analysis in QL <https://help.semmle.com/QL/learn-ql/ql/intro-to-data-flow.html>`__
- `Analyzing data flow in C/C++ <https://help.semmle.com/QL/learn-ql/ql/cpp/dataflow.html>`__
.. rst-class:: background2
Local data flow
===============
Importing data flow
===================
To use the data flow library, add the following import:
.. code-block:: ql
import semmle.code.cpp.dataflow.DataFlow
**Note**: this library contains an explicit “module” declaration:
.. code-block:: ql
module DataFlow {
class Node extends … { … }
predicate localFlow(Node source, Node sink) {
localFlowStep*(source, sink)
}
}
So all references will need to be qualified (that is ``DataFlow::Node``)
.. note::
A **query library** is file with the extension ``.qll``. Query libraries do not contain a query clause, but may contain modules, classes, and predicates. For example, the `C/C++ data flow library <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/DataFlow.qll/module.DataFlow.html>`__ is contained in the ``semmle/code/cpp/dataflow/DataFlow.qll`` QLL file, and can be imported as shown above.
A **module** is a way of organizing QL code by grouping together related predicates, classes and (sub-)modules; either explicitly declared or implicit. A query library implicitly declares a module with the same name as the QLL file.
For further information on libraries and modules in QL, see the chapter on `Modules <https://help.semmle.com/QL/ql-handbook/modules.html>`__ in the QL language handbook.
For further information on importing QL libraries and modules, see the chapter on `Name resolution <https://help.semmle.com/QL/ql-handbook/name-resolution.html>`__ in the QL language handbook.
Data flow graph
===============
- Class ``DataFlow::Node`` represents data flow graph nodes
- Predicate ``DataFlow::localFlowStep`` represents local data flow graph edges, ``DataFlow::localFlow`` is its transitive closure
- Data flow graph nodes are *not* AST nodes, but they correspond to AST nodes, and there are predicates for mapping between them:
- ``Expr Node.asExpr()``
- ``Parameter Node.asParameter()``
- ``DataFlow::Node DataFlow::exprNode(Expr e)``
- ``DataFlow::Node DataFlow::parameterNode(Parameter p)``
- ``etc.``
.. note::
The ``DataFlow::Node`` class is shared between both the local and global data flow graphsthe primary difference is the edges, which in the “global” case can link different functions.
``localFlowStep`` is the “single step” flow relationthat is it describes single edges in the local data flow graph. ``localFlow`` represents the `transitive <https://help.semmle.com/QL/ql-handbook/recursion.html#transitive-closures>`__ closure of this relationin other words, it contains every pair of nodes where the second node is reachable from the first in the data flow graph.
The data flow graph is completely separate from the `AST <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`__, to allow for flexibility in how data flow is modeled. There are a small number of data flow node typesexpression nodes, parameter nodes, uninitialized variable nodes, and definition by reference nodes. Each node provides mapping functions to and from the relevant AST (for example ``Expr``, ``Parameter`` etc.) or symbol table (e.g. ``Variable``) classes.
Taint-tracking
==============
- Usually, we want to generalise slightly by not only considering plain data flow, but also “taint” propagation, that is, whether a value is influenced by or derived from another.
- Examples:
.. code-block:: cpp
sink = source; // source -> sink: data and taint
strcat(sink, source); // source -> sink: taint, not data
- Library ``semmle.code.cpp.dataflow.TaintTracking`` provides predicates for tracking taint; ``TaintTracking::localTaintStep`` represents one (local) taint step, ``TaintTracking::localTaint`` is its transitive closure.
.. note::
Taint tracking can be thought of as another type of data flow graph. It usually extends the standard data flow graph for a problem by adding edges between nodes where one one node influences or *taints* another.
The `API <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/TaintTracking.qll/module.TaintTracking.html>`__ is almost identical to that of the local data flow; all we need to do to switch to taint tracking is ``import semmle.code.cpp.dataflow.TaintTracking`` instead of ``semmle.code.cpp.dataflow.DataFlow``, and instead of using ``localFlow``, we use ``localTaint``.
Exercise: Source Nodes
======================
Define a subclass of ``DataFlow::Node`` representing “source” nodes, that is, nodes without a (local) data flow predecessor.
**Hint**: use ``not exists()``.
.. rst-class:: build
.. code-block:: ql
class SourceNode extends DataFlow::Node {
SourceNode() {
not DataFlow::localFlowStep(_, this)
}
}
.. note::
Note the scoping of the `dont-care variable <https://help.semmle.com/QL/ql-handbook/expressions.html#don-t-care-expressions>`__ “_” in this example: the body of the characteristic predicate is equivalent to:
.. code-block:: ql
not exists(DataFlow::Node pred | DataFlow::localFlowStep(pred, this))
which is not the same as:
.. code-block:: ql
exists(DataFlow::Node pred | not DataFlow::localFlowStep(pred, this)).
Revisiting non-constant format strings
======================================
Refine the query to find calls to ``printf``-like functions where the format argument derives from a local source that is not a constant string.
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/data-flow-cpp-2.ql
:language: ql
Refinements (take home exercise)
================================
Audit the results and apply any refinements you deem necessary.
Suggestions:
- Replace ``DataFlow::localFlowStep`` with a custom predicate that includes steps through global variable definitions.
**Hint**: Use class ``GlobalVariable`` and its member predicates ``getAnAssignedValue()`` and ``getAnAccess()``.
- Exclude calls in wrapper functions that just forward their format argument to another ``printf``-like function; instead, flag calls to those functions.
Beyond local data flow
======================
- Results are still underwhelming.
- Dealing with parameter passing becomes cumbersome.
- Instead, lets turn the problem around and find user-controlled data that flows into a printf format argument, potentially through calls.
- This needs global data flow.

View File

@@ -0,0 +1,328 @@
Introduction to global data flow
================================
Getting started and setting up
==============================
To try the examples in this presentation you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- Snapshot: `dotnet/coreclr <http://downloads.lgtm.com/snapshots/cpp/dotnet/coreclr/dotnet_coreclr_fbe0c77.zip>`__
More resources:
- To learn more about the main features of QL, try looking at the `QL language handbook <https://help.semmle.com/QL/ql-handbook/>`__.
- For further information about writing queries in QL, see `Writing QL queries <https://help.semmle.com/QL/learn-ql/ql/writing-queries/writing-queries.html>`__.
.. note::
To run the queries featured in this training presentation, we recommend you download the free-to-use `QL for Eclipse plugin <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/getting-started.html>`__.
This plugin allows you to locally access the latest features of QL, including the standard QL libraries and queries. It also provides standard IDE features such as syntax highlighting, jump-to-definition, and tab completion.
A good project to start analyzing is `ChakraCore <https://github.com/dotnet/coreclr>`__a suitable snapshot to query is available by visiting the link on the slide.
Alternatively, you can query any project (including ChakraCore) in the `query console on LGTM.com <https://lgtm.com/query/projects:1505958977333/lang:cpp/>`__.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been addedthe snapshot available to download above is based on an historical version of the code base.
Agenda
======
- Global taint tracking
- Sanitizers
- Path queries
- Data flow models
Information flow
================
- Many security problems can be phrased as an information flow problem:
“Given a (problem-specific) set of sources and sinks, is there a path in the data flow graph from some source to some sink?”
- Some examples:
- SQL injection: sources are user-input, sinks are SQL queries
- Reflected XSS: sources are HTTP requests, sinks are HTTP responses
- We can solve such problems using the data flow and taint tracking libraries.
Global data flow and taint tracking
===================================
- Recap:
- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
- For global data flow (and taint tracking), we must therefore provided restrictions to ensure the problem is tractable.
- Typically, this involves specifying the “source” and “sink”.
.. note::
As we mentioned in the previous slide deck, while local dataflow is feasible to compute for all functions in a snapshot, global dataflow is not. This is because the number of paths becomes exponentially larger for global dataflow.
The global dataflow (and taint tracking) avoids this problem by requiring that the query author specifies which ``sources`` and ``sinks`` are applicable. This allows the implementation to compute paths between the restricted set of nodes, rather than the full graph.
Global taint tracking library
=============================
The semmle.code.cpp.dataflow.TaintTracking library provides a framework for implementing solvers for global taint tracking problems:
#. Subclass TaintTracking::Configuration following this template:
.. code-block:: ql
class Config extends TaintTracking::Configuration {
Config() { this = "<some unique identifier>" }
override predicate isSource(DataFlow::Node nd) { … }
override predicate isSink(DataFlow::Node nd) { … }
}
#. Use Config.hasFlow(source, sink) to find inter-procedural paths.
.. note::
In addition to the taint tracking configuration described here, there is also an equivalent *data flow* configuration in ``semmle.code.cpp.dataflow.DataFlow``, ``DataFlow::Configuration``. Data flow configurations are used to track whether the exact value produced by a source is used by a sink, whereas taint tracking configurations are used to determine whether the source may influence the value used at the sink. Whether you use taint tracking or data flow depends on the analysis problem you are trying to solve.
Finding tainted format strings (outline)
========================================
.. literalinclude:: ../query-examples/cpp/global-data-flow-cpp-1.ql
:language: ql
.. note::
Heres the outline for a inter-procedural (i.e. “global”) version of the tainted formatting strings query we saw in the previous slide deck. The same template will be applicable for most taint tracking problems.
Defining sources
================
The library class ``SecurityOptions`` provides a (configurable) model of what counts as user-controlled data:
.. code-block:: ql
import semmle.code.cpp.security.Security
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isSource(DataFlow::Node source) {
exists (SecurityOptions opts |
opts.isUserInput(source.asExpr(), _)
)
}
}
.. note::
We first define what it means to be a ``source`` of tainted data for this particular problem. In this case, what we care about is whether the format string can be provided by an external user to our application or service. As there are many such ways external data could be introduced into the system, the standard QL libraries for C/C++ include an extensible API for modelling user input. In this case, we will simply use the pre-defined set of “user inputs”, which includes arguments provided to command line applications.
Defining sinks (exercise)
=========================
Use the ``FormattingFunction`` class to fill in the definition of “isSink”
.. code-block:: ql
import semmle.code.cpp.security.Security
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isSink(DataFlow::Node sink) {
/* Fill me in */
}
}
.. note::
The second part is to define what it means to be a sink for this particular problem. The queries from the previous slide deck will be useful for this exercise.
Defining sinks (answer)
=======================
Use the ``FormattingFunction`` class to fill in the definition of “isSink”
.. code-block:: ql
import semmle.code.cpp.security.Security
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isSink(DataFlow:::Node sink) {
exists (FormattingFunction ff, Call c |
c.getTarget() = ff and
c.getArgument(ff.getFormatParameterIndex()) = sink.asExpr()
)
}
}
.. note::
When we run this query, we should find a single result. However, it is tricky to determine whether this result is a true positive - a “real” result - because our query only reports the source and the sink, and not the path through the graph between the two.
Path queries
============
Provide information about the identified paths from sources to sinks; can be examined in Path Explorer view.
Use this template:
.. code-block:: ql
/**
* …
* @kind path-problem
*/
import semmle.code.cpp.dataflow.TaintTracking
import DataFlow::PathGraph
from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink, source, sink, "<message>"
.. note::
In order to see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to work - we need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
Defining additional taint steps
===============================
Add an additional taint step that (heuristically) taints a local variable if it is a pointer, and it is passed to a function in a parameter position that taints it.
.. code-block:: ql
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isAdditionalTaintStep(DataFlow::Node pred,
DataFlow::Node succ) {
exists (Call c, Expr arg, LocalVariable lv |
arg = c.getAnArgument() and
arg = pred.asExpr() and
arg.getFullyConverted().getUnderlyingType() instanceof PointerType and
arg = lv.getAnAccess() and
succ.asUninitialized() = lv
)
}
}
Defining sanitizers
===================
Add a sanitizer, stopping propagation at parameters of formatting functions, to avoid double-reporting:
.. code-block:: ql
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isSanitizer(DataFlow::Node nd) {
exists (FormattingFunction ff, int idx |
idx = ff.getFormatParameterIndex() and
nd = DataFlow::parameterNode(ff.getParameter(idx))
)
}
}
Data flow models
================
- To provide models of data/taint flow through library functions, you can implement subclasses of ``DataFlowFunction`` (from ``semmle.code.cpp.models.interfaces.DataFlow``) and ``TaintFunction`` (from ``semmle.code.cpp.models.interfaces.Taint``), respectively
- Example: model of taint flow from third to first parameter of ``memcpy``
.. code-block:: ql
class MemcpyFunction extends TaintFunction {
MemcpyFunction() { this.hasName("memcpy") }
override predicate hasTaintFlow(FunctionInput i, FunctionOutput o)
i.isInParameter(2) and o.isOutParameterPointer(0)
}
}
.. rst-class:: end-slide
Extra slides
============
Exercise: How not to do global data flow
========================================
Implement a flowStep predicate extending localFlowStep with steps through function calls and returns. Why might we not want to use this?
.. code-block:: ql
predicate stepIn(Call c, DataFlow::Node arg, DataFlow::ParameterNode parm) {
exists(int i | arg.asExpr() = c.getArgument(i) |
parm.asParameter() = c.getTarget().getParameter(i))
}
predicate stepOut(Call c, DataFlow::Node ret, DataFlow::Node res) {
exists(ReturnStmt retStmt | retStmt.getEnclosingFunction() = c.getTarget() |
ret.asExpr() = retStmt.getExpr() and res.asExpr() = c)
}
predicate flowStep(DataFlow::Node pred, DataFlow::Node succ) {
DataFlow::localFlowStep(pred, succ) or
stepIn(_, pred, succ) or
stepOut(_, pred, succ)
}
Mismatched calls and returns
============================
.. container:: column-left
.. code-block:: ql
char *logFormat(char *fmt) {
log("Observed format string %s.", fmt);
return fmt;
}
...
char *dangerousFmt = unvalidatedUserData();
printf(logFormat(dangerousFmt), args);
...
char *safeFmt = "Hello %s!";
printf(logFormat(safeFmt), name);
.. container:: column-right
Infeasible path due to mismatched call/return pair!
Balancing calls and returns
===========================
- If we simply take ``flowStep*``, we might mismatch calls and returns, causing imprecision, which in turn may cause false positives.
- Instead, make sure that matching ``stepIn``/``stepOut`` pairs talk about the same call site:
.. code-block:: ql
predicate balancedPath(DataFlow::Node src, DataFlow::Node snk) {
src = snk or DataFlow::localFlowStep(src, snk) or
exists(DataFlow::Node m | balancedPath(src, m) | balancedPath(m, snk)) or
exists(Call c, DataFlow::Node parm, DataFlow::Node ret |
stepIn(c, src, parm) and
balancedPath(parm, ret) and
stepOut(c, ret, snk)
)
}
Summary-based global data flow
==============================
- To avoid traversing the same paths many times, we compute function summaries that record if a function parameter flows into a return value:
.. code-block:: ql
predicate returnsParameter(Function f, int i) {
exists (Parameter p, ReturnStmt retStmt, Expr ret |
p = f.getParameter(i) and
retStmt.getEnclosingFunction() = f and
ret = retStmt.getExpr() and
balancedPath(DataFlow::parameterNode(p), DataFlow::exprNode(ret))
)
}
- Use this predicate in balancedPath instead of ``stepIn``/``stepOut`` pairs.

View File

@@ -0,0 +1,154 @@
QL for C/C++
============
Program representation
Agenda
======
- Abstract syntax trees
- Database representation
- Symbol tables
- Variables
- Functions
Abstract syntax trees
=====================
The basic representation of an analyzed program is an *abstract syntax tree (AST)*.
.. code-block:: cpp
try {
...
} catch (AnException e) {
}
.. note::
When writing queries in QL it is important to have in mind the underlying representation of the program which is stored in the database. Typically queries make use of the “AST” representation of the program - a tree structure where program elements are nested within other program elements.
The “Introducing the C/C++ libraries” help topic contains a more complete overview of important AST classes and the rest of the C++ QL libraries: https://help.semmle.com/QL/learn-ql/ql/cpp/introduce-libraries-cpp.html
Database representaions of ASTs
===============================
AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaque; all one can do with them is to check their equality.
Each entity belongs to an entity type. Entity types have names starting with “@” and are defined in the database schema (not in QL).
Properties of AST nodes and their relationships to each other are encoded by database relations, which are predicates defined in the database (not in QL).
Entity types are rarely used directly, the usual pattern is to define a QL class that extends the type and exposes properties of its entities through member predicates.
.. note::
ASTs are a typical example of the kind of data representation one finds in object-oriented programming, with data-carrying nodes that reference each other. At first glance, QL, which can only work with atomic values, does not seem to be well suited for working with this kind of data. However, ultimately all that we require of the nodes in an AST is that they have an identity. The relationships among nodes, usually implemented by reference-valued object fields in other languages, can just as well (and arguably more naturally) be represented as relations over nodes. Attaching data (such as strings or numbers) to nodes can also be represented with relations over nodes and primitive values. All we need is a way for relations to reference nodes. This is achieved in QL (as in other database languages) by means of entity values (or “entities”, for short), which are opaque atomic values, implemented as integers under the hood.
It is the job of the extractor to create entity values for all AST nodes and populate database relations that encode the relationship between AST nodes and any values associated with them. These relations are extensional, that is, explicitly stored in the database, unlike the relations described by QL predicates, which we also refer to as intensional relations. Entity values belong to entity types, whose name starts with “@” to set them apart from primitive types and classes.
The interface between entity types and extensional relations on the one hand and QL predicates and classes on the other hand is provided by the database schema, which defines the available entity types and the schema of each extensional relation, that is, how many columns the relation has, and which entity type or primitive type the values in each column come from. QL programs can refer to entity types and extensional relations just as they would refer to QL classes and predicates, with the restriction that entity types cannot be directly selected in a “select” clause, since they do not have a well-defined string representation.
For example, the database schema for C++ snapshot databases is here: https://github.com/Semmle/ql/blob/master/cpp/ql/src/semmlecode.cpp.dbscheme
AST QL classes
==============
Important AST classes include:
- ``Expr``: expressions such as assignments, variable references, function calls, …
- ``Stmt``: statements such as conditionals, loops, try statements, …
- ``DeclarationEntry``: places where functions, variables or types are declared and/or defined
These three (and all other AST classes) are subclasses of Element.
.. note::
The “Introducing the C/C++ libraries” help topic contains a more complete overview of important AST classes and the rest of the C++ QL libraries: https://help.semmle.com/QL/learn-ql/ql/cpp/introduce-libraries-cpp.html
Symbol table
============
The database also includes information about the symbol table associated with a program:
- ``Variable``: all variables, including local variables, global variables, static variables and member variables
- ``Function``: all functions, including member function
- ``Type``: built-in and user-defined types
.. note::
The “Introducing the C/C++ libraries” help topic contains a more complete overview of important symbol table classes and the rest of the C++ QL libraries: https://help.semmle.com/QL/learn-ql/ql/cpp/introduce-libraries-cpp.html
Working with variables
======================
``Variable`` represents program variables, including locally scoped variables (``LocalScopeVariable``), global variables (``GlobalVariable``), and others:
- ``string Variable.getName()``
- ``Type Variable.getType()``
``Access`` represents references to declared entities such as functions (``FunctionAccess``) and variables (``VariableAccess``), including fields (``FieldAccess``).
- ``Declaration Access.getTarget()``
``VariableDeclarationEntry`` represents declarations or definitions of a variable.
- ``Variable VariableDeclarationEntry.getVariable()``
Working with functions
======================
Functions are represented by the Function QL class. Each declaration or definition of a function is represented by a ``FunctionDeclarationEntry``.
Calls to functions are modelled by QL class Call and its subclasses:
- ``Call.getTarget()`` gets the declared target of the call; undefined for calls through function pointers
- ``Function.getACallToThisFunction()`` gets a call to this function
Typically, functions are identified by name:
- ``string Function.getName()``
- ``string Function.getQualifiedName()``
Working with preprocessor logic
===============================
Macros and other preprocessor directives can easily cause confusion when analyzing programs:
- AST structure reflects the program *after* preprocessing.
- Locations refer to the original source text *before* preprocessing.
For example, in:
.. code-block:: cpp
#define square(x) x*x
y = square(y0), z = square(z0)
there are no AST nodes corresponding to ``square(y0)`` or ``square(z0)``, but there are AST nodes corresponding to ``y0*y0`` and ``z0*z0``.
.. note::
The C preprocessor poses a dilemma: un-preprocessed code cannot, in general, be parsed and analyzed meaningfully, but showing results in preprocessed code is not useful to developers. Our solution is to base the AST representation on preprocessed source (in the same way as compilers do), but associate AST nodes with locations in the original source text.
Working with Macros
===================
.. code-block:: cpp
#define square(x) x*x
y = square(y0), z = square(z0)
is represented in the snapshot database as:
- A Macro entity representing the text of the *head* and *body* of the macro
- Assignment nodes, representing the two assignments after preprocessing
- Left-hand sides are ``VariableAccess`` nodes of y and z
- Right-hand sides are ``MulExpr`` nodes representing ``y0*y0`` and ``z0*z0``
- A ``MacroAccess`` entity, which associates the Macro with the ``MulExprs``
Useful predicates on ``Element: isInMacroExpansion()``, ``isAffectedByMacro()``

View File

@@ -0,0 +1,101 @@
Exercise: ``snprintf`` overflow
===============================
Getting started and setting up
==============================
To try the examples in this presentation you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- Snapshot: `rsyslog <https://downloads.lgtm.com/snapshots/cpp/rsyslog/rsyslog/rsyslog-all-revision-2018-April-27--14-12-31.zip>`__
More resources:
- To learn more about the main features of QL, try looking at the `QL language handbook <https://help.semmle.com/QL/ql-handbook/>`__.
- For further information about writing queries in QL, see `Writing QL queries <https://help.semmle.com/QL/learn-ql/ql/writing-queries/writing-queries.html>`__.
.. note::
To run the queries featured in this training presentation, we recommend you download the free-to-use `QL for Eclipse plugin <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/getting-started.html>`__.
This plugin allows you to locally access the latest features of QL, including the standard QL libraries and queries. It also provides standard IDE features such as syntax highlighting, jump-to-definition, and tab completion.
A good project to start analyzing is `ChakraCore <https://github.com/rsyslog/rsyslog>`__a suitable snapshot to query is available by visiting the link on the slide.
Alternatively, you can query any project (including ChakraCore) in the `query console on LGTM.com <https://lgtm.com/query/project:1506087977050/lang:cpp/>`__.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been addedthe snapshot available to download above is based on an historical version of the code base.
``snprintf``
============
.. rst-class:: build
- ``printf``: Returns number of characters printed.
.. code-block:: cpp
printf("Hello %s!", name)
- ``sprintf``: Returns number of characters written to ``buf``.
.. code-block:: cpp
sprintf(buf, "Hello %s!", name)
- ``snprintf``: Returns number of characters it **would have written** to ``buf`` had ``n`` been sufficiently large, **not** the number of characters actually written.
.. code-block:: cpp
snprintf(buf, n, "Hello %s!", name)
- In pre-C99 versions of glibc ``snprintf`` would return -1 if ``n`` was too small!
RCE in rsyslog
==============
- Vulnerable code looked similar to this (`original <https://github.com/rsyslog/librelp/blob/532aa362f0f7a8d037505b0a27a1df452f9bac9e/src/tcp.c#L1195-L1211>`__):
.. code-block:: cpp
char buf[1024];
int pos = 0;
for (int i = 0; i < n; i++) {
pos += snprintf(buf + pos, sizeof(buf) - pos, "%s", strs[i]);
}
- Disclosed as `CVE-2018-1000140 <https://nvd.nist.gov/vuln/detail/CVE-2018-1000140>`__.
- Blog post: `https://blog.semmle.com/librelp-buffer-overflow-cve-2018-1000140/ <https://blog.semmle.com/librelp-buffer-overflow-cve-2018-1000140/>`__.
Finding the RCE yourself
========================
#. Write a query to find calls to ``snprintf``
**Hint**: Use class ``FunctionCall``
#. Restrict to calls whose result is used
**Hint**: Use class ``ExprInVoidContext``
#. Restrict to calls where the format string contains “%s”
**Hint**: Use predicates ``Expr.getValue`` and ``string.regexpMatch``
#. Restrict to calls where the result flows back to the size argument
**Hint**: Import library ``semmle.code.cpp.dataflow.TaintTracking`` and use predicate ``TaintTracking::localTaint``
Model answer
============
.. literalinclude:: ../query-examples/cpp/snprintf-1.ql
:language: ql
.. rst-class:: build
- More full-featured version: `https://lgtm.com/rules/1505913226124 <https://lgtm.com/rules/1505913226124>`__.
.. note::
The regular expression for matching the format string uses the “(?s)” directive to ensure that “.” also matches any newline characters embedded in the string.

View File

@@ -2,6 +2,5 @@
:glob:
:hidden:
./intro-to-ql/*
./**/*
./*

View File

@@ -0,0 +1,8 @@
import cpp
from AddExpr a, Variable v, RelationalOperation cmp
where
a.getAnOperand() = v.getAnAccess() and
cmp.getAnOperand() = a and
cmp.getAnOperand() = v.getAnAccess()
select cmp, "Overflow check."

View File

@@ -0,0 +1,9 @@
import cpp
from AddExpr a, Variable v, RelationalOperation cmp
where
a.getAnOperand() = v.getAnAccess() and
cmp.getAnOperand() = a and
cmp.getAnOperand() = v.getAnAccess() and
forall(Expr op | op = a.getAnOperand() | isSmall(op))
select cmp, "Bad overflow check."

View File

@@ -0,0 +1,12 @@
import cpp
predicate isSmall(Expr e) { e.getType().getSize() < 4 }
from AddExpr a, Variable v, RelationalOperation cmp
where
a.getAnOperand() = v.getAnAccess() and
cmp.getAnOperand() = a and
cmp.getAnOperand() = v.getAnAccess() and
forall(Expr op | op = a.getAnOperand() | isSmall(op)) and
not isSmall(a.getExplicitlyConverted())
select cmp, "Bad overflow check"

View File

@@ -0,0 +1,8 @@
import cpp
from FunctionCall alloc, FunctionCall free, LocalScopeVariable v
where allocationCall(alloc)
and alloc = v.getAnAssignedValue()
and freeCall(free, v.getAnAccess())
and alloc.getASuccessor+() = free
select alloc, free

View File

@@ -0,0 +1,8 @@
import cpp
from FunctionCall free, LocalScopeVariable v, VariableAccess u
where freeCall(free, v.getAnAccess())
and u = v.getAnAccess()
and u.isRValue()
and free.getASuccessor+() = u
select free, u

View File

@@ -0,0 +1,10 @@
import cpp
from LocalVariable lv, ControlFlowNode def
where
def = lv.getAnAssignment() and
not exists(VariableAccess use |
use = lv.getAnAccess() and
use = def.getASuccessor+()
)
select lv, def

View File

@@ -0,0 +1,10 @@
import cpp
predicate isReachable(BasicBlock bb) {
bb instanceof EntryBasicBlock or
isReachable(bb.getAPredecessor())
}
from BasicBlock bb
where not isReachable(bb)
select bb

View File

@@ -0,0 +1,10 @@
import cpp
from ExprCall c, PointerDereferenceExpr deref, VariableAccess va,
Access fnacc
where c.getLocation().getFile().getBaseName() = "cjpeg.c" and
c.getLocation().getStartLine() = 640 and
deref = c.getExpr() and
va = deref.getOperand() and
fnacc = va.getTarget().getAnAssignedValue()
select c, fnacc.getTarget()

View File

@@ -0,0 +1,8 @@
import cpp
import semmle.code.cpp.commons.Printf
from Call c, FormattingFunction ff, Expr format
where c.getTarget() = ff and
format = c.getArgument(ff.getFormatParameterIndex()) and
not format instanceof StringLiteral
select format, "Non-constant format string."

View File

@@ -0,0 +1,12 @@
import cpp
import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.commons.Printf
class SourceNode extends DataFlow::Node { … }
from FormattingFunction f, Call c, SourceNode src, DataFlow::Node arg
where c.getTarget() = f and
arg.asExpr() = c.getArgument(f.getFormatParameterIndex()) and
DataFlow::localFlow(src, arg) and
not src.asExpr() instanceof StringLiteral
select arg, "Non-constant format string."

View File

@@ -0,0 +1,13 @@
import cpp
import semmle.code.cpp.dataflow.TaintTracking
class TaintedFormatConfig extends TaintTracking::Configuration {
TaintedFormatConfig() { this = "TaintedFormatConfig" }
override predicate isSource(DataFlow::Node source) { /* TBD */ }
override predicate isSink(DataFlow::Node sink) { /* TBD */ }
}
from TaintedFormatConfig cfg, DataFlow::Node source, DataFlow::Node sink
where cfg.hasFlow(source, sink)
select sink, "This format string may be derived from a $@.",
source, "user-controlled value"

View File

@@ -0,0 +1,11 @@
import cpp
import semmle.code.cpp.dataflow.TaintTracking
from FunctionCall call, DataFlow::Node source, DataFlow::Node sink
where
call.getTarget().getName() = "snprintf" and
call.getArgument(2).getValue().regexpMatch("(?s).*%s.*") and
TaintTracking::localTaint(source, sink) and
source.asExpr() = call and
sink.asExpr() = call.getArgument(1)
select call