docs: rename ql-training-rst > ql-training

(cherry picked from commit 65573492e7)
This commit is contained in:
james
2019-08-30 16:53:33 +01:00
parent d1d19bf744
commit 40be3bc8cf
113 changed files with 0 additions and 118 deletions

View File

@@ -0,0 +1,236 @@
===========================
Example: Bad overflow guard
===========================
QL for C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
.. rst-class:: setup
Setup
=====
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `ChakraCore snapshot <https://downloads.lgtm.com/snapshots/cpp/microsoft/chakracore/ChakraCore-revision-2017-April-12--18-13-26.zip>`__
.. note::
For the examples in this presentation, we will be analyzing `ChakraCore <https://github.com/microsoft/ChakraCore>`__.
You can query the project in `the query console <https://lgtm.com/query/project:2034240708/lang:cpp/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
Checking for overflow in C
==========================
- Arithmetic operations overflow if the result is too large for their type.
- Developers sometimes exploit this to write overflow checks:
.. code-block:: cpp
if (v + b < v) {
handle_error("overflow");
} else {
result = v + b;
}
Where might this go wrong?
.. note::
- In C/C++ we often need to check for whether an operation `overflows <https://en.wikipedia.org/wiki/Integer_overflow>`__.
- An overflow is when an arithmetic operation, such as an addition, results in a number which is too large to be stored in the type.
- When an operation overflows, the value “wraps” around.
- A typical way to check for overflow of an addition, therefore, is whether the result is less than one of the argumentsthat is the result has **wrapped**.
Integer promotion
=================
From `https://en.cppreference.com/w/c/language/conversion <https://en.cppreference.com/w/c/language/conversion>`__:
.. rst-class:: quote
Integer promotion is the implicit conversion of a value of any integer type with rank less or equal to rank of ``int`` ... to the value of type ``int`` or ``unsigned int``.
- The arguments of the following arithmetic operators undergo implicit conversions:
- binary arithmetic ``* / % + -``
- relational operators ``< > <= >= == !=``
- binary bitwise operators ``& ^ |``
- the conditional operator ``?:``
.. note::
- Most of the time integer conversion works fine. However, the rules governing addition in C/C++ are complex, and it easy to get bitten.
- CPUs generally prefer to perform arithmetic operations on 32-bit or larger integers, as the architectures are optimized to perform those efficiently.
- The compiler therefore performs “integer promotion” for arguments to arithmetic operations that are smaller than 32-bit.
Checking for overflow in C revisited
====================================
Which branch gets executed in this example? What is the value of ``result``?
.. code-block:: cpp
uint16_t v = 65535;
uint16_t b = 1;
uint16_t result;
if (v + b < v) {
handle_error("overflow");
} else {
result = v + b;
}
.. note::
In this example the second branch is executed, even though there is a 16-bit overflow, and ``result`` is set to zero.
Checking for overflow in C revisited
====================================
Here is the example again, with the conversions made explicit:
.. code-block:: cpp
:emphasize-lines: 4,7
uint16_t v = 65535;
uint16_t b = 1;
uint16_t result;
if ((int)v + (int)b < (int)v) {
handle_error("overflow");
} else {
result = (uint16_t)((int)v + (int)b);
}
.. note::
In this example the second branch is executed, even though there is a 16-bit overflow, and result is set to zero.
Explanation:
- The two integer arguments to the addition, ``v`` and ``b``, are promoted to 32-bit integers.
- The comparison (``<``) is also an arithmetic operation, therefore it will also be completed on 32-bit integers.
- This means that ``v + b < v`` will never be true, because v and b can hold at most 2 :sup:`16`.
- Therefore, the second branch is executed, but the result of the addition is stored into the result variable. Overflow will still occur as result is a 16-bit integer.
This happens even though the overflow check passed!
.. rst-class:: background2
Developing a QL query
=====================
Finding bad overflow guards
QL query: bad overflow guards
=============================
Lets look for overflow guards of the form ``v + b < v``, using the classes
``AddExpr``, ``Variable`` and ``RelationalOperation`` from the ``cpp`` library.
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/bad-overflow-guard-1.ql
:language: ql
.. note::
- When performing `variant analysis <https://semmle.com/variant-analysis>`__, it is usually helpful to write a simple query that finds the simple syntactic pattern, before trying to go on to describe the cases where it goes wrong.
- In this case, we start by looking for all the *overflow* checks, before trying to refine the query to find all *bad overflow* checks.
- The ``select`` clause defines what this query is looking for:
- an ``AddExpr``: the expression that is being checked for overflow.
- a ``RelationalOperation``: the overflow comparison check.
- a ``Variable``: used as an argument to both the addition and comparison.
- The ``where`` part of the query ties these three QL variables together using `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ defined in the `standard QL for C/C++ library <https://help.semmle.com/qldoc/cpp/>`__.
QL query: bad overflow guards
=============================
We want to ensure the operands being added have size less than 4 bytes.
We may want to reuse this logic, so let us create a separate predicate.
Looking at autocomplete suggestions, we see that we can get the type of an expression using the ``getType()`` method.
We can get the size (in bytes) of a type using the ``getSize()`` method.
.. rst-class:: build
.. code-block:: ql
predicate isSmall(Expr e) {
e.getType().getSize() < 4
}
.. note::
- An important part of the query is to determine whether a given expression has a “small” type that is going to trigger integer promotion.
- We therefore write a helper predicate for small expressions.
- This predicate effectively represents the set of all expressions in the database where the size of the type of the expression is less than 4 bytes, that is, less than 32-bits.
QL query: bad overflow guards
=============================
We can ensure the operands being added have size less than 4 bytes, using our new predicate.
QL has logical quantifiers like ``exists`` and ``forall``, allowing us to declare variables and enforce a certain condition on them.
Now our query becomes:
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/bad-overflow-guard-2.ql
:language: ql
.. note::
- Recall from earlier that what makes an overflow check a “bad” check is that all the arguments to the addition are integers smaller than 32-bits.
- We could write this by using our helper predicate ``isSmall`` to specify that each individual operand to the addition ``isSmall`` (that is under 32-bits):
.. code-block:: ql
isSmall(a.getLeftOperand()) and
isSmall(a.getRightOperand())
- However, this is a little bit repetitive. What we really want to say is that: all the operands of the addition are small. Fortunately, QL provides a ``forall`` formula that we can use in these circumstances.
- A ``forall`` has three parts:
- A “declaration” part, where we can introduce variables.
- A “range” part, which allows us to restrict those variables.
- A “condition” part. The ``forall`` as a whole holds if the condition holds for each of the values in the range.
- In our case:
- The declaration introduces a variable for expressions, called ``op``. At this stage, this variable represents all the expressions in the program.
- The “range” part, ``op = a.getAnOperand()``, restricts ``op`` to being one of the two operands to the addition.
- The “condition” part, ``isSmall(op)``, says that the ``forall`` holds only if the condition (that the ``op`` is small) holds for everything in the rangethat is, both the arguments to the addition.
QL query: bad overflow guards
=============================
Sometimes the result of the addition is cast to a small type of size less than 4 bytes, preventing automatic widening. We dont want our query to flag these instances.
We can use predicate ``Expr.getExplicitlyConverted()`` to reason about casts that are applied to an expression, adding this restriction to our query:
.. code-block:: ql
not isSmall(a.getExplicitlyConverted())
The final query
===============
.. literalinclude:: ../query-examples/cpp/bad-overflow-guard-3.ql
:language: ql
This query finds a single result in our historic snapshot, which was `a genuine bug in ChakraCore <https://github.com/Microsoft/ChakraCore/commit/2500e1cdc12cb35af73d5c8c9b85656aba6bab4d>`__.

View File

@@ -0,0 +1,371 @@
======================
Analyzing control flow
======================
QL for C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
.. Include information slides here
.. rst-class:: setup
Setup
=====
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `ChakraCore snapshot <https://downloads.lgtm.com/snapshots/cpp/microsoft/chakracore/ChakraCore-revision-2017-April-12--18-13-26.zip>`__
.. note::
For the examples in this presentation, we will be analyzing `ChakraCore <https://github.com/microsoft/ChakraCore>`__.
You can query the project in `the query console <https://lgtm.com/query/project:2034240708/lang:cpp/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
.. rst-class:: agenda
Agenda
======
- Control flow graphs
- Exercise: use after free
- Recursion over the control flow graph
- Basic blocks
- Guard conditions
Control flow graphs
===================
.. container:: column-left
We frequently want to ask questions about the possible *order of execution* for a program.
Example:
.. code-block:: cpp
if (x) {
return 1;
} else {
return 2;
}
.. container:: column-right
Possible execution order is usually represented by a *control flow graph*:
.. graphviz::
digraph {
graph [ dpi = 1000 ]
node [shape=polygon,sides=4,color=blue4,style="filled,rounded",fontname=consolas,fontcolor=white]
a [label=<if<BR /><FONT POINT-SIZE="10">IfStmt</FONT>>]
b [label=<x<BR /><FONT POINT-SIZE="10">VariableAccess</FONT>>]
c [label=<1<BR /><FONT POINT-SIZE="10">Literal</FONT>>]
d [label=<2<BR /><FONT POINT-SIZE="10">Literal</FONT>>]
e [label=<return<BR /><FONT POINT-SIZE="10">ReturnStmt</FONT>>]
f [label=<return<BR /><FONT POINT-SIZE="10">ReturnStmt</FONT>>]
a -> b
b -> {c, d}
c -> e
d -> f
}
.. note::
The control flow graph is a static over-approximation of possible control flow at runtime.
Its nodes are program elements such as expressions and statements.
If there is an edge from one node to another, then it means that the semantic operation corresponding to the first node may be immediately followed by the operation corresponding to the second node.
Some nodes (such as conditions of “if” statements or loop conditions) have more than one successor, representing conditional control flow at runtime.
Modeling control flow
=====================
The control flow is modeled with a QL class, ``ControlFlowNode``. Examples of control flow nodes include statements and expressions.
- ``ControlFlowNode`` provides API for traversing the control flow graph:
- ``ControlFlowNode ControlFlowNode.getASuccessor()``
- ``ControlFlowNode ControlFlowNode.getAPredecessor()``
- ``ControlFlowNode ControlFlowNode.getATrueSuccessor()``
- ``ControlFlowNode ControlFlowNode.getAFalseSuccessor()``
The control-flow graph is *intra-procedural*in other words, only models paths within a function. To find the associated function, use
- ``Function ControlFlowNode.getControlFlowScope()``
.. note::
The control flow graph is similar in concept to data flow graphs. In contrast to data flow, however, the AST nodes are directly control flow graph nodes.
The predecessor/successor predicates are prime examples of member predicates with results that are used in functional syntax, but that are not actually functions. This is because a control flow node may have any number of predecessors and successors (including zero or more than one).
Example: malloc/free pairs
==========================
Find calls to ``free`` that are reachable from an allocation on the same variable:
.. literalinclude:: ../query-examples/cpp/control-flow-cpp-1.ql
:language: ql
.. note::
Predicates ``allocationCall`` and ``freeCall`` are defined in the standard library and model a number of standard alloc/free-like functions.
Exercise: use after free
========================
Based on this query, write a query that finds accesses to the variable that occur after the free.
.. rst-class:: build
- What do you find? What problems occur with this approach to detecting use-after-free vulnerabilities?
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/control-flow-cpp-2.ql
:language: ql
Utilizing recursion
===================
The main problem we observed in the previous exercise was that the successor's relation is unaware of changes to the variable that would invalidate our results.
We can fix this by writing our own successor predicate that stops traversing the CFG if the variable is re-defined.
Utilizing recursion
===================
.. code-block:: ql
ControlFlowNode reachesWithoutReassignment(FunctionCall free, LocalScopeVariable v)
{
freeCall(free, v.getAnAccess()) and
(
// base case
result = free
or
// recursive case
exists(ControlFlowNode mid |
mid = reachesWithoutReassignment(free, v) and
result = mid.getASuccessor() and
// stop tracking when the value may change
not result = v.getAnAssignedValue() and
not result.(AddressOfExpr).getOperand() = v.getAnAccess()
)
)
}
Exercise
========
Find local variables that are written to, and then never accessed again.
**Hint**: Use ``LocalVariable.getAnAssignment()``.
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/control-flow-cpp-3.ql
:language: ql
.. rst-class:: background2
More control flow
=================
Basic blocks
============
``BasicBlock`` represents basic blocks, that is, straight-line sequences of control flow nodes without branching.
- ``ControlFlowNode BasicBlock.getNode(int)``
- ``BasicBlock BasicBlock.getASuccessor()``
- ``BasicBlock BasicBlock.getAPredecessor()``
- ``BasicBlock BasicBlock.getATrueSuccessor()``
- ``BasicBlock BasicBlock.getAFalseSuccessor()``
Often, queries can be made more efficient by treating basic blocks as a unit instead of reasoning about individual control flow nodes.
Exercise: unreachable blocks
============================
Write a query to find unreachable basic blocks.
**Hint**: First define a recursive predicate to identify reachable blocks. Class ``EntryBasicBlock`` may be useful.
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/control-flow-cpp-4.ql
:language: ql
.. note::
This query has a good number of false positives on Chakra, many of them to do with templating and macros.
Guard conditions
================
A ``GuardCondition`` is a ``Boolean`` condition that controls one or more basic blocks in the sense that it is known to be true/false at the entry of those blocks.
- ``GuardCondition.controls(BasicBlock bb, boolean outcome):`` the entry of bb can only be reached if the guard evaluates to outcome
- ``GuardCondition.comparesLt, GuardCondition.ensuresLt, GuardCondition.comparesEq:`` auxiliary predicates to identify conditions that guarantee that one expression is less than/equal to another
Further materials
=================
- QL for C/C++: https://help.semmle.com/QL/learn-ql/ql/cpp/ql-for-cpp.html
- API reference: https://help.semmle.com/qldoc/cpp
.. rst-class:: end-slide
Extra slides
============
.. rst-class:: background2
Appendix: Library customizations
================================
Call graph customizations
=========================
The default implementation of call target resolution does not handle function pointers, because they are difficult to deal with in general.
We can, however, add support for particular patterns of use by contributing a new override of ``Call.getTarget``.
Exercise: unresolvable calls
============================
Write a query that finds all calls for which no call target can be determined, and run it on libjpeg-turbo.
Examine the results. What do you notice?
.. rst-class:: build
.. code-block:: ql
import cpp
from Call c
where not exists(c.getTarget())
select c
.. rst-class:: build
- Many results are calls through struct fields emulating virtual dispatch.
Exercise: resolving calls through variables
===========================================
Write a query that resolves the call at `cjpeg.c:640 <https://github.com/libjpeg-turbo/libjpeg-turbo/blob/9bc8eb6449a32f452ab3fc9f94af672a0af13f81/cjpeg.c#L640>`__.
**Hint**: Use classes ``ExprCall``, ``PointerDereferenceExpr``, and ``Access``.
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/control-flow-cpp-5.ql
:language: ql
Exercise: customizing the call graph
====================================
Create a subclass of ``ExprCall`` that uses your query to implement ``getTarget``.
.. rst-class:: build
.. code-block:: ql
class CallThroughVariable extends ExprCall {
Variable v;
CallThroughVariable() {
exists(PointerDereferenceExpr callee | callee = getExpr() |
callee.getOperand() = v.getAnAccess()
)
}
override Function getTarget() {
result = super.getTarget() or
exists(Access init | init = v.getAnAssignedValue() |
result = init.getTarget()
)
}
}
Control-flow graph customizations
=================================
The default control-flow graph implementation recognizes a few common patterns for non-returning functions, but sometimes it fails to spot them, which can cause imprecision.
We can add support for new non-returning functions by overriding ``ControlFlowNode.getASuccessor()``.
Exercise: calls to ``error_exit``
=================================
Write a query that finds all calls to a field called ``error_exit``.
**Hint**: Reuse (parts of) the ``CallThroughVariable`` class from before.
.. rst-class:: build
.. code-block:: ql
class CallThroughVariable extends ExprCall { ... }
class ErrorExitCall extends CallThroughVariable {
override Field v;
ErrorExitCall() { v.getName() = "error_exit" }
}
from ErrorExitCall eec
select eec
Exercise: customizing the control-flow graph
============================================
Override ``ControlFlowNode`` to mark calls to ``error_exit`` as non-returning.
**Hint**: ``ExprCall`` is an indirect subclass of ``ControlFlowNode``.
.. rst-class:: build
.. code-block:: ql
class CallThroughVariable extends ExprCall { ... }
class ErrorExitCall extends CallThroughVariable {
override Field v;
ErrorExitCall() { v.getName() = "error_exit" }
override ControlFlowNode getASuccessor() { none() }
}
``CustomOptions`` class
=======================
The Options library defines a ``CustomOptions`` class with various member predicates that can be overridden to customize aspects of the analysis.
In particular, it has an ``exprExits`` predicate that can be overridden to more easily perform the customization on the previous slide:
.. code-block:: ql
import Options
class MyOptions extends CustomOptions {
override predicate exprExits(Expr e) {
super.exprExits(e) or ...
}
}

View File

@@ -0,0 +1,199 @@
=========================
Introduction to data flow
=========================
Finding string formatting vulnerabilities in C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
.. rst-class:: setup
Setup
=====
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `dotnet/coreclr snapshot <http://downloads.lgtm.com/snapshots/cpp/dotnet/coreclr/dotnet_coreclr_fbe0c77.zip>`__
.. note::
For the examples in this presentation, we will be analyzing `dotnet/coreclr <https://github.com/dotnet/coreclr>`__.
You can query the project in `the query console <https://lgtm.com/query/projects:1505958977333/lang:cpp/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
.. rst-class:: agenda
Agenda
======
- Non-constant format string
- Data flow
- Modules and libraries
- Local data flow
- Local taint tracking
Motivation
==========
Lets write a query to identify instances of `CWE-134 <https://cwe.mitre.org/data/definitions/134.html>`__ **Use of externally controlled format string**.
.. code-block:: cpp
printf(userControlledString, arg1);
**Goal**: Find uses of ``printf`` (or similar) where the format string can be controlled by an attacker.
.. note::
Formatting functions allow the programmer to construct a string output using a *format string* and an optional set of arguments. The *format string* is specified using a simple template language, where the output string is constructed by processing the format string to find *format specifiers*, and inserting values provided as arguments. For example:
.. code-block:: cpp
printf("Name: %s, Age: %d", "Freddie", 2);
would produce the output ``"Name: Freddie, Age: 2”``. So far, so good. However, problems arise if there is a mismatch between the number of formatting specifiers, and the number of arguments. For example:
.. code-block:: cpp
printf("Name: %s, Age: %d", "Freddie");
In this case, we have one more format specifier than we have arguments. In a managed language such as Java or C#, this simply leads to a runtime exception. However, in C/C++, the formatting functions are typically implemented by reading values from the stack without any validation of the number of arguments. This means a mismatch in the number of format specifiers and format arguments can lead to information disclosure.
Of course, in practice this happens rarely with *constant* formatting strings. Instead, its most problematic when the formatting string can be specified by the user, allowing an attacker to provide a formatting string with the wrong number of format specifiers. Furthermore, if an attacker can control the format string, they may be able to provide the ``%n`` format specifier, which causes ``printf`` to write the number characters in the generated output string to a specified location.
See https://en.wikipedia.org/wiki/Uncontrolled_format_string for more background.
Exercise: Non-constant format string
====================================
Write a query that flags ``printf`` calls where the format argument is not a ``StringLiteral``.
**Hint**: Import ``semmle.code.cpp.commons.Printf`` and use class ``FormattingFunction`` and ``getFormatParameterIndex()``.
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/data-flow-cpp-1.ql
:language: ql
.. note::
This first query is about finding places where the format specifier is not a constant string. In QL for C/C++, constant strings are modeled as ``StringLiteral`` nodes, so we are looking for calls to format functions where the format specifier argument is not a string literal.
The `C/C++ standard libraries <https://help.semmle.com/qldoc/cpp/>`__ include many different formatting functions that may be vulnerable to this particular attackincluding ``printf``, ``snprintf``, and others. Furthermore, each of these different formatting functions may include the format string in a different position in the argument list. Instead of laboriously listing all these different variants, we can make use of the QL for C/C++ standard library class ``FormattingFunction``, which provides an interface that models common formatting functions in C/C++.
Meh...
======
Results are unsatisfactory:
- Query flags cases where the format string is a symbolic constant.
- Query flags cases where the format string is itself a format argument.
- Query doesn't recognize wrapper functions around ``printf``-like functions.
We need something better.
.. note::
For example, consider the results which appear in ``/src/ToolBox/SOS/Strike/util.h``, between lines 965 and 970:
.. code-block:: cpp
const char *format = align == AlignLeft ? "%-*.*s" : "%*.*s";
if (IsDMLEnabled())
DMLOut(format, width, precision, mValue);
else
ExtOut(format, width, precision, mValue);
Here, ``DMLOut`` and ``ExtOut`` are macros that expand to formatting calls. The format specifier is not constant, in the sense that the format argument is not a string literal. However, it is clearly one of two possible constants, both with the same number of format specifiers.
What we need is a way to determine whether the format argument is ever set to something that is not constant.
.. include general data flow slides
.. include:: ../slide-snippets/local-data-flow.rst
.. resume language-specific slides
Exercise: source nodes
======================
Define a subclass of ``DataFlow::Node`` representing “source” nodes, that is, nodes without a (local) data flow predecessor.
**Hint**: use ``not exists()``.
.. rst-class:: build
.. code-block:: ql
class SourceNode extends DataFlow::Node {
SourceNode() {
not DataFlow::localFlowStep(_, this)
}
}
.. note::
Note the scoping of the `dont-care variable <https://help.semmle.com/QL/ql-handbook/expressions.html#don-t-care-expressions>`__ “_” in this example: the body of the characteristic predicate is equivalent to:
.. code-block:: ql
not exists(DataFlow::Node pred | DataFlow::localFlowStep(pred, this))
which is not the same as:
.. code-block:: ql
exists(DataFlow::Node pred | not DataFlow::localFlowStep(pred, this)).
Revisiting non-constant format strings
======================================
Refine the query to find calls to ``printf``-like functions where the format argument derives from a local source that is, not a constant string.
.. rst-class:: build
.. code-block:: ql
import cpp
import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.commons.Printf
class SourceNode extends DataFlow::Node { ... }
from FormattingFunction f, Call c, SourceNode src, DataFlow::Node arg
where c.getTarget() = f and
arg.asExpr() = c.getArgument(f.getFormatParameterIndex()) and
DataFlow::localFlow(src, arg) and
not src.asExpr() instanceof StringLiteral
select arg, "Non-constant format string."
Refinements (take home exercise)
================================
Audit the results and apply any refinements you deem necessary.
Suggestions:
- Replace ``DataFlow::localFlowStep`` with a custom predicate that includes steps through global variable definitions.
**Hint**: Use class ``GlobalVariable`` and its member predicates ``getAnAssignedValue()`` and ``getAnAccess()``.
- Exclude calls in wrapper functions that just forward their format argument to another ``printf``-like function; instead, flag calls to those functions.
Beyond local data flow
======================
- Results are still underwhelming.
- Dealing with parameter passing becomes cumbersome.
- Instead, lets turn the problem around and find user-controlled data that flows into a ``printf`` format argument, potentially through calls.
- This needs :doc:`global data flow <global-data-flow-cpp>`.

View File

@@ -0,0 +1,195 @@
================================
Introduction to global data flow
================================
QL for C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
.. rst-class:: setup
Setup
=====
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `dotnet/coreclr snapshot <http://downloads.lgtm.com/snapshots/cpp/dotnet/coreclr/dotnet_coreclr_fbe0c77.zip>`__
.. note::
For the examples in this presentation, we will be analyzing `dotnet/coreclr <https://github.com/dotnet/coreclr>`__.
You can query the project in `the query console <https://lgtm.com/query/projects:1505958977333/lang:cpp/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
.. rst-class:: agenda
Agenda
======
- Global taint tracking
- Sanitizers
- Path queries
- Data flow models
.. insert common global data flow slides
.. include:: ../slide-snippets/global-data-flow.rst
.. resume language-specific global data flow slides
Finding tainted format strings (outline)
========================================
.. literalinclude:: ../query-examples/cpp/global-data-flow-cpp-1.ql
:language: ql
.. note::
Heres the outline for a inter-procedural (that is, “global”) version of the tainted formatting strings query we saw in the previous slide deck. The same template will be applicable for most taint tracking problems.
Defining sources
================
The library class ``SecurityOptions`` provides a (configurable) model of what counts as user-controlled data:
.. code-block:: ql
import semmle.code.cpp.security.Security
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isSource(DataFlow::Node source) {
exists (SecurityOptions opts |
opts.isUserInput(source.asExpr(), _)
)
}
...
}
.. note::
We first define what it means to be a *source* of tainted data for this particular problem. In this case, what we care about is whether the format string can be provided by an external user to our application or service. As there are many such ways external data could be introduced into the system, the standard QL libraries for C/C++ include an extensible API for modeling user input. In this case, we will simply use the predefined set of *user inputs*, which includes arguments provided to command line applications.
Defining sinks (exercise)
=========================
Use the ``FormattingFunction`` class to fill in the definition of ``isSink``.
.. code-block:: ql
import semmle.code.cpp.security.Security
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isSink(DataFlow::Node sink) {
/* Fill me in */
}
...
}
.. note::
The second part is to define what it means to be a sink for this particular problem. The queries from the previous slide deck will be useful for this exercise.
Defining sinks (answer)
=======================
Use the ``FormattingFunction`` class, we can write the sink as:
.. code-block:: ql
import semmle.code.cpp.security.Security
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isSink(DataFlow::Node sink) {
exists (FormattingFunction ff, Call c |
c.getTarget() = ff and
c.getArgument(ff.getFormatParameterIndex()) = sink.asExpr()
)
}
...
}
.. note::
When we run this query, we should find a single result. However, it is tricky to determine whether this result is a true positive (a “real” result) because our query only reports the source and the sink, and not the path through the graph between the two.
.. insert path queries slides
.. include:: ../slide-snippets/path-queries.rst
.. resume language-specific global data flow slides
Defining additional taint steps
===============================
Add an additional taint step that (heuristically) taints a local variable if it is a pointer, and it is passed to a function in a parameter position that taints it.
.. code-block:: ql
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isAdditionalTaintStep(DataFlow::Node pred,
DataFlow::Node succ) {
exists (Call c, Expr arg, LocalVariable lv |
arg = c.getAnArgument() and
arg = pred.asExpr() and
arg.getFullyConverted().getUnderlyingType() instanceof PointerType and
arg = lv.getAnAccess() and
succ.asUninitialized() = lv
)
}
...
}
Defining sanitizers
===================
Add a sanitizer, stopping propagation at parameters of formatting functions, to avoid double-reporting:
.. code-block:: ql
class TaintedFormatConfig extends TaintTracking::Configuration {
override predicate isSanitizer(DataFlow::Node nd) {
exists (FormattingFunction ff, int idx |
idx = ff.getFormatParameterIndex() and
nd = DataFlow::parameterNode(ff.getParameter(idx))
)
}
...
}
Data flow models
================
- To provide models of data/taint flow through library functions, you can implement subclasses of ``DataFlowFunction`` (from ``semmle.code.cpp.models.interfaces.DataFlow``) and ``TaintFunction`` (from ``semmle.code.cpp.models.interfaces.Taint``), respectively
- Example: model of taint flow from third to first parameter of ``memcpy``
.. code-block:: ql
class MemcpyFunction extends TaintFunction {
MemcpyFunction() { this.hasName("memcpy") }
override predicate hasTaintFlow(FunctionInput i, FunctionOutput o)
i.isInParameter(2) and o.isOutParameterPointer(0)
}
}
.. note::
See the API documentation for more details about ``FunctionInput`` and ``FunctionOutput``.
.. rst-class:: end-slide
Extra slides
============
.. include:: ../slide-snippets/global-data-flow-extra-slides.rst

View File

@@ -0,0 +1,212 @@
================================
Introduction to variant analysis
================================
QL for C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
.. rst-class:: setup
Setup
=====
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `exiv2 snapshot <http://downloads.lgtm.com/snapshots/cpp/exiv2/Exiv2_exiv2_b090f4d.zip>`__
.. note::
For this example, we will be analyzing `exiv2 <https://github.com/Exiv2/exiv2>`__.
You can also query the project in `the query console <https://lgtm.com/query/project:1506532406873/lang:cpp/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
.. Include language-agnostic section here
.. include:: ../slide-snippets/intro-ql-general.rst
Oops
====
.. code-block:: cpp
:emphasize-lines: 3
int write(int buf[], int size, int loc, int val) {
if (loc >= size) {
// return -1;
}
buf[loc] = val;
return 0;
}
- The return statement has been commented out (during debugging?)
- The if statement is now dead code
- No bounds checking!
.. note::
Heres a simple (artificial) bug, which well develop a QL query to catch.
This function writes a value to a given location in an array, first trying to do a bounds check to validate that the location is within bounds. However, the return statement has been commented out, leaving a redundant if statement and no bounds checking.
This case can act as our “patient zero” in the variant analysis game.
A simple QL query
=================
.. literalinclude:: ../query-examples/cpp/empty-if-cpp.ql
:language: ql
.. note::
We are going to write a simple query which finds “if statements” with empty “then” blocks, so we can highlight the results like those on the previous slide. The query can be run in the `query console on LGTM <https://lgtm.com/query>`__, or in your `IDE <https://lgtm.com/help/lgtm/running-queries-ide>`__.
A `QL query <https://help.semmle.com/QL/ql-handbook/queries.html>`__ consists of a “select” clause that indicates what results should be returned. Typically it will also provide a “from” clause to declare some variables, and a “where” clause to state conditions over those variables. For more information on the structure of query files (including links to useful topics in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__), see `Introduction to query files <https://help.semmle.com/QL/learn-ql/ql/writing-queries/introduction-to-queries.html>`__.
In our example here, the first line of the query imports the `C/C++ standard QL library <https://help.semmle.com/qldoc/cpp/>`__, which defines concepts like ``IfStmt`` and ``Block``.
The query proper starts by declaring two variablesifStmt and block. These variables represent sets of values in the database, according to the type of each of the variables. For example, ifStmt has the type IfStmt, which means it represents the set of all if statements in the program.
If we simply selected these two variables::
from IfStmt ifStmt, Block block
select ifStmt, block
We would get a result row for every combination of blocks and if statements in the program. This is known as a cross-product, because there is no logical condition linking the two variables. We can use the where clause to specify the condition that we are only interested in rows where the “block” is the “then” part of the if statement. We do this by specifying::
block = ifStmt.getThen()
This states that the block is equal to (not assigned!) the “then” part of the ``ifStmt``. ``getThen()`` is an operation which is available on any IfStmt. One way to interpret this is as a filtering operation starting with every pair of block and if statements, check each one to whether the logical condition holds, and only keep the row if that is the case.
We can add a second condition that specifies the block must be empty::
and block.isEmpty()
The ``isEmpty()`` operation is available on any Block, and is only true if the “block” has no children.
Finally, we select a location, at which to report the problem, and a message, to explain what the problem is.
Structure of a QL query
=======================
A **query file** has the extension ``.ql`` and contains a **query clause**, and optionally **predicates**, **classes**, and **modules**.
A **query library** has the extension ``.qll`` and does not contain a query clause, but may contain modules, classes, and predicates.
Each query library also implicitly defines a module.
**Import** statements allow the classes and predicates defined in one module to be used in another.
.. note::
QL queries are always contained in query files with the file extension ``.ql``. `Quick queries <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/quick-query.html>`__, run in `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/home-page.html>`__, are no exception: the quick query window maintains a temporary QL file in the background.
Parts of queries can be lifted into `QL library files <https://help.semmle.com/QL/ql-handbook/modules.html#library-modules>`__ with the extension ``qll``. Definitions within such libraries can be brought into scope using ``import`` statements, and similarly QLL files can import each others definitions using “import” statements.
Logic can be encapsulated as user-defined `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ and `classes <https://help.semmle.com/QL/ql-handbook/types.html#classes>`__, and organized into `modules <https://help.semmle.com/QL/ql-handbook/modules.html>`__. Each QLL file implicitly defines a module, but QL and QLL files can also contain explicit module definitions, as we will see later.
Predicates in QL
================
A predicate allows you to pull out and name parts of a query.
.. container:: column-left
.. literalinclude:: ../query-examples/cpp/empty-if-cpp.ql
:language: ql
:emphasize-lines: 6
.. container:: column-right
.. literalinclude:: ../query-examples/cpp/empty-if-cpp-predicate.ql
:language: ql
:emphasize-lines: 3-5
.. note::
A QL predicate takes zero or more parameters, and its body is a condition on those parameters. The predicate may (or may not) hold. Predicates may also be recursive, simply by referring to themselves (directly or indirectly).
You can imagine a predicate to be a self-contained from-where-select statement, that produces an intermediate relation, or table. In this case, the ``isEmpty`` predicate will be the set of all blocks which are empty.
Classes in QL
=============
A QL class allows you to name a set of values and define (member) predicates on them.
A class has at least one supertype and optionally a **characteristic predicate**; it contains the values that belong to *all* supertypes *and* satisfy the characteristic predicate, if provided.
Member predicates are inherited and can be overridden.
.. code-block:: ql
class EmptyBlock extends Block {
EmptyBlock() {
this.getNumStmt() = 0
}
}
.. note::
`Classes <https://help.semmle.com/QL/ql-handbook/types.html#classes>`__ model sets of values from the database. A class has one or more supertypes, and inherits `member predicates <https://help.semmle.com/QL/ql-handbook/types.html#member-predicates>`__ (methods) from each of them. Each value in a class must be in every supertype, but additional conditions can be stated in a so-called **characteristic predicate**, which looks a bit like a zero-argument constructor.
In the example, declaring a variable “EmptyBlock e” will allow it to range over only those blocks that have zero statements.
Classes in QL continued
=======================
.. container:: column-left
.. literalinclude:: ../query-examples/cpp/empty-if-cpp-predicate.ql
:language: ql
:emphasize-lines: 3-5
.. container:: column-right
.. literalinclude:: ../query-examples/cpp/empty-if-cpp-class.ql
:language: ql
:emphasize-lines: 3-7
.. note::
As shown in this example, classes behave much like unary predicates, but with ``instanceof`` instead of predicate calls to check membership. Later on, we will see how to define member predicates on classes.
Iterative query refinement
==========================
- **Common workflow**: Start with a simple query, inspect a few results, refine, repeat.
- For example, empty then branches are not a problem if there is an else.
- **Exercise**: How can we refine the query to take this into account?
**Hints**:
- Use member predicate ``IfStmt.getElse()``
- Use ``not exists(...)``
.. note::
QL makes it very easy to experiment with analysis ideas. A common workflow is to start with a simple query (like our “redundant if-statement” example), examine a few results, refine the query based on any patterns that emerge and repeat.
As an exercise, refine the redundant-if query based on the observation that if the if-statement has an “else” clause, then even if the body of the “then” clause is empty, its not actually redundant.
Model answer: redundant if-statement
=====================================
.. literalinclude:: ../query-examples/cpp/empty-if-cpp-model.ql
.. note::
You can explore the results generated when this query is run on exiv2 in LGTM `here <https://lgtm.com/query/4641433299746527262/>`__.

View File

@@ -0,0 +1,124 @@
======================
Program representation
======================
QL for C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
.. rst-class:: agenda
Agenda
======
- Abstract syntax trees
- Database representation
- Symbol tables
- Variables
- Functions
.. insert abstract-syntax-tree.rst
.. include:: ../slide-snippets/abstract-syntax-tree.rst
.. resume slides
AST QL classes
==============
Important AST classes include:
- ``Expr``: expressions such as assignments, variable references, function calls, ...
- ``Stmt``: statements such as conditionals, loops, try statements, ...
- ``DeclarationEntry``: places where functions, variables or types are declared and/or defined
These three (and all other AST classes) are subclasses of ``Element``.
Symbol table
============
The database also includes information about the symbol table associated with a program:
- ``Variable``: all variables, including local variables, global variables, static variables and member variables
- ``Function``: all functions, including member function
- ``Type``: built-in and user-defined types
Working with variables
======================
``Variable`` represents program variables, including locally scoped variables (``LocalScopeVariable``), global variables (``GlobalVariable``), and others:
- ``string Variable.getName()``
- ``Type Variable.getType()``
``Access`` represents references to declared entities such as functions (``FunctionAccess``) and variables (``VariableAccess``), including fields (``FieldAccess``).
- ``Declaration Access.getTarget()``
``VariableDeclarationEntry`` represents declarations or definitions of a variable.
- ``Variable VariableDeclarationEntry.getVariable()``
Working with functions
======================
Functions are represented by the Function QL class. Each declaration or definition of a function is represented by a ``FunctionDeclarationEntry``.
Calls to functions are modeled by QL class Call and its subclasses:
- ``Call.getTarget()`` gets the declared target of the call; undefined for calls through function pointers
- ``Function.getACallToThisFunction()`` gets a call to this function
Typically, functions are identified by name:
- ``string Function.getName()``
- ``string Function.getQualifiedName()``
Working with preprocessor logic
===============================
Macros and other preprocessor directives can easily cause confusion when analyzing programs:
- AST structure reflects the program *after* preprocessing.
- Locations refer to the original source text *before* preprocessing.
For example, in:
.. code-block:: cpp
#define square(x) x*x
y = square(y0), z = square(z0)
there are no AST nodes corresponding to ``square(y0)`` or ``square(z0)``, but there are AST nodes corresponding to ``y0*y0`` and ``z0*z0``.
.. note::
The C preprocessor poses a dilemma: un-preprocessed code cannot, in general, be parsed and analyzed meaningfully, but showing results in preprocessed code is not useful to developers. Our solution is to base the AST representation on preprocessed source (in the same way as compilers do), but associate AST nodes with locations in the original source text.
Working with macros
===================
.. code-block:: cpp
#define square(x) x*x
y = square(y0), z = square(z0)
is represented in the snapshot database as:
- A Macro entity representing the text of the *head* and *body* of the macro
- Assignment nodes, representing the two assignments after preprocessing
- Left-hand sides are ``VariableAccess`` nodes of y and z
- Right-hand sides are ``MulExpr`` nodes representing ``y0*y0`` and ``z0*z0``
- A ``MacroAccess`` entity, which associates the Macro with the ``MulExprs``
Useful predicates on ``Element``: ``isInMacroExpansion()``, ``isAffectedByMacro()``
.. note::
The snapshot also contains information about macro definitions, which are represented by class ``Macro``. These macro definitions are related to the AST nodes resulting from their uses by the class ``MacroAccess``.

View File

@@ -0,0 +1,105 @@
===============================
Exercise: ``snprintf`` overflow
===============================
QL for C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
.. rst-class:: setup
Setup
=====
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `rsyslog snapshot <https://downloads.lgtm.com/snapshots/cpp/rsyslog/rsyslog/rsyslog-all-revision-2018-April-27--14-12-31.zip>`__
.. note::
For this example, we will be analyzing `rsyslog <https://github.com/rsyslog/rsyslog>`__.
You can also query the project in `the query console <https://lgtm.com/query/project:1506087977050/lang:cpp/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. resume slides
``snprintf``
============
.. rst-class:: build
- ``printf``: Returns number of characters printed.
.. code-block:: cpp
printf("Hello %s!", name)
- ``sprintf``: Returns number of characters written to ``buf``.
.. code-block:: cpp
sprintf(buf, "Hello %s!", name)
- ``snprintf``: Returns number of characters it **would have written** to ``buf`` had ``n`` been sufficiently large, **not** the number of characters actually written.
.. code-block:: cpp
snprintf(buf, n, "Hello %s!", name)
- In pre-C99 versions of glibc ``snprintf`` would return -1 if ``n`` was too small!
RCE in rsyslog
==============
- Vulnerable code looked similar to this (`original <https://github.com/rsyslog/librelp/blob/532aa362f0f7a8d037505b0a27a1df452f9bac9e/src/tcp.c#L1195-L1211>`__):
.. code-block:: cpp
char buf[1024];
int pos = 0;
for (int i = 0; i < n; i++) {
pos += snprintf(buf + pos, sizeof(buf) - pos, "%s", strs[i]);
}
- Disclosed as `CVE-2018-1000140 <https://nvd.nist.gov/vuln/detail/CVE-2018-1000140>`__.
- Blog post: `https://blog.semmle.com/librelp-buffer-overflow-cve-2018-1000140/ <https://blog.semmle.com/librelp-buffer-overflow-cve-2018-1000140/>`__.
Finding the RCE yourself
========================
#. Write a query to find calls to ``snprintf``
**Hint**: Use class ``FunctionCall``
#. Restrict to calls whose result is used
**Hint**: Use class ``ExprInVoidContext``
#. Restrict to calls where the format string contains “%s”
**Hint**: Use predicates ``Expr.getValue`` and ``string.regexpMatch``
#. Restrict to calls where the result flows back to the size argument
**Hint**: Import library ``semmle.code.cpp.dataflow.TaintTracking`` and use predicate ``TaintTracking::localTaint``
Model answer
============
.. literalinclude:: ../query-examples/cpp/snprintf-1.ql
:language: ql
.. rst-class:: build
- More full-featured version: `https://lgtm.com/rules/1505913226124 <https://lgtm.com/rules/1505913226124>`__.
.. note::
The regular expression for matching the format string uses the “(?s)” directive to ensure that “.” also matches any newline characters embedded in the string.