docs: 2nd round of suggestions

This commit is contained in:
james
2019-08-16 09:49:19 +01:00
parent feb4d26de8
commit a35241e4cd
9 changed files with 87 additions and 47 deletions

View File

@@ -43,7 +43,7 @@ Where might this go wrong?
- In C/C++ we often need to check for whether an operation `overflows <https://en.wikipedia.org/wiki/Integer_overflow>`__.
- An overflow is when an arithmetic operation, such as an addition, results in a number which is too large to be stored in the type.
- When an operation overflows, the value “wraps” around.
- A typical way to check for overflow of an addition, therefore, is whether the result is less than one of the arguments - that is the result has **wrapped**.
- A typical way to check for overflow of an addition, therefore, is whether the result is less than one of the argumentsthat is the result has **wrapped**.
Integer promotion
=================
@@ -139,13 +139,13 @@ Lets look for overflow guards of the form ``v + b < v``, using the classes
- When performing `variant analysis <https://semmle.com/variant-analysis>`__, it is usually helpful to write a simple query that finds the simple syntactic pattern, before trying to go on to describe the cases where it goes wrong.
- In this case, we start by looking for all the *overflow* checks, before trying to refine the query to find all *bad overflow* checks.
- The select clause defines what this query is looking for:
- The ``select`` clause defines what this query is looking for:
- an ``AddExpr``: the expression that is being checked for overflow.
- a ``RelationalOperation``: the overflow comparison check.
- a ``Variable``: used as an argument to both the addition and comparison.
- The where part of the query ties these three QL variables together using `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ defined in the `standard QL for C/C++ library <https://help.semmle.com/qldoc/cpp/>`__.
- The ``where`` part of the query ties these three QL variables together using `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ defined in the `standard QL for C/C++ library <https://help.semmle.com/qldoc/cpp/>`__.
QL query: bad overflow guards
=============================
@@ -197,14 +197,14 @@ Now our query becomes:
- However, this is a little bit repetitive. What we really want to say is that: all the operands of the addition are small. Fortunately, QL provides a ``forall`` formula that we can use in these circumstances.
- A ``forall`` has three parts:
- A declaration part, where we can introduce variables.
- A declaration part, where we can introduce variables.
- A “range” part, which allows us to restrict those variables.
- A “condition” part. The ``forall`` as a whole holds if the condition holds for each of the values in the range.
- In our case:
- The declaration introduces a variable for expressions, called ``op``. At this stage, this variable represents all the expressions in the program.
- The “range” part, ``op = a.getAnOperand()``, restricts ``op`` to being one of the two operands to the addition.
- The “condition” part, ``isSmall(op)``, says that the ``forall`` holds only if the condition - that the ``op`` is small - holds for everything in the range - that is both the arguments to the addition.
- The “condition” part, ``isSmall(op)``, says that the ``forall`` holds only if the condition (that the ``op`` is small) holds for everything in the rangethat is, both the arguments to the addition.
QL query: bad overflow guards
=============================

View File

@@ -79,7 +79,7 @@ Control flow graphs
Modeling control flow
=====================
The control flow is modelled with a QL class, ``ControlFlowNode``. Examples of control flow nodes include statements and expressions.
The control flow is modeled with a QL class, ``ControlFlowNode``. Examples of control flow nodes include statements and expressions.
``ControlFlowNode`` provides API for traversing the control flow graph:
@@ -88,7 +88,7 @@ The control flow is modelled with a QL class, ``ControlFlowNode``. Examples of c
- ``ControlFlowNode ControlFlowNode.getATrueSuccessor()``
- ``ControlFlowNode ControlFlowNode.getAFalseSuccessor()``
The control-flow graph is *intra-procedural* - in other words, only models paths within a function. To find the associated function, use
The control-flow graph is *intra-procedural*in other words, only models paths within a function. To find the associated function, use
- ``Function ControlFlowNode.getControlFlowScope()``
@@ -101,7 +101,7 @@ The control-flow graph is *intra-procedural* - in other words, only models paths
Example: malloc/free pairs
==========================
Find calls to free that are reachable from an allocation on the same variable:
Find calls to ``free`` that are reachable from an allocation on the same variable:
.. literalinclude:: ../query-examples/cpp/control-flow-cpp-1.ql
:language: ql
@@ -127,7 +127,7 @@ Based on this query, write a query that finds accesses to the variable that occu
Utilizing recursion
===================
The main problem we observed in the previous exercise was that the successors relation is unaware of changes to the variable that would invalidate our results.
The main problem we observed in the previous exercise was that the successor's relation is unaware of changes to the variable that would invalidate our results.
We can fix this by writing our own successor predicate that stops traversing the CFG if the variable is re-defined.
@@ -200,6 +200,21 @@ Write a query to find unreachable basic blocks.
This query has a good number of false positives on Chakra, many of them to do with templating and macros.
Guard conditions
================
A ``GuardCondition`` is a ``Boolean`` condition that controls one or more basic blocks in the sense that it is known to be true/false at the entry of those blocks.
- ``GuardCondition.controls(BasicBlock bb, boolean outcome):`` the entry of bb can only be reached if the guard evaluates to outcome
- ``GuardCondition.comparesLt, GuardCondition.ensuresLt, GuardCondition.comparesEq:`` auxiliary predicates to identify conditions that guarantee that one expression is less than/equal to another
Further materials
=================
- QL for C/C++: https://help.semmle.com/QL/learn-ql/ql/cpp/ql-for-cpp.html
- API reference: https://help.semmle.com/qldoc/cpp
.. rst-class:: end-slide
Extra slides
@@ -274,7 +289,7 @@ Create a subclass of ``ExprCall`` that uses your query to implement ``getTarget`
}
}
Control flow graph customizations
Control-flow graph customizations
=================================
The default control-flow graph implementation recognizes a few common patterns for non-returning functions, but sometimes it fails to spot them, which can cause imprecision.

View File

@@ -24,8 +24,8 @@ Alternatively, you can query the project in `the query console <https://lgtm.com
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been addedthe snapshot available to download above is based on an historical version of the code base.
Agenda
======
Overview
========
- Non-constant format string
- Data flow
@@ -108,7 +108,7 @@ We need something better.
Here, ``DMLOut`` and ``ExtOut`` are macros that expand to formatting calls. The format specifier is not constant, in the sense that the format argument is not a string literal. However, it is clearly one of two possible constants, both with the same number of format specifiers.
What we need is a way to determine whether the format argument is ever set to something that is, not constant.
What we need is a way to determine whether the format argument is ever set to something that is not constant.
Data flow analysis
==================
@@ -127,7 +127,7 @@ Data flow analysis
The solution here is to use *data flow*. Data flow is, as the name suggests, about tracking the flow of data through the program. It helps answers questions like: *does this expression ever hold a value that originates from a particular other place in the program*?
We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two edges.
We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in the program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two edges.
Data flow graphs
================
@@ -246,19 +246,22 @@ Data flow graph
The data flow graph is separate from the `AST <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`__, to allow for flexibility in how data flow is modeled. There are a small number of data flow node typesexpression nodes, parameter nodes, uninitialized variable nodes, and definition by reference nodes. Each node provides mapping functions to and from the relevant AST (for example ``Expr``, ``Parameter`` etc.) or symbol table (for example ``Variable``) classes.
Taint-tracking
Taint tracking
==============
- Usually, we want to generalise slightly by not only considering plain data flow, but also “taint” propagation, that is, whether a value is influenced by or derived from another.
- Examples:
.. code-block:: cpp
.. code-block:: cpp
sink = source; // source -> sink: data and taint
strcat(sink, source); // source -> sink: taint, not data
sink = source; // source -> sink: data and taint
strcat(sink, source); // source -> sink: taint, not data
- Library ``semmle.code.cpp.dataflow.TaintTracking`` provides predicates for tracking taint:
- Library ``semmle.code.cpp.dataflow.TaintTracking`` provides predicates for tracking taint; ``TaintTracking::localTaintStep`` represents one (local) taint step, ``TaintTracking::localTaint`` is its transitive closure.
- ``TaintTracking::localTaintStep`` represents one (local) taint step
- ``TaintTracking::localTaint`` is its transitive closure.
.. note::
@@ -304,8 +307,20 @@ Refine the query to find calls to ``printf``-like functions where the format arg
.. rst-class:: build
.. literalinclude:: ../query-examples/cpp/data-flow-cpp-2.ql
:language: ql
.. code-block:: ql
import cpp
import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.commons.Printf
class SourceNode extends DataFlow::Node { ... }
from FormattingFunction f, Call c, SourceNode src, DataFlow::Node arg
where c.getTarget() = f and
arg.asExpr() = c.getArgument(f.getFormatParameterIndex()) and
DataFlow::localFlow(src, arg) and
not src.asExpr() instanceof StringLiteral
select arg, "Non-constant format string."
Refinements (take home exercise)
================================
@@ -325,4 +340,4 @@ Beyond local data flow
- Results are still underwhelming.
- Dealing with parameter passing becomes cumbersome.
- Instead, lets turn the problem around and find user-controlled data that flows into a ``printf`` format argument, potentially through calls.
- This needs **global data flow**.
- This needs :doc:`global data flow <global-data-flow-cpp>`.

View File

@@ -53,7 +53,7 @@ Global data flow and taint tracking
- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
- For global data flow (and taint tracking), we must therefore provided restrictions to ensure the problem is tractable.
- For global data flow (and taint tracking), we must therefore provide restrictions to ensure the problem is tractable.
- Typically, this involves specifying the *source* and *sink*.
.. note::
@@ -113,7 +113,7 @@ The library class ``SecurityOptions`` provides a (configurable) model of what co
.. note::
We first define what it means to be a *source* of tainted data for this particular problem. In this case, what we care about is whether the format string can be provided by an external user to our application or service. As there are many such ways external data could be introduced into the system, the standard QL libraries for C/C++ include an extensible API for modelling user input. In this case, we will simply use the predefined set of *user inputs*, which includes arguments provided to command line applications.
We first define what it means to be a *source* of tainted data for this particular problem. In this case, what we care about is whether the format string can be provided by an external user to our application or service. As there are many such ways external data could be introduced into the system, the standard QL libraries for C/C++ include an extensible API for modeling user input. In this case, we will simply use the predefined set of *user inputs*, which includes arguments provided to command line applications.
Defining sinks (exercise)
@@ -157,7 +157,7 @@ Use the ``FormattingFunction`` class to fill in the definition of “isSink”
.. note::
When we run this query, we should find a single result. However, it is tricky to determine whether this result is a true positive - a “real” result - because our query only reports the source and the sink, and not the path through the graph between the two.
When we run this query, we should find a single result. However, it is tricky to determine whether this result is a true positive (a “real” result) because our query only reports the source and the sink, and not the path through the graph between the two.
Path queries
============
@@ -182,7 +182,7 @@ Use this template:
.. note::
To see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to work - we need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
To see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to workwe need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
Defining additional taint steps
===============================
@@ -250,7 +250,7 @@ Extra slides
Exercise: How not to do global data flow
========================================
Implement a flowStep predicate extending localFlowStep with steps through function calls and returns. Why might we not want to use this?
Implement a ``flowStep`` predicate extending ``localFlowStep`` with steps through function calls and returns. Why might we not want to use this?
.. code-block:: ql

View File

@@ -60,14 +60,14 @@ The basic representation of an analyzed program is an *abstract syntax tree (AST
.. note::
When writing queries in QL it is important to have in mind the underlying representation of the program which is stored in the database. Typically queries make use of the “AST” representation of the program - a tree structure where program elements are nested within other program elements.
When writing queries in QL it is important to have in mind the underlying representation of the program which is stored in the database. Typically queries make use of the “AST” representation of the programa tree structure where program elements are nested within other program elements.
The “Introducing the C/C++ libraries” help topic contains a more complete overview of important AST classes and the rest of the C++ QL libraries: https://help.semmle.com/QL/learn-ql/ql/cpp/introduce-libraries-cpp.html
Database representations of ASTs
================================
AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaque - all one can do with them is to check their equality.
AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaqueall one can do with them is to check their equality.
Each entity belongs to an entity type. Entity types have names starting with “@” and are defined in the database schema (not in QL).
@@ -136,7 +136,7 @@ Working with functions
Functions are represented by the Function QL class. Each declaration or definition of a function is represented by a ``FunctionDeclarationEntry``.
Calls to functions are modelled by QL class Call and its subclasses:
Calls to functions are modeled by QL class Call and its subclasses:
- ``Call.getTarget()`` gets the declared target of the call; undefined for calls through function pointers
- ``Function.getACallToThisFunction()`` gets a call to this function

View File

@@ -52,13 +52,13 @@ RCE in rsyslog
- Vulnerable code looked similar to this (`original <https://github.com/rsyslog/librelp/blob/532aa362f0f7a8d037505b0a27a1df452f9bac9e/src/tcp.c#L1195-L1211>`__):
.. code-block:: cpp
char buf[1024];
int pos = 0;
for (int i = 0; i < n; i++) {
pos += snprintf(buf + pos, sizeof(buf) - pos, "%s", strs[i]);
}
.. code-block:: cpp
char buf[1024];
int pos = 0;
for (int i = 0; i < n; i++) {
pos += snprintf(buf + pos, sizeof(buf) - pos, "%s", strs[i]);
}
- Disclosed as `CVE-2018-1000140 <https://nvd.nist.gov/vuln/detail/CVE-2018-1000140>`__.
- Blog post: `https://blog.semmle.com/librelp-buffer-overflow-cve-2018-1000140/ <https://blog.semmle.com/librelp-buffer-overflow-cve-2018-1000140/>`__.

View File

@@ -4,4 +4,4 @@ from IfStmt ifstmt, Block block
where
block = ifstmt.getThen() and
block.getNumStmt() = 0
select ifstmt
select ifstmt, "This if-statement is redundant."

View File

@@ -15,12 +15,19 @@ The QL queries included in the latest Semmle release are open source. View them
**Extra information**
- Pressing ``p`` toggles extra notes (if they're on the current slide)
- Pressing ``f`` toggles full screen viewing
- Pressing ``o`` toggles overview mode
.. |arrow-l| unicode:: U+2190
.. |arrow-r| unicode:: U+2192
- Press |arrow-l| and |arrow-r| to navigate between slides
- Pressing **p** toggles between the slide and any extra notes (where they're available)
- Pressing **f** toggles full screen viewing on/off
.. note::
To run the queries featured in this training presentation, we recommend you download the free-to-use `QL for Eclipse plugin <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/getting-started.html>`__.
This plugin allows you to locally access the latest features of QL, including the standard QL libraries and queries. It also provides standard IDE features such as syntax highlighting, jump-to-definition, and tab completion.
When you have setup QL for Eclipse we recommend increasing the “Memory for running queries” from the default setting of 4096MB to 8192MB, to ensure that all the queries complete quickly.

View File

@@ -103,15 +103,18 @@ Find all instances!
Analysis overview
=================
.. rst-class:: build
- The database schema is (source) language specific, as are queries and libraries.
- Multi-language code bases are analyzed one language at a time.
.. container:: image-box
.. image:: ../_static-training/analysis-overview.png
.. rst-class:: build
- The database schema is (source) language specific, as are queries and libraries.
- Multi-language code bases are analyzed one language at a time.
.. note::