mirror of
https://github.com/github/codeql.git
synced 2026-04-30 19:26:02 +02:00
docs: fix some errors picked up by vale linter
This commit is contained in:
@@ -29,7 +29,7 @@ More resources:
|
||||
|
||||
Alternatively, you can query any project (including ChakraCore) in the `query console on LGTM.com <https://lgtm.com/query/project:2034240708/lang:cpp/>`__.
|
||||
|
||||
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
|
||||
Note that results generated in the query console are likely to differ to those generated in the QL plugin. LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
|
||||
|
||||
|
||||
Checking for overflow in C
|
||||
@@ -53,7 +53,7 @@ Where might this go wrong?
|
||||
- In C/C++ we often need to check for whether an operation `overflows <https://en.wikipedia.org/wiki/Integer_overflow>`__.
|
||||
- An overflow is when an arithmetic operation, such as an addition, results in a number which is too large to be stored in the type.
|
||||
- When an operation overflows, the value “wraps” around.
|
||||
- A typical way to check for overflow of an addition, therefore, is whether the result is less than one of the arguments - i.e. the result has “wrapped”.
|
||||
- A typical way to check for overflow of an addition, therefore, is whether the result is less than one of the arguments - that is the result has **wrapped**.
|
||||
|
||||
Integer promotion
|
||||
=================
|
||||
@@ -174,7 +174,7 @@ We can get the size (in bytes) of a type using the ``getSize()`` method.
|
||||
|
||||
- An important part of the query is to determine whether a given expression has a “small” type that is going to trigger integer promotion.
|
||||
- We therefore write a helper predicate for small expressions.
|
||||
- This predicate effectively represents the set of all expressions in the database where the size of the type of the expression is less than 4 bytes, i.e. less than 32 bits.
|
||||
- This predicate effectively represents the set of all expressions in the database where the size of the type of the expression is less than 4 bytes, that is less than 32 bits.
|
||||
|
||||
QL query: bad overflow guards
|
||||
=============================
|
||||
@@ -191,7 +191,7 @@ Now our query becomes:
|
||||
.. note::
|
||||
|
||||
- Recall from earlier that what makes an overflow check a “bad” check is that all the arguments to the addition are integers smaller than 32 bits.
|
||||
- We could write this by using our helper predicate ``isSmall`` to specify that each individual operand to the addition ``isSmall`` (i.e. under 32 bits):
|
||||
- We could write this by using our helper predicate ``isSmall`` to specify that each individual operand to the addition ``isSmall`` (that is under 32 bits):
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
@@ -206,12 +206,12 @@ Now our query becomes:
|
||||
- In our case:
|
||||
- The declaration introduces a variable for Expressions, called ``op``. At this stage, this variable represents all the expressions in the program.
|
||||
- The “range” part, ``op = a.getAnOperand()``, restricts ``op`` to being one of the two operands to the addition.
|
||||
- The “condition” part, ``isSmall(op)``, says that the ``forall`` holds only if the condition - that the ``op`` is small - holds for everything in the range - i.e. both the arguments to the addition
|
||||
- The “condition” part, ``isSmall(op)``, says that the ``forall`` holds only if the condition - that the ``op`` is small - holds for everything in the range - that is both the arguments to the addition
|
||||
|
||||
QL query: bad overflow guards
|
||||
=============================
|
||||
|
||||
In some cases the result of the addition is cast to a small type of size less than 4 bytes, preventing automatic widening. We don’t want our query to flag these instances.
|
||||
Sometimes the result of the addition is cast to a small type of size less than 4 bytes, preventing automatic widening. We don’t want our query to flag these instances.
|
||||
|
||||
We can use predicate ``Expr.getExplicitlyConverted()`` to reason about casts that are applied to an expression, adding this restriction to our query:
|
||||
|
||||
|
||||
@@ -28,7 +28,7 @@ More resources:
|
||||
|
||||
Alternatively, you can query any project (including ChakraCore) in the `query console on LGTM.com <https://lgtm.com/query/project:2034240708/lang:cpp/>`__.
|
||||
|
||||
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
|
||||
Note that results generated in the query console are likely to differ to those generated in the QL plugin. LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
|
||||
|
||||
Agenda
|
||||
======
|
||||
@@ -79,7 +79,10 @@ Control flow graphs
|
||||
|
||||
.. note::
|
||||
|
||||
The control flow graph is a static over-approximation of possible control flow at runtime. Its nodes are program elements such as expressions and statements. If there is an edge from one node to another, then it means that the semantic operation corresponding to the first node may be immediately followed by the operation corresponding to the second node. Some nodes (such as conditions of “if” statements or loop conditions) have more than one successor, representing conditional control flow at runtime.
|
||||
The control flow graph is a static over-approximation of possible control flow at runtime.
|
||||
Its nodes are program elements such as expressions and statements.
|
||||
If there is an edge from one node to another, then it means that the semantic operation corresponding to the first node may be immediately followed by the operation corresponding to the second node.
|
||||
Some nodes (such as conditions of “if” statements or loop conditions) have more than one successor, representing conditional control flow at runtime.
|
||||
|
||||
Modeling control flow
|
||||
=====================
|
||||
@@ -101,7 +104,7 @@ The control-flow graph is *intra-procedural* - in other words, only models paths
|
||||
|
||||
The control flow graph is similar in concept to data flow graphs. In contrast to data flow, however, the AST nodes are directly control flow graph nodes.
|
||||
|
||||
The predecessor/successor predicates are prime examples of member predicates with results that are used in functional syntax, but that are not actually functions, since a control flow node may have any number of predecessors and successors (including zero or more than one).
|
||||
The predecessor/successor predicates are prime examples of member predicates with results that are used in functional syntax, but that are not actually functions. This is because a control flow node may have any number of predecessors and successors (including zero or more than one).
|
||||
|
||||
Example: malloc/free pairs
|
||||
==========================
|
||||
|
||||
@@ -44,7 +44,7 @@ Agenda
|
||||
Motivation
|
||||
==========
|
||||
|
||||
Let’s write a query to identify instances of `CWE-134 <https://cwe.mitre.org/data/definitions/134.html>`__ “Use of externally controlled format string”.
|
||||
Let’s write a query to identify instances of `CWE-134 <https://cwe.mitre.org/data/definitions/134.html>`__ **Use of externally controlled format string**.
|
||||
|
||||
.. code-block:: cpp
|
||||
|
||||
@@ -60,7 +60,7 @@ Let’s write a query to identify instances of `CWE-134 <https://cwe.mitre.org/d
|
||||
|
||||
printf("Name: %s, Age: %d", "Freddie", 2);
|
||||
|
||||
would produce the output “Name: Freddie, Age: 2”. So far, so good. However, problems arise if there is a mismatch between the number of formatting specifiers, and the number of arguments. For example:
|
||||
would produce the output ``"Name: Freddie, Age: 2”``. So far, so good. However, problems arise if there is a mismatch between the number of formatting specifiers, and the number of arguments. For example:
|
||||
|
||||
.. code-block:: cpp
|
||||
|
||||
@@ -123,14 +123,14 @@ Data flow analysis
|
||||
|
||||
- Models flow of data through the program.
|
||||
- Implemented in the module ``semmle.code.cpp.dataflow.DataFlow``.
|
||||
- Class ``DataFlow::Node`` represents program elements that have a value, such as expressions and fucntion parameters.
|
||||
- Class ``DataFlow::Node`` represents program elements that have a value, such as expressions and function parameters.
|
||||
- Nodes of the data flow graph.
|
||||
- Various predicated represent flow between these nodes.
|
||||
Edges of the data flow graph.
|
||||
|
||||
.. note::
|
||||
|
||||
The solution here is to use *data flow*. Data flow is, as the name suggests, about tracking the flow of data through the program. It helps answers questions like “does this expression ever hold a value that originates from a particular other place in the program”.
|
||||
The solution here is to use *data flow*. Data flow is, as the name suggests, about tracking the flow of data through the program. It helps answers questions like: *does this expression ever hold a value that originates from a particular other place in the program*?
|
||||
|
||||
We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two edges.
|
||||
|
||||
@@ -225,7 +225,7 @@ So all references will need to be qualified (that is ``DataFlow::Node``)
|
||||
|
||||
A **query library** is file with the extension ``.qll``. Query libraries do not contain a query clause, but may contain modules, classes, and predicates. For example, the `C/C++ data flow library <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/DataFlow.qll/module.DataFlow.html>`__ is contained in the ``semmle/code/cpp/dataflow/DataFlow.qll`` QLL file, and can be imported as shown above.
|
||||
|
||||
A **module** is a way of organizing QL code by grouping together related predicates, classes and (sub-)modules; either explicitly declared or implicit. A query library implicitly declares a module with the same name as the QLL file.
|
||||
A **module** is a way of organizing QL code by grouping together related predicates, classes, and (sub-)modules. They can be either explicitly declared or implicit. A query library implicitly declares a module with the same name as the QLL file.
|
||||
|
||||
For further information on libraries and modules in QL, see the chapter on `Modules <https://help.semmle.com/QL/ql-handbook/modules.html>`__ in the QL language handbook.
|
||||
|
||||
@@ -250,7 +250,7 @@ Data flow graph
|
||||
|
||||
``localFlowStep`` is the “single step” flow relation–that is it describes single edges in the local data flow graph. ``localFlow`` represents the `transitive <https://help.semmle.com/QL/ql-handbook/recursion.html#transitive-closures>`__ closure of this relation–in other words, it contains every pair of nodes where the second node is reachable from the first in the data flow graph.
|
||||
|
||||
The data flow graph is completely separate from the `AST <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`__, to allow for flexibility in how data flow is modeled. There are a small number of data flow node types–expression nodes, parameter nodes, uninitialized variable nodes, and definition by reference nodes. Each node provides mapping functions to and from the relevant AST (for example ``Expr``, ``Parameter`` etc.) or symbol table (e.g. ``Variable``) classes.
|
||||
The data flow graph is separate from the `AST <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`__, to allow for flexibility in how data flow is modeled. There are a small number of data flow node types–expression nodes, parameter nodes, uninitialized variable nodes, and definition by reference nodes. Each node provides mapping functions to and from the relevant AST (for example ``Expr``, ``Parameter`` etc.) or symbol table (for example ``Variable``) classes.
|
||||
|
||||
Taint-tracking
|
||||
==============
|
||||
@@ -270,9 +270,9 @@ Taint-tracking
|
||||
|
||||
Taint tracking can be thought of as another type of data flow graph. It usually extends the standard data flow graph for a problem by adding edges between nodes where one one node influences or *taints* another.
|
||||
|
||||
The `API <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/TaintTracking.qll/module.TaintTracking.html>`__ is almost identical to that of the local data flow; all we need to do to switch to taint tracking is ``import semmle.code.cpp.dataflow.TaintTracking`` instead of ``semmle.code.cpp.dataflow.DataFlow``, and instead of using ``localFlow``, we use ``localTaint``.
|
||||
The `API <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/TaintTracking.qll/module.TaintTracking.html>`__ is almost identical to that of the local data flow. All we need to do to switch to taint tracking is ``import semmle.code.cpp.dataflow.TaintTracking`` instead of ``semmle.code.cpp.dataflow.DataFlow``, and instead of using ``localFlow``, we use ``localTaint``.
|
||||
|
||||
Exercise: Source Nodes
|
||||
Exercise: source nodes
|
||||
======================
|
||||
|
||||
Define a subclass of ``DataFlow::Node`` representing “source” nodes, that is, nodes without a (local) data flow predecessor.
|
||||
@@ -329,5 +329,5 @@ Beyond local data flow
|
||||
|
||||
- Results are still underwhelming.
|
||||
- Dealing with parameter passing becomes cumbersome.
|
||||
- Instead, let’s turn the problem around and find user-controlled data that flows into a printf format argument, potentially through calls.
|
||||
- This needs global data flow.
|
||||
- Instead, let’s turn the problem around and find user-controlled data that flows into a ``printf`` format argument, potentially through calls.
|
||||
- This needs **global data flow**.
|
||||
|
||||
@@ -59,20 +59,20 @@ Global data flow and taint tracking
|
||||
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
|
||||
|
||||
- For global data flow (and taint tracking), we must therefore provided restrictions to ensure the problem is tractable.
|
||||
- Typically, this involves specifying the “source” and “sink”.
|
||||
- Typically, this involves specifying the *source* and *sink*.
|
||||
|
||||
.. note::
|
||||
|
||||
As we mentioned in the previous slide deck, while local dataflow is feasible to compute for all functions in a snapshot, global dataflow is not. This is because the number of paths becomes exponentially larger for global dataflow.
|
||||
As we mentioned in the previous slide deck, while local data flow is feasible to compute for all functions in a snapshot, global data flow is not. This is because the number of paths becomes exponentially larger for global data flow.
|
||||
|
||||
The global dataflow (and taint tracking) avoids this problem by requiring that the query author specifies which ``sources`` and ``sinks`` are applicable. This allows the implementation to compute paths between the restricted set of nodes, rather than the full graph.
|
||||
The global data flow (and taint tracking) avoids this problem by requiring that the query author specifies which ``sources`` and ``sinks`` are applicable. This allows the implementation to compute paths between the restricted set of nodes, rather than the full graph.
|
||||
|
||||
Global taint tracking library
|
||||
=============================
|
||||
|
||||
The semmle.code.cpp.dataflow.TaintTracking library provides a framework for implementing solvers for global taint tracking problems:
|
||||
The ``semmle.code.cpp.dataflow.TaintTracking`` library provides a framework for implementing solvers for global taint tracking problems:
|
||||
|
||||
#. Subclass TaintTracking::Configuration following this template:
|
||||
#. Subclass ``TaintTracking::Configuration`` following this template:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
@@ -82,7 +82,7 @@ The semmle.code.cpp.dataflow.TaintTracking library provides a framework for impl
|
||||
override predicate isSink(DataFlow::Node nd) { … }
|
||||
}
|
||||
|
||||
#. Use Config.hasFlow(source, sink) to find inter-procedural paths.
|
||||
#. Use ``Config.hasFlow(source, sink)`` to find inter-procedural paths.
|
||||
|
||||
.. note::
|
||||
|
||||
@@ -96,7 +96,7 @@ Finding tainted format strings (outline)
|
||||
|
||||
.. note::
|
||||
|
||||
Here’s the outline for a inter-procedural (i.e. “global”) version of the tainted formatting strings query we saw in the previous slide deck. The same template will be applicable for most taint tracking problems.
|
||||
Here’s the outline for a inter-procedural (that is “global”) version of the tainted formatting strings query we saw in the previous slide deck. The same template will be applicable for most taint tracking problems.
|
||||
|
||||
Defining sources
|
||||
================
|
||||
@@ -118,7 +118,7 @@ The library class ``SecurityOptions`` provides a (configurable) model of what co
|
||||
|
||||
.. note::
|
||||
|
||||
We first define what it means to be a ``source`` of tainted data for this particular problem. In this case, what we care about is whether the format string can be provided by an external user to our application or service. As there are many such ways external data could be introduced into the system, the standard QL libraries for C/C++ include an extensible API for modelling user input. In this case, we will simply use the pre-defined set of “user inputs”, which includes arguments provided to command line applications.
|
||||
We first define what it means to be a *source* of tainted data for this particular problem. In this case, what we care about is whether the format string can be provided by an external user to our application or service. As there are many such ways external data could be introduced into the system, the standard QL libraries for C/C++ include an extensible API for modelling user input. In this case, we will simply use the predefined set of *user inputs*, which includes arguments provided to command line applications.
|
||||
|
||||
|
||||
Defining sinks (exercise)
|
||||
@@ -167,7 +167,8 @@ Use the ``FormattingFunction`` class to fill in the definition of “isSink”
|
||||
Path queries
|
||||
============
|
||||
|
||||
Provide information about the identified paths from sources to sinks; can be examined in Path Explorer view.
|
||||
Path queries provide information about the identified paths from sources to sinks. Paths can be examined in Path Explorer view.
|
||||
|
||||
Use this template:
|
||||
|
||||
.. code-block:: ql
|
||||
@@ -186,7 +187,7 @@ Use this template:
|
||||
|
||||
.. note::
|
||||
|
||||
In order to see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to work - we need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
|
||||
To see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to work - we need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
|
||||
|
||||
Defining additional taint steps
|
||||
===============================
|
||||
|
||||
@@ -34,10 +34,10 @@ The basic representation of an analyzed program is an *abstract syntax tree (AST
|
||||
|
||||
The “Introducing the C/C++ libraries” help topic contains a more complete overview of important AST classes and the rest of the C++ QL libraries: https://help.semmle.com/QL/learn-ql/ql/cpp/introduce-libraries-cpp.html
|
||||
|
||||
Database representaions of ASTs
|
||||
===============================
|
||||
Database representations of ASTs
|
||||
================================
|
||||
|
||||
AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaque; all one can do with them is to check their equality.
|
||||
AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaque - all one can do with them is to check their equality.
|
||||
|
||||
Each entity belongs to an entity type. Entity types have names starting with “@” and are defined in the database schema (not in QL).
|
||||
|
||||
@@ -47,7 +47,7 @@ Entity types are rarely used directly, the usual pattern is to define a QL class
|
||||
|
||||
.. note::
|
||||
|
||||
ASTs are a typical example of the kind of data representation one finds in object-oriented programming, with data-carrying nodes that reference each other. At first glance, QL, which can only work with atomic values, does not seem to be well suited for working with this kind of data. However, ultimately all that we require of the nodes in an AST is that they have an identity. The relationships among nodes, usually implemented by reference-valued object fields in other languages, can just as well (and arguably more naturally) be represented as relations over nodes. Attaching data (such as strings or numbers) to nodes can also be represented with relations over nodes and primitive values. All we need is a way for relations to reference nodes. This is achieved in QL (as in other database languages) by means of entity values (or “entities”, for short), which are opaque atomic values, implemented as integers under the hood.
|
||||
ASTs are a typical example of the kind of data representation one finds in object-oriented programming, with data-carrying nodes that reference each other. At first glance, QL, which can only work with atomic values, does not seem to be well suited for working with this kind of data. However, ultimately all that we require of the nodes in an AST is that they have an identity. The relationships among nodes, usually implemented by reference-valued object fields in other languages, can just as well (and arguably more naturally) be represented as relations over nodes. Attaching data (such as strings or numbers) to nodes can also be represented with relations over nodes and primitive values. All we need is a way for relations to reference nodes. This is achieved in QL (as in other database languages) by means of entity values (or entities, for short), which are opaque atomic values, implemented as integers under the hood.
|
||||
|
||||
It is the job of the extractor to create entity values for all AST nodes and populate database relations that encode the relationship between AST nodes and any values associated with them. These relations are extensional, that is, explicitly stored in the database, unlike the relations described by QL predicates, which we also refer to as intensional relations. Entity values belong to entity types, whose name starts with “@” to set them apart from primitive types and classes.
|
||||
|
||||
@@ -137,7 +137,7 @@ there are no AST nodes corresponding to ``square(y0)`` or ``square(z0)``, but th
|
||||
|
||||
The C preprocessor poses a dilemma: un-preprocessed code cannot, in general, be parsed and analyzed meaningfully, but showing results in preprocessed code is not useful to developers. Our solution is to base the AST representation on preprocessed source (in the same way as compilers do), but associate AST nodes with locations in the original source text.
|
||||
|
||||
Working with Macros
|
||||
Working with macros
|
||||
===================
|
||||
|
||||
.. code-block:: cpp
|
||||
|
||||
Reference in New Issue
Block a user