Merge branch 'docs-preparation' into merge-docs-preparation-master

This commit is contained in:
james
2020-04-08 08:24:17 +01:00
97 changed files with 2099 additions and 1452 deletions

View File

@@ -1,15 +1,14 @@
Tutorial: Conversions and classes
=================================
Conversions and classes in C and C++
====================================
Overview
--------
This topic contains worked examples of how to write queries using the CodeQL library classes for C/C++ conversions and classes.
You can use the standard CodeQL libraries for C and C++ to detect when the type of an expression is changed.
Conversions
-----------
Let us take a look at the ``Conversion`` class in the standard library:
In C and C++, conversions change the type of an expression. They may be implicit conversions generated by the compiler, or explicit conversions requested by the user.
Let's take a look at the `Conversion <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/exprs/Cast.qll/type.Cast$Conversion.html>`__ class in the standard library:
- ``Expr``
@@ -25,8 +24,6 @@ Let us take a look at the ``Conversion`` class in the standard library:
- ``ArrayToPointerConversion``
- ``VirtualMemberToFunctionPointerConversion``
All conversions change the type of an expression. They may be implicit conversions (generated by the compiler) or explicit conversions (requested by the user).
Exploring the subexpressions of an assignment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -166,7 +163,7 @@ Our starting point for the query is pairs of a base class and a derived class, c
where derived.getABaseClass+() = base
select base, derived, "The second class is derived from the first."
`See this in the query console <https://lgtm.com/query/1505902347211/>`__
`See this in the query console on LGTM.com <https://lgtm.com/query/1505902347211/>`__
Note that the transitive closure symbol ``+`` indicates that ``Class.getABaseClass()`` may be followed one or more times, rather than only accepting a direct base class.
@@ -178,7 +175,7 @@ A lot of the results are uninteresting template parameters. You can remove those
and not exists(base.getATemplateArgument())
and not exists(derived.getATemplateArgument())
`See this in the query console <https://lgtm.com/query/1505907047251/>`__
`See this in the query console on LGTM.com <https://lgtm.com/query/1505907047251/>`__
Finding derived classes with destructors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -197,7 +194,7 @@ Now we can extend the query to find derived classes with destructors, using the
and d2 = derived.getDestructor()
select base, derived, "The second class is derived from the first, and both have a destructor."
`See this in the query console <https://lgtm.com/query/1505901767389/>`__
`See this in the query console on LGTM.com <https://lgtm.com/query/1505901767389/>`__
Notice that getting the destructor implicitly asserts that one exists. As a result, this version of the query returns fewer results than before.
@@ -217,17 +214,17 @@ Our last change is to use ``Function.isVirtual()`` to find cases where the base
and not d1.isVirtual()
select d1, "This destructor should probably be virtual."
`See this in the query console <https://lgtm.com/query/1505908156827/>`__
`See this in the query console on LGTM.com <https://lgtm.com/query/1505908156827/>`__
That completes the query.
There is a similar built-in LGTM `query <https://lgtm.com/rules/2158670642/>`__ that finds classes in a C/C++ project with virtual functions but no virtual destructor. You can take a look at the code for this query by clicking **Open in query console** at the top of that page.
There is a similar built-in `query <https://lgtm.com/rules/2158670642/>`__ on LGTM.com that finds classes in a C/C++ project with virtual functions but no virtual destructor. You can take a look at the code for this query by clicking **Open in query console** at the top of that page.
What next?
----------
Further reading
---------------
- Explore other ways of querying classes using examples from the `C/C++ cookbook <https://help.semmle.com/wiki/label/CBCPP/class>`__.
- Take a look at the :doc:`Analyzing data flow in C/C++ <dataflow>` tutorial.
- Try the worked examples in the following topics: :doc:`Example: Checking that constructors initialize all private fields <private-field-initialization>`, and :doc:`Example: Checking for allocations equal to 'strlen(string)' without space for a null terminator <zero-space-terminator>`.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.
- Take a look at the :doc:`Analyzing data flow in C and C++ <dataflow>` tutorial.
- Try the worked examples in the following topics: :doc:`Refining a query to account for edge cases <private-field-initialization>`, and :doc:`Detecting a potential buffer overflow <zero-space-terminator>`.
- Find out more about QL in the `QL language reference <https://help.semmle.com/QL/ql-handbook/index.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__ on LGTM.com.

View File

@@ -1,13 +1,12 @@
Analyzing data flow in C/C++
============================
Analyzing data flow in C and C++
================================
Overview
--------
You can use data flow analysis to track the flow of potentially malicious or insecure data that can cause vulnerabilities in your codebase.
This topic describes how data flow analysis is implemented in the CodeQL libraries for C/C++ and includes examples to help you write your own data flow queries.
The following sections describe how to utilize the libraries for local data flow, global data flow, and taint tracking.
About data flow
---------------
For a more general introduction to modeling data flow, see :doc:`Introduction to data flow analysis with CodeQL <../intro-to-data-flow>`.
Data flow analysis computes the possible values that a variable can hold at various points in a program, determining how those values propagate through the program, and where they are used. In CodeQL, you can model both local data flow and global data flow. For a more general introduction to modeling data flow, see :doc:`About data flow analysis <../intro-to-data-flow>`.
Local data flow
---------------
@@ -296,12 +295,12 @@ Exercise 3: Write a class that represents flow sources from ``getenv``. (`Answer
Exercise 4: Using the answers from 2 and 3, write a query which finds all global data flows from ``getenv`` to ``gethostbyname``. (`Answer <#exercise-4>`__)
What next?
----------
Further reading
---------------
- Try the worked examples in the following topics: :doc:`Example: Checking that constructors initialize all private fields <private-field-initialization>` and :doc:`Example: Checking for allocations equal to 'strlen(string)' without space for a null terminator <zero-space-terminator>`.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.
- Try the worked examples in the following topics: :doc:`Refining a query to account for edge cases <private-field-initialization>` and :doc:`Detecting a potential buffer overflow <zero-space-terminator>`.
- Find out more about QL in the `QL language reference <https://help.semmle.com/QL/ql-handbook/index.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__ on LGTM.com.
Answers
-------

View File

@@ -1,13 +1,10 @@
Tutorial: Expressions, types and statements
===========================================
Expressions, types, and statements in C and C++
===============================================
Overview
--------
You can use CodeQL to explore expressions, types, and statements in C and C++ code to find, for example, incorrect assignments.
This topic contains worked examples of how to write queries using the standard CodeQL library classes for C/C++ expressions, types, and statements.
Expressions and types
---------------------
Expressions and types in CodeQL
-------------------------------
Each part of an expression in C becomes an instance of the ``Expr`` class. For example, the C code ``x = x + 1`` becomes an ``AssignExpr``, an ``AddExpr``, two instances of ``VariableAccess`` and a ``Literal``. All of these CodeQL classes extend ``Expr``.
@@ -24,7 +21,7 @@ In the following example we find instances of ``AssignExpr`` which assign the co
where e.getRValue().getValue().toInt() = 0
select e, "Assigning the value 0 to something."
`See this in the query console <https://lgtm.com/query/1505908086530/>`__
`See this in the query console on LGTM.com <https://lgtm.com/query/1505908086530/>`__
The ``where`` clause in this example gets the expression on the right side of the assignment, ``getRValue()``, and compares it with zero. Notice that there are no checks to make sure that the right side of the assignment is an integer or that it has a value (that is, it is compile-time constant, rather than a variable). For expressions where either of these assumptions is wrong, the associated predicate simply does not return anything and the ``where`` clause will not produce a result. You could think of it as if there is an implicit ``exists(e.getRValue().getValue().toInt())`` at the beginning of this line.
@@ -34,7 +31,7 @@ It is also worth noting that the query above would find this C code:
yPtr = NULL;
This is because the database contains a representation of the code base after the preprocessor transforms have run (for more information, see `Database generation <https://lgtm.com/help/lgtm/generate-database>`__). This means that any macro invocations, such as the ``NULL`` define used here, are expanded during the creation of the database. If you want to write queries about macros then there are some special library classes that have been designed specifically for this purpose (for example, the ``Macro``, ``MacroInvocation`` classes and predicates like ``Element.isInMacroExpansion()``). In this case, it is good that macros are expanded, but we do not want to find assignments to pointers.
This is because the database contains a representation of the code base after the preprocessor transforms have run. This means that any macro invocations, such as the ``NULL`` define used here, are expanded during the creation of the database. If you want to write queries about macros then there are some special library classes that have been designed specifically for this purpose (for example, the ``Macro``, ``MacroInvocation`` classes and predicates like ``Element.isInMacroExpansion()``). In this case, it is good that macros are expanded, but we do not want to find assignments to pointers. For more information, see `Database generation <https://lgtm.com/help/lgtm/generate-database>`__ on LGTM.com.
Finding assignments of 0 to an integer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -50,7 +47,7 @@ We can make the query more specific by defining a condition for the left side of
and e.getLValue().getType().getUnspecifiedType() instanceof IntegralType
select e, "Assigning the value 0 to an integer."
`See this in the query console <https://lgtm.com/query/1505906986578/>`__
`See this in the query console on LGTM.com <https://lgtm.com/query/1505906986578/>`__
This checks that the left side of the assignment has a type that is some kind of integer. Note the call to ``Type.getUnspecifiedType()``. This resolves ``typedef`` types to their underlying types so that the query finds assignments like this one:
@@ -61,8 +58,8 @@ This checks that the left side of the assignment has a type that is some kind of
i = 0;
Statements
----------
Statements in CodeQL
--------------------
We can refine the query further using statements. In this case we use the class ``ForStmt``:
@@ -110,7 +107,7 @@ Unfortunately this would not quite work, because the loop initialization is actu
and e.getLValue().getType().getUnspecifiedType() instanceof IntegralType
select e, "Assigning the value 0 to an integer, inside a for loop initialization."
`See this in the query console <https://lgtm.com/query/1505909016965/>`__
`See this in the query console on LGTM.com <https://lgtm.com/query/1505909016965/>`__
Finding assignments of 0 within the loop body
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -128,14 +125,14 @@ We can find assignments inside the loop body using similar code with the predica
and e.getLValue().getType().getUnderlyingType() instanceof IntegralType
select e, "Assigning the value 0 to an integer, inside a for loop body."
`See this in the query console <https://lgtm.com/query/1505901437190/>`__
`See this in the query console on LGTM.com <https://lgtm.com/query/1505901437190/>`__
Note that we replaced ``e.getEnclosingStmt()`` with ``e.getEnclosingStmt().getParentStmt*()``, to find an assignment expression that is deeply nested inside the loop body. The transitive closure modifier ``*`` here indicates that ``Stmt.getParentStmt()`` may be followed zero or more times, rather than just once, giving us the statement, its parent statement, its parent's parent statement etc.
What next?
----------
Further reading
---------------
- Explore other ways of finding types and statements using examples from the C/C++ cookbook for `types <https://help.semmle.com/wiki/label/CBCPP/type>`__ and `statements <https://help.semmle.com/wiki/label/CBCPP/statement>`__.
- Take a look at the :doc:`Conversions and classes <conversions-classes>` and :doc:`Analyzing data flow in C/C++ <dataflow>` tutorials.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.
- Take a look at the :doc:`Conversions and classes in C and C++ <conversions-classes>` and :doc:`Analyzing data flow in C and C++ <dataflow>` tutorials.
- Find out more about QL in the `QL language reference <https://help.semmle.com/QL/ql-handbook/index.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__ on LGTM.com.

View File

@@ -1,10 +1,12 @@
Tutorial: Function classes
==========================
Functions in C and C++
=======================
You can use CodeQL to explore functions in C and C++ code.
Overview
--------
The standard CodeQL library for C and C++ represents functions using the ``Function`` class (see :doc:`Introducing the C/C++ libraries <introduce-libraries-cpp>`).
The standard CodeQL library for C and C++ represents functions using the ``Function`` class (see :doc:`CodeQL libraries for C and C++ <introduce-libraries-cpp>`).
The example queries in this topic explore some of the most useful library predicates for querying functions.
@@ -26,7 +28,7 @@ This query is very general, so there are probably too many results to be interes
Finding functions that are not called
-------------------------------------
It might be more interesting to find functions that are not called, using the standard CodeQL ``FunctionCall`` class from the **abstract syntax tree** category (see :doc:`Introducing the C/C++ libraries <introduce-libraries-cpp>`). The ``FunctionCall`` class can be used to identify places where a function is actually used, and it is related to ``Function`` through the ``FunctionCall.getTarget()`` predicate.
It might be more interesting to find functions that are not called, using the standard CodeQL ``FunctionCall`` class from the **abstract syntax tree** category (see :doc:`CodeQL libraries for C and C++ <introduce-libraries-cpp>`). The ``FunctionCall`` class can be used to identify places where a function is actually used, and it is related to ``Function`` through the ``FunctionCall.getTarget()`` predicate.
.. code-block:: ql
@@ -36,7 +38,7 @@ It might be more interesting to find functions that are not called, using the st
where not exists(FunctionCall fc | fc.getTarget() = f)
select f, "This function is never called."
`See this in the query console <https://lgtm.com/query/1505891246456/>`__
`See this in the query console on LGTM.com <https://lgtm.com/query/1505891246456/>`__
The new query finds functions that are not the target of any ``FunctionCall``—in other words, functions that are never called. You may be surprised by how many results the query finds. However, if you examine the results, you can see that many of the functions it finds are used indirectly. To create a query that finds only unused functions, we need to refine the query and exclude other ways of using a function.
@@ -54,7 +56,7 @@ You can modify the query to remove functions where a function pointer is used to
and not exists(FunctionAccess fa | fa.getTarget() = f)
select f, "This function is never called, or referenced with a function pointer."
`See this in the query console <https://lgtm.com/query/1505890446605/>`__
`See this in the query console on LGTM.com <https://lgtm.com/query/1505890446605/>`__
This query returns fewer results. However, if you examine the results then you can probably still find potential refinements.
@@ -76,7 +78,7 @@ This query uses ``Function`` and ``FunctionCall`` to find calls to the function
and not fc.getArgument(1) instanceof StringLiteral
select fc, "sprintf called with variable format string."
`See this in the query console <https://lgtm.com/query/1505889506751/>`__
`See this in the query console on LGTM.com <https://lgtm.com/query/1505889506751/>`__
This uses:
@@ -87,10 +89,10 @@ Note that we could have used ``Declaration.getName()``, but ``Declaration.getQua
The LGTM version of this query is considerably more complicated, but if you look carefully you will find that its structure is the same. See `Non-constant format string <https://lgtm.com/rules/2152810612/>`__ and click **Open in query console** at the top of the page.
What next?
----------
Further reading
---------------
- Explore other ways of finding functions using examples from the `C/C++ cookbook <https://help.semmle.com/wiki/label/CBCPP/function>`__.
- Take a look at some of the other tutorials: :doc:`Expressions, types and statements <expressions-types>`, :doc:`Conversions and classes <conversions-classes>`, and :doc:`Analyzing data flow in C/C++ <dataflow>`.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.
- Take a look at some other tutorials: :doc:`Expressions, types and statements in C and C++ <introduce-libraries-cpp>`, :doc:`Conversions and classes in C and C++ <conversions-classes>`, and :doc:`Analyzing data flow in C and C++ <dataflow>`.
- Find out more about QL in the `QL language reference <https://help.semmle.com/QL/ql-handbook/index.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__ on LGTM.com.

View File

@@ -1,8 +1,10 @@
Using the guards library in C and C++
=====================================
Overview
--------
You can use the CodeQL guards library to identify conditional expressions that control the execution of other parts of a program in C and C++ codebases.
About the guards library
------------------------
The guards library (defined in ``semmle.code.cpp.controlflow.Guards``) provides a class `GuardCondition <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/controlflow/Guards.qll/type.Guards$GuardCondition.html>`__ representing Boolean values that are used to make control flow decisions.
A ``GuardCondition`` is considered to guard a basic block if the block can only be reached if the ``GuardCondition`` is evaluated a certain way. For instance, in the following code, ``x < 10`` is a ``GuardCondition``, and it guards all the code before the return statement.
@@ -20,7 +22,7 @@ A ``GuardCondition`` is considered to guard a basic block if the block can only
The ``controls`` predicate
------------------------------------------------
--------------------------
The ``controls`` predicate helps determine which blocks are only run when the ``GuardCondition`` evaluates a certain way. ``guard.controls(block, testIsTrue)`` holds if ``block`` is only entered if the value of this condition is ``testIsTrue``.

View File

@@ -1,10 +1,13 @@
Introducing the CodeQL libraries for C/C++
==========================================
CodeQL library for C and C++
============================
Overview
--------
When analyzing C or C++ code, you can use the large collection of classes in the CodeQL library for C and C++.
There is an extensive library for analyzing CodeQL databases extracted from C/C++ projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks. The library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``cpp.qll`` imports all the core C/C++ library modules, so you can include the complete library by beginning your query with:
About the CodeQL library for C and C++
--------------------------------------
There is an extensive library for analyzing CodeQL databases extracted from C/C++ projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks.
The library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``cpp.qll`` imports all the core C/C++ library modules, so you can include the complete library by beginning your query with:
.. code-block:: ql
@@ -12,9 +15,7 @@ There is an extensive library for analyzing CodeQL databases extracted from C/C+
The rest of this topic summarizes the available CodeQL classes and corresponding C/C++ constructs.
NOTE: You can find related classes and features using the query console's auto-complete feature. You can also press *F3* to jump to the definition of any element; library files are opened in new tabs in the console.
Summary of the library classes
Commonly-used library classes
------------------------------
The most commonly used standard library classes are listed below. The listing is broken down by functionality. Each library class is annotated with a C/C++ construct it corresponds to.
@@ -521,9 +522,9 @@ This table lists `Preprocessor <https://help.semmle.com/qldoc/cpp/semmle/code/cp
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
What next?
----------
Further reading
---------------
- Experiment with the worked examples in the CodeQL for C/C++ topics: :doc:`Function classes <function-classes>`, :doc:`Expressions, types and statements <expressions-types>`, :doc:`Conversions and classes <conversions-classes>`, and :doc:`Analyzing data flow in C/C++ <dataflow>`.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.
- Experiment with the worked examples in the CodeQL for C/C++ topics: :doc:`Functions in C and C++ <function-classes>`, :doc:`Expressions, types, and statements in C and C++ <expressions-types>`, :doc:`Conversions and classes in C and C++ <conversions-classes>`, and :doc:`Analyzing data flow in C and C++ <dataflow>`.
- Find out more about QL in the `QL language reference <https://help.semmle.com/QL/ql-handbook/index.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__ on LGTM.com.

View File

@@ -1,13 +1,15 @@
Example: Checking that constructors initialize all private fields
=================================================================
Refining a query to account for edge cases
==========================================
You can improve the results generated by a CodeQL query by adding conditions to remove false positive results caused by common edge cases.
Overview
--------
This topic describes how a C++ query was developed. The example introduces recursive predicates and demonstrates the typical workflow used to refine a query. For a full overview of the topics available for learning to write queries for C/C++ code, see :doc:`CodeQL for C/C++ <ql-for-cpp>`.
This topic describes how a C++ query was developed. The example introduces recursive predicates and demonstrates the typical workflow used to refine a query. For a full overview of the topics available for learning to write queries for C/C++ code, see :doc:`CodeQL for C and C++ <ql-for-cpp>`.
Problem—finding every private field and checking for initialization
-------------------------------------------------------------------
Finding every private field and checking for initialization
-----------------------------------------------------------
Writing a query to check if a constructor initializes all private fields seems like a simple problem, but there are several edge cases to account for.
@@ -100,7 +102,7 @@ You may also wish to consider methods called by constructors that assign to the
int m_value;
};
This case can be excluded by creating a recursive predicate. The recursive predicate is given a function and a field, then checks whether the function assigns to the field. The predicate runs itself on all the functions called by the function that it has been given. By passing the constructor to this predicate, we can check for assignments of a field in all functions called by the constructor, and then do the same for all functions called by those functions all the way down the tree of function calls (see `Recursion <https://help.semmle.com/QL/ql-handbook/recursion.html>`__ for more information).
This case can be excluded by creating a recursive predicate. The recursive predicate is given a function and a field, then checks whether the function assigns to the field. The predicate runs itself on all the functions called by the function that it has been given. By passing the constructor to this predicate, we can check for assignments of a field in all functions called by the constructor, and then do the same for all functions called by those functions all the way down the tree of function calls. For more information, see `Recursion <https://help.semmle.com/QL/ql-handbook/recursion.html>`__ in the QL language reference.
.. code-block:: ql
@@ -124,7 +126,7 @@ This case can be excluded by creating a recursive predicate. The recursive predi
Refinement 4—simplifying the query
----------------------------------
Finally we can simplify the query by using the `transitive closure operator <https://help.semmle.com/QL/ql-handbook/recursion.html#transitive-closures>`__. In this final version of the query, ``c.calls*(fun)`` resolves to the set of all functions that are ``c`` itself, are called by ``c``, are called by a function that is called by ``c``, and so on. This eliminates the need to make a new predicate all together.
Finally we can simplify the query by using the transitive closure operator. In this final version of the query, ``c.calls*(fun)`` resolves to the set of all functions that are ``c`` itself, are called by ``c``, are called by a function that is called by ``c``, and so on. This eliminates the need to make a new predicate all together. For more information, see `Transitive closures <https://help.semmle.com/QL/ql-handbook/recursion.html#transitive-closures>`__ in the QL language reference.
.. code-block:: ql
@@ -142,11 +144,11 @@ Finally we can simplify the query by using the `transitive closure operator <htt
and exists(c.getBlock())
select c, "Constructor does not initialize fields $@.", f, f.getName()
`See this in the query console <https://lgtm.com/query/1505896968215/>`__
`See this in the query console on LGTM.com <https://lgtm.com/query/1505896968215/>`__
What next?
----------
Further reading
---------------
- Take a look at another example: :doc:`Checking for allocations equal to 'strlen(string)' without space for a null terminator <zero-space-terminator>`.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.
- Take a look at another example: :doc:`Detecting a potential buffer overflow <zero-space-terminator>`.
- Find out more about QL in the `QL language reference <https://help.semmle.com/QL/ql-handbook/index.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__ on LGTM.com.

View File

@@ -1,8 +1,9 @@
CodeQL for C/C++
================
CodeQL for C and C++
====================
Experiment and learn how to write effective and efficient queries for CodeQL databases generated from C and C++ codebases.
.. toctree::
:glob:
:hidden:
introduce-libraries-cpp
@@ -12,47 +13,30 @@ CodeQL for C/C++
dataflow
private-field-initialization
zero-space-terminator
These topics provide an overview of the CodeQL libraries for C/C++ and show examples of how to write queries that use them.
- `Basic C/C++ query <https://lgtm.com/help/lgtm/console/ql-cpp-basic-example>`__ describes how to write and run queries using LGTM.
- :doc:`Introducing the CodeQL libraries for C/C++ <introduce-libraries-cpp>` introduces the standard libraries used to write queries for C and C++ code.
- :doc:`Tutorial: Function classes <function-classes>` demonstrates how to write queries using the standard CodeQL library classes for C/C++ functions.
- :doc:`Tutorial: Expressions, types and statements <expressions-types>` demonstrates how to write queries using the standard CodeQL library classes for C/C++ expressions, types and statements.
- :doc:`Tutorial: Conversions and classes <conversions-classes>` demonstrates how to write queries using the standard CodeQL library classes for C/C++ conversions and classes.
- :doc:`Tutorial: Analyzing data flow in C/C++ <dataflow>` demonstrates how to write queries using the standard data flow and taint tracking libraries for C/C++.
- :doc:`Example: Checking that constructors initialize all private fields <private-field-initialization>` works through the development of a query. It introduces recursive predicates and shows the typical workflow used to refine a query.
- :doc:`Example: Checking for allocations equal to strlen(string) without space for a null terminator <zero-space-terminator>` shows how a query to detect this particular buffer issue was developed.
Advanced libraries
----------------------------------
.. toctree::
:hidden:
guards
range-analysis
value-numbering-hash-cons
- :doc:`Using the guards library in C and C++ <guards>` demonstrates how to identify conditional expressions that control the execution of other code and what guarantees they provide.
- :doc:`Using range analysis for C and C++ <range-analysis>` demonstrates how to determine constant upper and lower bounds and possible overflow or underflow of expressions.
- `Basic C/C++ query <https://lgtm.com/help/lgtm/console/ql-cpp-basic-example>`__: Learn to write and run a simple CodeQL query using LGTM.
- :doc:`Using hash consing and value numbering for C and C++ <value-numbering-hash-cons>` demonstrates how to recognize expressions that are syntactically identical or compute the same value at runtime.
- :doc:`CodeQL library for C and C++ <introduce-libraries-cpp>`: When analyzing C or C++ code, you can use the large collection of classes in the CodeQL library for C and C++.
- :doc:`Functions in C and C++ <function-classes>`: You can use CodeQL to explore functions in C and C++ code.
Other resources
- :doc:`Expressions, types, and statements in C and C++ <expressions-types>`: You can use CodeQL to explore expressions, types, and statements in C and C++ code to find, for example, incorrect assignments.
- :doc:`Conversions and classes in C and C++ <conversions-classes>`: You can use the standard CodeQL libraries for C and C++ to detect when the type of an expression is changed.
- :doc:`Analyzing data flow in C and C++ <dataflow>`: You can use data flow analysis to track the flow of potentially malicious or insecure data that can cause vulnerabilities in your codebase.
- :doc:`Refining a query to account for edge cases <private-field-initialization>`: You can improve the results generated by a CodeQL query by adding conditions to remove false positive results caused by common edge cases.
- :doc:`Detecting a potential buffer overflow <zero-space-terminator>`: You can use CodeQL to detect potential buffer overflows by checking for allocations equal to ``strlen`` in C and C++.
Further reading
---------------
.. TODO: Rename the cookbooks: C/C++ cookbook, or C/C++ CodeQL cookbook, or CodeQL cookbook for C/C++, or...?
- For examples of how to query common C/C++ elements, see the `C/C++ cookbook <https://help.semmle.com/wiki/display/CBCPP>`__.
- For the queries used in LGTM, display a `C/C++ query <https://lgtm.com/search?q=language%3Acpp&t=rules>`__ and click **Open in query console** to see the code used to find alerts.
- For more information about the library for C/C++ see the `CodeQL library for C/C++ <https://help.semmle.com/qldoc/cpp>`__.

View File

@@ -1,10 +1,10 @@
Using range analysis for C and C++
==================================
Overview
--------
You can use range analysis to determine the upper or lower bounds on an expression, or whether an expression could potentially over or underflow.
Range analysis determines upper and lower bounds for an expression.
About the range analysis library
--------------------------------
The range analysis library (defined in ``semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis``) provides a set of predicates for determining constant upper and lower bounds on expressions, as well as recognizing integer overflows. For performance, the library performs automatic widening and therefore may not provide the tightest possible bounds.

View File

@@ -1,15 +1,14 @@
Hash consing and value numbering
=================================================
================================
Overview
--------
You can use specialized CodeQL libraries to recognize expressions that are syntactically identical or compute the same value at runtime in C and C++ codebases.
About the hash consing and value numbering libraries
----------------------------------------------------
In C and C++ databases, each node in the abstract syntax tree is represented by a separate object. This allows both analysis and results display to refer to specific appearances of a piece of syntax. However, it is frequently useful to determine whether two expressions are equivalent, either syntactically or semantically.
The `hash consing <https://en.wikipedia.org/wiki/Hash_consing>`__ library (defined in ``semmle.code.cpp.valuenumbering.HashCons``) provides a mechanism for identifying expressions that have the same syntactic structure. The `global value numbering <https://en.wikipedia.org/wiki/Value_numbering>`__ library (defined in ``semmle.code.cpp.valuenumbering.GlobalValueNumbering``) provides a mechanism for identifying expressions that compute the same value at runtime.
Both libraries partition the expressions in each function into equivalence classes represented by objects. Each ``HashCons`` object represents a set of expressions with identical parse trees, while ``GVN`` objects represent sets of expressions that will always compute the same value.
The hash consing library (defined in ``semmle.code.cpp.valuenumbering.HashCons``) provides a mechanism for identifying expressions that have the same syntactic structure. The global value numbering library (defined in ``semmle.code.cpp.valuenumbering.GlobalValueNumbering``) provides a mechanism for identifying expressions that compute the same value at runtime. Both libraries partition the expressions in each function into equivalence classes represented by objects. Each ``HashCons`` object represents a set of expressions with identical parse trees, while ``GVN`` objects represent sets of expressions that will always compute the same value. For more information, see `Hash consing <https://en.wikipedia.org/wiki/Hash_consing>`__ and `Value numbering <https://en.wikipedia.org/wiki/Value_numbering>`__ on Wikipedia.
Example C code
--------------
@@ -111,4 +110,3 @@ Example query
hashCons(outer.getCondition()) = hashCons(inner.getCondition())
select inner.getCondition(), "The condition of this if statement duplicates the condition of $@",
outer.getCondition(), "an enclosing if statement"

View File

@@ -1,10 +1,7 @@
Example: Checking for allocations equal to ``strlen(string)`` without space for a null terminator
=================================================================================================
Detecting a potential buffer overflow
=====================================
Overview
--------
This topic describes how a C/C++ query for detecting a potential buffer overflow was developed. For a full overview of the topics available for learning to write queries for C/C++ code, see :doc:`CodeQL for C/C++ <ql-for-cpp>`.
You can use CodeQL to detect potential buffer overflows by checking for allocations equal to ``strlen`` in C and C++. This topic describes how a C/C++ query for detecting a potential buffer overflow was developed.
Problem—detecting memory allocation that omits space for a null termination character
-------------------------------------------------------------------------------------
@@ -98,7 +95,7 @@ When you have defined the basic query then you can refine the query to include f
Improving the query using the 'SSA' library
-------------------------------------------
The ``SSA`` library represents variables in `static single assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`__ (SSA) form. In this form, each variable is assigned exactly once and every variable is defined before it is used. The use of SSA variables simplifies queries considerably as much of the local data flow analysis has been done for us.
The ``SSA`` library represents variables in static single assignment (SSA) form. In this form, each variable is assigned exactly once and every variable is defined before it is used. The use of SSA variables simplifies queries considerably as much of the local data flow analysis has been done for us. For more information, see `Static single assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`__ on Wikipedia.
Including examples where the string size is stored before use
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -224,8 +221,8 @@ The completed query will now identify cases where the result of ``strlen`` is st
where malloc.getAllocatedSize() instanceof StrlenCall
select malloc, "This allocation does not include space to null-terminate the string."
What next?
----------
Further reading
---------------
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.
- Find out more about QL in the `QL language reference <https://help.semmle.com/QL/ql-handbook/index.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__ on LGTM.com.