Merge pull request #2869 from jf205/codeql-migration-2163

CodeQL docs: update titles, add intros, and a few content updates
This commit is contained in:
James Fletcher
2020-03-11 14:29:56 +00:00
committed by GitHub
11 changed files with 75 additions and 105 deletions

View File

@@ -1,15 +1,14 @@
Tutorial: Conversions and classes
=================================
Conversions and classes in C and C++
====================================
Overview
--------
This topic contains worked examples of how to write queries using the CodeQL library classes for C/C++ conversions and classes.
You can use the standard CodeQL libraries for C and C++ to detect when the type of an expression is changed.
Conversions
-----------
Let us take a look at the ``Conversion`` class in the standard library:
In C and C++, conversions change the type of an expression. They may be implicit conversions generated by the compiler, or explicit conversions requested by the user.
Let's take a look at the `Conversion <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/exprs/Cast.qll/type.Cast$Conversion.html>`__ class in the standard library:
- ``Expr``
@@ -25,8 +24,6 @@ Let us take a look at the ``Conversion`` class in the standard library:
- ``ArrayToPointerConversion``
- ``VirtualMemberToFunctionPointerConversion``
All conversions change the type of an expression. They may be implicit conversions (generated by the compiler) or explicit conversions (requested by the user).
Exploring the subexpressions of an assignment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -221,13 +218,13 @@ Our last change is to use ``Function.isVirtual()`` to find cases where the base
That completes the query.
There is a similar built-in LGTM `query <https://lgtm.com/rules/2158670642/>`__ that finds classes in a C/C++ project with virtual functions but no virtual destructor. You can take a look at the code for this query by clicking **Open in query console** at the top of that page.
There is a similar built-in `query <https://lgtm.com/rules/2158670642/>`__ on LGTM.com that finds classes in a C/C++ project with virtual functions but no virtual destructor. You can take a look at the code for this query by clicking **Open in query console** at the top of that page.
What next?
----------
- Explore other ways of querying classes using examples from the `C/C++ cookbook <https://help.semmle.com/wiki/label/CBCPP/class>`__.
- Take a look at the :doc:`Analyzing data flow in C/C++ <dataflow>` tutorial.
- Try the worked examples in the following topics: :doc:`Example: Checking that constructors initialize all private fields <private-field-initialization>`, and :doc:`Example: Checking for allocations equal to 'strlen(string)' without space for a null terminator <zero-space-terminator>`.
- Take a look at the :doc:`Analyzing data flow in C and C++ <dataflow>` tutorial.
- Try the worked examples in the following topics: :doc:`Refining a query to account for edge cases <private-field-initialization>`, and :doc:`Detecting a potential buffer overflow <zero-space-terminator>`.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.

View File

@@ -1,13 +1,12 @@
Analyzing data flow in C/C++
============================
Analyzing data flow in C and C++
================================
Overview
--------
You can use data-flow analysis to track the flow of potentially malicious or insecure data that can cause vulnerabilities in your codebase.
This topic describes how data flow analysis is implemented in the CodeQL libraries for C/C++ and includes examples to help you write your own data flow queries.
The following sections describe how to utilize the libraries for local data flow, global data flow, and taint tracking.
About data flow
---------------
For a more general introduction to modeling data flow, see :doc:`Introduction to data flow analysis with CodeQL <../intro-to-data-flow>`.
Data flow analysis computes the possible values that a variable can hold at various points in a program, determining how those values propagate through the program, and where they are used. In CodeQL, you can model both local data flow and global data flow. For more background information, see :doc:`Introduction to data flow analysis with CodeQL <../intro-to-data-flow>`.
Local data flow
---------------
@@ -299,7 +298,7 @@ Exercise 4: Using the answers from 2 and 3, write a query which finds all global
What next?
----------
- Try the worked examples in the following topics: :doc:`Example: Checking that constructors initialize all private fields <private-field-initialization>` and :doc:`Example: Checking for allocations equal to 'strlen(string)' without space for a null terminator <zero-space-terminator>`.
- Try the worked examples in the following topics: :doc:`Refining a query to account for edge cases <private-field-initialization>` and :doc:`Detecting a potential buffer overflow <zero-space-terminator>`.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.

View File

@@ -1,13 +1,10 @@
Tutorial: Expressions, types and statements
===========================================
Expressions, types, and statements in C and C++
===============================================
Overview
--------
You can use CodeQL to explore expressions, types, and statements in C and C++ code to find, for example, incorrect assignments.
This topic contains worked examples of how to write queries using the standard CodeQL library classes for C/C++ expressions, types, and statements.
Expressions and types
---------------------
Expressions and types in CodeQL
-------------------------------
Each part of an expression in C becomes an instance of the ``Expr`` class. For example, the C code ``x = x + 1`` becomes an ``AssignExpr``, an ``AddExpr``, two instances of ``VariableAccess`` and a ``Literal``. All of these CodeQL classes extend ``Expr``.
@@ -34,7 +31,7 @@ It is also worth noting that the query above would find this C code:
yPtr = NULL;
This is because the database contains a representation of the code base after the preprocessor transforms have run (for more information, see `Database generation <https://lgtm.com/help/lgtm/generate-database>`__). This means that any macro invocations, such as the ``NULL`` define used here, are expanded during the creation of the database. If you want to write queries about macros then there are some special library classes that have been designed specifically for this purpose (for example, the ``Macro``, ``MacroInvocation`` classes and predicates like ``Element.isInMacroExpansion()``). In this case, it is good that macros are expanded, but we do not want to find assignments to pointers.
This is because the database contains a representation of the code base after the preprocessor transforms have run. This means that any macro invocations, such as the ``NULL`` define used here, are expanded during the creation of the database. If you want to write queries about macros then there are some special library classes that have been designed specifically for this purpose (for example, the ``Macro``, ``MacroInvocation`` classes and predicates like ``Element.isInMacroExpansion()``). In this case, it is good that macros are expanded, but we do not want to find assignments to pointers. For more information, see `Database generation <https://lgtm.com/help/lgtm/generate-database>`__ on LGTM.com.
Finding assignments of 0 to an integer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -61,8 +58,8 @@ This checks that the left side of the assignment has a type that is some kind of
i = 0;
Statements
----------
Statements in CodeQL
--------------------
We can refine the query further using statements. In this case we use the class ``ForStmt``:
@@ -136,6 +133,6 @@ What next?
----------
- Explore other ways of finding types and statements using examples from the C/C++ cookbook for `types <https://help.semmle.com/wiki/label/CBCPP/type>`__ and `statements <https://help.semmle.com/wiki/label/CBCPP/statement>`__.
- Take a look at the :doc:`Conversions and classes <conversions-classes>` and :doc:`Analyzing data flow in C/C++ <dataflow>` tutorials.
- Take a look at the :doc:`Conversions and classes in C and C++ <conversions-classes>` and :doc:`Analyzing data flow in C and C++ <dataflow>` tutorials.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.

View File

@@ -1,10 +1,12 @@
Tutorial: Function classes
==========================
Functions in C and C++
=======================
You can use CodeQL to explore functions in C and C++ code.
Overview
--------
The standard CodeQL library for C and C++ represents functions using the ``Function`` class (see :doc:`Introducing the C/C++ libraries <introduce-libraries-cpp>`).
The standard CodeQL library for C and C++ represents functions using the ``Function`` class (see :doc:`CodeQL libraries for C and C++ <introduce-libraries-cpp>`).
The example queries in this topic explore some of the most useful library predicates for querying functions.
@@ -26,7 +28,7 @@ This query is very general, so there are probably too many results to be interes
Finding functions that are not called
-------------------------------------
It might be more interesting to find functions that are not called, using the standard CodeQL ``FunctionCall`` class from the **abstract syntax tree** category (see :doc:`Introducing the C/C++ libraries <introduce-libraries-cpp>`). The ``FunctionCall`` class can be used to identify places where a function is actually used, and it is related to ``Function`` through the ``FunctionCall.getTarget()`` predicate.
It might be more interesting to find functions that are not called, using the standard CodeQL ``FunctionCall`` class from the **abstract syntax tree** category (see :doc:`CodeQL libraries for C and C++ <introduce-libraries-cpp>`). The ``FunctionCall`` class can be used to identify places where a function is actually used, and it is related to ``Function`` through the ``FunctionCall.getTarget()`` predicate.
.. code-block:: ql
@@ -91,6 +93,6 @@ What next?
----------
- Explore other ways of finding functions using examples from the `C/C++ cookbook <https://help.semmle.com/wiki/label/CBCPP/function>`__.
- Take a look at some of the other tutorials: :doc:`Expressions, types and statements <expressions-types>`, :doc:`Conversions and classes <conversions-classes>`, and :doc:`Analyzing data flow in C/C++ <dataflow>`.
- Take a look at some other tutorials: :doc:`Expressions, types and statements in C and C++ <introduce-libraries-cpp>`, :doc:`Conversions and classes in C and C++ <conversions-classes>`, and :doc:`Analyzing data flow in C and C++ <dataflow>`.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.

View File

@@ -1,8 +1,10 @@
Using the guards library in C and C++
=====================================
Overview
--------
You can use the CodeQL guards library to identify conditional expressions that control the execution of other parts of a program in C and C++ codebases.
About the guards library
------------------------
The guards library (defined in ``semmle.code.cpp.controlflow.Guards``) provides a class `GuardCondition <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/controlflow/Guards.qll/type.Guards$GuardCondition.html>`__ representing Boolean values that are used to make control flow decisions.
A ``GuardCondition`` is considered to guard a basic block if the block can only be reached if the ``GuardCondition`` is evaluated a certain way. For instance, in the following code, ``x < 10`` is a ``GuardCondition``, and it guards all the code before the return statement.
@@ -20,7 +22,7 @@ A ``GuardCondition`` is considered to guard a basic block if the block can only
The ``controls`` predicate
------------------------------------------------
--------------------------
The ``controls`` predicate helps determine which blocks are only run when the ``GuardCondition`` evaluates a certain way. ``guard.controls(block, testIsTrue)`` holds if ``block`` is only entered if the value of this condition is ``testIsTrue``.

View File

@@ -1,10 +1,13 @@
Introducing the CodeQL libraries for C/C++
==========================================
CodeQL library for C and C++
============================
Overview
--------
When analyzing C or C++ code, you can use the large collection of classes in the CodeQL library for C and C++.
There is an extensive library for analyzing CodeQL databases extracted from C/C++ projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks. The library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``cpp.qll`` imports all the core C/C++ library modules, so you can include the complete library by beginning your query with:
About the CodeQL library for C and C++
--------------------------------------
There is an extensive library for analyzing CodeQL databases extracted from C/C++ projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks.
The library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``cpp.qll`` imports all the core C/C++ library modules, so you can include the complete library by beginning your query with:
.. code-block:: ql
@@ -12,9 +15,7 @@ There is an extensive library for analyzing CodeQL databases extracted from C/C+
The rest of this topic summarizes the available CodeQL classes and corresponding C/C++ constructs.
NOTE: You can find related classes and features using the query console's auto-complete feature. You can also press *F3* to jump to the definition of any element; library files are opened in new tabs in the console.
Summary of the library classes
Commonly-used library classes
------------------------------
The most commonly used standard library classes are listed below. The listing is broken down by functionality. Each library class is annotated with a C/C++ construct it corresponds to.
@@ -522,6 +523,6 @@ This table lists `Preprocessor <https://help.semmle.com/qldoc/cpp/semmle/code/cp
What next?
----------
- Experiment with the worked examples in the CodeQL for C/C++ topics: :doc:`Function classes <function-classes>`, :doc:`Expressions, types and statements <expressions-types>`, :doc:`Conversions and classes <conversions-classes>`, and :doc:`Analyzing data flow in C/C++ <dataflow>`.
- Experiment with the worked examples in the CodeQL for C/C++ topics: :doc:`Functions in C and C++ <function-classes>`, :doc:`Expressions, types, and statements in C and C++ <expressions-types>`, :doc:`Conversions and classes in C and C++ <conversions-classes>`, and :doc:`Analyzing data flow in C and C++ <dataflow>`.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.

View File

@@ -1,13 +1,15 @@
Example: Checking that constructors initialize all private fields
=================================================================
Refining a query to account for edge cases
==========================================
You can improve the results generated by a CodeQL query by adding conditions to remove false positive results caused by common edge cases.
Overview
--------
This topic describes how a C++ query was developed. The example introduces recursive predicates and demonstrates the typical workflow used to refine a query. For a full overview of the topics available for learning to write queries for C/C++ code, see :doc:`CodeQL for C/C++ <ql-for-cpp>`.
This topic describes how a C++ query was developed. The example introduces recursive predicates and demonstrates the typical workflow used to refine a query. For a full overview of the topics available for learning to write queries for C/C++ code, see :doc:`CodeQL for C and C++ <ql-for-cpp>`.
Problem—finding every private field and checking for initialization
-------------------------------------------------------------------
Finding every private field and checking for initialization
-----------------------------------------------------------
Writing a query to check if a constructor initializes all private fields seems like a simple problem, but there are several edge cases to account for.
@@ -100,7 +102,7 @@ You may also wish to consider methods called by constructors that assign to the
int m_value;
};
This case can be excluded by creating a recursive predicate. The recursive predicate is given a function and a field, then checks whether the function assigns to the field. The predicate runs itself on all the functions called by the function that it has been given. By passing the constructor to this predicate, we can check for assignments of a field in all functions called by the constructor, and then do the same for all functions called by those functions all the way down the tree of function calls (see `Recursion <https://help.semmle.com/QL/ql-handbook/recursion.html>`__ for more information).
This case can be excluded by creating a recursive predicate. The recursive predicate is given a function and a field, then checks whether the function assigns to the field. The predicate runs itself on all the functions called by the function that it has been given. By passing the constructor to this predicate, we can check for assignments of a field in all functions called by the constructor, and then do the same for all functions called by those functions all the way down the tree of function calls. For more information, see `Recursion <https://help.semmle.com/QL/ql-handbook/recursion.html>`__ in the QL language handbook.
.. code-block:: ql
@@ -124,7 +126,7 @@ This case can be excluded by creating a recursive predicate. The recursive predi
Refinement 4—simplifying the query
----------------------------------
Finally we can simplify the query by using the `transitive closure operator <https://help.semmle.com/QL/ql-handbook/recursion.html#transitive-closures>`__. In this final version of the query, ``c.calls*(fun)`` resolves to the set of all functions that are ``c`` itself, are called by ``c``, are called by a function that is called by ``c``, and so on. This eliminates the need to make a new predicate all together.
Finally we can simplify the query by using the transitive closure operator. In this final version of the query, ``c.calls*(fun)`` resolves to the set of all functions that are ``c`` itself, are called by ``c``, are called by a function that is called by ``c``, and so on. This eliminates the need to make a new predicate all together. For more information, see `Transitive closures <https://help.semmle.com/QL/ql-handbook/recursion.html#transitive-closures>`__ in the QL language handbook.
.. code-block:: ql
@@ -147,6 +149,6 @@ Finally we can simplify the query by using the `transitive closure operator <htt
What next?
----------
- Take a look at another example: :doc:`Checking for allocations equal to 'strlen(string)' without space for a null terminator <zero-space-terminator>`.
- Take a look at another example: :doc:`Detecting a potential buffer overflow <zero-space-terminator>`.
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.

View File

@@ -1,9 +1,11 @@
CodeQL for C/C++
================
Learn how to write queries using the standard CodeQL libraries for C and C++.
.. toctree::
:glob:
:hidden:
:maxdepth: 1
introduce-libraries-cpp
function-classes
@@ -12,42 +14,10 @@ CodeQL for C/C++
dataflow
private-field-initialization
zero-space-terminator
These topics provide an overview of the CodeQL libraries for C/C++ and show examples of how to write queries that use them.
- `Basic C/C++ query <https://lgtm.com/help/lgtm/console/ql-cpp-basic-example>`__ describes how to write and run queries using LGTM.
- :doc:`Introducing the CodeQL libraries for C/C++ <introduce-libraries-cpp>` introduces the standard libraries used to write queries for C and C++ code.
- :doc:`Tutorial: Function classes <function-classes>` demonstrates how to write queries using the standard CodeQL library classes for C/C++ functions.
- :doc:`Tutorial: Expressions, types and statements <expressions-types>` demonstrates how to write queries using the standard CodeQL library classes for C/C++ expressions, types and statements.
- :doc:`Tutorial: Conversions and classes <conversions-classes>` demonstrates how to write queries using the standard CodeQL library classes for C/C++ conversions and classes.
- :doc:`Tutorial: Analyzing data flow in C/C++ <dataflow>` demonstrates how to write queries using the standard data flow and taint tracking libraries for C/C++.
- :doc:`Example: Checking that constructors initialize all private fields <private-field-initialization>` works through the development of a query. It introduces recursive predicates and shows the typical workflow used to refine a query.
- :doc:`Example: Checking for allocations equal to strlen(string) without space for a null terminator <zero-space-terminator>` shows how a query to detect this particular buffer issue was developed.
Advanced libraries
----------------------------------
.. toctree::
:hidden:
guards
range-analysis
value-numbering-hash-cons
- :doc:`Using the guards library in C and C++ <guards>` demonstrates how to identify conditional expressions that control the execution of other code and what guarantees they provide.
- :doc:`Using range analysis for C and C++ <range-analysis>` demonstrates how to determine constant upper and lower bounds and possible overflow or underflow of expressions.
- :doc:`Using hash consing and value numbering for C and C++ <value-numbering-hash-cons>` demonstrates how to recognize expressions that are syntactically identical or compute the same value at runtime.
Other resources
---------------

View File

@@ -1,10 +1,10 @@
Using range analysis for C and C++
==================================
Overview
--------
You can use range analysis to determine the upper or lower bounds on an expression, or whether an expression could potentially over or underflow.
Range analysis determines upper and lower bounds for an expression.
About the range analysis library
--------------------------------
The range analysis library (defined in ``semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis``) provides a set of predicates for determining constant upper and lower bounds on expressions, as well as recognizing integer overflows. For performance, the library performs automatic widening and therefore may not provide the tightest possible bounds.

View File

@@ -1,15 +1,14 @@
Hash consing and value numbering
=================================================
================================
Overview
--------
You can use specialized CodeQL libraries to recognize expressions that are syntactically identical or compute the same value at runtime in C and C++ codebases.
About the hash consing and value numbering libraries
----------------------------------------------------
In C and C++ databases, each node in the abstract syntax tree is represented by a separate object. This allows both analysis and results display to refer to specific appearances of a piece of syntax. However, it is frequently useful to determine whether two expressions are equivalent, either syntactically or semantically.
The `hash consing <https://en.wikipedia.org/wiki/Hash_consing>`__ library (defined in ``semmle.code.cpp.valuenumbering.HashCons``) provides a mechanism for identifying expressions that have the same syntactic structure. The `global value numbering <https://en.wikipedia.org/wiki/Value_numbering>`__ library (defined in ``semmle.code.cpp.valuenumbering.GlobalValueNumbering``) provides a mechanism for identifying expressions that compute the same value at runtime.
Both libraries partition the expressions in each function into equivalence classes represented by objects. Each ``HashCons`` object represents a set of expressions with identical parse trees, while ``GVN`` objects represent sets of expressions that will always compute the same value.
The hash consing library (defined in ``semmle.code.cpp.valuenumbering.HashCons``) provides a mechanism for identifying expressions that have the same syntactic structure. The global value numbering library (defined in ``semmle.code.cpp.valuenumbering.GlobalValueNumbering``) provides a mechanism for identifying expressions that compute the same value at runtime. Both libraries partition the expressions in each function into equivalence classes represented by objects. Each ``HashCons`` object represents a set of expressions with identical parse trees, while ``GVN`` objects represent sets of expressions that will always compute the same value. For more information, see `Hash consing <https://en.wikipedia.org/wiki/Hash_consing>`__ and `Value numbering <https://en.wikipedia.org/wiki/Value_numbering>`__ on Wikipedia.
Example C code
--------------
@@ -111,4 +110,3 @@ Example query
hashCons(outer.getCondition()) = hashCons(inner.getCondition())
select inner.getCondition(), "The condition of this if statement duplicates the condition of $@",
outer.getCondition(), "an enclosing if statement"

View File

@@ -1,5 +1,7 @@
Example: Checking for allocations equal to ``strlen(string)`` without space for a null terminator
=================================================================================================
Detecting a potential buffer overflow
=====================================
You can use CodeQL to detect potential buffer overflows by checking for allocations equal to ``strlen`` in C and C++.
Overview
--------
@@ -98,7 +100,7 @@ When you have defined the basic query then you can refine the query to include f
Improving the query using the 'SSA' library
-------------------------------------------
The ``SSA`` library represents variables in `static single assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`__ (SSA) form. In this form, each variable is assigned exactly once and every variable is defined before it is used. The use of SSA variables simplifies queries considerably as much of the local data flow analysis has been done for us.
The ``SSA`` library represents variables in static single assignment (SSA) form. In this form, each variable is assigned exactly once and every variable is defined before it is used. The use of SSA variables simplifies queries considerably as much of the local data flow analysis has been done for us. For more information, see `Static single assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`__ on Wikipedia.
Including examples where the string size is stored before use
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~