Merge pull request #2257 from jf205/codeql-training-material

docs: update terminology in training material
This commit is contained in:
shati-patel
2019-11-06 14:43:55 +00:00
committed by GitHub
25 changed files with 178 additions and 428 deletions

View File

@@ -1,19 +1,19 @@
QL training and variant analysis examples
#########################################
CodeQL training and variant analysis examples
=============================================
QL and variant analysis
=======================
CodeQL and variant analysis
---------------------------
`Variant analysis <https://semmle.com/variant-analysis>`__ is the process of using a known vulnerability as a seed to find similar problems in your code. Security engineers typically perform variant analysis to identify possible vulnerabilities and to ensure that these threats are properly fixed across multiple code bases.
`QL <https://semmle.com/ql>`__ is Semmle's variant analysis engine, and it is also the technology that underpins LGTM, Semmle's community driven security analysis platform. Together, QL and LGTM provide continuous monitoring and scalable variant analysis for your projects, even if you dont have your own team of dedicated security engineers. You can read more about using QL and LGTM in variant analysis in the `Semmle blog <https://blog.semmle.com/tags/variant-analysis>`__.
`CodeQL <https://semmle.com/ql>`__ is the code analysis engine that underpins LGTM, Semmle's community driven security analysis platform. Together, CodeQL and LGTM provide continuous monitoring and scalable variant analysis for your projects, even if you dont have your own team of dedicated security engineers. You can read more about using CodeQL and LGTM in variant analysis in the `Semmle blog <https://blog.semmle.com/tags/variant-analysis>`__.
The QL language is easy to learn, and exploring code using QL is the most efficient way to perform variant analysis.
CodeQL is easy to learn, and exploring code using CodeQL is the most efficient way to perform variant analysis.
Learning QL for variant analysis
================================
Learning CodeQL for variant analysis
------------------------------------
Start learning how to use QL in variant analysis for a specific language by looking at the topics below. Each topic links to a short presentation on the QL language, QL libraries, or an example variant discovered using QL.
Start learning how to use CodeQL in variant analysis for a specific language by looking at the topics below. Each topic links to a short presentation on CodeQL, its libraries, or an example variant discovered using CodeQL.
.. |arrow-l| unicode:: U+2190
@@ -24,7 +24,7 @@ Start learning how to use QL in variant analysis for a specific language by look
When you have selected a presentation, use |arrow-r| and |arrow-l| to navigate between slides.
Press **p** to view the additional notes on slides that have an information icon |info| in the top right corner, and press **f** to enter full-screen mode.
The presentations contain a number of QL query examples.
The presentations contain a number of query examples.
We recommend that you download `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/home-page.html>`__ and import the example snapshot for each presentation so that you can find the bugs mentioned in the slides.
@@ -32,35 +32,35 @@ We recommend that you download `QL for Eclipse <https://help.semmle.com/ql-for-e
Information
The presentations listed below are used in QL language and variant analysis training sessions run by Semmle engineers.
The presentations listed below are used in CodeQL and variant analysis training sessions run by Semmle engineers.
Therefore, be aware that the slides are designed to be presented by an instructor.
If you are using the slides without an instructor, please use the additional notes to help guide you through the examples.
QL and variant analysis for C/C++
---------------------------------
CodeQL and variant analysis for C/C++
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- `Introduction to variant analysis: QL for C/C++ <../ql-training/cpp/intro-ql-cpp.html>`__an introduction to variant analysis and QL for C/C++ programmers.
- `Introduction to variant analysis: CodeQL for C/C++ <../ql-training/cpp/intro-ql-cpp.html>`__an introduction to variant analysis and CodeQL for C/C++ programmers.
- `Example: Bad overflow guard <../ql-training/cpp/bad-overflow-guard.html>`__an example of iterative query development to find bad overflow guards in a C++ project.
- `Program representation: QL for C/C++ <../ql-training/cpp/program-representation-cpp.html>`__information on how QL analysis represents C/C++ programs.
- `Introduction to local data flow <../ql-training/cpp/data-flow-cpp.html>`__an introduction to analyzing local data flow in C/C++ using QL, including an example demonstrating how to develop a query to find a real CVE.
- `Program representation: CodeQL for C/C++ <../ql-training/cpp/program-representation-cpp.html>`__information on how CodeQL analysis represents C/C++ programs.
- `Introduction to local data flow <../ql-training/cpp/data-flow-cpp.html>`__an introduction to analyzing local data flow in C/C++ using CodeQL, including an example demonstrating how to develop a query to find a real CVE.
- `Exercise: snprintf overflow <../ql-training/cpp/snprintf.html>`__an example demonstrating how to develop a data flow query.
- `Introduction to global data flow <../ql-training/cpp/global-data-flow-cpp.html>`__an introduction to analyzing global data flow in C/C++ using QL.
- `Analyzing control flow: QL for C/C++ <../ql-training/cpp/control-flow-cpp.html>`__an introduction to analyzing control flow in C/C++ using QL.
- `Introduction to global data flow <../ql-training/cpp/global-data-flow-cpp.html>`__an introduction to analyzing global data flow in C/C++ using CodeQL.
- `Analyzing control flow: CodeQL for C/C++ <../ql-training/cpp/control-flow-cpp.html>`__an introduction to analyzing control flow in C/C++ using CodeQL.
QL and variant analysis for Java
--------------------------------
CodeQL and variant analysis for Java
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- `Introduction to variant analysis: QL for Java <../ql-training/java/intro-ql-java.html>`__an introduction to variant analysis and QL for Java programmers.
- `Introduction to variant analysis: CodeQL for Java <../ql-training/java/intro-ql-java.html>`__an introduction to variant analysis and CodeQL for Java programmers.
- `Example: Query injection <../ql-training/java/query-injection-java.html>`__an example of iterative query development to find unsanitized SPARQL injections in a Java project.
- `Program representation: QL for Java <../ql-training/java/program-representation-java.html>`__information on how QL analysis represents Java programs.
- `Introduction to local data flow <../ql-training/java/data-flow-java.html>`__an introduction to analyzing local data flow in Java using QL, including an example demonstrating how to develop a query to find a real CVE.
- `Program representation: CodeQL for Java <../ql-training/java/program-representation-java.html>`__information on how CodeQL analysis represents Java programs.
- `Introduction to local data flow <../ql-training/java/data-flow-java.html>`__an introduction to analyzing local data flow in Java using CodeQL, including an example demonstrating how to develop a query to find a real CVE.
- `Exercise: Apache Struts <../ql-training/java/apache-struts-java.html>`__an example demonstrating how to develop a data flow query.
- `Introduction to global data flow <../ql-training/java/global-data-flow-java.html>`__an introduction to analyzing global data flow in Java using QL.
- `Introduction to global data flow <../ql-training/java/global-data-flow-java.html>`__an introduction to analyzing global data flow in Java using CodeQL.
More resources
--------------
~~~~~~~~~~~~~~
- If you are completely new to QL, look at our introductory topics in :ref:`Getting started <getting-started>`.
- To find more detailed information about how to write QL queries for specific languages, visit the links in :ref:`Writing QL queries <writing-ql-queries>`.
- To read more about how QL queries have been used in Semmle's security research, and to read about new QL developments, visit the `Semmle blog <https://blog.semmle.com>`__.
- If you are completely new to CodeQL, look at our introductory topics in :doc:`Learning CodeQL <index>`.
- To find more detailed information about how to write queries for specific languages, visit the links in :ref:`Writing CodeQL queries <writing-ql-queries>`.
- To read more about how CodeQL queries have been used in Semmle's security research, and to read about new CodeQL developments, visit the `Semmle blog <https://blog.semmle.com>`__.
- Find more examples of queries written by Semmle's own security researchers in the `Semmle Demos repository <https://github.com/semmle/demos>`__ on GitHub.

View File

@@ -485,6 +485,7 @@ ul {
margin-left: 2.2em;
margin-bottom: 1em;
position: relative;
width: 90%;
}
/* line 300, ../scss/default.scss */
ul li {
@@ -1300,13 +1301,13 @@ aside.gdbar img {
.title-slide hgroup h1 {
font-size: 2em;
line-height: 1.4;
/*letter-spacing: -3px;*/
color: white;
margin: auto;
display: block;
position: absolute;
top: 0;
bottom: 10%;
left: 1.25em;
height: 0;
}
/* line 898, ../scss/default.scss */
@@ -1430,31 +1431,19 @@ hgroup .pre {
color: #5c31ff;
}
/* title slide (deck title, subtitle, semmle logo)*/
/* title slide (deck title, subtitle)*/
.title-slide {
background-image: url("../../title-slide.svg");
background-size: cover;
}
.semmle-logo sup {
vertical-align: super;
font-size: 0.3em;
font-weight: 100;
}
.title-slide .semmle-logo {
color: white;
font-size: 1.2em;
position: absolute;
top: 10%;
}
.title-slide p {
color: white;
font-size: 1em;
position: absolute;
bottom: 30%;
left: 2.6em;
}
.title-slide hgroup .pre {
@@ -1464,6 +1453,7 @@ hgroup .pre {
.subheading {
position: absolute;
top: 62.5%;
left: 0;
}
.subheading p {
@@ -1569,7 +1559,7 @@ p.first.admonition-title {
text-align: left;
font-size: 0.8em;
width: 100%;
overflow: scroll;
overflow: auto;
border: 1px solid black;
}
@@ -1608,7 +1598,7 @@ p.first.admonition-title {
display: block;
position: fixed;
top: 0;
right: -1%;
right: 0;
font-size: 1.2em;
}

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 101 KiB

After

Width:  |  Height:  |  Size: 98 KiB

View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# QL training slides build configuration file
# CodeQL training slides build configuration file
#
# This file is execfile()d with the current directory set to its
# containing dir.
@@ -59,7 +59,7 @@ highlight_language = 'ql'
master_doc = 'index'
# General information about the project.
project = u'QL training and variant analysis examples'
project = u'CodeQL training and variant analysis examples'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
@@ -77,18 +77,18 @@ slide_theme_path = ["_static-training/"]
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
html_title = 'QL training and variant analysis examples'
html_title = 'CodeQL training and variant analysis examples'
# Output file base name for HTML help builder.
htmlhelp_basename = 'QL training'
htmlhelp_basename = 'CodeQL training'
# The Semmle version info for the current release you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = u'1.21'
version = u'1.22'
# The full version, including alpha/beta/rc tags.
release = u'1.21'
release = u'1.22'
copyright = u'2019 Semmle Ltd'
author = u'Semmle Ltd'

View File

@@ -2,11 +2,7 @@
Example: Bad overflow guard
===========================
QL for C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
CodeQL for C/C++
.. rst-class:: setup
@@ -16,7 +12,7 @@ Setup
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `ChakraCore snapshot <https://downloads.lgtm.com/snapshots/cpp/microsoft/chakracore/ChakraCore-revision-2017-April-12--18-13-26.zip>`__
- `ChakraCore database <https://downloads.lgtm.com/snapshots/cpp/microsoft/chakracore/ChakraCore-revision-2017-April-12--18-13-26.zip>`__
.. note::
@@ -24,9 +20,9 @@ For this example you should download:
You can query the project in `the query console <https://lgtm.com/query/project:2034240708/lang:cpp/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. insert database-note.rst to explain differences between database available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. include:: ../slide-snippets/database-note.rst
.. resume slides
@@ -127,13 +123,13 @@ This happens even though the overflow check passed!
.. rst-class:: background2
Developing a QL query
=====================
Developing a CodeQL query
=========================
Finding bad overflow guards
QL query: bad overflow guards
=============================
CodeQL query: bad overflow guards
==================================
Lets look for overflow guards of the form ``v + b < v``, using the classes
``AddExpr``, ``Variable`` and ``RelationalOperation`` from the ``cpp`` library.
@@ -153,10 +149,10 @@ Lets look for overflow guards of the form ``v + b < v``, using the classes
- a ``RelationalOperation``: the overflow comparison check.
- a ``Variable``: used as an argument to both the addition and comparison.
- The ``where`` part of the query ties these three QL variables together using `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ defined in the `standard QL for C/C++ library <https://help.semmle.com/qldoc/cpp/>`__.
- The ``where`` part of the query ties these three variables together using `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ defined in the `standard CodeQL for C/C++ library <https://help.semmle.com/qldoc/cpp/>`__.
QL query: bad overflow guards
=============================
CodeQL query: bad overflow guards
=================================
We want to ensure the operands being added have size less than 4 bytes.
@@ -180,8 +176,8 @@ We can get the size (in bytes) of a type using the ``getSize()`` method.
- We therefore write a helper predicate for small expressions.
- This predicate effectively represents the set of all expressions in the database where the size of the type of the expression is less than 4 bytes, that is, less than 32-bits.
QL query: bad overflow guards
=============================
CodeQL query: bad overflow guards
==================================
We can ensure the operands being added have size less than 4 bytes, using our new predicate.
@@ -216,8 +212,8 @@ Now our query becomes:
- The “range” part, ``op = a.getAnOperand()``, restricts ``op`` to being one of the two operands to the addition.
- The “condition” part, ``isSmall(op)``, says that the ``forall`` holds only if the condition (that the ``op`` is small) holds for everything in the rangethat is, both the arguments to the addition.
QL query: bad overflow guards
=============================
CodeQL query: bad overflow guards
=================================
Sometimes the result of the addition is cast to a small type of size less than 4 bytes, preventing automatic widening. We dont want our query to flag these instances.
@@ -233,4 +229,4 @@ The final query
.. literalinclude:: ../query-examples/cpp/bad-overflow-guard-3.ql
:language: ql
This query finds a single result in our historic snapshot, which was `a genuine bug in ChakraCore <https://github.com/Microsoft/ChakraCore/commit/2500e1cdc12cb35af73d5c8c9b85656aba6bab4d>`__.
This query finds a single result in our historic database, which was `a genuine bug in ChakraCore <https://github.com/Microsoft/ChakraCore/commit/2500e1cdc12cb35af73d5c8c9b85656aba6bab4d>`__.

View File

@@ -2,11 +2,7 @@
Analyzing control flow
======================
QL for C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
CodeQL for C/C++
.. Include information slides here
@@ -18,7 +14,7 @@ Setup
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `ChakraCore snapshot <https://downloads.lgtm.com/snapshots/cpp/microsoft/chakracore/ChakraCore-revision-2017-April-12--18-13-26.zip>`__
- `ChakraCore database <https://downloads.lgtm.com/snapshots/cpp/microsoft/chakracore/ChakraCore-revision-2017-April-12--18-13-26.zip>`__
.. note::
@@ -26,9 +22,9 @@ For this example you should download:
You can query the project in `the query console <https://lgtm.com/query/project:2034240708/lang:cpp/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. insert database-note.rst to explain differences between database available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. include:: ../slide-snippets/database-note.rst
.. resume slides
@@ -93,7 +89,7 @@ Control flow graphs
Modeling control flow
=====================
The control flow is modeled with a QL class, ``ControlFlowNode``. Examples of control flow nodes include statements and expressions.
The control flow is modeled with a CodeQL class, ``ControlFlowNode``. Examples of control flow nodes include statements and expressions.
- ``ControlFlowNode`` provides API for traversing the control flow graph:
@@ -226,7 +222,7 @@ A ``GuardCondition`` is a ``Boolean`` condition that controls one or more basic
Further materials
=================
- QL for C/C++: https://help.semmle.com/QL/learn-ql/ql/cpp/ql-for-cpp.html
- CodeQL for C/C++: https://help.semmle.com/QL/learn-ql/ql/cpp/ql-for-cpp.html
- API reference: https://help.semmle.com/qldoc/cpp
.. rst-class:: end-slide

View File

@@ -4,10 +4,6 @@ Introduction to data flow
Finding string formatting vulnerabilities in C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
.. rst-class:: setup
Setup
@@ -16,7 +12,7 @@ Setup
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `dotnet/coreclr snapshot <http://downloads.lgtm.com/snapshots/cpp/dotnet/coreclr/dotnet_coreclr_fbe0c77.zip>`__
- `dotnet/coreclr database <http://downloads.lgtm.com/snapshots/cpp/dotnet/coreclr/dotnet_coreclr_fbe0c77.zip>`__
.. note::
@@ -24,9 +20,9 @@ For this example you should download:
You can query the project in `the query console <https://lgtm.com/query/projects:1505958977333/lang:cpp/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. insert database-note.rst to explain differences between database available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. include:: ../slide-snippets/database-note.rst
.. resume slides
@@ -86,9 +82,9 @@ Write a query that flags ``printf`` calls where the format argument is not a ``S
.. note::
This first query is about finding places where the format specifier is not a constant string. In QL for C/C++, constant strings are modeled as ``StringLiteral`` nodes, so we are looking for calls to format functions where the format specifier argument is not a string literal.
This first query is about finding places where the format specifier is not a constant string. In the CodeQL libraries for C/C++, constant strings are modeled as ``StringLiteral`` nodes, so we are looking for calls to format functions where the format specifier argument is not a string literal.
The `C/C++ standard libraries <https://help.semmle.com/qldoc/cpp/>`__ include many different formatting functions that may be vulnerable to this particular attackincluding ``printf``, ``snprintf``, and others. Furthermore, each of these different formatting functions may include the format string in a different position in the argument list. Instead of laboriously listing all these different variants, we can make use of the QL for C/C++ standard library class ``FormattingFunction``, which provides an interface that models common formatting functions in C/C++.
The `C/C++ standard libraries <https://help.semmle.com/qldoc/cpp/>`__ include many different formatting functions that may be vulnerable to this particular attackincluding ``printf``, ``snprintf``, and others. Furthermore, each of these different formatting functions may include the format string in a different position in the argument list. Instead of laboriously listing all these different variants, we can make use of the standard CodeQL class ``FormattingFunction``, which provides an interface that models common formatting functions in C/C++.
Meh...
======

View File

@@ -2,11 +2,7 @@
Introduction to global data flow
================================
QL for C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
CodeQL for C/C++
.. rst-class:: setup
@@ -16,7 +12,7 @@ Setup
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `dotnet/coreclr snapshot <http://downloads.lgtm.com/snapshots/cpp/dotnet/coreclr/dotnet_coreclr_fbe0c77.zip>`__
- `dotnet/coreclr database <http://downloads.lgtm.com/snapshots/cpp/dotnet/coreclr/dotnet_coreclr_fbe0c77.zip>`__
.. note::
@@ -24,9 +20,9 @@ For this example you should download:
You can query the project in `the query console <https://lgtm.com/query/projects:1505958977333/lang:cpp/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. insert database-note.rst to explain differences between database available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. include:: ../slide-snippets/database-note.rst
.. resume slides
@@ -77,7 +73,7 @@ The library class ``SecurityOptions`` provides a (configurable) model of what co
.. note::
We first define what it means to be a *source* of tainted data for this particular problem. In this case, what we care about is whether the format string can be provided by an external user to our application or service. As there are many such ways external data could be introduced into the system, the standard QL libraries for C/C++ include an extensible API for modeling user input. In this case, we will simply use the predefined set of *user inputs*, which includes arguments provided to command line applications.
We first define what it means to be a *source* of tainted data for this particular problem. In this case, what we care about is whether the format string can be provided by an external user to our application or service. As there are many such ways external data could be introduced into the system, the standard CodeQL libraries for C/C++ include an extensible API for modeling user input. In this case, we will simply use the predefined set of *user inputs*, which includes arguments provided to command line applications.
Defining sinks (exercise)

View File

@@ -2,11 +2,7 @@
Introduction to variant analysis
================================
QL for C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
CodeQL for C/C++
.. rst-class:: setup
@@ -16,7 +12,7 @@ Setup
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `exiv2 snapshot <http://downloads.lgtm.com/snapshots/cpp/exiv2/Exiv2_exiv2_b090f4d.zip>`__
- `exiv2 database <http://downloads.lgtm.com/snapshots/cpp/exiv2/Exiv2_exiv2_b090f4d.zip>`__
.. note::
@@ -24,9 +20,9 @@ For this example you should download:
You can also query the project in `the query console <https://lgtm.com/query/project:1506532406873/lang:cpp/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. insert database-note.rst to explain differences between database available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. include:: ../slide-snippets/database-note.rst
.. resume slides
@@ -56,14 +52,14 @@ Oops
.. note::
Heres a simple (artificial) bug, which well develop a QL query to catch.
Heres a simple (artificial) bug, which well develop a query to catch.
This function writes a value to a given location in an array, first trying to do a bounds check to validate that the location is within bounds. However, the return statement has been commented out, leaving a redundant if statement and no bounds checking.
This case can act as our “patient zero” in the variant analysis game.
A simple QL query
=================
A simple CodeQL query
=====================
.. literalinclude:: ../query-examples/cpp/empty-if-cpp.ql
:language: ql
@@ -72,9 +68,9 @@ A simple QL query
We are going to write a simple query which finds “if statements” with empty “then” blocks, so we can highlight the results like those on the previous slide. The query can be run in the `query console on LGTM <https://lgtm.com/query>`__, or in your `IDE <https://lgtm.com/help/lgtm/running-queries-ide>`__.
A `QL query <https://help.semmle.com/QL/ql-handbook/queries.html>`__ consists of a “select” clause that indicates what results should be returned. Typically it will also provide a “from” clause to declare some variables, and a “where” clause to state conditions over those variables. For more information on the structure of query files (including links to useful topics in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__), see `Introduction to query files <https://help.semmle.com/QL/learn-ql/ql/writing-queries/introduction-to-queries.html>`__.
A `query <https://help.semmle.com/QL/ql-handbook/queries.html>`__ consists of a “select” clause that indicates what results should be returned. Typically it will also provide a “from” clause to declare some variables, and a “where” clause to state conditions over those variables. For more information on the structure of query files (including links to useful topics in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__), see `Introduction to query files <https://help.semmle.com/QL/learn-ql/ql/writing-queries/introduction-to-queries.html>`__.
In our example here, the first line of the query imports the `C/C++ standard QL library <https://help.semmle.com/qldoc/cpp/>`__, which defines concepts like ``IfStmt`` and ``Block``.
In our example here, the first line of the query imports the `CodeQL library for C/C++ <https://help.semmle.com/qldoc/cpp/>`__, which defines concepts like ``IfStmt`` and ``Block``.
The query proper starts by declaring two variablesifStmt and block. These variables represent sets of values in the database, according to the type of each of the variables. For example, ifStmt has the type IfStmt, which means it represents the set of all if statements in the program.
If we simply selected these two variables::
@@ -97,8 +93,8 @@ A simple QL query
Structure of a QL query
=======================
Structure of a query
====================
A **query file** has the extension ``.ql`` and contains a **query clause**, and optionally **predicates**, **classes**, and **modules**.
@@ -110,14 +106,14 @@ Each query library also implicitly defines a module.
.. note::
QL queries are always contained in query files with the file extension ``.ql``. `Quick queries <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/quick-query.html>`__, run in `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/home-page.html>`__, are no exception: the quick query window maintains a temporary QL file in the background.
Queries are always contained in query files with the file extension ``.ql``. `Quick queries <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/quick-query.html>`__, run in `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/home-page.html>`__, are no exception: the quick query window maintains a temporary query file in the background.
Parts of queries can be lifted into `QL library files <https://help.semmle.com/QL/ql-handbook/modules.html#library-modules>`__ with the extension ``qll``. Definitions within such libraries can be brought into scope using ``import`` statements, and similarly QLL files can import each others definitions using “import” statements.
Parts of queries can be lifted into `library files <https://help.semmle.com/QL/ql-handbook/modules.html#library-modules>`__ with the extension ``.qll``. Definitions within such libraries can be brought into scope using ``import`` statements, and similarly QLL files can import each others definitions using “import” statements.
Logic can be encapsulated as user-defined `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ and `classes <https://help.semmle.com/QL/ql-handbook/types.html#classes>`__, and organized into `modules <https://help.semmle.com/QL/ql-handbook/modules.html>`__. Each QLL file implicitly defines a module, but QL and QLL files can also contain explicit module definitions, as we will see later.
Predicates in QL
================
Predicates
==========
A predicate allows you to pull out and name parts of a query.
@@ -135,7 +131,7 @@ A predicate allows you to pull out and name parts of a query.
.. note::
A QL predicate takes zero or more parameters, and its body is a condition on those parameters. The predicate may (or may not) hold. Predicates may also be recursive, simply by referring to themselves (directly or indirectly).
A `predicate <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ takes zero or more parameters, and its body is a condition on those parameters. The predicate may (or may not) hold. Predicates may also be `recursive <https://help.semmle.com/QL/ql-handbook/predicates.html#recursive-predicates>`__, simply by referring to themselves (directly or indirectly).
You can imagine a predicate to be a self-contained from-where-select statement, that produces an intermediate relation, or table. In this case, the ``isEmpty`` predicate will be the set of all blocks which are empty.
@@ -198,7 +194,7 @@ Iterative query refinement
.. note::
QL makes it very easy to experiment with analysis ideas. A common workflow is to start with a simple query (like our “redundant if-statement” example), examine a few results, refine the query based on any patterns that emerge and repeat.
CodeQL makes it very easy to experiment with analysis ideas. A common workflow is to start with a simple query (like our “redundant if-statement” example), examine a few results, refine the query based on any patterns that emerge and repeat.
As an exercise, refine the redundant-if query based on the observation that if the if-statement has an “else” clause, then even if the body of the “then” clause is empty, its not actually redundant.

View File

@@ -2,11 +2,7 @@
Program representation
======================
QL for C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
CodeQL for C/C++
.. rst-class:: agenda
@@ -25,16 +21,16 @@ Agenda
.. resume slides
AST QL classes
==============
AST CodeQL classes
==================
Important AST classes include:
Important AST CodeQL classes include:
- ``Expr``: expressions such as assignments, variable references, function calls, ...
- ``Stmt``: statements such as conditionals, loops, try statements, ...
- ``DeclarationEntry``: places where functions, variables or types are declared and/or defined
These three (and all other AST classes) are subclasses of ``Element``.
These three (and all other AST CodeQL classes) are subclasses of ``Element``.
Symbol table
============
@@ -66,9 +62,9 @@ Working with variables
Working with functions
======================
Functions are represented by the Function QL class. Each declaration or definition of a function is represented by a ``FunctionDeclarationEntry``.
Functions are represented by the Function class. Each declaration or definition of a function is represented by a ``FunctionDeclarationEntry``.
Calls to functions are modeled by QL class Call and its subclasses:
Calls to functions are modeled by the CodeQL class ``Call`` and its subclasses:
- ``Call.getTarget()`` gets the declared target of the call; undefined for calls through function pointers
- ``Function.getACallToThisFunction()`` gets a call to this function
@@ -107,7 +103,7 @@ Working with macros
#define square(x) x*x
y = square(y0), z = square(z0)
is represented in the snapshot database as:
is represented in the CodeQL database as:
- A Macro entity representing the text of the *head* and *body* of the macro
- Assignment nodes, representing the two assignments after preprocessing
@@ -121,4 +117,4 @@ Useful predicates on ``Element``: ``isInMacroExpansion()``, ``isAffectedByMacro(
.. note::
The snapshot also contains information about macro definitions, which are represented by class ``Macro``. These macro definitions are related to the AST nodes resulting from their uses by the class ``MacroAccess``.
The CodeQL database also contains information about macro definitions, which are represented by class ``Macro``. These macro definitions are related to the AST nodes resulting from their uses by the class ``MacroAccess``.

View File

@@ -2,11 +2,7 @@
Exercise: ``snprintf`` overflow
===============================
QL for C/C++
.. container:: semmle-logo
Semmle :sup:`TM`
CodeQL for C/C++
.. rst-class:: setup
@@ -16,7 +12,7 @@ Setup
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `rsyslog snapshot <https://downloads.lgtm.com/snapshots/cpp/rsyslog/rsyslog/rsyslog-all-revision-2018-April-27--14-12-31.zip>`__
- `rsyslog database <https://downloads.lgtm.com/snapshots/cpp/rsyslog/rsyslog/rsyslog-all-revision-2018-April-27--14-12-31.zip>`__
.. note::
@@ -24,9 +20,9 @@ For this example you should download:
You can also query the project in `the query console <https://lgtm.com/query/project:1506087977050/lang:cpp/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. insert database-note.rst to explain differences between database available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. include:: ../slide-snippets/database-note.rst
.. resume slides

View File

@@ -1,9 +1,5 @@
QL training and variant analysis examples
=========================================
.. container:: semmle-logo
Semmle :sup:`TM`
CodeQL training and variant analysis examples
=============================================
.. toctree::
:glob:

View File

@@ -8,10 +8,6 @@ Exercise: Apache Struts
CVE-2017-9805
.. container:: semmle-logo
Semmle :sup:`TM`
.. rst-class:: setup
Setup
@@ -20,7 +16,7 @@ Setup
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `Apache Struts snapshot <https://downloads.lgtm.com/snapshots/java/apache/struts/apache-struts-7fd1622-CVE-2018-11776.zip>`__
- `Apache Struts database <https://downloads.lgtm.com/snapshots/java/apache/struts/apache-struts-7fd1622-CVE-2018-11776.zip>`__
.. note::
@@ -28,9 +24,9 @@ For this example you should download:
You can also query the project in `the query console <https://lgtm.com/query/project:1878521151/lang:java/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. insert database-note.rst to explain differences between database available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. include:: ../slide-snippets/database-note.rst
.. resume slides
@@ -67,7 +63,7 @@ RCE in Apache Struts
Finding the RCE yourself
========================
#. Create a QL class to find the interface ``org.apache.struts2.rest.handler.ContentTypeHandler``
#. Create a class to find the interface ``org.apache.struts2.rest.handler.ContentTypeHandler``
**Hint**: Use predicate ``hasQualifiedName(...)``

View File

@@ -2,10 +2,6 @@
Introduction to data flow
=========================
.. container:: semmle-logo
Semmle :sup:`TM`
Finding SPARQL injection vulnerabilities in Java
.. rst-class:: setup
@@ -16,7 +12,7 @@ Setup
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `VIVO Vitro snapshot <http://downloads.lgtm.com/snapshots/java/vivo-project/Vitro/vivo-project_Vitro_java-srcVersion_47ae42c01954432c3c3b92d5d163551ce367f510-dist_odasa-lgtm-2019-04-23-7ceff95-linux64.zip>`__
- `VIVO Vitro database <http://downloads.lgtm.com/snapshots/java/vivo-project/Vitro/vivo-project_Vitro_java-srcVersion_47ae42c01954432c3c3b92d5d163551ce367f510-dist_odasa-lgtm-2019-04-23-7ceff95-linux64.zip>`__
.. note::
@@ -24,9 +20,9 @@ For this example you should download:
You can also query the project in `the query console <https://lgtm.com/query/project:14040005/lang:java/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. insert database-note.rst to explain differences between database available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. include:: ../slide-snippets/database-note.rst
.. resume slides

View File

@@ -2,11 +2,7 @@
Introduction to global data flow
================================
QL for Java
.. container:: semmle-logo
Semmle :sup:`TM`
CodeQL for Java
.. rst-class:: setup
@@ -16,7 +12,7 @@ Setup
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `Apache Struts snapshot <https://downloads.lgtm.com/snapshots/java/apache/struts/apache-struts-7fd1622-CVE-2018-11776.zip>`__
- `Apache Struts database <https://downloads.lgtm.com/snapshots/java/apache/struts/apache-struts-7fd1622-CVE-2018-11776.zip>`__
.. note::
@@ -24,9 +20,9 @@ For this example you should download:
You can also query the project in `the query console <https://lgtm.com/query/project:1878521151/lang:java/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. insert database-note.rst to explain differences between database available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. include:: ../slide-snippets/database-note.rst
.. resume slides

View File

@@ -2,11 +2,7 @@
Introduction to variant analysis
================================
QL for Java
.. container:: semmle-logo
Semmle :sup:`TM`
CodeQL for Java
.. rst-class:: setup
@@ -16,7 +12,7 @@ Setup
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `Apache Struts snapshot <https://downloads.lgtm.com/snapshots/java/apache/struts/apache-struts-7fd1622-CVE-2018-11776.zip>`__
- `Apache Struts database <https://downloads.lgtm.com/snapshots/java/apache/struts/apache-struts-7fd1622-CVE-2018-11776.zip>`__
.. note::
@@ -24,9 +20,9 @@ For this example you should download:
You can also query the project in `the query console <https://lgtm.com/query/project:1878521151/lang:java/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. insert database-note.rst to explain differences between database available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. include:: ../slide-snippets/database-note.rst
.. resume slides
@@ -56,14 +52,14 @@ Oops
.. note::
Heres a simple (artificial) bug, which well develop a QL query to catch.
Heres a simple (artificial) bug, which well develop a query to catch.
This function writes a value to a given location in an array, first trying to do a bounds check to validate that the location is within bounds. However, the return statement has been commented out, leaving a redundant if statement and no bounds checking.
This case can act as our “patient zero” in the variant analysis game.
A simple QL query
=================
A simple CodeQL query
=====================
.. literalinclude:: ../query-examples/java/empty-if-java.ql
:language: ql
@@ -72,9 +68,9 @@ A simple QL query
We are going to write a simple query which finds “if statements” with empty “then” blocks, so we can highlight the results like those on the previous slide. The query can be run in the `query console on LGTM <https://lgtm.com/query>`__, or in your `IDE <https://lgtm.com/help/lgtm/running-queries-ide>`__.
A `QL query <https://help.semmle.com/QL/ql-handbook/queries.html>`__ consists of a “select” clause that indicates what results should be returned. Typically it will also provide a “from” clause to declare some variables, and a “where” clause to state conditions over those variables. For more information on the structure of query files (including links to useful topics in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__), see `Introduction to query files <https://help.semmle.com/QL/learn-ql/ql/writing-queries/introduction-to-queries.html>`__.
A `query <https://help.semmle.com/QL/ql-handbook/queries.html>`__ consists of a “select” clause that indicates what results should be returned. Typically it will also provide a “from” clause to declare some variables, and a “where” clause to state conditions over those variables. For more information on the structure of query files (including links to useful topics in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__), see `Introduction to query files <https://help.semmle.com/QL/learn-ql/ql/writing-queries/introduction-to-queries.html>`__.
In our example here, the first line of the query imports the `Java standard QL library <https://help.semmle.com/qldoc/java/>`__, which defines concepts like ``IfStmt`` and ``Block``.
In our example here, the first line of the query imports the `CodeQL library for Java <https://help.semmle.com/qldoc/java/>`__, which defines concepts like ``IfStmt`` and ``Block``.
The query proper starts by declaring two variablesifStmt and block. These variables represent sets of values in the database, according to the type of each of the variables. For example, ``ifStmt`` has the type ``IfStmt``, which means it represents the set of all if statements in the program.
If we simply selected these two variables::
@@ -96,8 +92,8 @@ A simple QL query
Finally, we select a location, at which to report the problem, and a message, to explain what the problem is.
Structure of a QL query
=======================
Structure of a query
=====================
A **query file** has the extension ``.ql`` and contains a **query clause**, and optionally **predicates**, **classes**, and **modules**.
@@ -109,14 +105,14 @@ Each query library also implicitly defines a module.
.. note::
QL queries are always contained in query files with the file extension ``.ql``. `Quick queries <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/quick-query.html>`__, run in `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/home-page.html>`__, are no exception: the quick query window maintains a temporary QL file in the background.
Queries are always contained in query files with the file extension ``.ql``. `Quick queries <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/quick-query.html>`__, run in `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/home-page.html>`__, are no exception: the quick query window maintains a temporary QL file in the background.
Parts of queries can be lifted into `QL library files <https://help.semmle.com/QL/ql-handbook/modules.html#library-modules>`__ with the extension ``.qll``. Definitions within such libraries can be brought into scope using “import” statements, and similarly QLL files can import each others definitions using “import” statements.
Parts of queries can be lifted into `library files <https://help.semmle.com/QL/ql-handbook/modules.html#library-modules>`__ with the extension ``.qll``. Definitions within such libraries can be brought into scope using “import” statements, and similarly QLL files can import each others definitions using “import” statements.
Logic can be encapsulated as user-defined `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ and `classes <https://help.semmle.com/QL/ql-handbook/types.html#classes>`__, and organized into `modules <https://help.semmle.com/QL/ql-handbook/modules.html>`__. Each QLL file implicitly defines a module, but QL and QLL files can also contain explicit module definitions, as we will see later.
Predicates in QL
================
Predicates
==========
A predicate allows you to pull out and name parts of a query.
@@ -134,7 +130,7 @@ A predicate allows you to pull out and name parts of a query.
.. note::
A `QL predicate <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ takes zero or more parameters, and its body is a condition on those parameters. The predicate may (or may not) hold. Predicates may also be `recursive <https://help.semmle.com/QL/ql-handbook/predicates.html#recursive-predicates>`__, simply by referring to themselves (directly or indirectly).
A `predicate <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ takes zero or more parameters, and its body is a condition on those parameters. The predicate may (or may not) hold. Predicates may also be `recursive <https://help.semmle.com/QL/ql-handbook/predicates.html#recursive-predicates>`__, simply by referring to themselves (directly or indirectly).
You can imagine a predicate to be a self-contained from-where-select statement, that produces an intermediate relation, or table. In this case, the ``isEmpty`` predicate will be the set of all blocks which are empty.
@@ -197,7 +193,7 @@ Iterative query refinement
.. note::
QL makes it very easy to experiment with analysis ideas. A common workflow is to start with a simple query (like our “redundant if-statement” example), examine a few results, refine the query based on any patterns that emerge and repeat.
CodeQL makes it very easy to experiment with analysis ideas. A common workflow is to start with a simple query (like our “redundant if-statement” example), examine a few results, refine the query based on any patterns that emerge and repeat.
As an exercise, refine the redundant-if query based on the observation that if the if-statement has an “else” clause, then even if the body of the “then” clause is empty, its not actually redundant.

View File

@@ -2,11 +2,7 @@
Program representation
======================
QL for Java
.. container:: semmle-logo
Semmle :sup:`TM`
CodeQL for Java
.. rst-class:: agenda
@@ -16,7 +12,7 @@ Agenda
- Abstract syntax trees
- Database representation
- Program elements
- AST classes
- AST CodeQL classes
.. insert abstract-syntax-tree.rst
@@ -27,7 +23,7 @@ Agenda
Program elements
================
- The QL class ``Element`` represents program elements with a name.
- The CodeQL class ``Element`` represents program elements with a name.
- This includes: packages (``Package``), compilation units (``CompilationUnit``), types (``Type``), methods (``Method``), constructors (``Constructor``), and variables (``Variable``).
- It is often convenient to refer to an element that might either be a method or a constructor; the class ``Callable``, which is a common superclass of ``Method`` and ``Constructor``, can be used for this purpose.
@@ -35,7 +31,7 @@ Program elements
AST
===
There are two primary AST classes, used within ``Callables``:
There are two primary AST CodeQL classes, used within ``Callables``:
- ``Expr``: expressions such as assignments, variable references, function calls, ...
- ``Stmt``: statements such as conditionals, loops, try statements, ...
@@ -51,7 +47,7 @@ Types
The database also includes information about the types used in a program:
- ``PrimitiveType`` represents a `primitive type <http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html>`__, that is, one of ``boolean``, ``byte``, ``char``, ``double``, ``float``, ``int``, ``long``, ``short``. QL also classifies ``void`` and ``<nulltype>`` (the type of the ``null`` literal) as primitive types.
- ``PrimitiveType`` represents a `primitive type <http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html>`__, that is, one of ``boolean``, ``byte``, ``char``, ``double``, ``float``, ``int``, ``long``, ``short``. CodeQL also classifies ``void`` and ``<nulltype>`` (the type of the ``null`` literal) as primitive types.
- ``RefType`` represents a reference type; it has several subclasses:
- ``Class`` represents a Java class.
@@ -78,9 +74,9 @@ Working with variables
Working with callables
======================
Callables are represented by the ``Callable`` QL class.
Callables are represented by the ``Callable`` CodeQL class.
Calls to callables are modeled by the QL class ``Call`` and its subclasses:
Calls to callables are modeled by the CodeQL class ``Call`` and its subclasses:
- ``Call.getCallee()`` gets the declared target of the call
- ``Call.getAReference()`` gets a call to this function

View File

@@ -2,11 +2,7 @@
Example: Query injection
========================
QL for Java
.. container:: semmle-logo
Semmle :sup:`TM`
CodeQL for Java
.. rst-class:: setup
@@ -16,7 +12,7 @@ Setup
For this example you should download:
- `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__
- `VIVO Vitro snapshot <http://downloads.lgtm.com/snapshots/java/vivo-project/Vitro/vivo-project_Vitro_java-srcVersion_47ae42c01954432c3c3b92d5d163551ce367f510-dist_odasa-lgtm-2019-04-23-7ceff95-linux64.zip>`__
- `VIVO Vitro database <http://downloads.lgtm.com/snapshots/java/vivo-project/Vitro/vivo-project_Vitro_java-srcVersion_47ae42c01954432c3c3b92d5d163551ce367f510-dist_odasa-lgtm-2019-04-23-7ceff95-linux64.zip>`__
.. note::
@@ -24,9 +20,9 @@ For this example you should download:
You can also query the project in `the query console <https://lgtm.com/query/project:14040005/lang:java/>`__ on LGTM.com.
.. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console.
.. insert database-note.rst to explain differences between database available to download and the version available in the query console.
.. include:: ../slide-snippets/snapshot-note.rst
.. include:: ../slide-snippets/database-note.rst
.. resume slides
@@ -65,13 +61,13 @@ SPARQL injection
.. rst-class:: background2
Developing a QL query
======================
Developing a query
===================
Finding a query concatenation
QL query: find SPARQL methods
=============================
CodeQL query: find SPARQL methods
=================================
Lets start by looking for calls to methods with names of the form ``sparql*Query``, using the classes ``Method`` and ``MethodAccess`` from the Java library.
@@ -81,17 +77,17 @@ Lets start by looking for calls to methods with names of the form ``sparql*Qu
.. note::
- When performing `variant analysis <https://semmle.com/ variant-analysis>`__, it is usually helpful to write a simple query that finds the simple syntactic pattern, before trying to go on to describe the cases where it goes wrong.
- In this case, we start by looking for all the method calls which appear to run, before trying to refine the query to find cases which are vulnerable to query injection.
- When performing `variant analysis <https://semmle.com/variant-analysis>`__, it is usually helpful to write a simple query that finds the simple syntactic pattern, before trying to go on to describe the cases where it goes wrong.
- In this case, we start by looking for all the method calls that appear to run, before trying to refine the query to find cases which are vulnerable to query injection.
- The ``select`` clause defines what this query is looking for:
- a ``MethodAccess``: the call to a SPARQL query method
- a ``Method``: the SPARQL query method.
- The ``where`` part of the query ties these three QL variables together using `predicates <https://help.semmle.com/QL/ ql-handbook/predicates.html>`__ defined in the `standard QL for Java library <https://help.semmle.com/qldoc/java/>`__.
- The ``where`` part of the query ties these variables together using `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ defined in the `standard CodeQL library for Java <https://help.semmle.com/qldoc/java/>`__.
QL query: find string concatenation
===================================
CodeQL query: find string concatenation
=======================================
- We now need to define what would make these API calls unsafe.
- A simple heuristic would be to look for string concatenation used in the query argument.
@@ -113,8 +109,8 @@ Looking at autocomplete suggestions, we see that we can get the type of an expre
- We therefore write a helper predicate for finding string concatenation.
- This predicate effectively represents the set of all ``add`` expressions in the database where the type of the expression is ``TypeString`` - that is, the addition produces a ``String`` value.
QL query: SPARQL injection
==========================
CodeQL query: SPARQL injection
==============================
We can now combine our predicate with the existing query.
Note that we do not need to specify that the argument of the method access is an ``AddExpr`` - this is implied by the ``isStringConcat`` requirement.

View File

@@ -35,9 +35,9 @@ The basic representation of an analyzed program is an *abstract syntax tree (AST
.. note::
When writing queries in QL it is important to have in mind the underlying representation of the program which is stored in the database. Typically queries make use of the “AST” representation of the program - a tree structure where program elements are nested within other program elements.
When writing queries it is important to have in mind the underlying representation of the program which is stored in the database. Typically queries make use of the “AST” representation of the program - a tree structure where program elements are nested within other program elements.
The following topics contain overviews of the important AST classes and QL libraries for C/C++, C#, and Java:
The following topics contain overviews of the important AST classes and CodeQL libraries for C/C++, C#, and Java:
- `Introducing the C/C++ libraries <https://help.semmle.com/QL/learn-ql/cpp/introduce-libraries-cpp.html>`__
- `Introducing the C# libraries <https://help.semmle.com/QL/learn-ql/csharp/introduce-libraries-csharp.html>`__
@@ -47,23 +47,23 @@ The basic representation of an analyzed program is an *abstract syntax tree (AST
Database representations of ASTs
================================
AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaque - all one can do with them is to check their equality.
AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaque---all one can do with them is to check their equality.
Each entity belongs to an entity type. Entity types have names starting with “@” and are defined in the database schema (not in QL).
Properties of AST nodes and their relationships to each other are encoded by database relations, which are predicates defined in the database (not in QL).
Entity types are rarely used directly, the usual pattern is to define a QL class that extends the type and exposes properties of its entities through member predicates.
Entity types are rarely used directly, the usual pattern is to define a class that extends the type and exposes properties of its entities through member predicates.
.. note::
ASTs are a typical example of the kind of data representation one finds in object-oriented programming, with data-carrying nodes that reference each other. At first glance, QL, which can only work with atomic values, does not seem to be well suited for working with this kind of data. However, ultimately all that we require of the nodes in an AST is that they have an identity. The relationships among nodes, usually implemented by reference-valued object fields in other languages, can just as well (and arguably more naturally) be represented as relations over nodes. Attaching data (such as strings or numbers) to nodes can also be represented with relations over nodes and primitive values. All we need is a way for relations to reference nodes. This is achieved in QL (as in other database languages) by means of *entity values* (or entities, for short), which are opaque atomic values, implemented as integers under the hood.
It is the job of the extractor to create entity values for all AST nodes and populate database relations that encode the relationship between AST nodes and any values associated with them. These relations are *extensional*, that is, explicitly stored in the database, unlike the relations described by QL predicates, which we also refer to as *intensional* relations. Entity values belong to *entity types*, whose name starts with “@” to set them apart from primitive types and classes.
It is the job of the extractor to create entity values for all AST nodes and populate database relations that encode the relationship between AST nodes and any values associated with them. These relations are *extensional*, that is, explicitly stored in the database, unlike the relations described by predicates, which we also refer to as *intensional* relations. Entity values belong to *entity types*, whose name starts with “@” to set them apart from primitive types and classes.
The interface between entity types and extensional relations on the one hand and QL predicates and classes on the other hand is provided by the *database schema*, which defines the available entity types and the schema of each extensional relation, that is, how many columns the relation has, and which entity type or primitive type the values in each column come from. QL programs can refer to entity types and extensional relations just as they would refer to QL classes and predicates, with the restriction that entity types cannot be directly selected in a ``select`` clause, since they do not have a well-defined string representation.
For example, the database schemas for C/++, C#, and Java snapshot databases are here:
For example, the database schemas for C/++, C#, and Java CodeQL databases are here:
- https://github.com/Semmle/ql/blob/master/cpp/ql/src/semmlecode.cpp.dbscheme
- https://github.com/Semmle/ql/blob/master/csharp/ql/src/semmlecode.csharp.dbscheme

View File

@@ -1 +1 @@
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been addedthe snapshot available to download above is based on an historical version of the codebase.
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been addedthe CodeQL database available to download above is based on an historical version of the codebase.

View File

@@ -17,15 +17,15 @@ Global data flow and taint tracking
- Recap:
- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a CodeQL database
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a CodeQL database
- For global data flow (and taint tracking), we must therefore provide restrictions to ensure the problem is tractable.
- Typically, this involves specifying the *source* and *sink*.
.. note::
As we mentioned in the previous slide deck, while local data flow is feasible to compute for all functions in a snapshot, global data flow is not. This is because the number of paths becomes exponentially larger for global data flow.
As we mentioned in the previous slide deck, while local data flow is feasible to compute for all functions in a CodeQL database, global data flow is not. This is because the number of paths becomes exponentially larger for global data flow.
The global data flow (and taint tracking) avoids this problem by requiring that the query author specifies which ``sources`` and ``sinks`` are applicable. This allows the implementation to compute paths between the restricted set of nodes, rather than the full graph.

View File

@@ -1,33 +0,0 @@
Information
===========
To try the examples in this presentation we recommend you download `QL for Eclipse <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/install-plugin-free.html>`__.
**QL language resources**
- If you are new to QL, try the QL language tutorials at `Learning CodeQL <https://help.semmle.com/QL/learn-ql/>`__.
- To learn more about the main features of QL, try looking at the `QL language handbook <https://help.semmle.com/QL/ql-handbook/>`__.
- For further information about writing queries in QL, see `Writing QL queries <https://help.semmle.com/QL/learn-ql/writing-queries/writing-queries.html>`__.
**QL queries**
The QL queries included in the latest Semmle release are open source. View them in the `semmle/ql repository <https://github.com/semmle/ql>`__.
**Extra information**
.. |arrow-l| unicode:: U+2190
.. |arrow-r| unicode:: U+2192
- Press |arrow-l| and |arrow-r| to navigate between slides
- Pressing **p** toggles between the slide and any extra notes (where they're available)
- Pressing **f** toggles full screen viewing on/off
.. note::
To run the queries featured in this training presentation, we recommend you download the free-to-use `QL for Eclipse plugin <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/getting-started.html>`__.
This plugin allows you to locally access the latest features of QL, including the standard QL libraries and queries. It also provides standard IDE features such as syntax highlighting, jump-to-definition, and tab completion.
When you have setup QL for Eclipse we recommend increasing the “Memory for running queries” from the default setting of 4096MB to 8192MB, to ensure that all the queries complete quickly.

View File

@@ -67,9 +67,9 @@ Complete text of the analysis (nothing left out!):
.. note::
Once the mission critical bug was discovered on Curiosity, JPL contacted Semmle for help discovering whether variants of the problem might exist elsewhere in the Curiosity control software. In 20 minutes, research engineers from Semmle produced a QL query and shared it with the JPL team. It finds all functions that are passed an array as an argument whose size is smaller than expected.
Once the mission critical bug was discovered on Curiosity, JPL contacted Semmle for help discovering whether variants of the problem might exist elsewhere in the Curiosity control software. In 20 minutes, research engineers from Semmle produced a CodeQL query and shared it with the JPL team. It finds all functions that are passed an array as an argument whose size is smaller than expected.
(The goal here is not to fully understand the QL, but to illustrate the power of the language and its standard libraries.)
(The goal here is not to fully understand the query, but to illustrate the power of the language and its standard libraries.)
Find all instances!
@@ -105,33 +105,33 @@ Analysis overview
Semmles analysis works by extracting a queryable database from your project. For compiled languages, Semmles tools observe an ordinary build of the source code. Each time a compiler is invoked to process a source file, a copy of that file is made, and all relevant information about the source code (syntactic data about the abstract syntax tree, semantic data like name binding and type information, data on the operation of the C preprocessor, etc.) is collected. For interpreted languages, the extractor gathers similar information by running directly on the source code. Multi-language code bases are analyzed one language at a time.
Once the extraction finishes, all this information is collected into a single `snapshot database <https://help.semmle.com/QL/learn-ql/ql/snapshot.html>`__, which is then ready to query, possibly on a different machine. A copy of the source files, made at the time the database was created, is also included in the snapshot so analysis results can be displayed at the correct location in the code. The database schema is (source) language specific.
Once the extraction finishes, all this information is collected into a single `CodeQL database <https://help.semmle.com/QL/learn-ql/database.html>`__, which is then ready to query, possibly on a different machine. A copy of the source files, made at the time the database was created, is also included in the CodeQL database so analysis results can be displayed at the correct location in the code. The database schema is (source) language specific.
Queries are written in `QL <https://semmle.com/ql>`__ and usually depend on one or more of the `standard QL libraries <https://github.com/semmle/ql>`__ (and of course you can write your own custom libraries). They are compiled into an efficiently executable format by the QL compiler and then run on a snapshot database by the QL evaluator, either on a remote worker machine or locally on a developers machine.
Queries are written in `QL <https://semmle.com/ql>`__ and usually depend on one or more of the `standard CodeQL libraries <https://github.com/semmle/ql>`__ (and of course you can write your own custom libraries). They are compiled into an efficiently executable format by the QL compiler and then run on a CodeQL database by the QL evaluator, either on a remote worker machine or locally on a developers machine.
Query results can be interpreted and presented in a variety of ways, including displaying them in an `IDE plugin <https://lgtm.com/help/lgtm/running-queries-ide>`__ such as QL for Eclipse, or in a web dashboard as on `LGTM <https://lgtm.com/help/lgtm/about-lgtm>`__.
Introducing QL
==============
QL is the query language running all Semmle analysis.
QL is the query language running all CodeQL analysis.
QL is:
- a **logic** language based on first-order logic
- a **declarative** language without side effects
- an **object-oriented** language
- a **query** language working on a read-only snapshot database
- a **query** language working on a read-only CodeQL database
- equipped with rich standard libraries **for program analysis**
.. note::
QL is the high-level, object-oriented logic language that underpins all of Semmles libraries and analyses. You can learn lots more about QL by visiting `Introduction to the QL language <https://help.semmle.com/QL/learn-ql/ql/introduction-to-ql.html>`__ and `About QL <https://help.semmle.com/QL/learn-ql/ql/about-ql.html>`__.
QL is the high-level, object-oriented logic language that underpins all CodeQL libraries and analyses. You can learn lots more about QL by visiting `Introduction to the QL language <https://help.semmle.com/QL/learn-ql/ql/introduction-to-ql.html>`__ and `About QL <https://help.semmle.com/QL/learn-ql/ql/about-ql.html>`__.
The key features of QL are:
- All common logic connectives are available, including quantifiers like ``exist``, which can also introduce new variables.
- The language is declarativethe user focuses on stating what they would like to find, and leaves the details of how to evaluate the query to the engine.
- The object-oriented layer allows Semmle to distribute rich standard libraries for program analysis. These model the common AST node types, control flow and name lookup, and define further layers on topfor example control flow or data flow analysis. The `standard QL libraries and queries <https://github.com/semmle/ql>`__ ship as source and can be inspected by the user, and new abstractions are readily defined.
- The object-oriented layer allows Semmle to distribute rich standard libraries for program analysis. These model the common AST node types, control flow and name lookup, and define further layers on topfor example control flow or data flow analysis. The `standard CodeQL libraries and queries <https://github.com/semmle/ql>`__ ship as source and can be inspected by the user, and new abstractions are readily defined.
- The database generated by Semmles tools is treated as read-only; queries cannot insert new data into it, though they can inspect its contents in various ways.
You can start writing QL and running QL queries on open source projects in the `query console <https://lgtm.com/query>`__ on LGTM.com. You can also download snapshots from LGTM.com to query locally, by `running queries in your IDE <https://lgtm.com/help/lgtm/running-queries-ide>`__ equipped with a QL plugin or extension.
You can start writing running queries on open source projects in the `query console <https://lgtm.com/query>`__ on LGTM.com. You can also download CodeQL databases from LGTM.com to query locally, by `running queries in your IDE <https://lgtm.com/help/lgtm/running-queries-ide>`__.

View File

@@ -61,8 +61,8 @@ Data flow graphs
Local vs global data flow
=========================
- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a CodeQL database
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a CodeQL database
- Different APIs, so discussed separately
- This slide deck focuses on the former
@@ -70,7 +70,7 @@ Local vs global data flow
For further information, see:
- `Introduction to data flow analysis in QL <https://help.semmle.com/QL/learn-ql/ql/intro-to-data-flow.html>`__
- `Introduction to data flow analysis with CodeQL <https://help.semmle.com/QL/learn-ql/ql/intro-to-data-flow.html>`__
.. rst-class:: background2

View File

@@ -27,10 +27,6 @@ Template slide deck
Second subheading
.. container:: semmle-logo
Semmle :sup:`TM`
.. Set up slide. Include link to QL4E snapshots required for examples
.. rst-class:: setup