From 4e4388d6880a7d21f195df7c032dcf4b68a64676 Mon Sep 17 00:00:00 2001 From: james Date: Tue, 3 Sep 2019 11:16:45 +0100 Subject: [PATCH] docs: address review comments (cherry picked from commit 8c88cbba3ab53571a32fe8fb5c9284dac45aff51) --- .../static/theme/css/default.css | 6 ++ .../cpp/bad-overflow-guard.rst | 6 +- .../ql-training-rst/cpp/control-flow-cpp.rst | 6 +- .../ql-training-rst/cpp/data-flow-cpp.rst | 8 +- .../cpp/global-data-flow-cpp.rst | 6 +- .../ql-training-rst/cpp/intro-ql-cpp.rst | 6 +- .../cpp/program-representation-cpp.rst | 71 +----------------- .../language/ql-training-rst/cpp/snprintf.rst | 6 +- .../java/apache-struts-java.rst | 11 ++- .../ql-training-rst/java/data-flow-java.rst | 12 ++- .../java/global-data-flow-java.rst | 10 ++- .../ql-training-rst/java/intro-ql-java.rst | 6 +- .../java/program-representation-java.rst | 75 +------------------ .../java/query-injection-java.rst | 18 +++-- .../slide-snippets/abstract-syntax-tree.rst | 70 +++++++++++++++++ .../global-data-flow-extra-slides.rst | 2 +- .../slide-snippets/local-data-flow.rst | 2 +- .../slide-snippets/path-queries.rst | 2 +- .../slide-snippets/snapshot-note.rst | 1 + docs/language/ql-training-rst/template.rst | 5 +- 20 files changed, 160 insertions(+), 169 deletions(-) create mode 100644 docs/language/ql-training-rst/slide-snippets/abstract-syntax-tree.rst create mode 100644 docs/language/ql-training-rst/slide-snippets/snapshot-note.rst diff --git a/docs/language/ql-training-rst/_static-training/slides-semmle-2/static/theme/css/default.css b/docs/language/ql-training-rst/_static-training/slides-semmle-2/static/theme/css/default.css index 6203915d417..9d34a78ebb9 100644 --- a/docs/language/ql-training-rst/_static-training/slides-semmle-2/static/theme/css/default.css +++ b/docs/language/ql-training-rst/_static-training/slides-semmle-2/static/theme/css/default.css @@ -1009,6 +1009,12 @@ article.smaller q:before, article.smaller q:after { background-image: -webkit-radial-gradient(50% 50%, #b1dfff 0%, #4387fd 600px); background-image: radial-gradient(50% 50%, #b1dfff 0%, #4387fd 600px); } + +/* the popup class is used to display the speaker notes when 'presenter' view + is enabled. This view is not currently optimal, so certain selectors have been commented-out, + with a view to improving the styles at a later date */ + + /* line 684, ../scss/default.scss */ /*.with-notes.popup slide.next { -moz-transform: translate3d(570px, 80px, 0) scale(0.35); diff --git a/docs/language/ql-training-rst/cpp/bad-overflow-guard.rst b/docs/language/ql-training-rst/cpp/bad-overflow-guard.rst index 498b8945ece..32b0ed9816b 100644 --- a/docs/language/ql-training-rst/cpp/bad-overflow-guard.rst +++ b/docs/language/ql-training-rst/cpp/bad-overflow-guard.rst @@ -24,7 +24,11 @@ For this example you should download: You can query the project in `the query console `__ on LGTM.com. - Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base. + .. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console. + + .. include:: ../slide-snippets/snapshot-note.rst + + .. resume slides Checking for overflow in C ========================== diff --git a/docs/language/ql-training-rst/cpp/control-flow-cpp.rst b/docs/language/ql-training-rst/cpp/control-flow-cpp.rst index 00a2c78458b..ea8e2c1b158 100644 --- a/docs/language/ql-training-rst/cpp/control-flow-cpp.rst +++ b/docs/language/ql-training-rst/cpp/control-flow-cpp.rst @@ -26,7 +26,11 @@ For this example you should download: You can query the project in `the query console `__ on LGTM.com. - Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base. + .. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console. + + .. include:: ../slide-snippets/snapshot-note.rst + + .. resume slides .. rst-class:: agenda diff --git a/docs/language/ql-training-rst/cpp/data-flow-cpp.rst b/docs/language/ql-training-rst/cpp/data-flow-cpp.rst index 969f791bcb5..36b5eb6b525 100644 --- a/docs/language/ql-training-rst/cpp/data-flow-cpp.rst +++ b/docs/language/ql-training-rst/cpp/data-flow-cpp.rst @@ -24,7 +24,11 @@ For this example you should download: You can query the project in `the query console `__ on LGTM.com. - Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base. + .. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console. + + .. include:: ../slide-snippets/snapshot-note.rst + + .. resume slides .. rst-class:: agenda @@ -112,7 +116,7 @@ We need something better. Here, ``DMLOut`` and ``ExtOut`` are macros that expand to formatting calls. The format specifier is not constant, in the sense that the format argument is not a string literal. However, it is clearly one of two possible constants, both with the same number of format specifiers. - What we need is a way to determine whether the format argument is ever set to something that is, not constant. + What we need is a way to determine whether the format argument is ever set to something that is not constant. .. include general data flow slides diff --git a/docs/language/ql-training-rst/cpp/global-data-flow-cpp.rst b/docs/language/ql-training-rst/cpp/global-data-flow-cpp.rst index ac01498e339..6033581ffc3 100644 --- a/docs/language/ql-training-rst/cpp/global-data-flow-cpp.rst +++ b/docs/language/ql-training-rst/cpp/global-data-flow-cpp.rst @@ -24,7 +24,11 @@ For this example you should download: You can query the project in `the query console `__ on LGTM.com. - Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base. + .. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console. + + .. include:: ../slide-snippets/snapshot-note.rst + + .. resume slides .. rst-class:: agenda diff --git a/docs/language/ql-training-rst/cpp/intro-ql-cpp.rst b/docs/language/ql-training-rst/cpp/intro-ql-cpp.rst index 4062625eefd..82eb62a3ba8 100644 --- a/docs/language/ql-training-rst/cpp/intro-ql-cpp.rst +++ b/docs/language/ql-training-rst/cpp/intro-ql-cpp.rst @@ -24,7 +24,11 @@ For this example you should download: You can also query the project in `the query console `__ on LGTM.com. - Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base. + .. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console. + + .. include:: ../slide-snippets/snapshot-note.rst + + .. resume slides .. Include language-agnostic section here diff --git a/docs/language/ql-training-rst/cpp/program-representation-cpp.rst b/docs/language/ql-training-rst/cpp/program-representation-cpp.rst index 95ea3bbc4df..1850e3e5671 100644 --- a/docs/language/ql-training-rst/cpp/program-representation-cpp.rst +++ b/docs/language/ql-training-rst/cpp/program-representation-cpp.rst @@ -19,68 +19,11 @@ Agenda - Variables - Functions -Abstract syntax trees -===================== +.. insert abstract-syntax-tree.rst -The basic representation of an analyzed program is an *abstract syntax tree (AST)*. +.. include:: ../slide-snippets/abstract-syntax-tree.rst -.. container:: column-left - - .. code-block:: cpp - - try { - ... - } catch (AnException e) { - } - -.. container:: ast-graph - - .. graphviz:: - - digraph { - graph [ dpi = 1000 ] - node [shape=polygon,sides=4,color=blue4,style="filled,rounded", fontname=consolas,fontcolor=white] - a [label=] - b [label=] - c [label=<...>,color=white,fontcolor=black] - d [label=] - e [label=<...>,color=white,fontcolor=black] - f [label=<...>,color=white,fontcolor=black] - g [label=<...>,color=white,fontcolor=black] - - a -> {b, c} - b -> {d, e} - d -> {f, g} - } - - - -.. note:: - - When writing queries in QL it is important to have in mind the underlying representation of the program which is stored in the database. Typically queries make use of the “AST” representation of the program–a tree structure where program elements are nested within other program elements. - - The “Introducing the C/C++ libraries” help topic contains a more complete overview of important AST classes and the rest of the C++ QL libraries: https://help.semmle.com/QL/learn-ql/ql/cpp/introduce-libraries-cpp.html - -Database representations of ASTs -================================ - -AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaque–all one can do with them is to check their equality. - -Each entity belongs to an entity type. Entity types have names starting with “@” and are defined in the database schema (not in QL). - -Properties of AST nodes and their relationships to each other are encoded by database relations, which are predicates defined in the database (not in QL). - -Entity types are rarely used directly, the usual pattern is to define a QL class that extends the type and exposes properties of its entities through member predicates. - -.. note:: - - ASTs are a typical example of the kind of data representation one finds in object-oriented programming, with data-carrying nodes that reference each other. At first glance, QL, which can only work with atomic values, does not seem to be well suited for working with this kind of data. However, ultimately all that we require of the nodes in an AST is that they have an identity. The relationships among nodes, usually implemented by reference-valued object fields in other languages, can just as well (and arguably more naturally) be represented as relations over nodes. Attaching data (such as strings or numbers) to nodes can also be represented with relations over nodes and primitive values. All we need is a way for relations to reference nodes. This is achieved in QL (as in other database languages) by means of *entity values* (or entities, for short), which are opaque atomic values, implemented as integers under the hood. - - It is the job of the extractor to create entity values for all AST nodes and populate database relations that encode the relationship between AST nodes and any values associated with them. These relations are *extensional*, that is, explicitly stored in the database, unlike the relations described by QL predicates, which we also refer to as *intensional* relations. Entity values belong to *entity types*, whose name starts with “@” to set them apart from primitive types and classes. - - The interface between entity types and extensional relations on the one hand and QL predicates and classes on the other hand is provided by the *database schema*, which defines the available entity types and the schema of each extensional relation, that is, how many columns the relation has, and which entity type or primitive type the values in each column come from. QL programs can refer to entity types and extensional relations just as they would refer to QL classes and predicates, with the restriction that entity types cannot be directly selected in a “select” clause, since they do not have a well-defined string representation. - - For example, the database schema for C++ snapshot databases is here: https://github.com/Semmle/ql/blob/master/cpp/ql/src/semmlecode.cpp.dbscheme +.. resume slides AST QL classes ============== @@ -93,10 +36,6 @@ Important AST classes include: These three (and all other AST classes) are subclasses of ``Element``. -.. note:: - - The “Introducing the C/C++ libraries” help topic contains a more complete overview of important AST classes and the rest of the C++ QL libraries: https://help.semmle.com/QL/learn-ql/ql/cpp/introduce-libraries-cpp.html - Symbol table ============ @@ -108,10 +47,6 @@ The database also includes information about the symbol table associated with a - ``Type``: built-in and user-defined types -.. note:: - - The “Introducing the C/C++ libraries” help topic contains a more complete overview of important symbol table classes and the rest of the C++ QL libraries: https://help.semmle.com/QL/learn-ql/ql/cpp/introduce-libraries-cpp.html - Working with variables ====================== diff --git a/docs/language/ql-training-rst/cpp/snprintf.rst b/docs/language/ql-training-rst/cpp/snprintf.rst index c518423f4c9..77e46933fcb 100644 --- a/docs/language/ql-training-rst/cpp/snprintf.rst +++ b/docs/language/ql-training-rst/cpp/snprintf.rst @@ -24,7 +24,11 @@ For this example you should download: You can also query the project in `the query console `__ on LGTM.com. - Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base. + .. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console. + + .. include:: ../slide-snippets/snapshot-note.rst + + .. resume slides ``snprintf`` ============ diff --git a/docs/language/ql-training-rst/java/apache-struts-java.rst b/docs/language/ql-training-rst/java/apache-struts-java.rst index 90d6050d738..e873f37c12f 100644 --- a/docs/language/ql-training-rst/java/apache-struts-java.rst +++ b/docs/language/ql-training-rst/java/apache-struts-java.rst @@ -28,19 +28,23 @@ For this example you should download: You can also query the project in `the query console `__ on LGTM.com. - Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base. + .. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console. + + .. include:: ../slide-snippets/snapshot-note.rst + + .. resume slides Unsafe deserialization in Struts ================================ -Apache Struts provides a ContentTypeHandler interface, which can be implemented for specific content types. It defines the following interface method: +Apache Struts provides a ``ContentTypeHandler`` interface, which can be implemented for specific content types. It defines the following interface method: .. code-block:: java void toObject(Reader in, Object target); -which is intended to populate the “target” object with data from the reader, usually through deserialization. However, the in parameter should be considered untrusted, and should not be deserialized without sanitization. +which is intended to populate the ``target`` object with data from the reader, usually through deserialization. However, the ``in`` parameter should be considered untrusted, and should not be deserialized without sanitization. RCE in Apache Struts ==================== @@ -85,6 +89,7 @@ Model answer, step 1 import java /** The interface `org.apache.struts2.rest.handler.ContentTypeHandler`. */ + class ContentTypeHandler extends RefType { ContentTypeHandler() { this.hasQualifiedName("org.apache.struts2.rest.handler", "ContentTypeHandler") diff --git a/docs/language/ql-training-rst/java/data-flow-java.rst b/docs/language/ql-training-rst/java/data-flow-java.rst index 17077d88b59..be9ba98456e 100644 --- a/docs/language/ql-training-rst/java/data-flow-java.rst +++ b/docs/language/ql-training-rst/java/data-flow-java.rst @@ -24,7 +24,11 @@ For this example you should download: You can also query the project in `the query console `__ on LGTM.com. - Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base. + .. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console. + + .. include:: ../slide-snippets/snapshot-note.rst + + .. resume slides .. rst-class:: agenda @@ -54,7 +58,7 @@ Motivation If you have completed the “Example: Query injection” slide deck which was part of the previous course, this example will look familiar to you. - To understand the scope of this vulnerability, consider what would happen if a malicious user could provide the following as the content of the individualURI variable: + To understand the scope of this vulnerability, consider what would happen if a malicious user could provide the following as the content of the ``individualURI`` variable: ``“http://vivoweb.org/ontology/core#FacultyMember> ?p ?o . FILTER regex("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!", "(.*a){50}") } #`` @@ -62,7 +66,7 @@ Motivation Example: SPARQL injection ========================= -We can write a simple query that finds string concatenations that occur in calls SPARQL query APIs. +We can write a simple query that finds string concatenations that occur in calls to SPARQL query APIs. .. rst-class:: build @@ -80,7 +84,7 @@ Query finds a CVE reported by Semmle (CVE-2019-6986), plus one other result, but - String concatenation occurs on a different line in the same method. - String concatenation occurs in a different method. - - String concatenation occurs through StringBuilders or similar. + - String concatenation occurs through ``StringBuilders`` or similar. - Entirety of user input is provided as the query. We want to improve our query to catch more of these cases. diff --git a/docs/language/ql-training-rst/java/global-data-flow-java.rst b/docs/language/ql-training-rst/java/global-data-flow-java.rst index 59bf7324d68..b105c082305 100644 --- a/docs/language/ql-training-rst/java/global-data-flow-java.rst +++ b/docs/language/ql-training-rst/java/global-data-flow-java.rst @@ -24,7 +24,11 @@ For this example you should download: You can also query the project in `the query console `__ on LGTM.com. - Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base. + .. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console. + + .. include:: ../slide-snippets/snapshot-note.rst + + .. resume slides .. rst-class:: agenda @@ -99,7 +103,7 @@ Exercise: Defining sinks Fill in the definition of ``isSink``. -**Hint**: We want to find the first argument of calls to the method compileAndExecute. +**Hint**: We want to find the first argument of calls to the method ``compileAndExecute``. .. code-block:: ql @@ -114,7 +118,7 @@ Fill in the definition of ``isSink``. .. note:: - The second part is to define what it means to be a sink for this particular problem. The queries from the previous slide deck will be useful for this exercise. + The second part is to define what it means to be a sink for this particular problem. The queries from an :doc:`Introduction to data flow ` will be useful for this exercise. Solution: Defining sinks ======================== diff --git a/docs/language/ql-training-rst/java/intro-ql-java.rst b/docs/language/ql-training-rst/java/intro-ql-java.rst index 68a403641d9..392c18309cb 100644 --- a/docs/language/ql-training-rst/java/intro-ql-java.rst +++ b/docs/language/ql-training-rst/java/intro-ql-java.rst @@ -24,7 +24,11 @@ For this example you should download: You can also query the project in `the query console `__ on LGTM.com. - Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base. + .. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console. + + .. include:: ../slide-snippets/snapshot-note.rst + + .. resume slides .. Include language-agnostic section here diff --git a/docs/language/ql-training-rst/java/program-representation-java.rst b/docs/language/ql-training-rst/java/program-representation-java.rst index 3c90b33bc88..d090c30aebe 100644 --- a/docs/language/ql-training-rst/java/program-representation-java.rst +++ b/docs/language/ql-training-rst/java/program-representation-java.rst @@ -18,69 +18,11 @@ Agenda - Program elements - AST classes -Abstract syntax trees -===================== +.. insert abstract-syntax-tree.rst -The basic representation of an analyzed program is an *abstract syntax tree (AST)*. - -.. container:: column-left - - .. code-block:: java - - try { - ... - } catch (AnException e) { - } - -.. container:: ast-graph - - .. graphviz:: - - digraph { - graph [ dpi = 1000 ] - node [shape=polygon,sides=4,color=blue4,style="filled,rounded", fontname=consolas,fontcolor=white] - a [label=] - b [label=] - c [label=<...>,color=white,fontcolor=black] - d [label=DeclExpr>] - e [label=<...>,color=white,fontcolor=black] - f [label=<...>,color=white,fontcolor=black] - g [label=<...>,color=white,fontcolor=black] - - a -> {b, c} - b -> {d, e} - d -> {f, g} - } - - - -.. note:: - - When writing queries in QL it is important to have in mind the underlying representation of the program which is stored in the database. Typically queries make use of the “AST” representation of the program - a tree structure where program elements are nested within other program elements. - - The “Introducing the Java libraries” help topic contains a more complete overview of important AST classes and the rest of the Java QL libraries: https://help.semmle.com/QL/learn-ql/ql/java/introduce-libraries-java.html - -Database representations of ASTs -================================ - -AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaque - all one can do with them is to check their equality. - -Each entity belongs to an entity type. Entity types have names starting with “@” and are defined in the database schema (not in QL). - -Properties of AST nodes and their relationships to each other are encoded by database relations, which are predicates defined in the database (not in QL). - -Entity types are rarely used directly, the usual pattern is to define a QL class that extends the type and exposes properties of its entities through member predicates. - -.. note:: - - ASTs are a typical example of the kind of data representation one finds in object-oriented programming, with data-carrying nodes that reference each other. At first glance, QL, which can only work with atomic values, does not seem to be well suited for working with this kind of data. However, ultimately all that we require of the nodes in an AST is that they have an identity. The relationships among nodes, usually implemented by reference-valued object fields in other languages, can just as well (and arguably more naturally) be represented as relations over nodes. Attaching data (such as strings or numbers) to nodes can also be represented with relations over nodes and primitive values. All we need is a way for relations to reference nodes. This is achieved in QL (as in other database languages) by means of *entity values* (or entities, for short), which are opaque atomic values, implemented as integers under the hood. - - It is the job of the extractor to create entity values for all AST nodes and populate database relations that encode the relationship between AST nodes and any values associated with them. These relations are *extensional*, that is, explicitly stored in the database, unlike the relations described by QL predicates, which we also refer to as *intensional* relations. Entity values belong to *entity types*, whose name starts with “@” to set them apart from primitive types and classes. - - The interface between entity types and extensional relations on the one hand and QL predicates and classes on the other hand is provided by the *database schema*, which defines the available entity types and the schema of each extensional relation, that is, how many columns the relation has, and which entity type or primitive type the values in each column come from. QL programs can refer to entity types and extensional relations just as they would refer to QL classes and predicates, with the restriction that entity types cannot be directly selected in a “select” clause, since they do not have a well-defined string representation. - - For example, the database schema for Java snapshot databases is here: https://github.com/Semmle/ql/blob/master/java/ql/src/config/semmlecode.dbscheme +.. include:: ../slide-snippets/abstract-syntax-tree.rst +.. resume slides Program elements ================ @@ -89,9 +31,6 @@ Program elements - This includes: packages (``Package``), compilation units (``CompilationUnit``), types (``Type``), methods (``Method``), constructors (``Constructor``), and variables (``Variable``). - It is often convenient to refer to an element that might either be a method or a constructor; the class ``Callable``, which is a common superclass of ``Method`` and ``Constructor``, can be used for this purpose. -.. note:: - - The “Introducing the Java libraries” help topic contains a more complete overview of important AST classes and the rest of the Java QL libraries: https://help.semmle.com/QL/learn-ql/ql/java/introduce-libraries-java.html AST === @@ -107,10 +46,6 @@ Operations are provided for exploring the AST: - ``Stmt.getAChild`` returns a statement or expression that is nested directly inside a given statement. - ``Expr.getParent`` and ``Stmt.getParent`` return the parent node of an AST node. -.. note:: - - The “Introducing the Java libraries” help topic contains a more complete overview of important AST classes and the rest of the Java QL libraries: https://help.semmle.com/QL/learn-ql/ql/java/introduce-libraries-java.html - Types ===== @@ -124,10 +59,6 @@ The database also includes information about the types used in a program: - ``EnumType`` represents a Java enum type. - ``Array`` represents a Java array type. -.. note:: - - The “Introducing the Java libraries” help topic contains a more complete overview of important AST classes and the rest of the Java QL libraries: https://help.semmle.com/QL/learn-ql/ql/java/introduce-libraries-java.html - Working with variables ====================== diff --git a/docs/language/ql-training-rst/java/query-injection-java.rst b/docs/language/ql-training-rst/java/query-injection-java.rst index ea536b64467..67f7fe21a76 100644 --- a/docs/language/ql-training-rst/java/query-injection-java.rst +++ b/docs/language/ql-training-rst/java/query-injection-java.rst @@ -23,8 +23,12 @@ For this example you should download: For this example, we will be analyzing `VIVO Vitro `__. You can also query the project in `the query console `__ on LGTM.com. + + .. insert snapshot-note.rst to explain differences between snapshot available to download and the version available in the query console. - Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base. + .. include:: ../slide-snippets/snapshot-note.rst + + .. resume slides SQL injection ============= @@ -89,11 +93,11 @@ Let’s start by looking for calls to methods with names of the form ``sparql*Qu QL query: find string concatenation =================================== -We now need to define what would make these API calls unsafe. +- We now need to define what would make these API calls unsafe. +- A simple heuristic would be to look for string concatenation used in the query argument. +- We may want to reuse this logic, so let us create a separate predicate. -A simple heuristic would be to look for string concatenation used in the query argument. We may want to reuse this logic, so let us create a separate predicate. - -Looking at autocomplete suggestions, we see that we can get the type of an expression using the getType() method. +Looking at autocomplete suggestions, we see that we can get the type of an expression using the ``getType()`` method. .. rst-class:: build @@ -106,8 +110,8 @@ Looking at autocomplete suggestions, we see that we can get the type of an expre .. note:: - An important part of the query is to determine whether a given expression is string concatenation. - - We therefore write a helper predicate for finding string concatenation. - - This predicate effectively represents the set of all add expressions in the database where the type of the expression is ``TypeString`` - that is, the addition produces a ``String`` value. + - We therefore write a helper predicate for finding string concatenation. + - This predicate effectively represents the set of all ``add`` expressions in the database where the type of the expression is ``TypeString`` - that is, the addition produces a ``String`` value. QL query: SPARQL injection ========================== diff --git a/docs/language/ql-training-rst/slide-snippets/abstract-syntax-tree.rst b/docs/language/ql-training-rst/slide-snippets/abstract-syntax-tree.rst new file mode 100644 index 00000000000..c640aa7a16a --- /dev/null +++ b/docs/language/ql-training-rst/slide-snippets/abstract-syntax-tree.rst @@ -0,0 +1,70 @@ +Abstract syntax trees +===================== + +The basic representation of an analyzed program is an *abstract syntax tree (AST)*. + +.. container:: column-left + + .. code-block:: java + + try { + ... + } catch (AnException e) { + } + +.. container:: ast-graph + + .. graphviz:: + + digraph { + graph [ dpi = 1000 ] + node [shape=polygon,sides=4,color=blue4,style="filled,rounded", fontname=consolas,fontcolor=white] + a [label=] + b [label=] + c [label=<...>,color=white,fontcolor=black] + d [label=DeclExpr>] + e [label=<...>,color=white,fontcolor=black] + f [label=<...>,color=white,fontcolor=black] + g [label=<...>,color=white,fontcolor=black] + + a -> {b, c} + b -> {d, e} + d -> {f, g} + } + + +.. note:: + + When writing queries in QL it is important to have in mind the underlying representation of the program which is stored in the database. Typically queries make use of the “AST” representation of the program - a tree structure where program elements are nested within other program elements. + + The following topics contain overviews of the important AST classes and QL libraries for C/C++, C#, and Java: + + - `Introducing the C/C++ libraries `__ + - `Introducing the C# libraries `__ + - `Introducing the Java libraries `__ + + +Database representations of ASTs +================================ + +AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaque - all one can do with them is to check their equality. + +Each entity belongs to an entity type. Entity types have names starting with “@” and are defined in the database schema (not in QL). + +Properties of AST nodes and their relationships to each other are encoded by database relations, which are predicates defined in the database (not in QL). + +Entity types are rarely used directly, the usual pattern is to define a QL class that extends the type and exposes properties of its entities through member predicates. + +.. note:: + + ASTs are a typical example of the kind of data representation one finds in object-oriented programming, with data-carrying nodes that reference each other. At first glance, QL, which can only work with atomic values, does not seem to be well suited for working with this kind of data. However, ultimately all that we require of the nodes in an AST is that they have an identity. The relationships among nodes, usually implemented by reference-valued object fields in other languages, can just as well (and arguably more naturally) be represented as relations over nodes. Attaching data (such as strings or numbers) to nodes can also be represented with relations over nodes and primitive values. All we need is a way for relations to reference nodes. This is achieved in QL (as in other database languages) by means of *entity values* (or entities, for short), which are opaque atomic values, implemented as integers under the hood. + + It is the job of the extractor to create entity values for all AST nodes and populate database relations that encode the relationship between AST nodes and any values associated with them. These relations are *extensional*, that is, explicitly stored in the database, unlike the relations described by QL predicates, which we also refer to as *intensional* relations. Entity values belong to *entity types*, whose name starts with “@” to set them apart from primitive types and classes. + + The interface between entity types and extensional relations on the one hand and QL predicates and classes on the other hand is provided by the *database schema*, which defines the available entity types and the schema of each extensional relation, that is, how many columns the relation has, and which entity type or primitive type the values in each column come from. QL programs can refer to entity types and extensional relations just as they would refer to QL classes and predicates, with the restriction that entity types cannot be directly selected in a ``select`` clause, since they do not have a well-defined string representation. + + For example, the database schemas for C/++, C#, and Java snapshot databases are here: + + - https://github.com/Semmle/ql/blob/master/cpp/ql/src/semmlecode.cpp.dbscheme + - https://github.com/Semmle/ql/blob/master/csharp/ql/src/semmlecode.csharp.dbscheme + - https://github.com/Semmle/ql/blob/master/java/ql/src/config/semmlecode.dbscheme \ No newline at end of file diff --git a/docs/language/ql-training-rst/slide-snippets/global-data-flow-extra-slides.rst b/docs/language/ql-training-rst/slide-snippets/global-data-flow-extra-slides.rst index 93b64f02dba..49cf6b1d7f5 100644 --- a/docs/language/ql-training-rst/slide-snippets/global-data-flow-extra-slides.rst +++ b/docs/language/ql-training-rst/slide-snippets/global-data-flow-extra-slides.rst @@ -1,7 +1,7 @@ Exercise: How not to do global data flow ======================================== -Implement a flowStep predicate extending localFlowStep with steps through function calls and returns. Why might we not want to use this? +Implement a ``flowStep`` predicate extending ``localFlowStep`` with steps through function calls and returns. Why might we not want to use this? .. code-block:: ql diff --git a/docs/language/ql-training-rst/slide-snippets/local-data-flow.rst b/docs/language/ql-training-rst/slide-snippets/local-data-flow.rst index 110e3040aab..eb20ce0282b 100644 --- a/docs/language/ql-training-rst/slide-snippets/local-data-flow.rst +++ b/docs/language/ql-training-rst/slide-snippets/local-data-flow.rst @@ -64,7 +64,7 @@ Local vs global data flow - Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot - Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot - Different APIs, so discussed separately -- This slide deck focuses on the former. +- This slide deck focuses on the former .. note:: diff --git a/docs/language/ql-training-rst/slide-snippets/path-queries.rst b/docs/language/ql-training-rst/slide-snippets/path-queries.rst index c7d9212e644..e93f7fda966 100644 --- a/docs/language/ql-training-rst/slide-snippets/path-queries.rst +++ b/docs/language/ql-training-rst/slide-snippets/path-queries.rst @@ -1,7 +1,7 @@ Path queries ============ -Path queries provide information about the identified paths from sources to sinks. Paths can be examined in Path Explorer view. +Path queries provide information about the identified paths from sources to sinks. Paths can be examined in the Path Explorer view. Use this template: diff --git a/docs/language/ql-training-rst/slide-snippets/snapshot-note.rst b/docs/language/ql-training-rst/slide-snippets/snapshot-note.rst new file mode 100644 index 00000000000..4a32243211b --- /dev/null +++ b/docs/language/ql-training-rst/slide-snippets/snapshot-note.rst @@ -0,0 +1 @@ +Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the codebase. \ No newline at end of file diff --git a/docs/language/ql-training-rst/template.rst b/docs/language/ql-training-rst/template.rst index ba4c111d9d0..0cce4a11435 100644 --- a/docs/language/ql-training-rst/template.rst +++ b/docs/language/ql-training-rst/template.rst @@ -6,7 +6,10 @@ - The default slide style is a plain white-ish background with minimal company branding - Different slide designs have been preconfigured. To choose a different layout use the appropriate .. rst-class:: directive. For examples of the different designs, - see the template below. + see the template below. This directive can also be used to create custom classes for individual + images and slide backgrounds if necessary. Additional CSS styles may also be required when using custom + class directives. Search for 'deck-specific styles for individual images` in default.css for examples + of how to implement custom class styles. - Additional notes can be added to a slide using a .. note:: directive - Press P to access the additional notes on the rendered slides. - Press F is go into full screen mode when viewing the rendered slides.