mirror of
https://github.com/github/codeql.git
synced 2026-05-05 13:45:19 +02:00
Merge branch 'main' into turbo/experimental/combined
This commit is contained in:
@@ -185,7 +185,7 @@ For more information about the class ``Call``, see ":doc:`Navigating the call gr
|
||||
Improvements
|
||||
~~~~~~~~~~~~
|
||||
|
||||
The Java standard library provides another annotation type ``java.lang.SupressWarnings`` that can be used to suppress certain categories of warnings. In particular, it can be used to turn off warnings about calls to deprecated methods. Therefore, it makes sense to improve our query to ignore calls to deprecated methods from inside methods that are marked with ``@SuppressWarnings("deprecated")``.
|
||||
The Java standard library provides another annotation type ``java.lang.SupressWarnings`` that can be used to suppress certain categories of warnings. In particular, it can be used to turn off warnings about calls to deprecated methods. Therefore, it makes sense to improve our query to ignore calls to deprecated methods from inside methods that are marked with ``@SuppressWarnings("deprecation")``.
|
||||
|
||||
For instance, consider this slightly updated example:
|
||||
|
||||
@@ -198,7 +198,7 @@ For instance, consider this slightly updated example:
|
||||
m();
|
||||
}
|
||||
|
||||
@SuppressWarnings("deprecated")
|
||||
@SuppressWarnings("deprecation")
|
||||
void r() {
|
||||
m();
|
||||
}
|
||||
@@ -206,7 +206,7 @@ For instance, consider this slightly updated example:
|
||||
|
||||
Here, the programmer has explicitly suppressed warnings about deprecated calls in ``A.r``, so our query should not flag the call to ``A.m`` any more.
|
||||
|
||||
To do so, we first introduce a class for representing all ``@SuppressWarnings`` annotations where the string ``deprecated`` occurs among the list of warnings to suppress:
|
||||
To do so, we first introduce a class for representing all ``@SuppressWarnings`` annotations where the string ``deprecation`` occurs among the list of warnings to suppress:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
|
||||
@@ -639,7 +639,7 @@ Various kinds of syntax can have *annotations* applied to them. Annotations are
|
||||
| "override"
|
||||
| "query"
|
||||
|
||||
argsAnnotation ::= "pragma" "[" ("inline" | "noinline" | "nomagic" | "noopt" | "assume_small_delta") "]"
|
||||
argsAnnotation ::= "pragma" "[" ("inline" | "inline_late" | "noinline" | "nomagic" | "noopt" | "assume_small_delta") "]"
|
||||
| "language" "[" "monotonicAggregates" "]"
|
||||
| "bindingset" "[" (variable ( "," variable)*)? "]"
|
||||
|
||||
@@ -692,6 +692,8 @@ The parameterized annotation ``pragma`` supplies compiler pragmas, and may be ap
|
||||
+===========================+=========+============+===================+=======================+=========+========+=========+=========+
|
||||
| ``inline`` | | yes | yes | yes | | | | |
|
||||
+---------------------------+---------+------------+-------------------+-----------------------+---------+--------+---------+---------+
|
||||
| ``inline_late`` | | | | yes | | | | |
|
||||
+---------------------------+---------+------------+-------------------+-----------------------+---------+--------+---------+---------+
|
||||
| ``noinline`` | | yes | yes | yes | | | | |
|
||||
+---------------------------+---------+------------+-------------------+-----------------------+---------+--------+---------+---------+
|
||||
| ``nomagic`` | | yes | yes | yes | | | | |
|
||||
@@ -2069,7 +2071,7 @@ The complete grammar for QL is as follows:
|
||||
| "override"
|
||||
| "query"
|
||||
|
||||
argsAnnotation ::= "pragma" "[" ("inline" | "noinline" | "nomagic" | "noopt" | "assume_small_delta") "]"
|
||||
argsAnnotation ::= "pragma" "[" ("inline" | "inline_late" | "noinline" | "nomagic" | "noopt" | "assume_small_delta") "]"
|
||||
| "language" "[" "monotonicAggregates" "]"
|
||||
| "bindingset" "[" (variable ( "," variable)*)? "]"
|
||||
|
||||
|
||||
8
docs/codeql/reusables/codespaces-template-note.rst
Normal file
8
docs/codeql/reusables/codespaces-template-note.rst
Normal file
@@ -0,0 +1,8 @@
|
||||
.. pull-quote::
|
||||
|
||||
Note
|
||||
|
||||
You can use the CodeQL template (beta) in `GitHub Codespaces <https://github.com/codespaces/new?template_repository=github/codespaces-codeql>`__ to try out the QL concepts and programming-language-agnostic examples in these tutorials. The template includes a guided introduction to working with QL, and makes it easy to get started.
|
||||
|
||||
When you're ready to run CodeQL queries on actual codebases, you will need to install the CodeQL extension in Visual Studio Code. For instructions, see ":ref:`Setting up CodeQL in Visual Studio Code <setting-up-codeql-in-visual-studio-code>`."
|
||||
|
||||
@@ -50,7 +50,7 @@ You start asking some creative questions and making notes of the answers so you
|
||||
|
||||
There is too much information to search through by hand, so you decide to use your newly acquired QL skills to help you with your investigation...
|
||||
|
||||
.. include:: ../reusables/setup-to-run-tutorials.rst
|
||||
.. include:: ../reusables/codespaces-template-note.rst
|
||||
|
||||
QL libraries
|
||||
------------
|
||||
|
||||
@@ -14,17 +14,17 @@ QL is a logic programming language, so it is built up of logical formulas. QL us
|
||||
|
||||
QL also supports recursion and aggregates. This allows you to write complex recursive queries using simple QL syntax and directly use aggregates such as ``count``, ``sum``, and ``average``.
|
||||
|
||||
.. include:: ../reusables/codespaces-template-note.rst
|
||||
|
||||
Running a query
|
||||
---------------
|
||||
|
||||
You can try out the following examples and exercises using :ref:`CodeQL for VS Code <codeql-for-visual-studio-code>`, or you can run them in the `query console on LGTM.com <https://lgtm.com/query>`__. Before you can run a query on LGTM.com, you need to select a language and project to query (for these logic examples, any language and project will do).
|
||||
You can try out the following examples and exercises using :ref:`CodeQL for VS Code <codeql-for-visual-studio-code>` or the `CodeQL template <https://github.com/codespaces/new?template_repository=github/codespaces-codeql>`__ on GitHub Codespaces.
|
||||
|
||||
Once you have selected a language, the query console is populated with the query:
|
||||
Here is an example of a basic query:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
import <language>
|
||||
|
||||
select "hello world"
|
||||
|
||||
This query returns the string ``"hello world"``.
|
||||
@@ -52,39 +52,33 @@ Simple exercises
|
||||
|
||||
You can write simple queries using the some of the basic functions that are available for the ``int``, ``date``, ``float``, ``boolean`` and ``string`` types. To apply a function, append it to the argument. For example, ``1.toString()`` converts the value ``1`` to a string. Notice that as you start typing a function, a pop-up is displayed making it easy to select the function that you want. Also note that you can apply multiple functions in succession. For example, ``100.log().sqrt()`` first takes the natural logarithm of 100 and then computes the square root of the result.
|
||||
|
||||
Exercise 1
|
||||
~~~~~~~~~~
|
||||
Exercise 1 - Strings
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Write a query which returns the length of the string ``"lgtm"``. (Hint: `here <https://codeql.github.com/docs/ql-language-reference/ql-language-specification/#built-ins-for-string>`__ is the list of the functions that can be applied to strings.)
|
||||
|
||||
➤ `See answer in the query console on LGTM.com <https://lgtm.com/query/2103060623/>`__
|
||||
➤ `Check your answer <#exercise-1>`__
|
||||
|
||||
There is often more than one way to define a query. For example, we can also write the above query in the shorter form:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
select "lgtm".length()
|
||||
|
||||
Exercise 2
|
||||
~~~~~~~~~~
|
||||
Exercise 2 - Numbers
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Write a query which returns the sine of the minimum of ``3^5`` (``3`` raised to the power ``5``) and ``245.6``.
|
||||
|
||||
➤ `See answer in the query console on LGTM.com <https://lgtm.com/query/2093780343/>`__
|
||||
➤ `Check your answer <#exercise-2>`__
|
||||
|
||||
Exercise 3
|
||||
~~~~~~~~~~
|
||||
Exercise 3 - Booleans
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Write a query which returns the opposite of the boolean ``false``.
|
||||
|
||||
➤ `See answer in the query console on LGTM.com <https://lgtm.com/query/2093780344/>`__
|
||||
➤ `Check your answer <#exercise-3>`__
|
||||
|
||||
Exercise 4
|
||||
~~~~~~~~~~
|
||||
Exercise 4 - Dates
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Write a query which computes the number of days between June 10 and September 28, 2017.
|
||||
|
||||
➤ `See answer in the query console on LGTM.com <https://lgtm.com/query/2100260596/>`__
|
||||
➤ `Check your answer <#exercise-4>`__
|
||||
|
||||
Example query with multiple results
|
||||
-----------------------------------
|
||||
@@ -98,8 +92,6 @@ The exercises above all show queries with exactly one result, but in fact many q
|
||||
x*x + y*y = z*z
|
||||
select x, y, z
|
||||
|
||||
➤ `See this in the query console on LGTM.com <https://lgtm.com/query/2100790036/>`__
|
||||
|
||||
To simplify the query, we can introduce a class ``SmallInt`` representing the integers between 1 and 10. We can also define a predicate ``square()`` on integers in that class. Defining classes and predicates in this way makes it easy to reuse code without having to repeat it every time.
|
||||
|
||||
.. code-block:: ql
|
||||
@@ -113,17 +105,17 @@ To simplify the query, we can introduce a class ``SmallInt`` representing the in
|
||||
where x.square() + y.square() = z.square()
|
||||
select x, y, z
|
||||
|
||||
➤ `See this in the query console on LGTM.com <https://lgtm.com/query/2101340747/>`__
|
||||
|
||||
Example CodeQL queries
|
||||
----------------------
|
||||
|
||||
The previous examples used the primitive types built in to QL. Although we chose a project to query, we didn't use the information in that project's database.
|
||||
The following example queries *do* use these databases and give you an idea of how to use CodeQL to analyze projects.
|
||||
The following example queries *do* use these databases and give you an idea of how to use CodeQL to analyze projects.
|
||||
|
||||
Queries using the CodeQL libraries can find errors and uncover variants of important security vulnerabilities in codebases.
|
||||
Visit `GitHub Security Lab <https://securitylab.github.com/>`__ to read about examples of vulnerabilities that we have recently found in open source projects.
|
||||
|
||||
Before you can run the following examples, you will need to install the CodeQL extension for Visual Studio Code. For more information, see :ref:`Setting up CodeQL in Visual Studio Code <setting-up-codeql-in-visual-studio-code>`. You will also need to import and select a database in the corresponding programming language. For more information about obtaining CodeQL databases, see `Analyzing your projects <https://codeql.github.com/docs/codeql-for-visual-studio-code/analyzing-your-projects/#choosing-a-database>`__ in the CodeQL for VS Code documentation.
|
||||
|
||||
To import the CodeQL library for a specific programming language, type ``import <language>`` at the start of the query.
|
||||
|
||||
.. code-block:: ql
|
||||
@@ -134,7 +126,7 @@ To import the CodeQL library for a specific programming language, type ``import
|
||||
where count(f.getAnArg()) > 7
|
||||
select f
|
||||
|
||||
➤ `See this in the query console on LGTM.com <https://lgtm.com/query/2096810474/>`__. The ``from`` clause defines a variable ``f`` representing a Python function. The ``where`` part limits the functions ``f`` to those with more than 7 arguments. Finally, the ``select`` clause lists these functions.
|
||||
The ``from`` clause defines a variable ``f`` representing a Python function. The ``where`` part limits the functions ``f`` to those with more than 7 arguments. Finally, the ``select`` clause lists these functions.
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
@@ -144,7 +136,7 @@ To import the CodeQL library for a specific programming language, type ``import
|
||||
where c.getText().regexpMatch("(?si).*\\bTODO\\b.*")
|
||||
select c
|
||||
|
||||
➤ `See this in the query console on LGTM.com <https://lgtm.com/query/2101530483/>`__. The ``from`` clause defines a variable ``c`` representing a JavaScript comment. The ``where`` part limits the comments ``c`` to those containing the word ``"TODO"``. The ``select`` clause lists these comments.
|
||||
The ``from`` clause defines a variable ``c`` representing a JavaScript comment. The ``where`` part limits the comments ``c`` to those containing the word ``"TODO"``. The ``select`` clause lists these comments.
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
@@ -154,9 +146,56 @@ To import the CodeQL library for a specific programming language, type ``import
|
||||
where not exists(p.getAnAccess())
|
||||
select p
|
||||
|
||||
➤ `See this in the query console on LGTM.com <https://lgtm.com/query/2098670762/>`__. The ``from`` clause defines a variable ``p`` representing a Java parameter. The ``where`` clause finds unused parameters by limiting the parameters ``p`` to those which are not accessed. Finally, the ``select`` clause lists these parameters.
|
||||
The ``from`` clause defines a variable ``p`` representing a Java parameter. The ``where`` clause finds unused parameters by limiting the parameters ``p`` to those which are not accessed. Finally, the ``select`` clause lists these parameters.
|
||||
|
||||
Further reading
|
||||
---------------
|
||||
|
||||
- For a more technical description of the underlying language, see the ":ref:`QL language reference <ql-language-reference>`."
|
||||
- For a more technical description of the underlying language, see the ":ref:`QL language reference <ql-language-reference>`."
|
||||
|
||||
--------------
|
||||
|
||||
Answers
|
||||
-------
|
||||
|
||||
Exercise 1
|
||||
~~~~~~~~~~
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
from string s
|
||||
where s = "lgtm"
|
||||
select s.length()
|
||||
|
||||
There is often more than one way to define a query. For example, we can also write the above query in the shorter form:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
select "lgtm".length()
|
||||
|
||||
Exercise 2
|
||||
~~~~~~~~~~
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
from float x, float y
|
||||
where x = 3.pow(5) and y = 245.6
|
||||
select x.minimum(y).sin()
|
||||
|
||||
Exercise 3
|
||||
~~~~~~~~~~
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
from boolean b
|
||||
where b = false
|
||||
select b.booleanNot()
|
||||
|
||||
Exercise 4
|
||||
~~~~~~~~~~
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
from date start, date end
|
||||
where start = "10/06/2017".toDate() and end = "28/09/2017".toDate()
|
||||
select start.daysTo(end)
|
||||
|
||||
@@ -1,101 +0,0 @@
|
||||
# Query classification and display
|
||||
|
||||
## Attributable Queries
|
||||
|
||||
The results of some queries are unsuitable for attribution to individual
|
||||
developers. Most of them have a threshold value on which they trigger,
|
||||
for example all metric violations and statistics based queries. The
|
||||
results of such queries would all be attributed to the person pushing
|
||||
the value over (or under) the threshold. Some queries only trigger when
|
||||
another one doesn't. An example of this is the MaybeNull query which
|
||||
only triggers if the AlwaysNull query doesn't. A small change in the
|
||||
data flow could make an alert switch from AlwaysNull to MaybeNull (or
|
||||
vice versa). As a result we attribute both a fix and an introduction to
|
||||
the developer that changed the data flow. For this particular example
|
||||
the funny attribution results are more a nuisance than a real problem;
|
||||
the overall alert count remains unchanged. However, for the duplicate
|
||||
and similar code queries the effects can be much more severe, as they
|
||||
come in versions for "duplicate file" and "duplicate function" among
|
||||
many others, where "duplicate function" only triggers if "duplicate
|
||||
file" didn't. As a result adding some code to a duplicate file might
|
||||
result in a "fix" of a "duplicate file" alert and an introduction of
|
||||
many "duplicate function" alerts. This would be highly unfair.
|
||||
Currently, only the duplicate and similar code queries exhibit this
|
||||
"exchanging one for many" alerts when trying to attribute their results.
|
||||
Therefore we currently exclude all duplicate code related alerts from
|
||||
attribution.
|
||||
|
||||
The following queries are excluded from attribution:
|
||||
|
||||
- Metric violations, i.e. the ones with metadata properties like
|
||||
`@(error|warning|recommendation)-(to|from)`
|
||||
- Queries with tag `non-attributable`
|
||||
|
||||
This check is applied when the results of a single attribution are
|
||||
loaded into the datastore. This means that any change to this behaviour
|
||||
will only take effect on newly attributed revisions but the historical
|
||||
data remains unchanged.
|
||||
|
||||
## Query severity and precision
|
||||
|
||||
We currently classify queries on two axes, with some additional tags.
|
||||
Those axes are severity and precision, and are defined using the
|
||||
query-metadata properties `@problem.severity` and `@precision`.
|
||||
|
||||
For severity, we have the following categories:
|
||||
|
||||
- Error
|
||||
- Warning
|
||||
- Recommendation
|
||||
|
||||
These categories may change in the future.
|
||||
|
||||
For precision, we have the following categories:
|
||||
|
||||
- very-high
|
||||
- high
|
||||
- medium
|
||||
- low
|
||||
|
||||
As [usual](https://en.wikipedia.org/wiki/Precision_and_recall),
|
||||
precision is defined as the percentage of query results that are true
|
||||
positives, i.e., precision = number of true positives / (number of true
|
||||
positives + number of false positives). There is no hard-and-fast rule
|
||||
for which precision ranges correspond to which categories.
|
||||
|
||||
We expect these categories to remain unchanged for the foreseeable
|
||||
future.
|
||||
|
||||
### A note on precision
|
||||
|
||||
Intuitively, precision measures how well the query performs at finding the
|
||||
results it is supposed to find, i.e., how well it implements its
|
||||
(informal, unwritten) rule. So how precise a query is depends very much
|
||||
on what we consider that rule to be. We generally try to sharpen our
|
||||
rules to focus on results that a developer might actually be interested
|
||||
in.
|
||||
|
||||
## Which queries to run and display on LGTM
|
||||
|
||||
The following queries are run:
|
||||
|
||||
Precision: | very-high | high | medium | low
|
||||
---------------|-----------|---------|---------|----
|
||||
Error | **Yes** | **Yes** | **Yes** | No
|
||||
Warning | **Yes** | **Yes** | **Yes** | No
|
||||
Recommendation | **Yes** | **Yes** | No | No
|
||||
|
||||
The following queries have their results displayed by default:
|
||||
|
||||
Precision: | very-high | high | medium | low
|
||||
---------------|-----------|---------|--------|----
|
||||
Error | **Yes** | **Yes** | No | No
|
||||
Warning | **Yes** | **Yes** | No | No
|
||||
Recommendation | **Yes** | No | No | No
|
||||
|
||||
Results for queries that are run but not displayed by default can be
|
||||
made visible by editing the project configuration.
|
||||
|
||||
Queries from custom query packs (in-repo or site-wide) are always run
|
||||
and displayed by default. They can be hidden by editing the project
|
||||
config, and "disabled" by removing them from the query pack.
|
||||
@@ -34,12 +34,8 @@ The process must begin with the first step and must conclude with the final step
|
||||
|
||||
Test the query on a number of large real-world projects to make sure it doesn't give too many false positive results. Adjust the `@precision` and `@problem.severity` attributes in accordance with the real-world results you observe. See the advice on query metadata below.
|
||||
|
||||
You can use the LGTM.com [query console](https://lgtm.com/query) to get an overview of true and false positive results on a large number of projects. The simplest way to do this is to:
|
||||
|
||||
1. [Create a list of prominent projects](https://lgtm.com/help/lgtm/managing-project-lists) on LGTM.
|
||||
2. In the query console, [run your query against your custom project list](https://lgtm.com/help/lgtm/using-query-console).
|
||||
3. Save links to your query console results and include them in discussions on issues and pull requests.
|
||||
|
||||
GitHub is running a private beta test of a new feature for testing CodeQL queries at scale from VS Code. To request access to the beta program, please respond to this [GitHub Discussion](https://github.com/orgs/community/discussions/40453).
|
||||
|
||||
5. **Test and improve performance**
|
||||
|
||||
There must be a balance between the execution time of a query and the value of its results: queries that are highly valuable and broadly applicable can be allowed to take longer to run. In all cases, you need to address any easy-to-fix performance issues before the query is put into production.
|
||||
@@ -62,8 +58,6 @@ The process must begin with the first step and must conclude with the final step
|
||||
|
||||
- The severity is one of `error`, `warning`, or `recommendation`.
|
||||
- The precision is one of `very-high`, `high`, `medium` or `low`. It may take a few iterations to get this right.
|
||||
- Currently, LGTM runs all `error` or `warning` queries with a `very-high`, `high`, or `medium` precision. In addition, `recommendation` queries with `very-high` or `high` precision are run.
|
||||
- However, results from `error` and `warning` queries with `medium` precision, as well as `recommendation` queries with `high` precision, are not shown by default.
|
||||
|
||||
c. All queries need an `@id`.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user