mirror of
https://github.com/github/codeql.git
synced 2025-12-20 18:56:32 +01:00
88 lines
4.8 KiB
ReStructuredText
88 lines
4.8 KiB
ReStructuredText
Query writing: common performance issues
|
||
========================================
|
||
|
||
This topic offers some simple tips on how to avoid common problems that can affect the performance of your queries.
|
||
Before reading the tips below, it is worth reiterating a few important points about CodeQL and the QL language:
|
||
|
||
- CodeQL `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ and `classes <https://help.semmle.com/QL/ql-handbook/types.html#classes>`__ are evaluated to database `tables <https://en.wikipedia.org/wiki/Table_(database)>`__. Large predicates generate large tables with many rows, and are therefore expensive to compute.
|
||
- The QL language is implemented using standard database operations and `relational algebra <https://en.wikipedia.org/wiki/Relational_algebra>`__ (such as join, projection, and union). For further information about query languages and databases, see :doc:`About QL <../about-ql>`.
|
||
- Queries are evaluated *bottom-up*, which means that a predicate is not evaluated until *all* of the predicates that it depends on are evaluated. For more information on query evaluation, see `Evaluation of QL programs <https://help.semmle.com/QL/ql-handbook/evaluation.html>`__ in the QL handbook.
|
||
|
||
Performance tips
|
||
----------------
|
||
|
||
Follow the guidelines below to ensure that you don't get tripped up by the most common CodeQL performance pitfalls.
|
||
|
||
Eliminate cartesian products
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The performance of a predicate can often be judged by considering roughly how many results it has.
|
||
One way of creating badly performing predicates is by using two variables without relating them in any way, or only relating them using a negation.
|
||
This leads to computing the `Cartesian product <https://en.wikipedia.org/wiki/Cartesian_product>`__ between the sets of possible values for each variable, potentially generating a huge table of results.
|
||
|
||
This can occur if you don't specify restrictions on your variables.
|
||
|
||
For instance, consider the following predicate that checks whether a Java method ``m`` may access a field ``f``::
|
||
|
||
predicate mayAccess(Method m, Field f) {
|
||
f.getAnAccess().getEnclosingCallable() = m
|
||
or
|
||
not exists(m.getBody())
|
||
}
|
||
|
||
The predicate holds if ``m`` contains an access to ``f``, but also conservatively assumes that methods without bodies (for example, native methods) may access *any* field.
|
||
|
||
However, if ``m`` is a native method, the table computed by ``mayAccess`` will contain a row ``m, f`` for *all* fields ``f`` in the codebase, making it potentially very large.
|
||
|
||
This example shows a similar mistake in a member predicate::
|
||
|
||
class Foo extends Class {
|
||
...
|
||
// BAD! Does not use ‘this’
|
||
Method getToString() {
|
||
result.getName() = "ToString"
|
||
}
|
||
...
|
||
}
|
||
|
||
Note that while ``getToString()`` does not declare any parameters, it has two implicit parameters, ``result`` and ``this``, which it fails to relate. Therefore, the table computed by ``getToString()`` contains a row for every combination of ``result`` and ``this``. That is, a row for every combination of a method named ``"ToString"`` and an instance of ``Foo``.
|
||
To avoid making this mistake, ``this`` should be restricted in the member predicate ``getToString()`` on the class ``Foo``.
|
||
|
||
Use specific types
|
||
~~~~~~~~~~~~~~~~~~
|
||
|
||
`Types <https://help.semmle.com/QL/ql-handbook/types.html>`__ provide an upper bound on the size of a relation.
|
||
This helps the query optimizer be more effective, so it's generally good to use the most specific types possible. For example::
|
||
|
||
predicate foo(LoggingCall e)
|
||
|
||
is preferred over::
|
||
|
||
predicate foo(Expr e)
|
||
|
||
From the type context, the query optimizer deduces that some parts of the program are redundant and removes them, or *specializes* them.
|
||
|
||
Avoid complex recursion
|
||
~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
`Recursion <https://help.semmle.com/QL/ql-handbook/recursion.html>`__ is about self-referencing definitions.
|
||
It can be extremely powerful as long as it is used appropriately.
|
||
On the whole, you should try to make recursive predicates as simple as possible.
|
||
That is, you should define a *base case* that allows the predicate to *bottom out*, along with a single *recursive call*::
|
||
|
||
int depth(Stmt s) {
|
||
exists(Callable c | c.getBody() = s | result = 0) // base case
|
||
or
|
||
result = depth(s.getParent()) + 1 // recursive call
|
||
}
|
||
|
||
.. pull-quote:: Note
|
||
|
||
The query optimizer has special data structures for dealing with `transitive closures <https://help.semmle.com/QL/ql-handbook/recursion.html#transitive-closures>`__.
|
||
If possible, use a transitive closure over a simple recursive predicate, as it is likely to be computed faster.
|
||
|
||
Further information
|
||
-------------------
|
||
|
||
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
|