Docs: refactor guidelines for new queries

This commit is contained in:
Jonas Jensen
2020-03-13 14:23:37 +01:00
parent 7dc80664e6
commit 9899d46999
3 changed files with 118 additions and 70 deletions

View File

@@ -1,62 +1,65 @@
# Contributing to CodeQL # Contributing to CodeQL
We welcome contributions to our standard library and standard checks. Got an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request! We welcome contributions to our CodeQL libraries and queries. Got an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request!
Before we accept your pull request, we require that you have agreed to our Contributor License Agreement, this is not something that you need to do before you submit your pull request, but until you've done so, we will be unable to accept your contribution. There is lots of useful documentation to help you write queries, ranging from information about query file structure to tutorials for specific target languages. For more information on the documentation available, see [Writing CodeQL queries](https://help.semmle.com/QL/learn-ql/writing-queries/writing-queries.html) on [help.semmle.com](https://help.semmle.com).
## Adding a new query
If you have an idea for a query that you would like to share with other CodeQL users, please open a pull request to add it to this repository. ## Submitting a new experimental query
Follow the steps below to help other users understand what your query does, and to ensure that your query is consistent with the other CodeQL queries.
1. **Consult the documentation for query writers** If you have an idea for a query that you would like to share with other CodeQL users, please open a pull request to add it to this repository. New queries start out in a `<language>/ql/src/experimental` directory, to which they can be merged when they meet the following requirements.
There is lots of useful documentation to help you write queries, ranging from information about query file structure to tutorials for specific target languages. For more information on the documentation available, see [Writing CodeQL queries](https://help.semmle.com/QL/learn-ql/writing-queries/writing-queries.html) on [help.semmle.com](https://help.semmle.com). 1. **Directory structure**
2. **Format your code correctly** There are five language-specific query directories in this repository:
All CodeQL standard queries and libraries are uniformly formatted for clarity and consistency, so we strongly recommend that all contributions follow the same formatting guidelines. If you use CodeQL for VS Code, you can autoformat your query in the [Editor](https://help.semmle.com/codeql/codeql-for-vscode/reference/editor.html#autoformatting). For more information, see the [CodeQL style guide](https://github.com/Semmle/ql/blob/master/docs/ql-style-guide.md). * C/C++: `cpp/ql/src`
* C#: `csharp/ql/src`
* Java: `java/ql/src`
* JavaScript: `javascript/ql/src`
* Python: `python/ql/src`
3. **Make sure your query has the correct metadata** Each language-specific directory contains further subdirectories that group queries based on their `@tags` or purpose.
- Experimental queries and libraries are stored in the `experimental` subdirectory within each language-specific directory in the [CodeQL repository](https://github.com/Semmle/ql). For example, experimental Java queries and libraries are stored in `java/ql/src/experimental` and any corresponding tests in `java/ql/test/experimental`.
- The structure of an `experimental` subdirectory mirrors the structure of its parent directory.
- Select or create an appropriate directory in `experimental` based on the existing directory structure of `experimental` or its parent directory.
Query metadata is used to identify your query and make sure the query results are displayed properly. 2. **Query metadata**
The most important metadata to include are the `@name`, `@description`, and the `@kind`.
Other metadata properties (`@precision`, `@severity`, and `@tags`) are usually added after the query has been reviewed by GitHub staff.
For more information on writing query metadata, see the [Query metadata style guide](https://github.com/Semmle/ql/blob/master/docs/query-metadata-style-guide.md).
4. **Make sure the `select` statement is compatible with the query type** - The query `@id` must conform to all the requirements in the [guide on query metadata](docs/query-metadata-style-guide.md#query-id-id). In particular, it must not clash with any other queries in the repository, and it must start with the appropriate language-specific prefix.
- The query must have a `@name` and `@description` to explain its purpose.
- The query must have a `@kind` and `@problem.severity` as required by CodeQL tools.
The `select` statement of your query must be compatible with the query type (determined by the `@kind` metadata property) for alert or path results to be displayed correctly in LGTM and CodeQL for VS Code. For details, see the [guide on query metadata](docs/query-metadata-style-guide.md).
For more information on `select` statement format, see [Introduction to query files](https://help.semmle.com/QL/learn-ql/writing-queries/introduction-to-queries.html#select-clause) on help.semmle.com.
5. **Save your query in a `.ql` file in the correct language directory in this repository** Make sure the `select` statement is compatible with the query `@kind`. See [Introduction to query files](https://help.semmle.com/QL/learn-ql/writing-queries/introduction-to-queries.html#select-clause) on help.semmle.com.
There are five language-specific directories in this repository: 3. **Formatting**
* C/C++: `ql/cpp/ql/src`
* C#: `ql/csharp/ql/src`
* Java: `ql/java/ql/src`
* JavaScript: `ql/javascript/ql/src`
* Python: `ql/python/ql/src`
Each language-specific directory contains further subdirectories that group queries based on their `@tags` properties or purpose. Select the appropriate subdirectory for your new query, or create a new one if necessary. - The queries and libraries must be [autoformatted](https://help.semmle.com/codeql/codeql-for-vscode/reference/editor.html#autoformatting).
6. **Write a query help file** 4. **Compilation**
Query help files explain the purpose of your query to other users. Write your query help in a `.qhelp` file and save it in the same directory as your new query. - Compilation of the query and any associated libraries and tests must be resilient to future development of the [supported](docs/supported-queries.md) libraries. This means that the functionality cannot use internal libraries, cannot depend on the output of `getAQlClass`, and cannot make use of regexp matching on `toString`.
For more information on writing query help, see the [Query help style guide](https://github.com/Semmle/ql/blob/master/docs/query-help-style-guide.md). - The query and any associated libraries and tests must not cause any compiler warnings to be emitted (such as use of deprecated functionality or missing `override` annotations).
7. **Maintain backwards compatibility** 5. **Results**
The standard CodeQL libraries must evolve in a backwards compatible manner. If any backwards incompatible changes need to be made, the existing API must first be marked as deprecated. This is done by adding a `deprecated` annotation along with a QLDoc reference to the replacement API. Only after at least one full release cycle has elapsed may the old API be removed. - The query must have at least one true positive result on some revision of a real project.
In addition to contributions to our standard queries and libraries, we also welcome contributions of a more experimental nature, which do not need to fulfill all the requirements listed above. See the guidelines for [experimental queries and libraries](docs/experimental.md) for details. 6. **Contributor License Agreement**
- The contributor can satisfy the [CLA](#contributor-license-agreement).
Experimental queries and libraries may not be actively maintained as the [supported](docs/supported-queries.md) libraries evolve. They may also be changed in backwards-incompatible ways or may be removed entirely in the future without deprecation warnings.
After the experimental query is merged, we welcome pull requests to improve it. Before a query can be moved out of the `experimental` subdirectory, it must satisfy [the requirements for being a supported query](docs/supported-queries.md).
## Using your personal data ## Using your personal data
If you contribute to this project, we will record your name and email If you contribute to this project, we will record your name and email
address (as provided by you with your contributions) as part of the code address (as provided by you with your contributions) as part of the code
repositories, which might be made public. We might also use this information repositories, which are public. We might also use this information
to contact you in relation to your contributions, as well as in the to contact you in relation to your contributions, as well as in the
normal course of software development. We also store records of your normal course of software development. We also store records of your
CLA agreements. Under GDPR legislation, we do this CLA agreements. Under GDPR legislation, we do this

View File

@@ -1,41 +1,7 @@
# Experimental CodeQL queries and libraries # Experimental CodeQL queries and libraries
In addition to our standard CodeQL queries and libraries, this repository may also contain queries and libraries of a more experimental nature. Experimental queries and libraries can be improved incrementally and may eventually reach a sufficient maturity to be included in our standard libraries and queries. In addition to [our supported queries and libraries](supported-queries.md), this repository also contains queries and libraries of a more experimental nature. Experimental queries and libraries can be improved incrementally and may eventually reach a sufficient maturity to be included in our supported queries and libraries.
Experimental queries and libraries may not be actively maintained as the standard libraries evolve. They may also be changed in backwards-incompatible ways or may be removed entirely in the future without deprecation warnings. Experimental queries and libraries may not be actively maintained as the [supported](supported-queries.md) libraries evolve. They may also be changed in backwards-incompatible ways or may be removed entirely in the future without deprecation warnings.
## Requirements See [CONTRIBUTING.md](../CONTRIBUTING.md) for guidelines on submitting a new experimental query.
1. **Directory structure**
- Experimental queries and libraries are stored in the `experimental` subdirectory within each language-specific directory in the [CodeQL repository](https://github.com/Semmle/ql). For example, experimental Java queries and libraries are stored in `ql/java/ql/src/experimental` and any corresponding tests in `ql/java/ql/test/experimental`.
- The structure of an `experimental` subdirectory mirrors the structure of standard queries and libraries (or tests) in the parent directory.
2. **Query metadata**
- The query `@id` must not clash with any other queries in the repository.
- The query must have a `@name` and `@description` to explain its purpose.
- The query must have a `@kind` and `@problem.severity` as required by CodeQL tools.
For details, see the [guide on query metadata](https://github.com/Semmle/ql/blob/master/docs/query-metadata-style-guide.md).
3. **Formatting**
- The queries and libraries must be [autoformatted](https://help.semmle.com/codeql/codeql-for-vscode/reference/editor.html#autoformatting).
4. **Compilation**
- Compilation of the query and any associated libraries and tests must be resilient to future development of the standard libraries. This means that the functionality cannot use internal APIs, cannot depend on the output of `getAQlClass`, and cannot make use of regexp matching on `toString`.
- The query and any associated libraries and tests must not cause any compiler warnings to be emitted (such as use of deprecated functionality or missing `override` annotations).
5. **Results**
- The query must have at least one true positive result on some revision of a real project.
6. **Contributor License Agreement**
- The contributor can satisfy the [CLA](CONTRIBUTING.md#contributor-license-agreement).
## Non-requirements
Other criteria typically required for our standard queries and libraries are not required for experimental queries and libraries. In particular, fully disciplined query [metadata](docs/query-metadata-style-guide.md), query [help](docs/query-help-style-guide.md), tests, a low false positive rate and performance tuning are not required (but nonetheless recommended).

79
docs/supported-queries.md Normal file
View File

@@ -0,0 +1,79 @@
# Supported CodeQL queries and libraries
Queries and libraries outside [the `experimental` directories](experimental.md) are _supported_ by GitHub, allowing our users to rely on their continued existence and functionality in the future:
1. Once a query or library has appeared in a stable release, a one-year deprecation period is required before we can remove it. There can be exceptions to this when it's not technically possible to mark it as deprecated.
2. Major changes to supported queries and libraries are always announced in the [change notes for stable releases](../change-notes/).
3. We will do our best to address user reports of false positives or false negatives.
Because of these commitments, we set a high bar for accepting new supported queries. The requirements are detailed in the rest of this document.
## Steps for introducing a new supported query
The process must begin with the first step and must conclude with the final step. The remaining steps can be performed in any order.
1. **Have the query merged into the appropriate `experimental` subdirectory**
See [CONTRIBUTING.md](../CONTRIBUTING.md).
2. **Write a query help file**
Query help files explain the purpose of your query to other users. Write your query help in a `.qhelp` file and save it in the same directory as your query. For more information on writing query help, see the [Query help style guide](query-help-style-guide.md).
- Note, in particular, that almost all queries need to have a pair of "before" and "after" examples demonstrating the kind of problem the query identifies and how to fix it. Make sure that the examples are actually consistent with what the query does, for example by including them in your unit tests.
- At the time of writing, there is no way of previewing help locally. Once you've opened a PR, a preview will be created as part of the CI checks. A GitHub employee will review this and let you know of any problems.
3. **Write unit tests**
Add one or more unit tests for the query (and for any library changes you make) to the `ql/<language>/ql/test/experimental` directory. Tests for library changes go into the `library-tests` subdirectory, and tests for queries go into `query-tests` with their relative path mirroring the query's location under `ql/<language>/ql/src/experimental`.
See the section on [Testing custom queries](https://help.semmle.com/codeql/codeql-cli/procedures/test-queries.html) in the [CodeQL documentation](https://help.semmle.com/codeql/) for more information.
4. **Test for correctness on real-world code**
Test the query on a number of large real-world projects to make sure it doesn't give too many false positive results. Adjust the `@precision` and `@problem.severity` attributes in accordance with the real-world results you observe. See the advice on query metadata below.
You can use the LGTM.com [query console](https://lgtm.com/query) to get an overview of true and false positive results on a large number of projects. The simplest way to do this is to:
1. [Create a list of prominent projects](https://lgtm.com/help/lgtm/managing-project-lists) on LGTM.
2. In the query console, [run your query against your custom project list](https://lgtm.com/help/lgtm/using-query-console).
3. Save links to your query console results and include them in discussions on issues and pull requests.
5. **Test and improve performance**
There must be a balance between the execution time of a query and the value of its results: queries that are highly valuable and broadly applicable can be allowed to take longer to run. In all cases, you need to address any easy-to-fix performance issues before the query is put into production.
QL performance profiling and tuning is an advanced topic, and some tasks will require assistance from GitHub employees. With that said, there are several things you can do.
- Understand [the evaluation model of QL](https://help.semmle.com/QL/ql-handbook/evaluation.html). It's more similar to SQL than to any mainstream programming language.
- Most performance tuning in QL boils down to computing as few tuples (rows of data) as possible. As a mental model, think of predicate evaluation as enumerating all combinations of parameters that satisfy the predicate body. This includes the implicit parameters `this` and `result`.
- The major libraries in CodeQL are _cached_ and will only be computed once for the entire suite of queries. The first query that needs a cached _stage_ will trigger its evaluation. This means that query authors should usually only look at the run time of the last stage of evaluation.
- In [the settings for the VSCode extension](https://help.semmle.com/codeql/codeql-for-vscode/reference/settings.html), check the box "Running Queries: Debug" (`codeQL.runningQueries.debug`). Then find "CodeQL Query Server" in the VSCode Output panel (View -> Output) and capture the output when running the query. That output contains timing and tuple counts for all computed predicates.
- To clear the entire cache, invoke "CodeQL: Clear Cache" from the VSCode command palette.
6. **Make sure your query has the correct metadata**
For the full reference on writing query metadata, see the [Query metadata style guide](query-metadata-style-guide.md). The following constitutes a checklist.
a. Each query needs a `@name`, a `@description`, and a `@kind`.
b. Alert queries also need a `@problem.severity` and a `@precision`.
- The severity is one of `error`, `warning`, or `recommendation`.
- The precision is one of `very-high`, `high`, `medium` or `low`. It may take a few iterations to get this right.
- Currently, LGTM runs all `error` or `warning` queries with a `very-high`, `high`, or `medium` precision. In addition, `recommendation` queries with `very-high` or `high` precision are run.
- However, results from `error` and `warning` queries with `medium` precision, as well as `recommendation` queries with `high` precision, are not shown by default.
c. All queries need an `@id`.
- The ID should be consistent with the ids of similar queries for other languages; for example, there is a C/C++ query looking for comments containing the word "TODO" which has id `cpp/todo-comment`, and its C# counterpart has id `cs/todo-comment`.
d. Provide one or more `@tags` describing the query.
- Tags are free-form, but we have some conventions, especially for tagging security queries with corresponding CWE numbers.
7. **Move your query out of `experimental`**
- The structure of an `experimental` subdirectory mirrors the structure of its parent directory, so this step may just be a matter of removing the `experimental/` prefix of the query and test paths. Be sure to also edit any references to the query path in tests.
- Add the query to one of the legacy suite files in `ql/<language>/config/suites/<language>/` if it exists. Note that there are separate suite directories for C and C++, `c` and `cpp` respectively, and the query should be added to one or both as appropriate.
- Add a release note to `change-notes/<next-version>/analysis-<language>.md`.