Docs: Query classification and display

Converted from Semmle wiki
2026-04-25 08:45:14 +02:00 · 2020-07-27 10:06:16 +01:00
parent bb5b161d72
commit 4094fa9db3
1 changed files with 125 additions and 0 deletions
--- a/docs/query-classification-and-display.md
+++ b/docs/query-classification-and-display.md
@@ -0,0 +1,125 @@
+# Query classification and display
+
+Attributable Queries
+
+The results of some queries are unsuitable for attribution to individual
+developers. Most of them have a threshold value on which they trigger,
+for example all metric violations and statistics based queries. The
+results of such queries would all be attributed to the person pushing
+the value over (or under) the threshold. Some queries only trigger
+another one doesn't. An example of this is the MaybeNull query which
+only triggers if the AlwaysNull query doesn't. A small change in the
+data flow could make an alert switch from AlwayNull to MaybeNull (or
+vice versa). As a result we attribute both a fix and an introduction to
+the developer that changed the data flow. For this particular example
+the funny attribution results are more a nuisance than a real problem;
+the overall alert count remains unchanged. However, for the duplicate
+and similar code queries the effects can be much more severe, as they
+come in versions for "duplicate file" and "duplicate function" among
+many others, where "duplicate function" only triggers if "duplicate
+file" didn't. As a result adding some code to a duplicate file might
+result in a "fix" of a "duplicate file" alert and an introduction of
+many "duplicate function" alerts. This would be highly unfair.
+Currently, on the duplicate and similar code queries exhibit this
+"exchanging one for many" alerts when trying to attribute their results.
+Therefore we currently exclude all duplicate code related alerts from
+attribution.
+
+The following queries are excluded from attribution:
+
+  - Metric violations, i.e. the ones with metadata properties like ` 
+    @(error|warning|recommendation)-(to|from)  `
+  - Queries with tag `  non-attributable `
+
+`   `
+
+<div>
+
+<div>
+
+This check is applied when the results of a single attribution are
+loaded into the datastore. This means that any change to this behaviour
+will only take effect on newly attributed revisions and the historical
+data is unchanged.
+
+</div>
+
+</div>
+
+ 
+
+Query severity and precision
+
+We currently classify queries in on two axes, with some additional tags.
+Those axes are severity and precision, and are defined using the
+query-metadata properties `@problem.severity` and `@precision`.
+
+For severity, we have the following categories:
+
+  - Error
+  - Warning
+  - Recommendation
+
+These categories may change in the future.
+
+For precision, we have the following categories:
+
+  - Very-high
+  - High
+  - Medium
+  - Low
+
+As [usual](https://en.wikipedia.org/wiki/Precision_and_recall),
+precision is defined as the percentage of query results that are true
+positives, i.e., precision = number of true positives / (number of true
+positives + number of false positives). There is no hard-and-fast rule
+for which precision ranges correspond to which categories.
+
+We expect these categories to remain unchanged for the forseeable
+future.
+
+### A note on precision
+
+Intuitively, precision measures how well the query does at finding the
+results it is supposed to find, i.e., how well it implements its
+(informal, unwritten) rule. So how precise a query is depends very much
+on what we consider that rule to be. We generally try to sharpen our
+rules to focus on results that a developer might actually be interested
+in.
+
+  
+
+## Which queries to run and display on LGTM
+
+The following queries are run:
+
+<div class="table-wrap">
+
+| Precision:     | V. high     | High    | Medium  | Low |
+| -------------- | ----------- | ------- | ------- | --- |
+| Error          | ****Yes**** | **Yes** | **Yes** | No  |
+| Warning        | ****Yes**** | **Yes** | **Yes** | No  |
+| Recommendation | ****Yes**** | **Yes** | No      | No  |
+
+</div>
+
+The following queries have their results displayed by default:
+
+<div class="table-wrap">
+
+| Precision:     | V. high     | High        | Medium | Low |
+| -------------- | ----------- | ----------- | ------ | --- |
+| Error          | Yes         | ****Yes**** | No     | No  |
+| Warning        | ****Yes**** | ****Yes**** | No     | No  |
+| Recommendation | ****Yes**** | No          | No     | No  |
+
+</div>
+
+  
+Results for queries that are run but not displayed by default can be
+made visible by editing the project configuration.
+
+  
+ Queries from custom query packs (in-repo or site-wide) are always run
+and displayed by default. The can be hidden by editing the project
+config, and "disabled" by removing them from the query pack.