Before this change, any function that has a parameter that was called
password/credentials would be treated as returning sensitive data of that
kind. `py/clear-text-logging-sensitive-data` would alert if one of these are
logged, which has a LOT of false-positives.
This makes it more obvious to the evaluator that it is a good predicate to pick as a sentinel, and in practice we mostly just have one configuration in scope anyway.
This was brought up on the LGTM.com forums here:
https://discuss.lgtm.com/t/warn-when-always-failing-assert-is-reachable-rather-than-unreachable/2436
Essentially, in a complex chain of `elif` statements, like
```python
if x < 0:
...
elif x >= 0:
...
else:
...
```
the `else` clause is redundant, since the preceding conditions completely
exhaust the possible values for `x` (assuming `x` is an integer). Rather than
promoting the final `elif` clause to an `else` clause, it is common to instead
raise an explicit exception in the `else` clause. During execution, this
exception will never actually be raised, but its presence indicates that the
preceding conditions are intended to cover all possible cases.
I think it's a fair point. This is a clear instance where the alert, even if it
is technically correct, is not useful for the end user.
Also, I decided to make the exclusion fairly restrictive: it only applies if
the unreachable statement is an `assert False, ...` or `raise ...`, and only
if said statement is the first in the `else` block. Any other statements will
still be reported.
The `AliasedUse` instruction is supposed to represent future uses of aliased memory after the function returns. Since local variables from that function are no longer allocated after the function returns, the `AliasedUse` instruction should access only the set of aliased locations that does not include locals from the current stack frame.
Hopefully it is more clear that you can get multiple results from getABaseType
because of multiple inheritance, and not because we are following the chain of
inheritance
It's not used anywhere outside `VoidContext.qll`, where it was defined.
The use in `VoidContext.qll` is 10 years old and was a workaround for an
extractor bug that no longer exists.
This new instruction is the dual of the existing `AliasedDefinition` instruction. Whereas that instruction defines the contents of aliased memory before the function was called, `AliasedUse` represents the potential use of all aliased memory after the function returns. This ensures that writes to aliased memory do not appear "dead", even if there are no further reads from aliased memory within the function itself.
The one interesting piece that needed to be fixed up was the type of an `Indirect[Read|Write]SideEffect` operand/result. If the parameter type is a pointer or reference to an incomplete type, we need to set the type of the side effect memory access to `Unknown`, because we don't model incomplete types in the IR type system.
I also added minimal support for `__assume` (generated as a `NoOp`), because lack of `__assume` support got in the way of debugging the other issue above.
This change gives a slight performance improvement and makes the QL code
shorter. It introduces some magic numbers in the code, but those are
confined to the `Pos` and `Spec` classes.
We get a speed-up because the evaluator has built-in support for integer
literals in the `OUTPUT` of `JOIN` operations, whereas `newtype`s have
to be explicitly joined on. As a result, a predicate like
`CFG::straightLineSparse#ffff` drops from 262 pipeline nodes to 242.
I measured performance on https://github.com/jluttine/suitesparse, which
is one of the projects that had the biggest slowdown when enabling the
QL CFG on lgtm.com. I took two measurements before this change and two
after. The `CFG.qll` stage took 117s and 112s before, and it took 106s
and 107s after.
Eventually these files will subsume the current `queries.xml` files
at the top of query-containing and library directories. For now they're
just here to support internal testing of the tooling support for them
we're writing on.
Format and contents is a work in progress. If you're not in Semmle,
don't depend on anything here making sense (or staying stable) until
you see the version tags increase to something nonzero.
Added some examples and reworded portions of guards.rst. There's still
more to do - examples for ensures and compares predicates, and possibly
rewording the description of the compares predicates
We now detect patterns like
f(bool cond){
if(cond)
then A
else B
and prune branches for calls like f(true) or f(false).
This pruning is done both in the local (bigstep) flow graph
as well as in the inter-procedural dataflow graph.
This test used `getAQlClass`, which caused it to break when new classes
were added anywhere in the libraries. That's now avoided by switching to
`getCanonicalQLClass`. It turns out that `getCanonicalQLClass` didn't
support arithmetic expressions on complex numbers, so that support had
to be added.
This fixes the test `cpp/ql/test/library-tests/constexpr_if/cfg.ql`,
which broke when the QL CFG was enabled.
The new cases are just copy-pastes of the `IfStmt` cases (they don't
share a useful common superclass) with added checks for whether their
constant value equals 0.
Hopefully it does not make a difference in practice whether
uninstantiated template functions are considered to have control flow
through initializers of their static variables.
This change is needed when enabling the QL CFG on certain snapshots such
as notaz/picodrive. It removes the `bbNotInLoop` predicate, which was
always a liability because it's inherently quadratic. The real slowdown
came in `skipLoop`, where all true-upon-entry loops were crossed with
all definitions of variables that should take their definition from the
loop body.
The `Operation` class is abstract, and extending it caused cached stages
to be recomputed all the way down to the AST. This meant that the leap
year queries evaluated their own copy of SSA and data flow.
The translation of static fields now uses `VariableAddress` instead of `FieldAddress`. This fixes the logic as well as the "field address without qualifier address" sanity check.
Fixed a problem when the translating the compiler generated constructors that caused some sanity errors (since they have no body, when translating the constructor block fragmentation happened). Fixed this by skipping the translation of the body, if it does not exist (when translating a function).
This commit implements the language-neutral IR type system for C#. It mostly follows the same pattern as C++, modified to fit the C# type system. All object references, pointers, and lvalues are represented as `IRAddress` types. All structs and generic parameters are implemented as `IRBlobType`. Function addresses get a single `IRFunctionAddressType`.
I had to fix a couple places in the original IR type system where I didn't realize I was still depending on language-specific types. As part of this, `CSharpType` and `CppType` now have a `hasUnspecifiedType()` predicate, which is equivalent to `hasType()`, except that it holds only for the unspecified version of the type. This predicate can go away once we remove the IR's references to the underlying `Type` objects.
All C# IR tests pass without modification, but only because this commit continues to print the name of `IRUnknownType` as `null`, and `IRFunctionAddressType` as `glval<null>`. These will be fixed separately in a subsequent commit in this PR.
A constructed type, `C<T>`, where `T` is the type parameter of `C`, is represented
in the database as the corresponding unbound generict type `C<>`. Consequently, the
type conversion library, which only considers `ConstructedType`s, does not handle
all implicit conversions. For example, in
```
interface I<in T1, T2> where T1 : C
```
there should be an implicit conversion from `I<C, T2>` to `I<T1, T2>` (=`I<>`).
We could find no reason for using `Builtin::builtin` instead of
`Builtin::special`. Since all the other base types use `special`, and the old
Object API is using `special`, let's also do that :)
The C++ IR currently has a very clunky way of specifying the type of an IR entity (`Instruction`, `Operand`, `IRVariable`, etc.). There are three separate predicates: `getType()`, `isGLValue()`, and `getSize()`. All three are necessary, rather than just having a `getType()` predicate, because some IR entities have types that are not represented via an existing `Type` object in the AST. Examples include the type for an lvalue returned from a `VariableAddress` instruction, the type for an array slice being zero-initialized in a variable initializer, and several others. It is very easy for QL code to just check the `getType()` predicate, while forgetting to use `isGLValue()` to determine if that type is the actual type of the entity (the prvalue case) or the type referred to by a glvalue entity. Furthermore, the C++ type system creates potentially many different `Type` objects for the same underlying type (e.g. typedefs, using declarations, `const`/`volatile` qualifiers, etc.), making it more difficult to tell when two entities have semantically equivalent types.
In addition, other languages for which we want to enable the IR have somewhat different type systems. The various language type systems differ in their structure, although they tend to share the basic building blocks necessary for the IR.
To address all of the above problems, I've introduced a new class hierarchy, rooted at the class `IRType`, that represents a bare-bones type system that is independent of source language (at least across C/C++/C#/Java). A type's identity is based on its kind (signed integer, unsigned integer, floating-point, Boolean, blob, etc.), size and in the case of blob types, a "tag" to differentiate between different classes and structs. No distinction is made between, say `signed int` and plain `int`, or between different language integer types that have the same signedness and size (e.g. `unsigned int` vs. `wchar_t` on Linux). `IRType` is intended for use by language-agnostic IR-based analyses, including range analysis, dataflow, SSA construction, and alias analysis. The set of available `IRType`s is determined by predicate provided by the language library implementation (e.g. `hasSignedIntegerType(int byteSize)`.
In addition to `IRType`, each language now defines a type alias named `LanguageType`, representing the type of an IR entity in more language-specific terms. The only predicate requried on `LanguageType` is `getIRType()`, which returns the single `IRType` object for the language-neutral representation of that `LanguageType`. All other predicates on and subclasses of `LanguageType` are language-specific. There may be many instances of `LanguageType` that map to a given `IRType`, to allow for typedefs, etc.
Most of the changes are mechanical changes in the IR construction code, to return the correct type for each IR entity. SSA construction has also been updated to avoid dependencies on language-specific types.
I have not yet removed the original `getType()` predicates that just return `Type`. These can be removed once we move the remaining existing libraries to use `IRType`.
Test results are, by design, pretty much unchanged. Once case changed for inline asm, because the previously IR generation for it played a little fast and loose with the input/output expressions. The test case now includes both input and output variables. The generated IR for `Conditional_LValue` is now more correct, because we now have a way to represent an lvalue of an lvalue. `syntax-zoo` is still a hot mess. Most of the changed outputs are due to wobble from having multiple functions with the same name, but with a slightly different order of evaluation due to the type changes. Others are wobble from already-invalid IR. A couple non-wobbly places have improved slightly, though.
The C# part of this change is waiting for #2005 to be merged, since that has some of the necessary C# implementation.
The translation of `IsExpr` created a sanity check to fail since it generated
a Phi node that had only one source: if a variable was declared as part of the `IsExpr`, a conditional branch was generated, and the variable was defined only in the true successor; this has been changes so that the declaration happens before the conditional branch, and the variable is uninitialized (this removed the need for the `isInitializedByElement` predicate from `TranslatedDeclarationBase`, so that has been removed) and only the assignment happens in the true successor block (so now the two inputs of the Phi node are the result of the `Uninitialized` instruction and the `Store` instruction from the true successor block).
Generation of IDs for namespace members has been fixed to generate
unique IDs for variables of the same name but in different namespaces.
Update the same_name test to validate this.
More accurate type sizes using language specific predicates from `IRCSharpLanguage.qll`.
Added immediate operands for some `PointerX` (add, sub) instructions.
Some other minor consistency fixes.
This fixes false positives that arise when a call such as `f.apply` can either be interpreted as a reflective invocation of `f`, or a normal call to method `apply` of `f`.
The key insight here is that `HC_FieldCons` and `HC_Array` are
functionally determined by the things that arise in another
recursive call. Lifting them to their own predicate, therefore,
reduces nonlinearity and constrains the join order in a way that
cannot be asymptotically bad -- and, indeed, makes quite a big
difference in practice.
Switching `security.TaintTracking` to use `DefaultTaintTracking` causes
us to lose a result from `UnboundedWrite.ql`, while this commit restores
it:
diff --git a/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-120/CERT/STR35-C/UnboundedWrite.expected b/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-120/CERT/STR35-C/UnboundedWrite.expected
index 1eba0e52f0e..d947b33b9d9 100644
--- a/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-120/CERT/STR35-C/UnboundedWrite.expected
+++ b/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-120/CERT/STR35-C/UnboundedWrite.expected
@@ -1,2 +1,3 @@
+| main.c:54:7:54:12 | call to strcat | This 'call to strcat' with input from $@ may overflow the destination. | main.c:93:15:93:18 | argv | argv |
| main.c:99:9:99:12 | call to gets | This 'call to gets' with input from $@ may overflow the destination. | main.c:99:9:99:12 | call to gets | call to gets |
| main.c:213:17:213:19 | buf | This 'scanf string argument' with input from $@ may overflow the destination. | main.c:213:17:213:19 | buf | buf |
This is one step toward implementing the taint-tracking wrapper in terms
of `Instruction` rather than `Expr`.
This leads to a few duplicate results in `TaintedAllocationSize.ql`
because the library now considers `sizeof(int)` to be just as
predictable as `4`, whereas the `security.TaintTracking` library does
not consider `sizeof` to be predictable. I think it's simpler to accept
the duplicate results since they are ultimately a quirk of the query,
not the library.
The following is the diff between (a) replacing `TaintTracking.qll` with
a link to `DefaultTaintTracking.qll` and (b) additionally applying this
commit.
diff --git a b
--- a/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/TaintedAllocationSize.expected
+++ b/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/TaintedAllocationSize.expected
@@ -1,5 +1,8 @@
| test.cpp:42:31:42:36 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
+| test.cpp:43:31:43:36 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
| test.cpp:43:38:43:63 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
+| test.cpp:45:31:45:36 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
| test.cpp:48:25:48:30 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
| test.cpp:49:17:49:30 | new[] | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
+| test.cpp:52:21:52:27 | call to realloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
| test.cpp:52:35:52:60 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
--- a/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-190/CERT/INT04-C/int04.expected
+++ b/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-190/CERT/INT04-C/int04.expected
@@ -1 +1,2 @@
| int04c.c:21:29:21:51 | ... * ... | This allocation size is derived from $@ and might overflow | int04c.c:14:30:14:35 | call to getenv | user input (getenv) |
+| int04c.c:22:33:22:38 | call to malloc | This allocation size is derived from $@ and might overflow | int04c.c:14:30:14:35 | call to getenv | user input (getenv) |
Added support for pointers and pointer operations and made sure all loads are correct.
Added support for the unsafe stmt.
Added basic support for the fixed stmt (for now we ignore the pinning).
Fixed a bug where assignments of the form `Object obj1 = obj2` would not generate a load instruction for `obj2` (see `raw_ir.expected`).
Added an extra `Load` for object creations that involve structs. This is because the variable that represents the struct should hold the actual struct, not a reference to it.
Refactored the piece of code that decided if a particular expr needs a load instruction and improved the code sharing between `TranslatedExpr.qll` and `TranslatedElement.qll` by creating 2 predicates that tell if a certain expr does or does not need a load.
Added support for `in`, `out` and `ref` parameters.
The user knows that an expression functionally determines its
hashCons value, and that an expression functionally determines
its number of children, but this is not provable from the
definitions, and so not usable by the optimiser. By storing
the result of those known-functional calls in a variable,
rather than repeating the call, we enable better join orders.
Added basic support for the using stmt, checked stmt, unchecked stmt
Note that the translations do not use the compiler generated element framework and hence they are just rough approximations. For accuracy, in the future their translation should use it.
These new results seem better than the previous ones, but the previous
ones are still there. Perhaps the `Buffer.qll` library could use some
adjustment, but this seems like an improvement in isolation.
The data flow library conflates pointers and their objects in some
places but not others. For example, a member function call `x.f()` will
cause flow from `x` of type `T` to `this` of type `T*` inside `f`. It
might be ideal to avoid that conflation, but that's not realistic
without using the IR.
We've had good experience in the taint tracking library with conflating
pointers and objects, and it improves results for field flow, so perhaps
it's time to try it out for all data flow.
This is for symmetry with `exprNode` etc., and it should be handy for
the same reasons. I found one caller of `asInstruction` that got simpler
by using the new predicate instead.
Added support for continue stmt.
Minimal refactoring of the `TranslatedSpecificJump` classes.
Added a new test file, `jumps.cs` and updated the expected output.
This currently relies on the fact that qltest includes `ql/csharp/ql/src/Metrics` in addition to `ql/csharp/ql/src` on its search path when run internally, which is inconsistent with the other languages. Since this is the only test that relies on it, I'd like to update it and get rid of the extra search root eventually.
This commit adds a `semmle.code.cpp.ir.dataflow.DefaultTaintTracking`
library that's API-compatible with the
`semmle.code.cpp.security.TaintTracking` library. The new library is
implemented on top of the IR data flow library.
The idea is to evolve this library until it can replace
`semmle.code.cpp.security.TaintTracking` without decreasing our SAMATE
score. Then we'll have the IR in production use, and we will have one
less taint-tracking library in production.
The way object creation was translated has been changed: now creations are treated as expressions.
The main motivation for this was the inability to have creation expressions as arguments to
function calls (a test case has been added to showcase this).
All code that dealt with creation expressions has been moved from `TranslatedInitialization.qll` to `TranslatedExpr.qll`.
Some light refactoring has also been done, mainly removing code that was useless after the changes mentioned above.
These partial defs don't do any harm, but they could hurt performance.
In typical C++ snapshots, between 5% and 20% of all calls are to `const`
functions.
Comments like these will make the autoformatter produce bad indentation.
For the record (not for explainability), these issues were found with
git grep -P -A1 '^( */\*| +\*( |$))(.(?!\*/))*$' cpp/ql/src/'**/*.ql*' |grep -B10 'qll\?- [^*]*$'
It superficially looks like `@param` is supported in QLDoc, but this is
mostly an accident of how its parser works. Attributes starting with `@`
are only intended to be used in the top-level QLDoc of a query, and
there can only be one of each attribute. If there are multiple `@param`
entries, the QLDoc parser will only keep the first one.
Even though `parseConvSpec` in `Scanf.qll` documented multiple
parameters, only the first one would be shown in an IDE. The
corresponding predicate in `Print.qll` documented only its first
parameter, perhaps because of an autoformatting accident earlier in
time. I've attempted to reconstruct documentation for its other
parameters based on its sibling in `Scanf.qll`.
These functions were overly complicated, and the comments explaining the
complications did not auto-format well. A reference type cannot have
specifiers on it, so it's fine to call `getUnspecifiedType` before
checking if it's a reference type.
The autoformatter is opinionated about comment styles and assumes that
"short" comments attach to the following item while "long" comments are
items themselves. I found top-level short comments with the following
two commands and then searched the output for empty lines that came
after the comment.
git grep -A1 '^/\* .*\*/' cpp/ql/src
git grep -A1 '^//' 'cpp/ql/src/**/*.ql*'
The foreach was erroneously labelling the `True` and `False` edges as backedges.
Added a case for the compiler generated while in the predicate `getInstructionBackEdgeSuccessor/2`
from the file `IRConstruction.qll` so that only the edges from inside the body are labeled as back edges.
Added a framework for the translation of compiler generated elements, so that the process of adding a new desugaring process is almost mechanical.
The files in `internal` serve as the superclasses for all the compiler generated elements.
The file `Common.qll` captures common patterns for the compiler generated code to improve code sharing (by pattern I mean an element that appears in multiple desugarings). For example the `try...finally` pattern appears in the desugaring process of both the `lock` and the `foreach` stmts, so a class the provides a blueprint for this pattern is exposed. Several other patterns are present.
The expected output has also been updated (after a rebase) and it should be ignored.
The `raw/internal` folder has been restructured to better enhance code sharing between compiler generated elements and AST generated elements.
The translated calls classes have been refactored to better fit the C# library.
A new folder has been added, `common` that provides blueprints for the classes that deal with translations of calls, declarations, exprs and conditions.
Several `TranslatedX.qll` files have been modified so that they use those blueprint classes.
We haven't come to a conclusion on whether these two types will remain
identical forever. To make sure we're able to change it in the future,
this change makes it impossible to cast between the two types. Callers
must use the `asInstruction` member predicate to convert.
g++ doesn't support this code:
sorry, unimplemented: non-trivial designated initializers not supported
twoIntFields sSwapped = { .m2 = source(), .m1 = 0 };
so we need to build it in clang mode.
Data flow probably never worked when a variable declared in a
`ConditionDeclExpr` was modeled with `BlockVar`. That combination did
not come up in testing before the last commit.
The data flow library conflates pointers and objects enough for the
`definitionByReference` predicate to be too strict in some cases. It was
too permissive in other cases that are now (or will be) handled better
by field flow.
See also the change note entry.
Partial data flow had a semantic merge conflict with this branch. The
problem is that partial data flow doesn't (and shouldn't) cause the
initial pruning steps to run, but the length-2 access paths depend on
the `consCand` information that comes from that initial pruning. The
solution is to restore the old `AccessPath` class, now called
`PartialAccessPath` for use only by partial data flow.
With this change, partial data flow will in some cases allow more field
flow than non-partial data flow.
This commit removes fields from the responsibilities of `FlowVar.qll`.
The treatment of fields in that file was slow and imprecise.
It then adds another copy of the shared global data flow library, used
only to find local field flow, and it exposes that local field flow
through `localFlow` and `localFlowStep`.
This has a performance cost. It adds two cached stages to any query that
uses `localFlow`: the stage from `DataFlowImplCommon`, which is shared
with all queries that use global data flow, and a new stage just for
`localFlowStep`.
Fixed a bug in `TranslatedArrayExpr` that would prevent the element to produce the correct instruction result, hence creating problems with loads and stores.
`ElementsAddress` opcode now inherits from the `UnaryOpcode`, as it should.
Began working o inheritance, polymorphism and constructor init. Correct code is produced for them (though some more work is needed to accurately treat conversions between classes).
Removed commented code.
Added classes to properly deal with constructor init and modified and refactored TranslatedFunction to accomodate for the changes.
Properties and property access produce correct code.
Fixed a function qualifier bug in `TranslatedCall.qll`.
Added a new class to translate `ExprStmt`s whose expr is an `AssignExpr` whose lvalue is an accessor call: we translate only the accessor call in for the translated AST.
Broke down the class `TranslatedJump` to have more control on the IR control flow.
Now GotoLabelStmt, GotoCaseStmt, GotoDefaultStmt and BreakStmt are translated separately.
This also fixes an issue when having a switch as the last statement of a void function would create an incorrect CFG.
Correct code is now produced for increment and decrement expressions
Modified producesExprResult() and TTranslatedLoad() so that no loads are done from outside the crement exprs and that the VariableAddress generated from the access of the operator variable is recognized as an expr that produces result.
Throw statements now give correct code, apart from the case of rethrows: need to make explicit the fact that a finally block is executed even if stack unwinding happens.
Added 2 new classes to TranslatedStmt.qll, one for throws that have an exception, one for rethrows.
Fixed a bug in TranslatedDeclarationEntry.qll where some local declaration would be missed.
Changed toString into getQualifiedName for more clarity when generating the instructions in Instruction.qll.
Some general refactoring in TranslatedExpr.qll and TranslatedStmt.qll.
Added tests to showcase the instructions generated for object creation and object initialization
Updated raw_ir.expected
PrintIR now uses the qualified name (with types) when printing the IR for more clarity
Correct code is now generated from ObjectCreation exprs and ObjectInitializer exprs.
Removed TranslatedFieldInitialization and its subclasses and further refactored TranslatedInitialization
Introduced 2 new tags to support multidimensional arrays
Multidimensional arrays produce correct code
All types of initializations for arrays work correctly
Correct code is now generated for array initialization and element access.
Created a new binary Opcode, `IndexedElementAddress`, used to get the address of an array element, similar to how CIL does it.
Fixed simple variable initialization.
Now variables addressing correctly gets translated
Added a new test case to showcase this
Changed VoidType to ObjectType for the type of the 2 instructions
generated by as the prelude of a translated function
(UnmodeledDefinition and AliasedDefinition)
Ported basic functionalities from the C++ IR
Added a simple test that passes the IR sanity check and produces
sensible IR (together with the .expected files) to the C# test folder
about: Tell us about an alert that shouldn't be reported
title: LGTM.com - false positive
labels: false-positive
assignees: ''
---
**Description of the false positive**
<!-- Please explain briefly why you think it shouldn't be included. -->
**URL to the alert on the project page on LGTM.com**
<!--
1. Open the project on LGTM.com.
For example, https://lgtm.com/projects/g/pallets/click/.
2. Switch to the `Alerts` tab. For example, https://lgtm.com/projects/g/pallets/click/alerts/.
3. Scroll to the alert that you would like to report.
4. Click on the right most icon `View this alert within the complete file`.
5. A new browser tab opens. Copy and paste the page URL here.
For example, https://lgtm.com/projects/g/pallets/click/snapshot/719fb7d8322b0767cdd1e5903ba3eb3233ba8dd5/files/click/_winconsole.py#xa08d213ab3289f87:1.
| Shift out of range (`js/shift-out-of-range`| Fewer false positive results | This rule now correctly handles BigInt shift operands. |
| Superfluous trailing arguments (`js/superfluous-trailing-arguments`) | Fewer false-positive results. | This rule no longer flags calls to placeholder functions that trivially throw an exception. |
| Undocumented parameter (`js/jsdoc/missing-parameter`) | No changes to results | This rule is now run on LGTM, although its results are still not shown by default. |
| Missing space in string concatenation (`js/missing-space-in-concatenation`) | Fewer false positive results | The rule now requires a word-like part exists in the string concatenation. |
| Hard-coded Japanese era start date (`cpp/japanese-era/exact-era-date`) | reliability, japanese-era | This query is a combination of two old queries that were identical in purpose but separate as an implementation detail. This new query replaces Hard-coded Japanese era start date in call (`cpp/japanese-era/constructor-or-method-with-exact-era-date`) and Hard-coded Japanese era start date in struct (`cpp/japanese-era/struct-with-exact-era-date`). |
| Hard-coded Japanese era start date in call (`cpp/japanese-era/constructor-or-method-with-exact-era-date`) | Deprecated | This query has been deprecated. Use the new combined query Hard-coded Japanese era start date (`cpp/japanese-era/exact-era-date`) instead. |
| Hard-coded Japanese era start date in struct (`cpp/japanese-era/struct-with-exact-era-date`) | Deprecated | This query has been deprecated. Use the new combined query Hard-coded Japanese era start date (`cpp/japanese-era/exact-era-date`) instead. |
| Hard-coded Japanese era start date (`cpp/japanese-era/exact-era-date`) | More correct results | This query now checks for the beginning date of the Reiwa era (1st May 2019). |
| Sign check of bitwise operation (`cpp/bitwise-sign-check`) | Fewer false positive results | Results involving `>=` or `<=` are no longer reported. |
| Too few arguments to formatting function (`cpp/wrong-number-format-arguments`) | Fewer false positive results | Fixed false positives resulting from mistmatching declarations of a formatting function. |
| Too many arguments to formatting function (`cpp/too-many-format-arguments`) | Fewer false positive results | Fixed false positives resulting from mistmatching declarations of a formatting function. |
| Unclear comparison precedence (`cpp/comparison-precedence`) | Fewer false positive results | False positives involving template classes and functions have been fixed. |
| Comparison of narrow type with wide type in loop condition (`cpp/comparison-with-wider-type`) | Higher precision | The precision of this query has been increased to "high" as the alerts from this query have proved to be valuable on real-world projects. With this precision, results are now displayed by default in LGTM. |
## Changes to QL libraries
* The data-flow library has been extended with a new feature to aid debugging.
Instead of specifying `isSink(Node n) { any() }` on a configuration to
explore the possible flow from a source, it is recommended to use the new
`Configuration::hasPartialFlow` predicate, as this gives a more complete
picture of the partial flow paths from a given source. The feature is
disabled by default and can be enabled for individual configurations by
overriding `int explorationLimit()`.
* The data-flow library now supports flow out of C++ reference parameters.
* The data-flow library now allows flow through the address-of operator (`&`).
* The `DataFlow::DefinitionByReferenceNode` class now considers `f(x)` to be a
definition of `x` when `x` is a variable of pointer type. It no longer
considers deep paths such as `f(&x.myField)` to be definitions of `x`. These
changes are in line with the user expectations we've observed.
* The data-flow library now makes it easier to specify barriers/sanitizers
arising from guards by overriding the predicate
`isBarrierGuard`/`isSanitizerGuard` on data-flow and taint-tracking
configurations respectively.
* There is now a `DataFlow::localExprFlow` predicate and a
`TaintTracking::localExprTaint` predicate to make it easy to use the most
common case of local data flow and taint: from one `Expr` to another.
* The member predicates of the `FunctionInput` and `FunctionOutput` classes have been renamed for
clarity (e.g. `isOutReturnPointer()` to `isReturnValueDeref()`). The existing member predicates
have been deprecated, and will be removed in a future release. Code that uses the old member
predicates should be updated to use the corresponding new member predicate.
* The control-flow graph is now computed in QL, not in the extractor. This can
lead to regressions (or improvements) in how queries are optimized because
optimization in QL relies on static size estimates, and the control-flow edge
relations will now have different size estimates than before.
| Unsafe year argument for 'DateTime' constructor (`cs/unsafe-year-construction`) | reliability, date-time | Finds incorrect manipulation of `DateTime` values, which could lead to invalid dates. |
| Mishandling the Japanese era start date (`cs/mishandling-japanese-era`) | reliability, date-time | Finds hard-coded Japanese era start dates that could be invalid. |
| Dereferenced variable may be null (`cs/dereferenced-value-may-be-null`) | Fewer false positive results | More `null` checks are now taken into account, including `null` checks for `dynamic` expressions and `null` checks such as `object alwaysNull = null; if (x != alwaysNull) ...`. |
| Missing Dispose call on local IDisposable (`cs/local-not-disposed`) | Fewer false positive results | The query has been rewritten in order to identify more dispose patterns. For example, a local `IDisposable` that is disposed of by passing through a fluent API is no longer reported. |
## Removal of old queries
## Changes to code extraction
*`nameof` expressions are now extracted correctly when the name is a namespace.
## Changes to QL libraries
* The new class `NamespaceAccess` models accesses to namespaces, for example in `nameof` expressions.
* The data-flow library now makes it easier to specify barriers/sanitizers
arising from guards by overriding the predicate
`isBarrierGuard`/`isSanitizerGuard` on data-flow and taint-tracking
configurations respectively.
* The data-flow library has been extended with a new feature to aid debugging.
Instead of specifying `isSink(Node n) { any() }` on a configuration to
explore the possible flow from a source, it is recommended to use the new
`Configuration::hasPartialFlow` predicate, as this gives a more complete
picture of the partial flow paths from a given source. The feature is
disabled by default and can be enabled for individual configurations by
overriding `int explorationLimit()`.
*`foreach` statements where the body is guaranteed to be executed at least once, such as `foreach (var x in new string[]{ "a", "b", "c" }) { ... }`, are now recognized by all analyses based on the control flow graph (such as SSA, data flow and taint tracking).
* Fixed the control flow graph for `switch` statements where the `default` case was not the last case. This had caused the remaining cases to be unreachable. `SwitchStmt.getCase(int i)` now puts the `default` case last.
* There is now a `DataFlow::localExprFlow` predicate and a
`TaintTracking::localExprTaint` predicate to make it easy to use the most
common case of local data flow and taint: from one `Expr` to another.
* Data is now tracked through null-coalescing expressions (`??`).
| Continue statement that does not continue (`java/continue-in-false-loop`) | correctness | Finds `continue` statements in `do { ... } while (false)` loops. |
| Dereferenced variable may be null (`java/dereferenced-value-may-be-null`) | Fewer false positives | Certain indirect null guards involving two auxiliary variables known to be equal can now be detected. |
| Non-synchronized override of synchronized method (`java/non-sync-override`) | Fewer false positives | Results are now only reported if the immediately overridden method is synchronized. |
| Query built from user-controlled sources (`java/sql-injection`) | More results | The query now identifies arguments to `Statement.executeLargeUpdate` and `Connection.prepareCall` as SQL expressions sinks. |
| Query built from local-user-controlled sources (`java/sql-injection-local`) | More results | The query now identifies arguments to `Statement.executeLargeUpdate` and `Connection.prepareCall` as SQL expressions sinks. |
| Query built without neutralizing special characters (`java/concatenated-sql-query`) | More results | The query now identifies arguments to `Statement.executeLargeUpdate` and `Connection.prepareCall` as SQL expressions sinks. |
| Useless comparison test (`java/constant-comparison`) | Fewer false positives | Additional overflow check patterns are now recognized and no longer reported. |
## Changes to QL libraries
* The data-flow library has been extended with a new feature to aid debugging.
Instead of specifying `isSink(Node n) { any() }` on a configuration to
explore the possible flow from a source, it is recommended to use the new
`Configuration::hasPartialFlow` predicate, as this gives a more complete
picture of the partial flow paths from a given source. The feature is
disabled by default and can be enabled for individual configurations by
| Unused index variable (`js/unused-index-variable`) | correctness | Highlights loops that iterate over an array, but do not use the index variable to access array elements, indicating a possible typo or logic error. Results are shown on LGTM by default. |
| Loop bound injection (`js/loop-bound-injection`) | security, external/cwe/cwe-834 | Highlights loops where a user-controlled object with an arbitrary .length value can trick the server to loop indefinitely. Results are shown on LGTM by default. |
| Suspicious method name (`js/suspicious-method-name-declaration`) | correctness, typescript, methods | Highlights suspiciously named methods where the developer likely meant to write a constructor or function. Results are shown on LGTM by default. |
| Shell command built from environment values (`js/shell-command-injection-from-environment`) | correctness, security, external/cwe/cwe-078, external/cwe/cwe-088 | Highlights shell commands that may change behavior inadvertently depending on the execution environment, indicating a possible violation of [CWE-78](https://cwe.mitre.org/data/definitions/78.html). Results are shown on LGTM by default.|
| Use of returnless function (`js/use-of-returnless-function`) | maintainability, correctness | Highlights calls where the return value is used, but the callee never returns a value. Results are shown on LGTM by default. |
| Useless regular expression character escape (`js/useless-regexp-character-escape`) | correctness, security, external/cwe/cwe-20 | Highlights regular expression strings with useless character escapes, indicating a possible violation of [CWE-20](https://cwe.mitre.org/data/definitions/20.html). Results are shown on LGTM by default. |
| Unreachable method overloads (`js/unreachable-method-overloads`) | correctness, typescript | Highlights method overloads that are impossible to use from client code. Results are shown on LGTM by default. |
| Incomplete string escaping or encoding (`js/incomplete-sanitization`) | Fewer false-positive results | This rule now recognizes additional ways delimiters can be stripped away. |
| Client-side cross-site scripting (`js/xss`) | More results, fewer false-positive results | More potential vulnerabilities involving functions that manipulate DOM attributes are now recognized, and more sanitizers are detected. |
| Code injection (`js/code-injection`) | More results | More potential vulnerabilities involving functions that manipulate DOM event handler attributes are now recognized. |
| Hard-coded credentials (`js/hardcoded-credentials`) | Fewer false-positive results | This rule now flags fewer password examples. |
| Illegal invocation (`js/illegal-invocation`) | Fewer false-positive results | This rule now correctly handles methods named `call` and `apply`. |
| Incorrect suffix check (`js/incorrect-suffix-check`) | Fewer false-positive results | The query recognizes valid checks in more cases. |
| Network data written to file (`js/http-to-file-access`) | Fewer false-positive results | This query has been renamed to better match its intended purpose, and now only considers network data untrusted. |
| Password in configuration file (`js/password-in-configuration-file`) | Fewer false-positive results | This rule now flags fewer password examples. |
| Prototype pollution (`js/prototype-pollution`) | More results | The query now highlights vulnerable uses of jQuery and Angular, and the results are shown on LGTM by default. |
| Reflected cross-site scripting (`js/reflected-xss`) | Fewer false-positive results | The query now recognizes more sanitizers. |
| Stored cross-site scripting (`js/stored-xss`) | Fewer false-positive results | The query now recognizes more sanitizers. |
| Uncontrolled command line (`js/command-line-injection`) | More results | This query now treats responses from servers as untrusted. |
| Uncontrolled data used in path expression (`js/path-injection`) | Fewer false-positive results | This query now recognizes calls to Express `sendFile` as safe in some cases. |
| Unknown directive (`js/unknown-directive`) | Fewer false positive results | This query no longer flags uses of ":", which is sometimes used like a directive. |
## Changes to QL libraries
*`Expr.getDocumentation()` now handles chain assignments.
## Removal of deprecated queries
The following queries (deprecated since 1.17) are no longer available in the distribution:
| Clear-text logging of sensitive information (`py/clear-text-logging-sensitive-data`) | security, external/cwe/cwe-312 | Finds instances where sensitive information is logged without encryption or hashing. Results are shown on LGTM by default. |
| Clear-text storage of sensitive information (`py/clear-text-storage-sensitive-data`) | security, external/cwe/cwe-312 | Finds instances where sensitive information is stored without encryption or hashing. Results are shown on LGTM by default. |
| Binding a socket to all network interfaces (`py/bind-socket-all-network-interfaces`) | security | Finds instances where a socket is bound to all network interfaces. Results are shown on LGTM by default. |
* Asynchronous generator methods are now parsed correctly and no longer cause a spurious syntax error.
* Files in `node_modules` and `bower_components` folders are no longer extracted by default. If you still want to extract files from these folders, you can add the following filters to your `lgtm.yml` file (or add them to existing filters):
```yaml
extraction:
javascript:
index:
filters:
- include:"**/node_modules"
- include:"**/bower_components"
```
* Recognition of CommonJS modules has improved. As a result, some files that were previously extracted as
global scripts are now extracted as modules.
* Top-level `await` is now supported.
* A bug was fixed in how the TypeScript extractor handles default-exported anonymous classes.
* A bug was fixed in how the TypeScript extractor handles computed instance field names.
When eras change, date and time conversions that rely on a hard-coded era start date need to be reviewed. Conversions relying on Japanese dates in the current era can produce an ambiguous date.
The values for the current Japanese era dates should be read from a source that will be updated, such as the Windows registry.
</p>
</overview>
<references>
<li>
<a href="https://blogs.msdn.microsoft.com/shawnste/2018/04/12/the-japanese-calendars-y2k-moment/">The Japanese Calendar's Y2K Moment</a>.
where // e is at position i, and has an explicit value in the source - but
// not just a reference to another enum constant
hasNonReferenceInitializer(e.getEnumConstant(i)) and
// but e is not the first or the last constant of the enum
i != 0 and
exists(e.getEnumConstant(i+1)) and
// and there exists another constant whose value is implicit, but it's
// not the last one: the last value is okay to use to get the highest
// enum value automatically. It can be followed by aliases though.
enumThatHasConstantWithImplicitValue(e)
select e, "In an enumerator list, the = construct should not be used to explicitly initialize members other than the first, unless all items are explicitly initialized."
where
// e is at position i, and has an explicit value in the source - but
// not just a reference to another enum constant
hasNonReferenceInitializer(e.getEnumConstant(i)) and
// but e is not the first or the last constant of the enum
i != 0 and
exists(e.getEnumConstant(i + 1)) and
// and there exists another constant whose value is implicit, but it's
// not the last one: the last value is okay to use to get the highest
// enum value automatically. It can be followed by aliases though.
enumThatHasConstantWithImplicitValue(e)
select e,
"In an enumerator list, the = construct should not be used to explicitly initialize members other than the first, unless all items are explicitly initialized."
where not bf.getUnspecifiedType().(IntegralType).isExplicitlySigned()
and not bf.getUnspecifiedType().(IntegralType).isExplicitlyUnsigned()
and not bf.getUnspecifiedType() instanceof Enum
and not bf.getUnspecifiedType() instanceof BoolType
where
not bf.getUnspecifiedType().(IntegralType).isExplicitlySigned() and
not bf.getUnspecifiedType().(IntegralType).isExplicitlyUnsigned() and
not bf.getUnspecifiedType() instanceof Enum and
not bf.getUnspecifiedType() instanceof BoolType and
// At least for C programs on Windows, BOOL is a common typedef for a type
// representing BoolType.
and not bf.getType().hasName("BOOL")
not bf.getType().hasName("BOOL") and
// If this is true, then there cannot be unsigned sign extension or overflow.
and not bf.getDeclaredNumBits() = bf.getType().getSize() * 8
and not bf.isAnonymous()
select bf, "Bit field " + bf.getName() + " of type " + bf.getUnderlyingType().getName() + " should have explicitly unsigned integral, explicitly signed integral, or enumeration type."
not bf.getDeclaredNumBits() = bf.getType().getSize() * 8 and
not bf.isAnonymous() and
not bf.isFromUninstantiatedTemplate(_)
select bf,
"Bit field " + bf.getName() + " of type " + bf.getUnderlyingType().getName() +
" should have explicitly unsigned integral, explicitly signed integral, or enumeration type."
// require the multiply to have two non-constant operands
// (the intuition here is that multiplying two unknowns is
// much more likely to produce a result that needs significantly
// more bits than the operands did, and thus requires a larger
// type).
getEffectiveMulOperands(me) >= 2 and
// exclude varargs promotions
not exists(FunctionCall fc, int vararg |
fc.getArgument(vararg) = me and
vararg >= fc.getTarget().getNumberOfParameters()
) and
// exclude cases where the type was made bigger by a literal
// (compared to other cases such as assignment, this is more
// likely to be a trivial accident rather than suggesting a
// larger type is needed for the result).
not exists(Expr other, Expr e |
other = me.getParent().(BinaryOperation).getAnOperand() and
not other = me and
(
e = other or
e = other.(BinaryOperation).getAnOperand*()
) and
e.(Literal).getType().getSize() = t2.getSize()
)
select me, "Multiplication result may overflow '" + me.getType().toString() + "' before it is converted to '" + me.getFullyConverted().getType().toString() + "'."
where
t1 = me.getType().getUnderlyingType() and
t2 = me.getConversion().getType().getUnderlyingType() and
t1.getSize() < t2.getSize() and
(
t1.getUnspecifiedType() instanceof IntegralType and
t2.getUnspecifiedType() instanceof IntegralType
or
t1.getUnspecifiedType() instanceof FloatingPointType and
predicate functionStats(Function f, string operation, int used, int total, int percentage) {
exists(PointerType pt | pt.getATypeNameUse() = f.getADeclarationEntry()) and
@@ -67,7 +66,9 @@ where
relevantFunctionCall(unchecked, f) and
not checkedFunctionCall(unchecked, operation) and
functionStats(f, operation, _, _, percent) and
percent >= 70
and unchecked.getFile().getAbsolutePath().matches("%fbcode%")
and not unchecked.getFile().getAbsolutePath().matches("%\\_build%")
select unchecked, "After " + percent.toString() + "% of calls to " + f.getName() + " there is a call to " + operation + " on the return value. The call may be missing in this case."
percent >= 70 and
unchecked.getFile().getAbsolutePath().matches("%fbcode%") and
not unchecked.getFile().getAbsolutePath().matches("%\\_build%")
select unchecked,
"After " + percent.toString() + "% of calls to " + f.getName() + " there is a call to " +
operation + " on the return value. The call may be missing in this case."
predicate functionStats(Function f, int percentage) {
exists(int used, int total |
@@ -190,4 +238,6 @@ where
uncheckedFunctionCall(unchecked) and
functionStats(f, percent) and
percent >= 70
select unchecked, "The result of this call to " + f.getName() + " is not checked for null, but " + percent + "% of calls to " + f.getName() + " check for null."
select unchecked,
"The result of this call to " + f.getName() + " is not checked for null, but " + percent +
"% of calls to " + f.getName() + " check for null."
* @name Year field changed using an arithmetic operation is used on an unchecked time conversion function
* @description A year field changed using an arithmetic operation is used on a time conversion function, but the return value of the function is not checked for success or failure.
* @name Arithmetic operation assumes 365 days per year
* @description When an arithmetic operation modifies a date by a constant
* value of 365, it may be a sign that leap years are not taken
* into account.
* @kind problem
* @problem.severity warning
* @id cpp/leap-year/adding-365-days-per-year
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.