On wireshark/wireshark, `isInCycle` ran into a low-memory loop on the
`aliased_ssa` stage. It shouldn't be necessary to detect cycles after
the `raw` stage, so this commit moves cycle detection into the
`Construction` modules and makes it a no-op in `SSAConstruction.qll`.
- `false positives` becomes `false positive results`
- Items are listed alphabetically.
- Query IDs are listed.
Also, some of the queries had the wrong name (query message rather than the
actual query name). These have been fixed.
The `loopVariant` predicate in `ComparisonWithWiderType.ql` is intended
to identify loop counters, but it was too much of a stretch to apply it
to any subexpression of the small side of the comparison.
This change fixes two false positives on arvidn/libtorrent and many
others seen in the wild (on Linux, CoreCLR, ffmpeg, ...).
This fixes a failing qltest and makes the exclusion similar to what's in
`PointerOverflow.ql`. It's possible we should exclude based on both `+`
and `<`, but we can revisit that if false positives show up.
Before this change, evaluating `cpp/constant-comparison` followed by
`cpp/signed-overflow-check` would result in re-evaluation of almost all
the cached stages they share: CFG, basic blocks, SSA, and range
analysis. The same effect could be seen on `cpp/bad-strncpy-size`, which
also uses the GVN library.
We were privately importing `semmle.code.<lang>.ir.internal.Overlap`, but `PrintSSA.qll` was depending on it being public. This is made a little more complicated by the presence of cross-langage pyrameterized modules.
The `SignedOverflowCheck.ql` query was very slow on certain snapshots
(jluttine/suitesparse and Chromium) due to bad magic in
`MacroInvocation::getAnAffectedElement_dispred#fb`. This commit doesn't
fix the bad magic but changes the exclusion mechanism to use a predicate
where we can better control the magic and optimization.
The query should also give more good results due to this new exclusion
mechanism, which is the same one used in its sibling,
`PointerOverflow.ql`.
1. The two last examples were misleading at best. The first of those two
recommended casting to non-negative `int`s to `unsigned int` and then
checking if their addition would overflow, but overflow was
impossible because their sum (on 32-bit two's complement) could be at
most 2^32 - 2. The second example could lead to the wrong condition
(unsigned overflow) being checked if taken literally. Instead of
keeping that example, I reworeded the first paragraph of the
Recommendation section.
2. The assumptions about `delta` being positive was relaxed to
non-negative.
3. There was no need to assume that an unsigned short was non-negative.
4. Some of the suggestions were missing `i >`.
We no longer exclude macros based on their name, which means we can now
find results inside arguments to the `likely` macro in Linux (except
that Linux is compiled with `-fno-strict-overflow`).
This trick for excluding elements from macro bodies but not macro
arguments looks promising and should probably be used much more. With
this commit, it's now easy to use from any query.
Performance is still good because the new predicate gets appropriately
magiced.
C#: Fix a qltest whereby a tuple type having multiple underlying types was causing an issue with the IR sanity checks.
C#: Revert more changes.
C#: Fix tests and remove dead code.
Fixes https://github.com/Semmle/ql/issues/1572
Adjust mock so it's more aligned with what the flask code actually does. Tests
were passing before, even though we didn't handle the case in real code :\
It's a bit crude to suppress all results in instantiations, but we're
already using this kind of suppression in `PointlessComparison.ql`
(without the `Self`) because there is no convenient alternative. It
means we lose some good results but also suppress a new false positive
in Boost that surfaced after we added support for non-type template
parameters.
If the legacy configuration is only enabled if there are no other
configurations, defining a configuration in an imported library can lead to
unwanted results. For example, code that uses `any(MyTaintKind t).taints(node)`
would *stop* working, if it did not define its own configuration. (this actually
happened to us)
We performed a dist-compare to ensure there is not a performance deg ration by
doing this. Results at https://git.semmle.com/gist/rasmuswl/a1eca07f3a92f5f65ee78d733e5d260e
Tests that were affected by this:
- RockPaperScissors + Simple: new edges because no configuration was defined for
SqlInjectionTaint or CommandInjectionTaint
- CleartextLogging + CleartextStorage: new edges because no configuration was
defined before, AND duplicate deges.
- TestNode: new edges because no configuration was defined before
- PathInjection: Duplicate edges
- TarSlip: Duplicate edges
- CommandInjection: Duplicate edges
- ReflectedXss: Duplicate edges
- SqlInjection: Duplicate edges
- CodeInjection: Duplicate edges
- StackTraceExposure: Duplicate edges
- UnsafeDeserialization: Duplicate edges
- UrlRedirect: Duplicate edges
We don't use a locked-down version of six, so some internal things probably
changed from the version used last time, and the versoin I have installed.
Long term fix would be to use a specific version of six for tests!
Due to internal PR#35123 we now actually run the tests under
`python/ql/test/2/...`
These seems like a regression, since the tests state that N is ok, but A and J
should not be allowed.
For now we can accept them, so we don't block all other Python PRs
Due to internal PR#35123 we now actually run the tests under
`python/ql/test/2/...`
Since we haven't done this in a while, test output has changed a bit. These
changes look perfectly fine.
We now output literals for accesses to members of template parameters:
So for `foo` in the following example:
```
template<typename T> void bar(T& t) {
T.foo(1)
}
```
This predicate was taking 39s on a snapshot of Facebook Fizz because it
had disjuncts like this:
43685 ~0% {1} r34 = JOIN Type::FunctionPointerIshType#f AS L WITH Type::Type::getUnspecifiedType_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>
43685 ~1% {2} r35 = JOIN r34 WITH CppType::getTypeSize#ff AS R ON FIRST 1 OUTPUT R.<1>, r34.<0>
170371500 ~2% {2} r36 = JOIN r35 WITH IRType::IRSizedType#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r35.<1>
43685 ~6% {2} r37 = JOIN r36 WITH IRType::IRFunctionAddressType#class#ff AS R ON FIRST 1 OUTPUT r36.<1>, r36.<0>
Instead of fixing the joins in `getIRTypeForPRValue` itself, I've
changed the `IRType::getByteSize` predicate such that the optimiser
knows how to join with it efficiently.
The disjunct shown above now looks like this instead:
43685 ~0% {1} r26 = JOIN Type::FunctionPointerIshType#f AS L WITH Type::Type::getUnspecifiedType_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>
43685 ~1% {2} r27 = JOIN r26 WITH CppType::getTypeSize#ff AS R ON FIRST 1 OUTPUT R.<1>, r26.<0>
43685 ~6% {2} r28 = JOIN r27 WITH IRType::IRFunctionAddressType::getByteSize#ff_10#join_rhs AS R ON FIRST 1 OUTPUT r27.<1>, R.<1>
Previously, we didn't track string literals as known memory locations at all, so they all just got marked as `UnknownMemoryLocation`, just like an aribtrary read from a random pointer. This led to some confusing def-use chains, where it would look like the contents of a string literal were being written to by the side effect of an earlier function call, which of course is impossible.
To fix this, I've made two changes. First, each string literal is now given a corresponding `IRVariable` (specifically `IRStringLiteral`), since a string literal behaves more or less as a read-only global variable. Second, the `IRVariable` for each string literal is now marked `isReadOnly()`, which the alias analysis uses to determine that an arbitrary write to aliased memory will not overwrite the contents of a string literal.
I originally planned to treat all string literals with the same value as being the same memory location, since this is the usual behavior of modern compilers. However, this made implementing `IRVariable.getAST()` tricky for string literals, so I left them unpooled.
The `multiple_invocation_paths` predicate had a bad join order where
we (essentially) joined `i1` with `i2` and only then joined `i1` and `i2`
separately to reduce the number of tuples. The join coming from `i1 != i2` had
little impact, but `i1.getFunction() = multi` made a big difference (and
similarly for `i2`). I factored out the code so that these joins would be done
more eagerly. Thus, we went from
```
[2019-11-06 16:53:05] (38s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths#ffff/4@2ce75a
[2019-11-06 16:53:35] (68s) Tuple counts for MethodCallOrder::multiple_invocation_paths#ffff:
134547 ~9% {2} r1 = SCAN CallGraph::TInvocation#fff AS I OUTPUT I.<0>, I.<2>
235284431 ~3% {4} r2 = JOIN r1 WITH CallGraph::TInvocation#fff AS R ON FIRST 1 OUTPUT r1.<0>, r1.<1>, R.<1>, R.<2>
235149884 ~3% {4} r3 = SELECT r2 ON r2.<3> != r2.<1>
235149884 ~4% {3} r4 = SCAN r3 OUTPUT r3.<1>, r3.<0>, r3.<3>
166753634 ~5% {4} r5 = JOIN r4 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus#swapped AS R ON FIRST 1 OUTPUT R.<1>, r4.<2>, r4.<1>, r4.<0>
129778 ~0% {4} r6 = JOIN r5 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus AS R ON FIRST 2 OUTPUT r5.<0>, r5.<3>, r5.<1>, r5.<2>
return r6
[2019-11-06 16:53:35] (68s) Registering MethodCallOrder::multiple_invocation_paths#ffff + [] with content 1705dcbc08kd9aa40rp2g2e9civhv
[2019-11-06 16:53:35] (68s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths#ffff with 129778 rows and 4 columns.
```
to
```
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths_helper#ffff/4@586aec
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths_helper#ffff:
134547 ~0% {2} r1 = SCAN CallGraph::TInvocation#fff AS I OUTPUT I.<2>, I.<0>
88111 ~4% {3} r2 = JOIN r1 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus#swapped AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>, r1.<0>
761305 ~0% {4} r3 = JOIN r2 WITH #CallGraph::FunctionInvocation::getACallee_dispred#ffPlus AS R ON FIRST 1 OUTPUT r2.<1>, r2.<2>, r2.<0>, R.<1>
673194 ~0% {4} r4 = SELECT r3 ON r3.<3> != r3.<1>
673194 ~0% {4} r5 = SCAN r4 OUTPUT r4.<2>, r4.<1>, r4.<3>, r4.<0>
return r5
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths_helper#ffff + [] with content 20edaaecf25nldgp24d9c4et8m3kv
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths_helper#ffff with 673194 rows and 4 columns.
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs/4@9e5441
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs:
673194 ~0% {4} r1 = SCAN MethodCallOrder::multiple_invocation_paths_helper#ffff AS I OUTPUT I.<2>, I.<3>, I.<0>, I.<1>
return r1
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs + [] with content 2069301e655fi9mcovngg9hetfqas
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs with 673194 rows and 4 columns.
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths#ffff/4@2f7c34
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths#ffff:
134547 ~0% {2} r1 = SCAN CallGraph::TInvocation#fff AS I OUTPUT I.<2>, I.<0>
129778 ~0% {4} r2 = JOIN r1 WITH MethodCallOrder::multiple_invocation_paths_helper#ffff_2301#join_rhs AS R ON FIRST 2 OUTPUT R.<2>, R.<3>, r1.<0>, r1.<1>
return r2
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths#ffff + [] with content 1705dcbc08kd9aa40rp2g2e9civhv
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths#ffff with 129778 rows and 4 columns.
[2019-11-06 17:22:22] (25s) Starting to evaluate predicate MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs/4@9f9146
[2019-11-06 17:22:22] (25s) Tuple counts for MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs:
129778 ~0% {4} r1 = SCAN MethodCallOrder::multiple_invocation_paths#ffff AS I OUTPUT I.<0>, I.<3>, I.<1>, I.<2>
return r1
[2019-11-06 17:22:22] (25s) Registering MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs + [] with content 17c3fe1fcbf6ghhdr7hiukqp41rst
[2019-11-06 17:22:22] (25s) >>> Wrote relation MethodCallOrder::multiple_invocation_paths#ffff_0312#join_rhs with 129778 rows and 4 columns.
```
Execution time on `salt` went from 29.5s to somewhere below 299ms (the predicate
was not listed in the timing report).
As it turns out, there was a further bad join-order in the `global_name_used`
predicate. In this case, there was a common subexpression in the RA that was
being factored out and evaluated separately, producing a large number of tuples.
This is just good enough to cause no performance regressions and pass
the virtual-dispatch tests we have for `security.TaintTracking`. In
particular, it fixes the tests for `UncontrolledProcessOperation.ql`
when enabling `DefaultTaintTracking.qll`.
Note that Declaration::getTemplateArgumentType() and
Declaration::getTemplateArgumentValue() need to be public so that they
can be overriden in derived classes.
Before this change, any function that has a parameter that was called
password/credentials would be treated as returning sensitive data of that
kind. `py/clear-text-logging-sensitive-data` would alert if one of these are
logged, which has a LOT of false-positives.
This was the minimal amount of predicates I could easily cache without
introducing extra cached stages. The predicates that are not cached
here, like `CppType::getTypeSize` and `getCanonicalLanguageType`, appear
to be cheap.
I've tested that this avoids recomputation of the IR type system by
running
grep -c 'Starting to evaluate predicate CppType::CppType::getIRType_dispred'
on the evaluator log for `IRSanity.ql`. It drops from 4 to 1. The
pretty-printed DIL drops from 79,175 lines to 76,326 lines.
This makes it more obvious to the evaluator that it is a good predicate to pick as a sentinel, and in practice we mostly just have one configuration in scope anyway.
This was brought up on the LGTM.com forums here:
https://discuss.lgtm.com/t/warn-when-always-failing-assert-is-reachable-rather-than-unreachable/2436
Essentially, in a complex chain of `elif` statements, like
```python
if x < 0:
...
elif x >= 0:
...
else:
...
```
the `else` clause is redundant, since the preceding conditions completely
exhaust the possible values for `x` (assuming `x` is an integer). Rather than
promoting the final `elif` clause to an `else` clause, it is common to instead
raise an explicit exception in the `else` clause. During execution, this
exception will never actually be raised, but its presence indicates that the
preceding conditions are intended to cover all possible cases.
I think it's a fair point. This is a clear instance where the alert, even if it
is technically correct, is not useful for the end user.
Also, I decided to make the exclusion fairly restrictive: it only applies if
the unreachable statement is an `assert False, ...` or `raise ...`, and only
if said statement is the first in the `else` block. Any other statements will
still be reported.
This would find instances of `thing = MyThing.objects.get(field=userinput)`, and
what seems to be a query that wants to match on `thing = MyThing();
thing.field=userinput`. Both are not vulnerable to user-input, due to the
build-in escaping by django.
The DjangoModelFieldWrite actually matches on `MyThing.field=userinput` and not
`thing.field=userinput`. I suspect this to be a mistake.
Matching on `thing.field=userinput`, would require this CodeQL:
attr.getObject(_).pointsTo().getClass() = model
This is vulnerable to SQL injection because of the quotes around %s -- added
some code that highlights this in test.py
Since our examples did this in the safe query, I ended up rewriting them
completely, causing a lot of trouble for myself :D
The `AliasedUse` instruction is supposed to represent future uses of aliased memory after the function returns. Since local variables from that function are no longer allocated after the function returns, the `AliasedUse` instruction should access only the set of aliased locations that does not include locals from the current stack frame.
Hopefully it is more clear that you can get multiple results from getABaseType
because of multiple inheritance, and not because we are following the chain of
inheritance
It's not used anywhere outside `VoidContext.qll`, where it was defined.
The use in `VoidContext.qll` is 10 years old and was a workaround for an
extractor bug that no longer exists.
This new instruction is the dual of the existing `AliasedDefinition` instruction. Whereas that instruction defines the contents of aliased memory before the function was called, `AliasedUse` represents the potential use of all aliased memory after the function returns. This ensures that writes to aliased memory do not appear "dead", even if there are no further reads from aliased memory within the function itself.
Like the one existing in ControlFlowNode.
This is useful for checking class of value being poitned to, as
expr.pointsTo().getClass() = someClass
Without this you need to do
exists(Value v | v.getClass() = someClass | expr.pointsTo(v))
Expressions like the `e` in `e;` or `e, e2`, whose result is immediately
discarded, should not get a synthetic `CopyValue`. This removes a lot of
redundancy from the IR.
To prevent these expressions from being confused with the expressions
from which they get their result, the predicate
`getInstructionConvertedResultExpression` now suppresses results for
expressions that don't produce their own result. This should fix the
mapping between expressions and IR data-flow nodes.
The one interesting piece that needed to be fixed up was the type of an `Indirect[Read|Write]SideEffect` operand/result. If the parameter type is a pointer or reference to an incomplete type, we need to set the type of the side effect memory access to `Unknown`, because we don't model incomplete types in the IR type system.
I also added minimal support for `__assume` (generated as a `NoOp`), because lack of `__assume` support got in the way of debugging the other issue above.
This change gives a slight performance improvement and makes the QL code
shorter. It introduces some magic numbers in the code, but those are
confined to the `Pos` and `Spec` classes.
We get a speed-up because the evaluator has built-in support for integer
literals in the `OUTPUT` of `JOIN` operations, whereas `newtype`s have
to be explicitly joined on. As a result, a predicate like
`CFG::straightLineSparse#ffff` drops from 262 pipeline nodes to 242.
I measured performance on https://github.com/jluttine/suitesparse, which
is one of the projects that had the biggest slowdown when enabling the
QL CFG on lgtm.com. I took two measurements before this change and two
after. The `CFG.qll` stage took 117s and 112s before, and it took 106s
and 107s after.
Eventually these files will subsume the current `queries.xml` files
at the top of query-containing and library directories. For now they're
just here to support internal testing of the tooling support for them
we're writing on.
Format and contents is a work in progress. If you're not in Semmle,
don't depend on anything here making sense (or staying stable) until
you see the version tags increase to something nonzero.
Added some examples and reworded portions of guards.rst. There's still
more to do - examples for ensures and compares predicates, and possibly
rewording the description of the compares predicates
We now detect patterns like
f(bool cond){
if(cond)
then A
else B
and prune branches for calls like f(true) or f(false).
This pruning is done both in the local (bigstep) flow graph
as well as in the inter-procedural dataflow graph.
This test used `getAQlClass`, which caused it to break when new classes
were added anywhere in the libraries. That's now avoided by switching to
`getCanonicalQLClass`. It turns out that `getCanonicalQLClass` didn't
support arithmetic expressions on complex numbers, so that support had
to be added.
This fixes the test `cpp/ql/test/library-tests/constexpr_if/cfg.ql`,
which broke when the QL CFG was enabled.
The new cases are just copy-pastes of the `IfStmt` cases (they don't
share a useful common superclass) with added checks for whether their
constant value equals 0.
Hopefully it does not make a difference in practice whether
uninstantiated template functions are considered to have control flow
through initializers of their static variables.
This adds IndirectMustWriteSideEffects for constructor qualifiers. The
introduced sanity failures result from constructor calls without qualifier
operands in the IR
This change is needed when enabling the QL CFG on certain snapshots such
as notaz/picodrive. It removes the `bbNotInLoop` predicate, which was
always a liability because it's inherently quadratic. The real slowdown
came in `skipLoop`, where all true-upon-entry loops were crossed with
all definitions of variables that should take their definition from the
loop body.
The `Operation` class is abstract, and extending it caused cached stages
to be recomputed all the way down to the AST. This meant that the leap
year queries evaluated their own copy of SSA and data flow.
The translation of static fields now uses `VariableAddress` instead of `FieldAddress`. This fixes the logic as well as the "field address without qualifier address" sanity check.
Fixed a problem when the translating the compiler generated constructors that caused some sanity errors (since they have no body, when translating the constructor block fragmentation happened). Fixed this by skipping the translation of the body, if it does not exist (when translating a function).
This commit implements the language-neutral IR type system for C#. It mostly follows the same pattern as C++, modified to fit the C# type system. All object references, pointers, and lvalues are represented as `IRAddress` types. All structs and generic parameters are implemented as `IRBlobType`. Function addresses get a single `IRFunctionAddressType`.
I had to fix a couple places in the original IR type system where I didn't realize I was still depending on language-specific types. As part of this, `CSharpType` and `CppType` now have a `hasUnspecifiedType()` predicate, which is equivalent to `hasType()`, except that it holds only for the unspecified version of the type. This predicate can go away once we remove the IR's references to the underlying `Type` objects.
All C# IR tests pass without modification, but only because this commit continues to print the name of `IRUnknownType` as `null`, and `IRFunctionAddressType` as `glval<null>`. These will be fixed separately in a subsequent commit in this PR.
A constructed type, `C<T>`, where `T` is the type parameter of `C`, is represented
in the database as the corresponding unbound generict type `C<>`. Consequently, the
type conversion library, which only considers `ConstructedType`s, does not handle
all implicit conversions. For example, in
```
interface I<in T1, T2> where T1 : C
```
there should be an implicit conversion from `I<C, T2>` to `I<T1, T2>` (=`I<>`).
We could find no reason for using `Builtin::builtin` instead of
`Builtin::special`. Since all the other base types use `special`, and the old
Object API is using `special`, let's also do that :)
The C++ IR currently has a very clunky way of specifying the type of an IR entity (`Instruction`, `Operand`, `IRVariable`, etc.). There are three separate predicates: `getType()`, `isGLValue()`, and `getSize()`. All three are necessary, rather than just having a `getType()` predicate, because some IR entities have types that are not represented via an existing `Type` object in the AST. Examples include the type for an lvalue returned from a `VariableAddress` instruction, the type for an array slice being zero-initialized in a variable initializer, and several others. It is very easy for QL code to just check the `getType()` predicate, while forgetting to use `isGLValue()` to determine if that type is the actual type of the entity (the prvalue case) or the type referred to by a glvalue entity. Furthermore, the C++ type system creates potentially many different `Type` objects for the same underlying type (e.g. typedefs, using declarations, `const`/`volatile` qualifiers, etc.), making it more difficult to tell when two entities have semantically equivalent types.
In addition, other languages for which we want to enable the IR have somewhat different type systems. The various language type systems differ in their structure, although they tend to share the basic building blocks necessary for the IR.
To address all of the above problems, I've introduced a new class hierarchy, rooted at the class `IRType`, that represents a bare-bones type system that is independent of source language (at least across C/C++/C#/Java). A type's identity is based on its kind (signed integer, unsigned integer, floating-point, Boolean, blob, etc.), size and in the case of blob types, a "tag" to differentiate between different classes and structs. No distinction is made between, say `signed int` and plain `int`, or between different language integer types that have the same signedness and size (e.g. `unsigned int` vs. `wchar_t` on Linux). `IRType` is intended for use by language-agnostic IR-based analyses, including range analysis, dataflow, SSA construction, and alias analysis. The set of available `IRType`s is determined by predicate provided by the language library implementation (e.g. `hasSignedIntegerType(int byteSize)`.
In addition to `IRType`, each language now defines a type alias named `LanguageType`, representing the type of an IR entity in more language-specific terms. The only predicate requried on `LanguageType` is `getIRType()`, which returns the single `IRType` object for the language-neutral representation of that `LanguageType`. All other predicates on and subclasses of `LanguageType` are language-specific. There may be many instances of `LanguageType` that map to a given `IRType`, to allow for typedefs, etc.
Most of the changes are mechanical changes in the IR construction code, to return the correct type for each IR entity. SSA construction has also been updated to avoid dependencies on language-specific types.
I have not yet removed the original `getType()` predicates that just return `Type`. These can be removed once we move the remaining existing libraries to use `IRType`.
Test results are, by design, pretty much unchanged. Once case changed for inline asm, because the previously IR generation for it played a little fast and loose with the input/output expressions. The test case now includes both input and output variables. The generated IR for `Conditional_LValue` is now more correct, because we now have a way to represent an lvalue of an lvalue. `syntax-zoo` is still a hot mess. Most of the changed outputs are due to wobble from having multiple functions with the same name, but with a slightly different order of evaluation due to the type changes. Others are wobble from already-invalid IR. A couple non-wobbly places have improved slightly, though.
The C# part of this change is waiting for #2005 to be merged, since that has some of the necessary C# implementation.
The translation of `IsExpr` created a sanity check to fail since it generated
a Phi node that had only one source: if a variable was declared as part of the `IsExpr`, a conditional branch was generated, and the variable was defined only in the true successor; this has been changes so that the declaration happens before the conditional branch, and the variable is uninitialized (this removed the need for the `isInitializedByElement` predicate from `TranslatedDeclarationBase`, so that has been removed) and only the assignment happens in the true successor block (so now the two inputs of the Phi node are the result of the `Uninitialized` instruction and the `Store` instruction from the true successor block).
Generation of IDs for namespace members has been fixed to generate
unique IDs for variables of the same name but in different namespaces.
Update the same_name test to validate this.
More accurate type sizes using language specific predicates from `IRCSharpLanguage.qll`.
Added immediate operands for some `PointerX` (add, sub) instructions.
Some other minor consistency fixes.
This change ensures that all `Expr`s (except parentheses) have a
`TranslatedExpr` with a `getResult` that's one of its own instructions,
not an instruction from one of its operands. This means that when we
translate back and forth between `Expr` and `Instruction`, like in
`DataFlow::exprNode`, we will not conflate `e` with `&e` or `... = e`.
This fixes false positives that arise when a call such as `f.apply` can either be interpreted as a reflective invocation of `f`, or a normal call to method `apply` of `f`.
The key insight here is that `HC_FieldCons` and `HC_Array` are
functionally determined by the things that arise in another
recursive call. Lifting them to their own predicate, therefore,
reduces nonlinearity and constrains the join order in a way that
cannot be asymptotically bad -- and, indeed, makes quite a big
difference in practice.
Switching `security.TaintTracking` to use `DefaultTaintTracking` causes
us to lose a result from `UnboundedWrite.ql`, while this commit restores
it:
diff --git a/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-120/CERT/STR35-C/UnboundedWrite.expected b/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-120/CERT/STR35-C/UnboundedWrite.expected
index 1eba0e52f0e..d947b33b9d9 100644
--- a/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-120/CERT/STR35-C/UnboundedWrite.expected
+++ b/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-120/CERT/STR35-C/UnboundedWrite.expected
@@ -1,2 +1,3 @@
+| main.c:54:7:54:12 | call to strcat | This 'call to strcat' with input from $@ may overflow the destination. | main.c:93:15:93:18 | argv | argv |
| main.c:99:9:99:12 | call to gets | This 'call to gets' with input from $@ may overflow the destination. | main.c:99:9:99:12 | call to gets | call to gets |
| main.c:213:17:213:19 | buf | This 'scanf string argument' with input from $@ may overflow the destination. | main.c:213:17:213:19 | buf | buf |
This is one step toward implementing the taint-tracking wrapper in terms
of `Instruction` rather than `Expr`.
This leads to a few duplicate results in `TaintedAllocationSize.ql`
because the library now considers `sizeof(int)` to be just as
predictable as `4`, whereas the `security.TaintTracking` library does
not consider `sizeof` to be predictable. I think it's simpler to accept
the duplicate results since they are ultimately a quirk of the query,
not the library.
The following is the diff between (a) replacing `TaintTracking.qll` with
a link to `DefaultTaintTracking.qll` and (b) additionally applying this
commit.
diff --git a b
--- a/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/TaintedAllocationSize.expected
+++ b/cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/TaintedAllocationSize/TaintedAllocationSize.expected
@@ -1,5 +1,8 @@
| test.cpp:42:31:42:36 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
+| test.cpp:43:31:43:36 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
| test.cpp:43:38:43:63 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
+| test.cpp:45:31:45:36 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
| test.cpp:48:25:48:30 | call to malloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
| test.cpp:49:17:49:30 | new[] | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
+| test.cpp:52:21:52:27 | call to realloc | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
| test.cpp:52:35:52:60 | ... * ... | This allocation size is derived from $@ and might overflow | test.cpp:39:21:39:24 | argv | user input (argv) |
--- a/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-190/CERT/INT04-C/int04.expected
+++ b/semmlecode-cpp-tests/DO_NOT_DISTRIBUTE/security-tests/CWE-190/CERT/INT04-C/int04.expected
@@ -1 +1,2 @@
| int04c.c:21:29:21:51 | ... * ... | This allocation size is derived from $@ and might overflow | int04c.c:14:30:14:35 | call to getenv | user input (getenv) |
+| int04c.c:22:33:22:38 | call to malloc | This allocation size is derived from $@ and might overflow | int04c.c:14:30:14:35 | call to getenv | user input (getenv) |
Added support for pointers and pointer operations and made sure all loads are correct.
Added support for the unsafe stmt.
Added basic support for the fixed stmt (for now we ignore the pinning).
Fixed a bug where assignments of the form `Object obj1 = obj2` would not generate a load instruction for `obj2` (see `raw_ir.expected`).
Added an extra `Load` for object creations that involve structs. This is because the variable that represents the struct should hold the actual struct, not a reference to it.
Refactored the piece of code that decided if a particular expr needs a load instruction and improved the code sharing between `TranslatedExpr.qll` and `TranslatedElement.qll` by creating 2 predicates that tell if a certain expr does or does not need a load.
Added support for `in`, `out` and `ref` parameters.
The user knows that an expression functionally determines its
hashCons value, and that an expression functionally determines
its number of children, but this is not provable from the
definitions, and so not usable by the optimiser. By storing
the result of those known-functional calls in a variable,
rather than repeating the call, we enable better join orders.
In theory this bug could associated CaptureOutNodes with the wrong transitively called
callable. However, in practice I could not create a test case that revealed incorrect
behaviour. I've included one such test case in the commit.
I believe that the cause of this is that OutNode::getACall() is not actually used in the
data flow libraries. Instead, DataFlowDispatch::Cached::getAnOutNode is the predicate
which is used to associated OutNode's with DataFlowCall's in practice, and that is always
used in a context that correctly binds the runtime target of the call.
The most common reason for the C# autobuilder to fail is because it
cannot determine a single unique .sln or .proj file to build, instead
reporting multiple sln or proj files at the same shortest depth. This
commit changes this to build all such files, rather than reporting an
error.
Added basic support for the using stmt, checked stmt, unchecked stmt
Note that the translations do not use the compiler generated element framework and hence they are just rough approximations. For accuracy, in the future their translation should use it.
These new results seem better than the previous ones, but the previous
ones are still there. Perhaps the `Buffer.qll` library could use some
adjustment, but this seems like an improvement in isolation.
The data flow library conflates pointers and their objects in some
places but not others. For example, a member function call `x.f()` will
cause flow from `x` of type `T` to `this` of type `T*` inside `f`. It
might be ideal to avoid that conflation, but that's not realistic
without using the IR.
We've had good experience in the taint tracking library with conflating
pointers and objects, and it improves results for field flow, so perhaps
it's time to try it out for all data flow.
TTransitiveCaptureCall represents a control flow node that may
transitively call many different callables which capture a variable from
the current scope. Captured variables are represented as synthetic
parameters to the callable, at negative indices. However, each of the
different targets may capture a different subset of variables from the
enclosing scope, so we must include the target along side the CFN in
order to prevent incorrect capture flow.
This is for symmetry with `exprNode` etc., and it should be handy for
the same reasons. I found one caller of `asInstruction` that got simpler
by using the new predicate instead.
Added support for continue stmt.
Minimal refactoring of the `TranslatedSpecificJump` classes.
Added a new test file, `jumps.cs` and updated the expected output.
This currently relies on the fact that qltest includes `ql/csharp/ql/src/Metrics` in addition to `ql/csharp/ql/src` on its search path when run internally, which is inconsistent with the other languages. Since this is the only test that relies on it, I'd like to update it and get rid of the extra search root eventually.
This commit adds a `semmle.code.cpp.ir.dataflow.DefaultTaintTracking`
library that's API-compatible with the
`semmle.code.cpp.security.TaintTracking` library. The new library is
implemented on top of the IR data flow library.
The idea is to evolve this library until it can replace
`semmle.code.cpp.security.TaintTracking` without decreasing our SAMATE
score. Then we'll have the IR in production use, and we will have one
less taint-tracking library in production.
The way object creation was translated has been changed: now creations are treated as expressions.
The main motivation for this was the inability to have creation expressions as arguments to
function calls (a test case has been added to showcase this).
All code that dealt with creation expressions has been moved from `TranslatedInitialization.qll` to `TranslatedExpr.qll`.
Some light refactoring has also been done, mainly removing code that was useless after the changes mentioned above.
These partial defs don't do any harm, but they could hurt performance.
In typical C++ snapshots, between 5% and 20% of all calls are to `const`
functions.
Comments like these will make the autoformatter produce bad indentation.
For the record (not for explainability), these issues were found with
git grep -P -A1 '^( */\*| +\*( |$))(.(?!\*/))*$' cpp/ql/src/'**/*.ql*' |grep -B10 'qll\?- [^*]*$'
It superficially looks like `@param` is supported in QLDoc, but this is
mostly an accident of how its parser works. Attributes starting with `@`
are only intended to be used in the top-level QLDoc of a query, and
there can only be one of each attribute. If there are multiple `@param`
entries, the QLDoc parser will only keep the first one.
Even though `parseConvSpec` in `Scanf.qll` documented multiple
parameters, only the first one would be shown in an IDE. The
corresponding predicate in `Print.qll` documented only its first
parameter, perhaps because of an autoformatting accident earlier in
time. I've attempted to reconstruct documentation for its other
parameters based on its sibling in `Scanf.qll`.
These functions were overly complicated, and the comments explaining the
complications did not auto-format well. A reference type cannot have
specifiers on it, so it's fine to call `getUnspecifiedType` before
checking if it's a reference type.
The autoformatter is opinionated about comment styles and assumes that
"short" comments attach to the following item while "long" comments are
items themselves. I found top-level short comments with the following
two commands and then searched the output for empty lines that came
after the comment.
git grep -A1 '^/\* .*\*/' cpp/ql/src
git grep -A1 '^//' 'cpp/ql/src/**/*.ql*'
The foreach was erroneously labelling the `True` and `False` edges as backedges.
Added a case for the compiler generated while in the predicate `getInstructionBackEdgeSuccessor/2`
from the file `IRConstruction.qll` so that only the edges from inside the body are labeled as back edges.
Added a framework for the translation of compiler generated elements, so that the process of adding a new desugaring process is almost mechanical.
The files in `internal` serve as the superclasses for all the compiler generated elements.
The file `Common.qll` captures common patterns for the compiler generated code to improve code sharing (by pattern I mean an element that appears in multiple desugarings). For example the `try...finally` pattern appears in the desugaring process of both the `lock` and the `foreach` stmts, so a class the provides a blueprint for this pattern is exposed. Several other patterns are present.
The expected output has also been updated (after a rebase) and it should be ignored.
The `raw/internal` folder has been restructured to better enhance code sharing between compiler generated elements and AST generated elements.
The translated calls classes have been refactored to better fit the C# library.
A new folder has been added, `common` that provides blueprints for the classes that deal with translations of calls, declarations, exprs and conditions.
Several `TranslatedX.qll` files have been modified so that they use those blueprint classes.
We haven't come to a conclusion on whether these two types will remain
identical forever. To make sure we're able to change it in the future,
this change makes it impossible to cast between the two types. Callers
must use the `asInstruction` member predicate to convert.
g++ doesn't support this code:
sorry, unimplemented: non-trivial designated initializers not supported
twoIntFields sSwapped = { .m2 = source(), .m1 = 0 };
so we need to build it in clang mode.
Data flow probably never worked when a variable declared in a
`ConditionDeclExpr` was modeled with `BlockVar`. That combination did
not come up in testing before the last commit.
The data flow library conflates pointers and objects enough for the
`definitionByReference` predicate to be too strict in some cases. It was
too permissive in other cases that are now (or will be) handled better
by field flow.
See also the change note entry.
Partial data flow had a semantic merge conflict with this branch. The
problem is that partial data flow doesn't (and shouldn't) cause the
initial pruning steps to run, but the length-2 access paths depend on
the `consCand` information that comes from that initial pruning. The
solution is to restore the old `AccessPath` class, now called
`PartialAccessPath` for use only by partial data flow.
With this change, partial data flow will in some cases allow more field
flow than non-partial data flow.
This commit removes fields from the responsibilities of `FlowVar.qll`.
The treatment of fields in that file was slow and imprecise.
It then adds another copy of the shared global data flow library, used
only to find local field flow, and it exposes that local field flow
through `localFlow` and `localFlowStep`.
This has a performance cost. It adds two cached stages to any query that
uses `localFlow`: the stage from `DataFlowImplCommon`, which is shared
with all queries that use global data flow, and a new stage just for
`localFlowStep`.
Fixed a bug in `TranslatedArrayExpr` that would prevent the element to produce the correct instruction result, hence creating problems with loads and stores.
`ElementsAddress` opcode now inherits from the `UnaryOpcode`, as it should.
Began working o inheritance, polymorphism and constructor init. Correct code is produced for them (though some more work is needed to accurately treat conversions between classes).
Removed commented code.
Added classes to properly deal with constructor init and modified and refactored TranslatedFunction to accomodate for the changes.
Properties and property access produce correct code.
Fixed a function qualifier bug in `TranslatedCall.qll`.
Added a new class to translate `ExprStmt`s whose expr is an `AssignExpr` whose lvalue is an accessor call: we translate only the accessor call in for the translated AST.
Broke down the class `TranslatedJump` to have more control on the IR control flow.
Now GotoLabelStmt, GotoCaseStmt, GotoDefaultStmt and BreakStmt are translated separately.
This also fixes an issue when having a switch as the last statement of a void function would create an incorrect CFG.
Correct code is now produced for increment and decrement expressions
Modified producesExprResult() and TTranslatedLoad() so that no loads are done from outside the crement exprs and that the VariableAddress generated from the access of the operator variable is recognized as an expr that produces result.
Throw statements now give correct code, apart from the case of rethrows: need to make explicit the fact that a finally block is executed even if stack unwinding happens.
Added 2 new classes to TranslatedStmt.qll, one for throws that have an exception, one for rethrows.
Fixed a bug in TranslatedDeclarationEntry.qll where some local declaration would be missed.
Changed toString into getQualifiedName for more clarity when generating the instructions in Instruction.qll.
Some general refactoring in TranslatedExpr.qll and TranslatedStmt.qll.
Added tests to showcase the instructions generated for object creation and object initialization
Updated raw_ir.expected
PrintIR now uses the qualified name (with types) when printing the IR for more clarity
Correct code is now generated from ObjectCreation exprs and ObjectInitializer exprs.
Removed TranslatedFieldInitialization and its subclasses and further refactored TranslatedInitialization
Introduced 2 new tags to support multidimensional arrays
Multidimensional arrays produce correct code
All types of initializations for arrays work correctly
Correct code is now generated for array initialization and element access.
Created a new binary Opcode, `IndexedElementAddress`, used to get the address of an array element, similar to how CIL does it.
Fixed simple variable initialization.
Now variables addressing correctly gets translated
Added a new test case to showcase this
Changed VoidType to ObjectType for the type of the 2 instructions
generated by as the prelude of a translated function
(UnmodeledDefinition and AliasedDefinition)
Ported basic functionalities from the C++ IR
Added a simple test that passes the IR sanity check and produces
sensible IR (together with the .expected files) to the C# test folder
This groups the change notes that concern the `DataFlow` library and
clarifies the change notes that concern the two different
`TaintTracking` libraries.
When this query was run as an upgrade script, the optimizer picked a bad
join order, making the upgrade very slow on large databases. It picked a
bad join order because upgrade scripts are run with no stats.
Lack of support for the GCC vector extensions was causing a bunch of sanity failures in the syntax zoo. This PR adds minimal IR generation support for these types.
Added `VectorAggregateLiteral`, and factored most of `ArrayAggregateLiteral` out into the common base class `ArrayOrVectorAggregateLiteral`. I'd be happy to merge these all into `ArrayAggregateLiteral` if we don't care about the distinction.
Made a few tweaks to `TranslatedArrayExpr` to compute the element type by looking at the result type of the `ArrayExpr`, not the type of the base operand. Note that this means that for `T a[10]; a[i] = foo;`, the result of the `PointerAdd` for `a[i]` will now be `glvalue<T>`, not `T*`. This is actually more faithful to the source language, and has no semantic difference on the IR.
Added some missing `getInstructionElementSize()` overrides.
Added the new `BuiltIn` opcode, renamed the existing `BuiltInInstruction` to `BuiltInOperationInstruction`, and made any `BuiltInOperation` that we don't specifically handle translate to `BuiltIn`. `BuiltInOperationInstruction` now has a way to get the specific `BuiltInOperation`.
Added `getCanonicalQLClass()` overrides for `GNUVectorType` and `BuiltInOperation`.
Added a simple IR test for vector types.
docs: better responsive behaviour
docs: improve c/c++ slides
docs: titles and fonts
docs: tidy up layout and css
docs: update layout to scale font-size by slide height
docs: tidy up templates and fix font headings
This commit adds field initializers to the CFG for non-static constructors. For
example, in
```
class C
{
int Field1 = 0;
int Field2 = Field1 + 1;
int Field3;
public C()
{
Field3 = 2;
}
public C(int i)
{
Field3 = 3;
}
}
```
the initializer expressions `Field1 = 0` and `Field2 = Field1 + 1` are added
to the two constructors, mimicking
```
public C()
{
Field1 = 0;
Field2 = Field1 + 1;
Field3 = 2;
}
```
and
```
public C()
{
Field1 = 0;
Field2 = Field1 + 1;
Field3 = 3;
}
```
respectively. This means that we no longer have to synthesize calls, callables,
parameters, and arguments in the data flow library, so much of the work from
d1755500e4 can be simplified.
This file is now identical in all languages. Unifying this file led to
the following changes:
- The documentation spelling fixes and example from the C++ version
were copied to the other versions and updated.
- The steps through `NonLocalJumpNode` from C# were abstracted into a
`globalAdditionalTaintStep` predicate that's empty for C++ and Java.
- The `defaultTaintBarrier` predicate from Java is now present but empty
on C++ and C#.
- The C++ `isAdditionalFlowStep` predicate on
`TaintTracking::Configuration` no longer includes `localFlowStep`.
That should avoid some unnecessary tuple copying.
This class was copy-pasted in all `DataFlowN.qll` files without using
the identical-files system to keep the copies in sync. The class is now
moved to the `DataFlowImplN.qll` files.
This also has the effect of preventing recursion through first data flow
library copy for C/C++. Such recursion has been deprecated for over a
year, and some forms of recursions are already ruled out by the library
implementation.
To compensate for the lack of field flow, the taint tracking library has
previously considered taint to flow from fields to their containing
structs and back again from the structs to any of their fields. This
leads to false flow between unrelated fields and is not needed now that
we have proper flow through fields.
There's now a `localFlowStep` predicate for use directly in queries and
other libraries and a `simpleLocalFlowStep` for use only by the global
data flow library. The former predicate is intended to include field
flow, but the latter may not.
This will let Java and C# (and possibly C++ IR) avoid getting two kinds
of field flow at the same time, both from SSA and from the global data
flow library. It should let C++ AST add some form of field flow to
`localFlowStep` without making it an input to the global data flow
library.
To keep the code changes minimal, and to keep the implementation similar
to C++ and Java, the `TaintTracking{Public,Private}` files are now
imported together through `TaintTrackingUtil`. This has the side effect
of exposing `localAdditionalTaintStep`. The corresponding predicate for
Java was already exposed.
This is more consistent with the other JSDoc queries. Results are still not shown on LGTM by default, but the query can now be enabled selectively for projects that care about JSDoc.
Initial implementation of data flow through fields, using the algorithm of the
shared data flow implementation. Fields (and field-like properties) are covered,
and stores can be either
- ordinary assignments, `Foo = x`,
- object initializers, `new C() { Foo = x }`, or
- field initializers, `int Foo = x`.
For field initializers, we need to synthesize calls (`SynthesizedCall`),
callables (`SynthesizedCallable`), parameters (`InstanceParameterNode`), and
arguments (`SynthesizedThisArgumentNode`), as the C# extractor does not (yet)
extract such entities. For example, in
```
class C
{
int Field1 = 1;
int Field2 = 2;
C() { }
}
```
there is a synthesized call from the constructor `C`, with a synthesized `this`
argument, and the targets of that call are two synthesized callables with bodies
`this.Field1 = 1` and `this.Field2 = 2`, respectively.
A consequence of this is that `DataFlowCallable` is no longer an alias for
`DotNet::Callable`, but instead an IPA type.
Because `ConstructorFieldInit` (member initializer lists) are not part
of the control flow graph, there was no data flow from the initial value
of parameters to their uses in member initializers. This commit adds the
necessary flow under the assumption that parameters are not overwritten
in member initializers.
This allows a member initializer list to be seen as a sequence of field
assignments. For example, the constructor
C() : a(taint()) { }
now has data flow similar to
C() { this.a = taint(); }
This brings the annotation style in sync with how we annotate new tests
these days. I also changed a few annotations to have different expected
outcome based on my understanding of the code.
This commit changes C++ `ConstructorCall` to behave like
`new`-expressions in Java: they are both `ExprNode`s and
`PostUpdateNodes`, and there's a "pre-update node" (here called
`PreConstructorCallNode`) to play the role of the qualifier argument
when calling a constructor.
On a Postgres snapshot, where the `getAReachedBlockVarSBB` predicate
performs badly because of a Yacc-generated 20,000-line parser loop, that
predicate is reduced from 4m22s to 1m32s plus 5.2s for the live
variables analysis.
This change removes 17,142 rows from `BlockVar.getAnAccess` on Postgres.
I sampled some of them, and they were all of the following form:
while (...) {
T x;
f1(&x); // access
f2(&x); // definition
}
Such accesses are ruled out now because we deliberately lose track of
variables when they go out of scope.
- Speedup the `varBlockReaches()` predicate, by restricting to basic blocks
in which a given SSA definition may still be live, in constrast to just
being able to reach *any* access (read or write) to the underlying source
variable.
- Account for some missing cases in the `lastRead()` predicate.
This removes a lot of flow steps, but it all seems to be flow that was
present twice: both exiting a `PartialDefNode` and a
`DefinitionByReferenceNode`. All `DefinitionByReferenceNode`s are now
`PartialDefNode`s.
There were two problems here.
1. The inline predicates `isInitialized` and `isValueInitialized` on
`ArrayAggregateLiteral` caused their callers to materialize every
`int` that was a valid index into the array. This was slow on huge
value-initialized arrays.
2. The `isInitialized` predicate was used in the `TInstructionTag` IPA
type, creating a numbered tuple for each integer in it. This seemed
to be entirely unnecessary since the `TranslatedElement`s using those
tags were already indexed appropriately.
Mostly just adding backticks in QLDoc comments. I'm trying out the edit-in-github workflow @jbj showed me, which seems like it will be a quicker way to do minor changes like these.
The predicate `maxSplits()` was previously applied dynamically to ensure that
any control flow node would keep track of at most `maxSplits()` number of splits.
However, there was no guarantee that two different copies of the same AST element
wouldn't contain different splits, so in general the number of copies for a given
AST element `e` could be on the order `$\binom{n}{k}c^k$`, where `n` is the total
number of splits that apply to `e`, `k = maxSplits()`, and `c` is a constant.
With this change, the relevant splits for `e` are instead computed statically,
meaning that the order is instead `$c^k$`.
With flow labels it often makes more sense to use a `DataFlow::Configuration` rather than a `TaintTracking::Configuration`, so flow summaries should support both.
My original fix in https://github.com/Semmle/ql/pull/1661 fixed my minimal test case, but did not fix the original failure in a Linux snapshot. The real fix is to simply not create a `TranslatedDeclarationEntry` for an extern declaration, and have `TranslatedDeclStmt` skip any such declarations. I've added a regression test for that case (multiple extern declarations with same location in a macro expansion, with control flow between them). I did verify that it generates correct IR, and that it fixes all of the "use not dominated by definition" failures in Linux.
The underlying extractor bug, that caused the above issue also caused PrintAST to print garbage. I've worked around the bug in PrintAST.qll.
I've also fixed a bug in the control flow for `try`/`catch`, where there was missing flow from the `CatchByType` of the last handler of a `try` to the enclosing handler (or `Unwind`). Hat tip to @AndreiDiaconu1 for spotting this bug.
Previously, where we had a function-scoped `DeclarationEntry` for an extern variable or function, we would generate a `NoOp` instruction for it. There's nothing wrong with this by itself, although it was unnecessary. However, I've hit an extractor issue (Jira ticket already opened) that commonly causes multiple `DeclStmt`s to share a single `DeclarationEntry` child on extern declarations, so removing the `NoOp` instructions is an easy way to work around the extractor issue.
We informally deprecated them in 1.21, this commit deprecates them properly and removes support from the implementation. The predicates themselves will be removed in a future release.
Rather than have IR.qll (which depends on the flavor) import EdgeKind.qll (which does not) with an non-relative import, I've moved these imports into internal.IRImports relative to IR.qll. These imports files can be shared across flavors within one language, but are different between C# and C++ due to the difference in paths.
This is still quadratic in the number of MemoryLocations for a vvar, but
only for a single pipeline step, which is not materialized. It seems to be
fast enough in practice for the IR.
Now that `PhiInstruction.getAnInput` only has results for congruent
operands, a previous optimization I made to `getConstantValue` is no
longer sound. We have to check that all phi inputs give the same value,
not just the congruent ones. After this change, if there are any
non-congruent operands on a phi instruction, the whole aggregate will
have no result.
In the `SignAnalysis` abstract interpretation, "unknown sign"
corresponds to the set of _all_ `Sign`, but using `getDef` leads to the
operand having _no_ `Sign`. To fix that, we assign all signs to inexact
operands.
The issue is now fixed in the extractor, and I've confirmed that the
workaround is no longer needed for g/an-tao/drogon.
This reverts commit 48a3385809.
This query seems to have been de-optimized by recent optimizer or stats
changes. On libretro/libretro-uae, the query took 1 second on a warm
cache with dist 89ad5f1 but took 9979 seconds with dist a3b9b6eb9.
The slowness was due to a Cartesian product in
`illDefined{Decr,Incr}ForStmt` between all the definitions and all the
uses of `Variable v`. This would be no problem with the right join
order, but that has apparently been lost. This commit factors out a pair
of `pragma[noinline]` helper predicates to make sure the definitions
(`v.getAnAssignedValue()`) and the uses (`v.getAnAccess()`) are queried
and filtered in separate predicates.
The performance problem can be seen in the tuple counts of this pipeline
I interrupted during evaluation of
`inconsistentLoopDirection::illDefinedDecrForStmt#ffff#shared`:
89716 ~3% {2} r1 = SCAN Variable::Variable::getAnAssignedValue_dispred#ff OUTPUT FIELDS {Variable::Variable::getAnAssignedValue_dispred#ff.<1>,Variable::Variable::getAnAssignedValue_dispred#ff.<0>}
89716 ~0% {3} r2 = JOIN r1 WITH DataFlowUtil::TExprNode#ff@staged_ext ON r1.<0>=DataFlowUtil::TExprNode#ff@staged_ext.<0> OUTPUT FIELDS {r1.<1>,DataFlowUtil::TExprNode#ff@staged_ext.<0>,DataFlowUtil::TExprNode#ff@staged_ext.<1>}
502539405 ~0% {4} r3 = JOIN r2 WITH Variable::Variable::getAnAccess_dispred#fb ON r2.<0>=Variable::Variable::getAnAccess_dispred#fb.<0> OUTPUT FIELDS {Variable::Variable::getAnAccess_dispred#fb.<1>,r2.<1>,r2.<2>,r2.<0>}
return r3
The last commit introduced calls to two predicates that did not exist. I
created `Instruction.getResultAddress` so it now exists and changed the
other call back to use the predicate that does exist.
This query used `getASuccessor()` on the CFG, which worked in many cases
but became quadratic on certain projects including PostgreSQL and
MySQL. The problem was that there was just enough context for magic to
apply to the transitive closure, but the use of magic meant that the
fast transitive closure algorithm wasn't used. In projects where the
magic had little effect, that led to the
`#ControlFlowGraph::ControlFlowNode::getASuccessor_dispred#bfPlus`
predicate taking quadratic time and space.
This commit changes the query to use basic blocks to find successors,
which is much faster because (1) there are many more `ControlFlowNode`s
than `BasicBlocks`, and (2) the optimizer does not apply magic but uses
fast transitive closure instead.
Behavior changes slightly in the `isUsedInCorrectLeapYearCheck` case: we
now accept a `yfacheck` that comes _before_ `yfa` if they are in the
same basic block. I don't think that matters in practice.
Without this `pragma[noopt]`, `isInCycle` gets compiled into RA that
unpacks every tuple of the fast TC:
0 ~0% {2} r1 = SELECT #Operand::getNonPhiOperandDef#3#ffPlus ON FIELDS #Operand::getNonPhiOperandDef#3#ffPlus.<0>=#Operand::getNonPhiOperandDef#3#ffPlus.<1>
0 ~0% {1} r2 = SCAN r1 OUTPUT FIELDS {r1.<0>}
return r2
With this change, it just becomes one lookup in the fast TC data
structure per instruction.
Just like `TInstruction` is cached to prevent re-numbering its tuples in
every IR query, I think `TOperand` should be cached too. I tested it on
the small comdb2 snapshot, where it only saves one second of work when
running a second IR query, but the savings should grow when snapshots
are larger and when there are more IR queries in a suite. Tuple
numbering is mildly quadratic, so it should be good to avoid repeating
it.
Adding these annotations adds three cached stages to the existing four
cached stages of the IR. The new cached stages are small and do not
appear to repeat any work from the other stages, so I see no advantage
to merging them with the existing stages.
The previous version of the test used `0 = 1;` to test an lvalue-typed
`ErrorExpr`, but the extractor replaced the whole assignment expression
with `ErrorExpr` instead of just the LHS. This variation of the test
only leads to an `ErrorExpr` for the part of the syntax that's supposed
to be an lvalue-typed expression, so that's an improvement.
Unfortunately it still doesn't demonstrate that we can `Store` into an
address computed by an `ErrorExpr`.
This doesn't fix the underlying problem that for some reason there are
cycles in the operand graph on our snapshots of the Linux kernel, but it
ensures that the cycles don't lead to non-termination of
`ConstantAnalysis` and `ValueNumbering`.
The `ValueNumbering` library is supposed to propagate value numberings
through a `CopyInstruction` only when it's _congruent_, meaning it must
have exact overlap with its source. A `CopyInstruction` can be a
`LoadInstruction`, a `StoreInstruction`, or a `CopyValueInstruction`.
The latter is also a `UnaryInstruction`, and the value numbering rule
for `UnaryInstruction` applied to it as well.
This meant that value numbering would propagate even through a
non-congruent `CopyValueInstruction`. That's semantically wrong but
probably only an issue in very rare circumstances, and it should get
corrected when we change the definition of `getUnary` to require
congruence.
What's worse is the performance implications. It meant that the value
numbering IPA witness could take two different paths through every
`CopyValueInstruction`. If multiple `CopyValueInstruction`s were
chained, this would lead to an exponential number of variable numbers
for the same `Instruction`, and we would run out of time and space
while performing value numbering.
This fixes the performance of `ValueNumbering.qll` on
https://github.com/asterisk/asterisk, although this project might also
require a separate change for fixing an infinite loop in the IR constant
analysis.
Before this change, `delete` and `delete[]` expressions had no control
flow after them, which caused the reachability analysis to remove all
code after a delete expression. This commit adds placeholder support for
delete expression by translating them to `NoOp` instructions so their
presence doesn't cause large chunks of the program to be removed.
We were previously missing a data-flow edge from reflected calls to the corresponding reflective call, that is, for `f.call(...)` we didn't have a flow edge from the implicit call to `f` to the result of `f.call(...)`.
This changes all the getters on `Instruction` to use `getDef` instead of
`getAnyDef`, with the result that these getters now only have a result
if the definition is exact.
This is a backwards-INCOMPATIBLE change.
These are the hand-written changes that complete the automatic changes
from the previous commit.
- Add deprecated compatibility wrappers for the renamed predicates.
- Add a new `Operand.getDef` predicate.
- Clarify the QLDoc for all these predicates.
This renames `getDefinitionInstruction` to `getAnyDef`, reflecting that
it includes definitions without exact overlap. It renames
`getUseInstruction` to `getUse` for consistency.
perl -p -i -e 's/\bgetUseInstruction\b/getUse/g; s/\bgetDefinitionInstruction\b/getAnyDef/g' \
cpp/ql/src/semmle/code/cpp/ir/**/*.ql* \
cpp/ql/test/**/*.ql* \
cpp/ql/src/semmle/code/cpp/rangeanalysis/**/*.ql*
The predicates `getter` and `setter` in `StructLikeClass.qll` were very
slow on some snapshots. On https://github.com/dotnet/coreclr they had
this performance:
StructLikeClass::getter#fff#antijoin_rhs ........... 3m55s
Variable::Variable::getAnAssignedValue_dispred#bb .. 3m36s
StructLikeClass::setter#fff#antijoin_rhs ........... 20.5s
The `getAnAssignedValue_dispred` predicate in the middle was slow due to
magic propagated from `setter`.
With this commit, performance is instead:
StructLikeClass::getter#fff#antijoin_rhs ........... 497ms
Variable::Variable::getAnAssignedValue_dispred#ff .. 617ms
StructLikeClass::setter#fff#antijoin_rhs ........... 158ms
Instead of hand-optimizing the QL for performance, I simplified `setter`
and `getter` to require slightly stronger conditions. Previously, a
function was only considered a setter if it had no writes to other
fields on the same class. That requirement is now relaxed by dropping
the "on the same class" part. I made the corresponding change for what
defines a getter. I think that still captures the spirit of what getters
and setters are.
I also changed the double-negation with `exists` into a `forall`.
The `sameBaseType` predicate was fundamentally quadratic, and this blew
up on large C++ code bases. Replacing it with calls to `Type.stripType`
fixes performance and does not affect the qltests. It looks like
`sameBaseType` was used purely an ad hoc heuristic, so I'm not worried
about the slight semantic difference between `sameBaseType` and
`stripType`.
In particular, `await`, `yield` and dynamic `import` expressions are now source nodes, as well as a few other experimental and legacy language features involving non-local flow.
The constraint `exists(callback.getParameter(i))` was getting pushed into `higherOrderCall`, which isn't a bad thing to do. However, this then led to a join on `i`, which is a very bad thing to do.
When completions are inherited by elements inside `finally` blocks, we previously
threw away the underlying completion. For example, in
```
try
{
if (b)
throw new Exception();
}
finally
{
if (b)
...
}
```
the completions for `b` inside the `finally` block are `true` and `throw(Exception)`,
where the latter is inherited from the `try` block, with an underlying `false`
completion. Throwing away the `false` completion meant that we were unable to prune
the `false` edge (Boolean CFG splitting).
- Make `InstructionViolation` abstract to avoid computing `getInstructionsUpTo()`
for all instructions in the database.
- Enable `consistency.ql`, which reports all consistency violations, and remove
all other specialized tests.
This is a workaround for an extractor issue where expressions in a
defaulted function are not always marked as generated. I haven't yet been
able to reproduce the issue in a test case.
To speed up the taint analysis in `NonConstantFormat.ql` and to remove
FPs that were due to taint spreading from `i` to `a[i]`, this commit
stops the taint tracking in `NonConstantFormat.ql` at every node that
could not possibly contain a string.
I tested performance on Wireshark, and it's fine. Pulling out the
`isSanitizerNode` prevented `isSanitizer` from turning into four
half-slow RA predicates due to both CPE and `#antijoin_rhs`
transformations happening.
- Split up the `last` predicate into a non-recursive part `lastNonRec` and a recursive
part `last`.
- Almost all syntactic constructs have a very simple `last` definition; a set of
disjuncts with exactly one recursive call -- those are defined in `lastNonRec`.
- `try` statements and (last) `catch` clauses require multiple recursive calls in
the same disjunct, and are therefore handled in the `last` predicate (as before).
- The benefit is that we only need to take care of the join order in the recursive
call (for non-`try`/`catch` statements) in one place (the predicate `lastRec`),
so we can get rid of many `nomagic`'ed `last`-specialisations.
- Add `Caching.qll` for controlling caching across multiple files.
- Move `isUncertainRefCall()` out of cached module in `Assignable.qll` to avoid
collapsing with CFG stage.
- Remove dependency on `AlwaysNullExpr` in `NullValue::getAnExpr()` to avoid
collapsing with CFG stage.
- Avoid caching pre-SSA library as it should only be used during the CFG construction
stage.
It's sometimes faster but sometimes up to 2x slower to use plain
recursion here. On the other hand, plain recursion won't run out of Java
heap space, and it won't make unrelated computation slower by forcing
all RAM data out to disk.
The use of transitive closure for BB index calculation has been the
cause of an out-of-memory error. This commit switches the calculation to
use the `shortestDistances` HOP, which still has the problem that the
result needs to fit in RAM, but at least the RAM requirements are sure
to be linear in the size of the result. The `shortestDistances` HOP is
already used for BB index calculation for the C++ IR and for C#.
We could guard even better against OOM by switching the calculation to
use manual recursion, but that would undo the much-needed performance
improvements we got from #123.
This change improves performance on Wireshark, which is notorious for
having long basic blocks. When I benchmarked `shortestDistances`
for #123, it was slower than TC. With the current evaluator, it looks
like `shortestDistances` is faster. Performance before was:
PrimitiveBasicBlocks::Cached::getMemberIndex#ff ................... 9.7s (executed 8027 times)
#PrimitiveBasicBlocks::Cached::member_step#ffPlus ................. 6.6s
PrimitiveBasicBlocks::Cached::primitive_basic_block_entry_node#f .. 3.5s
PrimitiveBasicBlocks::Cached::primitive_basic_block_member#fff .... 2.3s
Performance with this commit is:
PrimitiveBasicBlocks::Cached::primitive_basic_block_entry_node#f ................................................................... 3.5s
shortestDistances@PrimitiveBasicBlocks::Cached::primitive_basic_block_entry_node#1@PrimitiveBasicBlocks::Cached::member_step#2#fff . 3s
PrimitiveBasicBlocks::Cached::primitive_basic_block_member#fff ..................................................................... 963ms
This query is not producing good enough results to justify `@precision
high`. It's fundamentally looking for a pattern that should correlate
with memory management errors, but it doesn't look for the errors
themselves.
This query is supposed to look for constructors that unintentionally qualify as copy constructors due to default arguments. There are quite a few real-world projects that define such constructors intentionally. I've reduced the severity to "warning" and the precision to "low" due to the high false positive rate.
This works around BDD node exhaustion we get due to the complex type
hierarchy caused by importing many configurations at once. I've also
renamed the library accordingly.
Querying for overlap type wasn't possible when this library was first
written. This change fixes FPs in `RedundantNullCheckSimple.ql` on
Wireshark and other real-world projects.
IR construction was missing support for C++ 11 range-based `for` loops. The extractor generates ASTs for the compiler-generated implementation already, so I had enough information to generate IR. I've expanded on some of the predicates in `RangeBasedForStmt` to access the desugared information.
One complication was that the `DeclStmt`s for the compiler-generated variables seem to have results for `getDeclaration()` but not for `getDeclarationEntry()`. This required handling these slightly differently than we do for other `DeclStmt`s.
The flow for range-based `for` is actually easier than for a regular `for`, because all three components (init, condition, and update) are always present.
It turns out we didn't have to move the `getName` implementation into
the mirror classes in `QualifiedName`. Doing so only made it harder for
the optimiser to specialize calls to `getName` on various kinds of
`Declaration`.
In particular, the autobuilder will no longer succeed for projects that
contain HTML or YAML files but no JS/TS code. Further down the line,
this prevents LGTM.com from classifying such projects as "JavaScript"
projects.
For methods compiled without optimization (and possibly also with optimization),
it is possible for a variable update to have multiple possible assigned values.
For example, the non-optimized CIL for
```
return cond ? null : "not null"
```
is
```
0: nop
1: ldarg.0
2: ldfld cond
3: brtrue.s 6:
4: ldstr "not null"
5: br.s 7:
6: ldnull
7: stloc.0 L0 // stores either `null` or "not null"
8: br.s 9:
9: ldloc.0
10: ret
```
Consequently, an existential in `CallableReturns.qll` must be a `forex`.
- Extract names of properties in a propery match, using the `exprorstmt_name` relation.
- Simplify extraction of properties by not distinguishing between top-level patterns
and nested patterns.
- Introduce `PatternExpr` to capture patterns in `is` expressions, `case` statements,
and `switch` expression arms.
- Generalize `IsTypeExpr`, `IsPatternExpr`, `IsRecursivePatternExpr`, and `IsConstantExpr`
to just `IsExpr` with a member predicate `PatternExpr getPattern()`.
- Generalize `TypeCase`, `RecursivePatternCase`, and `ConstCase` to just `CaseStmt` with
a member predicate `PatternExpr getPattern()`.
- Introduce classes `Switch` and `Case` as base classes of switch statements/expressions
and case statements/switch expression arms, respectively.
- Simplify CFG logic using the generalized classes.
- Generalize guards library to cover `switch` expressions tests.
- Generalize data flow library to cover `switch` expression assignments.
This gains some performance by not computing locations for all
expressions since we are only interested in calls and variable accesses.
The `Top::hasLocationInfo` predicate goes from 2m28s to 1m32s on
Chromium.
This predicate was slow, mostly because it's just very large. A manual
join order cuts the run time on Chromium from
definitions::Top::hasLocationInfo_dispred#ffffff ..................... 3m23s
definitions::MacroAccessWithHasLocationInfo::hasLocationInfo#ffffff .. 1m56s
to
definitions::Top::hasLocationInfo#ffffff .... 2m28s
The main slowdown was the two uses of `SCAN` to reorder columns in the
RA.
In its present form, `getAnUndefinedReturn` does not handle `finally`
blocks correctly. For example, in this snippet
```
try {
return 42;
} finally {
cleanup();
}
```
the call to `cleanup` is erroneously considered an undefined return.
We currently don't use the predicate anywhere, so it seems best to back
it out for the time being.
The `reachable` predicate is large and slow to compute. It's part of a
mutual recursion that's non-linear, meaning it has a recursive call on
both sides of an `and`.
This change removes a part of the base case that has no effect on
recursive cases. The removed part is added back after the recursion has
finished.
Before, on Wireshark:
ControlFlowGraph::Cached::reachable#f .......... 20.8s (executed 9800 times)
ConstantExprs::successors_adapted#ff ........... 4.2s (executed 615 times)
ConstantExprs::potentiallyReturningFunction#f .. 3.9s (executed 9799 times)
ConstantExprs::possiblePredecessor#f ........... 2.9s (executed 788 times)
After, on Wireshark:
ConstantExprs::reachableRecursive#f ............ 13.2s (executed 9800 times)
ConstantExprs::successors_adapted#ff ........... 4.2s (executed 615 times)
ConstantExprs::potentiallyReturningFunction#f .. 4.3s (executed 9799 times)
ConstantExprs::possiblePredecessor#f ........... 2.6s (executed 788 times)
I've verified that this change doesn't change what's computed by
checking that the output of the following query is unchanged:
import cpp
import semmle.code.cpp.controlflow.internal.ConstantExprs
select
strictcount(ControlFlowNode n | reachable(n)) as reachable,
strictcount(ControlFlowNode n1, ControlFlowNode n2 | n2 = n1.getASuccessor()) as edges,
strictcount(FunctionCall c | aborting(c)) as abortingCall,
strictcount(Function f | abortingFunction(f)) as abortingFunction
This change only moves code around -- there are no changes to predicate
bodies or signatures.
The predicates that go in `ConstantExprs.Cached` after this change were
already cached in the same stage or, in the case of the `aborting*`
predicates, did not need to be cached. This is a fortunate consequence
of how the mutual recursion between the predicates happens to work, and
it's not going to be the case after the next commit.
The `addressConstantVariable` predicate was the slowest single predicate
when running the full LGTM suite on Chromium. Fortunately it's only
executed once, but it could be easily made faster by using the new
`Variable.isConstexpr` predicate instead of the slow workaround that was
in its place.
- General refactoring to fit with the shared data flow implementation.
- Move CFG splitting logic into `ControlFlowReachability.qll`.
- Replace `isAdditionalFlowStepIntoCall()` with `TaintedParameterNode`.
- Redefine `ReturnNode` to be the actual values that are returned, which should
yield better path information.
- No longer consider overrides in CIL calls.
Since the types in `QualifiedName.qll` are raw db types, callers need to
use `underlyingElement` and `unresolveElement` as appropriate. This has
no effect right now but will be needed when we switch the AST type
hierarchy to `newtype`s.
This removes all uses of `Declaration.getQualifiedName` that I think can
be removed without changing any behaviour. The following uses in the
LGTM default suite remain:
* `cpp/ql/src/Security/CWE/CWE-121/UnterminatedVarargsCall.ql` (in `select`).
* `cpp/ql/src/semmle/code/cpp/dataflow/internal/DataFlowDispatch.qll` (needs template args).
* `cpp/ql/src/semmle/code/cpp/security/FunctionWithWrappers.qll` (used for alert messages).
This imports `QualifiedName.qll` from
2f74a456290b9e0850b7308582e07f5d68de3a36 and makes minimal changes so it
compiles.
Original author: Ian Lynagh <ian@semmle.com>
I kept forgetting which operand on a Chi instruction was which, so I added dump labels. I added labels for the function target of a `Call`, for positional arguments, and for address operands as well.
Using `variableAccessedAsValue` fixes a FP because we can now
distinguish modifications to the parameter from modifications to data
_reachable from_ the parameter.
Now that the query has both tests and qhelp, we can use it on LGTM. This
commit also adds a change note.
I renamed the query to reduce confusion from the lower-case unquoted
word "alloca".
The `IRBlock::backEdgeSuccessor` predicate, in its three copies, had
become slow:
6:IRBlock::Cached::backEdgeSuccessor#fff ...... 1m1s
7:IRBlock::Cached::backEdgeSuccessor#2#fff .... 52.3s
8:IRBlock::Cached::backEdgeSuccessor#3#fff .... 26.4s
The slow part was finding all the nodes involved in cycles in the
`forwardEdgeRaw` graph. This was done with `forwardEdgeRaw+(pred, pred)`,
but that got compiled into a materialization of `forwardEdgeRaw+`, which
is a huge relation with 1,816,752,107 rows on Wireshark:
(1474s) Starting to evaluate predicate IRBlock::Cached::backEdgeSuccessor#3#fff
(1501s) Tuple counts:
0 ~0% {2} r1 = SELECT #IRBlock::Cached::forwardEdgeRaw#3#ffPlus ON FIELDS #IRBlock::Cached::forwardEdgeRaw#3#ffPlus.<0>=#IRBlock::Cached::forwardEdgeRaw#3#ffPlus.<1>
0 ~0% {1} r2 = SCAN r1 OUTPUT FIELDS {r1.<0>}
0 ~0% {3} r3 = JOIN r2 WITH IRBlock::Cached::blockSuccessor#6#fff ON r2.<0>=IRBlock::Cached::blockSuccessor#6#fff.<0> OUTPUT FIELDS {r2.<0>,IRBlock::Cached::blockSuccessor#6#fff.<1>,IRBlock::Cached::blockSuccessor#6#fff.<2>}
12411 ~7% {3} r4 = IRBlock::Cached::backEdgeSuccessorRaw#3#fff \/ r3
return r4
(1501s) >>> Relation IRBlock::Cached::backEdgeSuccessor#3#fff: 12411 rows using 0 MB
The problem is the `SELECT`. It's fast to join on a fastTC result once
we know what we're looking for, so this fix materializes the identity
relation on `IRBlock` and joins with that so the fastTC ends up on the
RHS of a join, where it's fast. I had to introduce a helper predicate
because even with `noopt` I couldn't get `pred = pred2` to come _before_
`forwardEdgeRaw+(pred, pred2)`. The predicate now takes less than a
second to evaluate:
(539s) Starting to evaluate predicate IRBlock::Cached::backEdgeSuccessor#fff
(539s) >>> Relation IRBlock::Cached::blockImmediatelyDominates#ff: 574677 rows using 0 MB
(539s) ... created with 574677 rows and 2 columns.
(539s) Tuple counts:
702445 ~1% {2} r1 = SELECT IRBlock::Cached::blockIdentity#ff ON FIELDS IRBlock::Cached::blockIdentity#ff.<0>=IRBlock::Cached::blockIdentity#ff.<1>
702445 ~1% {2} r2 = SCAN r1 OUTPUT FIELDS {r1.<0>,r1.<0>}
0 ~0% {1} r3 = JOIN r2 WITH #IRBlock::Cached::forwardEdgeRaw#ffPlus ON r2.<0>=#IRBlock::Cached::forwardEdgeRaw#ffPlus.<0> AND r2.<1>=#IRBlock::Cached::forwardEdgeRaw#ffPlus.<1> OUTPUT FIELDS {r2.<0>}
0 ~0% {3} r4 = JOIN r3 WITH IRBlock::Cached::blockSuccessor#2#fff ON r3.<0>=IRBlock::Cached::blockSuccessor#2#fff.<0> OUTPUT FIELDS {r3.<0>,IRBlock::Cached::blockSuccessor#2#fff.<1>,IRBlock::Cached::blockSuccessor#2#fff.<2>}
20487 ~0% {3} r5 = IRBlock::Cached::backEdgeSuccessorRaw#fff \/ r4
return r5
(539s) >>> Relation IRBlock::Cached::backEdgeSuccessor#fff: 20487 rows using 0 MB
Use the iterated dominance frontier algorithm to speed up dominance
frontier calculations. The implementation is copied from d310338c9b.
Before this change, the SSA calculations for unaliased and aliased SSA
used 169.9 seconds in total on these predicates:
7:Dominance::getDominanceFrontier#2#ff .. 49s
7:Dominance::blockDominates#2#ff ........ 47.5s
8:Dominance::getDominanceFrontier#ff .... 44.4s
8:Dominance::blockDominates#ff .......... 29s
After this change, the above predicates are replaced by two copies of
`getDominanceFrontier`, each of which takes less than a second.
This fixes `PointlessComparison.ql` on https://github.com/an-tao/drogon.
The QL is a bit obfuscated because it looks for a pattern that's
impossible according to the dbscheme. There is no accompanying test
because we haven't been able to boil this problem down to a simple test
case. If we could, we'd fix it directly in the extractor instead.
In addition to treating comparisons with literals as sanitisers, we now
also treat comparisons with variables that have a single assignment as
sanitisers.
Proving that such a variable is actually a constant is not easy, but for
this use case a simple approximation works fine.
This test case exposes two bugs in our data flow library (fixed by the
two previous commits):
- the charpreds of `SourcePathNode` and `SinkPathNode` only ensured
that they were on a path from a source to a sink, not that they
actually were the source/sink themselves;
- function summarization would allow for non-level paths; in the
test case, this meant that one of the summaries for `source`
represented the path returning from `source` on line 13 and then
flowing back into the call on line 15, in the process transforming
the parity of the flow label and hence causing a spurious flow.
Their charpreds previously only ensured that they were on a path from a
source to a sink, not that they actually were the source and sink,
respectively. See two commits further for a test case.
It now also has `step` and `smallstep` predicates. In the usual case,
however, I think I prefer the `SourceNode::track` API, so I left the
recommended style in the qldoc alone (and adjusted the one for
`TypeBackTracker` to match).
We avoid putting a variable into SSA if its address is ever taken in a way that could allow mutation of the variable via indirection. We currently just look to see if the address is either "pointer to non-const" or "reference to non-const". However, if the address was cast to an integral type (e.g. `uintptr_t n = (uintptr_t)&x;`), we were treating it as unescaped. This change makes the conservative assumption that casting a pointer to an integer may result in the pointed-to value being modified later.
This fixes a customer-reported false positive (#2 from https://discuss.lgtm.com/t/2-false-positives-in-c-for-comparison-is-always-same/1943)
Requiring strict inclusion between types turned out to cause false
positives in `SnprintfOverflow`, which relied indirectly on
`RangeAnalysisUtils::linearAccessImpl` to identify acceptable bounds
checks. This query was particularly affected because `snprintf` returns
`int` (signed) but takes `size_t` (unsigned), so conversions are bound
to happen.
compilers do. Various integral and floating-point types
are treated as mutually implicitly convertible. Remaining
warnings deal with misuse of pointer and array types.
This makes sure we can conclude from `(int)myShort == 0` that `myShort`
is 0 even though we can no longer conclude from `(short)myInt == 0` that
`myInt` is 0. Without this, we lost a good result in the test for
`InfiniteLoopWithUnsatisfiableExitCondition.ql`.
Reported by an LGTM customer here: https://discuss.lgtm.com/t/2-false-positives-in-c-for-comparison-is-always-same/1943.
Even though the comparison is pointless in the preprocessor configuration in effect during extraction, it is not pointless in other preprocessor configurations. Similar to ExprHasNoEffect, we'll now exclude results in functions that contain preprocessor-excluded code. I factored the similar code already used in ExprHasNoEffect in a non-recursive version into Preprocessor.qll, leaving the recursive version in ExprHasNoEffect.ql. I believe the recursive version is too aggressive for PointerlessComparison, which does no interprocedural analysis.
C promotion rules. The following issues are now flagged:
(1) passing a larger type than the receiver can accept
(e.g., long long -> int)
(2) passing a type of different signedness than the
parameter specified.
The IR construction code wasn't handling lambda expressions, so I added `TranslatedLambdaExpression`. It's pretty straightforward: it creates a temporary variable, initializes it with an `Uninitialized` instruction, then initializes the individual captured fields with the initializer list supplied in the AST.
When testing the case of a lambda with no captures, I noticed that we weren't handling initialization of empty structs with an initializer list correctly, so I fixed that along the way.
I was getting confused by the bad indentation for wrapped lines in
TranslatedInitialization.qll, so I fixed that up in a separate commit.
- Cache predicates in the same stage using a cached module.
- Introduce `DefUse::defUseVariableUpdate()` and use in `CallableReturns.qll`.
The updated file `csharp/ql/test/library-tests/cil/dataflow/Nullness.expected`
demonstrates why this is needed.
- Utilize CIL analysis in `Guards::nonNullValue()`.
- Analyze SSA definitions in `AlwaysNullExpr`, similar to `NonNullExpr`.
The `Expr.getType` predicate returns a pointer type since that's the
type of the `new`-expression as a whole. To find the class type, we use
`NewExpr.getAllocatedType`.
This commit reduces the number of alerts in a Qt snapshot from 229 to
51, and it removes the two false positives in
https://github.com/Subsurface-divelog/subsurface.
The main source of slowness in `BrokenCryptoAlgorithm.ql` was that the
regexp on function (macro) names was evaluated once per call
(invocation) instead of once per name. Factoring out separate predicates
for the problematic functions (macros) fixes this.
On https://github.com/ericniebler/range-v3, this change reduces the run
time of the two slowest predicates from
BrokenCryptoAlgorithm::InsecureMacroSpec#class#f .... 35.1s
BrokenCryptoAlgorithm::InsecureFunctionCall#class#f . 12.8s
to
BrokenCryptoAlgorithm::getAnInsecureFunction#f . 1.2s
BrokenCryptoAlgorithm::getAnInsecureMacro#f .... 12ms
Instead of `algorithmBlacklistRegex` having 2 * 5 results, it now has
only one result, which is a single regex that represents the union of
the previous 2 * 5 regexes. This means that `BrokenCryptoAlgorithm.ql`
has much less regex matching to do.
On https://github.com/ericniebler/range-v3, this change reduces the run
time of the two slowest predicates from
BrokenCryptoAlgorithm::InsecureMacroSpec#class#f .... 2m21s
BrokenCryptoAlgorithm::InsecureFunctionCall#class#f . 54.5s
to
BrokenCryptoAlgorithm::InsecureMacroSpec#class#f .... 35.1s
BrokenCryptoAlgorithm::InsecureFunctionCall#class#f . 12.8s
- Restrict `OverridableCallable::getAnOverrider(ValueOrRefType t)` to types `t`
that are sub types of the callable's declaring type.
- Use explicit recursion in `OverridableCallable::getInherited()`.
Please let em know if this pattern works, and I can add other mechanisms to start new threads with a shared object.
Please also let me know what other mechanisms would you like me to add, I would like to focus on the most commonly used ones first. Thanks
Performance has been improved via suitable predicate folding, and correctness
has been improved as the line
```
result = getType().(RefType).getAnIndexer().getAnAccessor().getACall()
```
was missing a `getABaseType*()` (now using the simpler `hasMember()` predicate
instead).
The `FlowVar::toString` predicate is purely a debugging aid, but
unfortunately it has to be `cached` because it's in a `cached` class.
Before this commit, it caused `Expr::toString` to be evaluated in full.
This implementation is borrowed from Java's QL library and offers a
great performance improvement. For example, on Wireshark the performance
goes from
Dominance::bbDominates#ff ....... 40.3s
SSAUtils::dominanceFrontier#ff .. 30s
to
SSAUtils::dominanceFrontier#ff .. 418ms (executed 67 times)
The big performance problem before was the need to materialize
`bbDominates`, which is the reflexive-transitive "basic block dominates"
relation. It had 79 million rows on Wireshark.
Some of these stubs were quite slow to evaluate. It's possible they
could be optimised, but it seems pointless as long as we don't have
call-context-sensitive virtual dispatch in the C++ library.
These queries are only tenuously security queries, and marking them as
security queries can cause them to have greater prominence than is
merited by the results that they report.
2018-10-03 11:48:27 +01:00
5839 changed files with 425361 additions and 137414 deletions
about: Tell us about an alert that shouldn't be reported
title: LGTM.com - false positive
labels: false-positive
assignees: ''
---
**Description of the false positive**
<!-- Please explain briefly why you think it shouldn't be included. -->
**URL to the alert on the project page on LGTM.com**
<!--
1. Open the project on LGTM.com.
For example, https://lgtm.com/projects/g/pallets/click/.
2. Switch to the `Alerts` tab. For example, https://lgtm.com/projects/g/pallets/click/alerts/.
3. Scroll to the alert that you would like to report.
4. Click on the right most icon `View this alert within the complete file`.
5. A new browser tab opens. Copy and paste the page URL here.
For example, https://lgtm.com/projects/g/pallets/click/snapshot/719fb7d8322b0767cdd1e5903ba3eb3233ba8dd5/files/click/_winconsole.py#xa08d213ab3289f87:1.
* Posting, or threatening to post, people’s personally identifying information (“doxing”).
* Insults, especially those using discriminatory terms or slurs.
* Behavior that could be perceived as sexual attention.
* Advocating for or encouraging any of the above behaviors.
* Advocating for or encouraging any of the above behaviors.
* Understand disagreements: Disagreements, both social and technical, are useful learning opportunities. Seek to understand others’ viewpoints and resolve differences constructively.
This code is not exhaustive or complete. It serves to capture our common understanding of a productive, collaborative environment. We expect the code to be followed in spirit as much as in the letter.
We welcome contributions to our standard library and standard checks, got an idea for a new check, or how to improve an existing query? Then please go ahead an open a Pull Request!
We welcome contributions to our standard library and standard checks. Got an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request!
Before we accept your pull request, we will require that you have agreed to our Contributor License Agreement, this is not something that you need to do before you submit your pull request, but until you've done so, we will be unable to accept your contribution.
Before we accept your pull request, we require that you have agreed to our Contributor License Agreement, this is not something that you need to do before you submit your pull request, but until you've done so, we will be unable to accept your contribution.
Please read our [QL Style Guide](docs/ql-style-guide.md) for information on how to format QL code in this repository.
## Adding a new query
If you have an idea for a query that you would like to share with other Semmle users, please open a pull request to add it to this repository.
Follow the steps below to help other users understand what your query does, and to ensure that your query is consistent with the other Semmle queries.
1.**Consult the documentation for query writers**
There is lots of useful documentation to help you write queries, ranging from information about query file structure to tutorials for specific target languages. For more information on the documentation available, see [Writing CodeQL queries](https://help.semmle.com/QL/learn-ql/writing-queries/writing-queries.html) on [help.semmle.com](https://help.semmle.com).
2.**Format your code correctly**
All of Semmle's standard queries and libraries are uniformly formatted for clarity and consistency, so we strongly recommend that all contributions follow the same formatting guidelines. If you use QL for Eclipse, you can auto-format your query in the [QL editor](https://help.semmle.com/ql-for-eclipse/Content/WebHelp/ql-editor.html). For more information, see the [CodeQL style guide](https://github.com/Semmle/ql/blob/master/docs/ql-style-guide.md).
3.**Make sure your query has the correct metadata**
Query metadata is used by Semmle's analysis to identify your query and make sure the query results are displayed properly.
The most important metadata to include are the `@name`, `@description`, and the `@kind`.
Other metadata properties (`@precision`, `@severity`, and `@tags`) are usually added after the query has been reviewed by Semmle staff.
For more information on writing query metadata, see the [Query metadata style guide](https://github.com/Semmle/ql/blob/master/docs/query-metadata-style-guide.md).
4.**Make sure the `select` statement is compatible with the query type**
The `select` statement of your query must be compatible with the query type (determined by the `@kind` metadata property) for alert or path results to be displayed correctly in LGTM and QL for Eclipse.
For more information on `select` statement format, see [Introduction to query files](https://help.semmle.com/QL/learn-ql/writing-queries/introduction-to-queries.html#select-clause) on help.semmle.com.
5.**Save your query in a `.ql` file in the correct language directory in this repository**
There are five language-specific directories in this repository:
* C/C++: `ql/cpp/ql/src`
* C#:`ql/csharp/ql/src`
* Java: `ql/java/ql/src`
* JavaScript: `ql/javascript/ql/src`
* Python: `ql/python/ql/src`
Each language-specific directory contains further subdirectories that group queries based on their `@tags` properties or purpose. Select the appropriate subdirectory for your new query, or create a new one if necessary.
6.**Write a query help file**
Query help files explain the purpose of your query to other users. Write your query help in a `.qhelp` file and save it in the same directory as your new query.
For more information on writing query help, see the [Query help style guide](https://github.com/Semmle/ql/blob/master/docs/query-help-style-guide.md).
## Using your personal data
@@ -14,7 +54,7 @@ repositories, which might be made public. We might also use this information
to contact you in relation to your contributions, as well as in the
normal course of software development. We also store records of your
CLA agreements. Under GDPR legislation, we do this
on the basis of our legitimate interest in creating the QL product.
on the basis of our legitimate interest in creating the CodeQL product.
Please do get in touch (privacy@semmle.com) if you have any questions about
This open source repository contains the standard QL libraries and queries that power [LGTM](https://lgtm.com), and the other products that [Semmle](https://semmle.com) makes available to its customers worldwide.
This open source repository contains the standard CodeQL libraries and queries that power [LGTM](https://lgtm.com), and the other products that [Semmle](https://semmle.com) makes available to its customers worldwide.
## How do I learn QL and run queries?
## How do I learn CodeQL and run queries?
There is [extensive documentation](https://help.semmle.com/QL/learn-ql/) on getting started with writing QL.
You can use the [interactive query console](https://lgtm.com/help/lgtm/using-query-console) on LGTM.com or the [QL for Eclipse](https://lgtm.com/help/lgtm/running-queries-ide) plugin to try out your queries on any open-source project that's currently being analyzed.
There is [extensive documentation](https://help.semmle.com/QL/learn-ql/) on getting started with writing CodeQL.
You can use the [interactive query console](https://lgtm.com/help/lgtm/using-query-console) on LGTM.com or the [CodeQL for Visual Studio Code](https://help.semmle.com/codeql/codeql-for-vscode.html) extension to try out your queries on any opensource project that's currently being analyzed.
## Contributing
We welcome contributions to our standard library and standard checks. Do you have an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request! Before you do, though, please take the time to read our [contributing guidelines](CONTRIBUTING.md) and [QL style guide](docs/ql-style-guide.md).
We welcome contributions to our standard library and standard checks. Do you have an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request! Before you do, though, please take the time to read our [contributing guidelines](CONTRIBUTING.md). You can also consult our [style guides](https://github.com/Semmle/ql/tree/master/docs) to learn how to format your code for consistency and clarity, how to write query metadata, and how to write query help documentation for your query.
## License
The QL queries in this repository are licensed under [Apache License 2.0](LICENSE) by [Semmle](https://semmle.com).
The code in this repository is licensed under [Apache License 2.0](LICENSE) by [Semmle](https://semmle.com).
| Call to alloca in a loop (`cpp/alloca-in-loop`) | reliability, correctness, external/cwe/cwe-770 | Finds calls to `alloca` in loops, which can lead to stack overflow if the number of iterations is large. Newly displayed [on LGTM](https://lgtm.com/rules/1508831665988/). |
| Call to function with fewer arguments than declared parameters (`cpp/too-few-arguments`) | correctness, maintainability, security | Finds all cases where the number of arguments is fewer than the number of parameters of the function, provided the function is also properly declared/defined elsewhere. Results are displayed by default [on LGTM](https://lgtm.com/rules/1508860726279/). |
| Call to a function with one or more incompatible arguments (`cpp/mistyped-function-arguments`) | correctness, maintainability | Finds all cases where the types of arguments do not match the types of parameters of the function, provided the function is also properly declared/defined elsewhere. Results are not displayed by default [on LGTM](https://lgtm.com/rules/1508849286093/). |
| Use of dangerous function (`cpp/dangerous-function-overflow`) | reliability, security, external/cwe/cwe-242 | Finds calls to `gets`, which does not guard against buffer overflow. These results were previously detected by the `cpp/potentially-dangerous-function` query. Results for both queries are displayed by default on LGTM. |
| Buffer not sufficient for string (`cpp/overflow-calculated`) | Fewer results | This query no longer reports results that would be found by the 'No space for zero terminator' (`cpp/no-space-for-terminator`) query. |
| Call to function with extraneous arguments (`cpp/futile-params`) | Improved coverage | Query has been generalized to find all cases where the number of arguments exceeds the number of parameters of the function, provided the function is also properly declared/defined elsewhere. |
| Commented-out code (`cpp/commented-out-code`) | More correct results | Commented-out preprocessor code is now detected by this query. |
| Comparison result is always the same | Fewer false positive results | The range analysis library is now more conservative about floating point values being possibly `NaN`. |
| Constructor with default arguments will be used as a copy constructor (`cpp/constructor-used-as-copy-constructor`) | Lowered severity and precision | The severity and precision of this query have been reduced to "warning" and "low", respectively. This coding pattern is used intentionally and safely in a number of real-world projects. Results are no longer displayed on LGTM unless you choose to display them. |
| Dead code due to goto or break statement (`cpp/dead-code-goto`) | Fewer false positive results | Functions containing preprocessor logic are now excluded from this analysis. |
| Memory is never freed (`cpp/memory-never-freed`) | More correct results | Support added for more Microsoft-specific allocation functions, including `LocalAlloc`, `GlobalAlloc`, `HeapAlloc` and `CoTaskMemAlloc`. |
| Memory may not be freed (`cpp/memory-may-not-be-freed`) | More correct results | Support added for more Microsoft-specific allocation functions, including `LocalAlloc`, `GlobalAlloc`, `HeapAlloc` and `CoTaskMemAlloc`. |
| Mismatching new/free or malloc/delete (`cpp/new-free-mismatch`) | Fewer false positive results | Fixed an issue where functions were being identified as allocation functions inappropriately. This correction also affects `cpp/new-array-delete-mismatch` and `cpp/new-delete-array-mismatch`. |
| No space for zero terminator (`cpp/no-space-for-terminator`) | More correct results | This query now detects calls to `std::malloc`. |
| Overflow in uncontrolled allocation size (`cpp/uncontrolled-allocation-size`) | More correct results | This query has been reworked so that it can find a wider variety of results. |
| Resource not released in destructor (`cpp/resource-not-released-in-destructor`) | Fewer false positive results | Resource allocation and deallocation functions are now determined more accurately. |
| Use of potentially dangerous function | More correct results | Calls to `localtime`, `ctime` and `asctime` are now detected by this query. |
| Wrong type of arguments to formatting function (`cpp/wrong-type-format-argument`) | More correct results and fewer false positive results | This query now understands non-standard uses of `%L`. In addition, it more accurately identifies wide and non-wide string/character format arguments on different platforms. |
| Use of potentially dangerous function (`cpp/potentially-dangerous-function`) | Fewer results | Results relating to the standard library `gets` function have been moved into a new query (`cpp/dangerous-function-overflow`). |
## Changes to QL libraries
- The predicate `Declaration.hasGlobalName` now only holds for declarations that are not nested in a class. For example, it no longer holds for a member function `MyClass::myFunction` or a constructor `MyClass::MyClass`, whereas previously it would classify those two declarations as global names.
- In class `Declaration`, predicates `getQualifiedName/0` and `hasQualifiedName/1` are no longer recommended for finding functions by name. Instead, use `hasGlobalName/1` and the new `hasQualifiedName/2` and `hasQualifiedName/3` predicates. This improves performance and identifies names involving templates and inline namespaces more reliably.
- Additional support for definition by reference has been added to the `semmle.code.cpp.dataflow.TaintTracking` library, including:
- Taint-specific edges for functions modeled in `semmle.code.cpp.models.interfaces.DataFlow`.
- Flow through library functions that are modeled in `semmle.code.cpp.models.interfaces.Taint`. Queries can add subclasses of `TaintFunction` to specify additional flow.
- There is a new `FoldExpr` class, representing C++17 fold expressions.
- The member predicates `DeclarationEntry.getUnspecifiedType`, `Expr.getUnspecifiedType`, and `Variable.getUnspecifiedType` have been added. These should be preferred over the existing `getUnderlyingType` predicates.
C# analysis now supports the extraction and analysis of many C# 8 features. For details see [Changes to code extraction](#changes-to-code-extraction) and [Changes to QL libraries](#changes-to-ql-libraries) below.
| Thread-unsafe capturing of an ICryptoTransform object (`cs/thread-unsafe-icryptotransform-captured-in-lambda`) | concurrency, security, external/cwe/cwe-362 | Highlights instances of classes where a field of type `System.Security.Cryptography.ICryptoTransform` is captured by a lambda, and appears to be used in a thread initialization method. Results are not shown on [LGTM](https://lgtm.com/rules/1508141845995/) by default. |
| Constant condition (`cs/constant-condition`) | Fewer false positive results | The query now ignores code where the `null` value is in a conditional expression on the left hand side of a null-coalescing expression. For example, in `(a ? b : null) ?? c`, `null` is not considered to be a constant condition. |
| Thread-unsafe use of a static ICryptoTransform field (`cs/thread-unsafe-icryptotransform-field-in-class`) | Fewer false positive results | The criteria for a result has changed to include nested properties, nested fields, and collections. The format of the alert message has changed to highlight the static field. The query name has been updated. |
| Useless upcast (`cs/useless-upcast`) | Fewer false positive results | The query now ignores code where the upcast is used to disambiguate the target of a constructor call. |
## Changes to code extraction
* The following C# 8 features are now extracted:
- Range expressions
- Recursive patterns
- Using declaration statements
-`static` modifiers on local functions
- Null-coalescing assignment expressions
* The `unmanaged` type parameter constraint is also now extracted.
## Changes to QL libraries
* The class `Attribute` has two new predicates: `getConstructorArgument()` and `getNamedArgument()`. The first predicate returns arguments to the underlying constructor call and the second returns named arguments for initializing fields and properties.
* The class `TypeParameterConstraints` has a new predicate `hasUnmanagedTypeConstraint()`. This shows whether the type parameter has the `unmanaged` constraint.
* The following QL classes have been added to model C# 8 features:
- Class `AssignCoalesceExpr` models null-coalescing assignment, for example `x ??= y`
- Class `IndexExpr` models from-end index expressions, for example `^1`
- Class `PatternExpr` is an `Expr` that appears in a pattern. It has the new subclasses `DiscardPatternExpr`, `LabeledPatternExpr`, `RecursivePatternExpr`, `TypeAccessPatternExpr`, `TypePatternExpr`, and `VariablePatternExpr`.
- Class `PatternMatch` models a pattern being matched. It has the subclasses `Case` and `IsExpr`.
- Class `PositionalPatternExpr` models position patterns, for example `(int x, int y)`
- Class `PropertyPatternExpr` models property patterns, for example `Length: int len`
- Class `RangeExpr` models range expressions, for example `1..^1`
- Class `SwitchCaseExpr` models the arm of a switch expression, for example `(false, false) => true`
- Class `SwitchExpr` models `switch` expressions, for example `(a, b) switch { ... }`
- Classes `IsConstantExpr`, `IsTypeExpr` and `IsPatternExpr` are deprecated in favour of `IsExpr`
- Class `Switch` models both `SwitchExpr` and `SwitchStmt`
- Class `Case` models both `CaseStmt` and `SwitchCaseExpr`
- Class `UsingStmt` models both `UsingBlockStmt` and `UsingDeclStmt`
| Implicit conversion from array to string (`java/print-array`) | Fewer false positive results | Results in slf4j logging calls are no longer reported as slf4j supports array printing. |
| Result of multiplication cast to wider type (`java/integer-multiplication-cast-to-long`) | Fewer false positive results | Range analysis is now used to exclude results involving multiplication of small values that cannot overflow. |
## Changes to QL libraries
* The `Guards` library has been extended to account for method calls that check
conditions by conditionally throwing an exception. This includes the
`checkArgument` and `checkState` methods in
`com.google.common.base.Preconditions`, the `isTrue` and `validState` methods
in `org.apache.commons.lang3.Validate`, as well as any similar custom
methods. This means that more guards are recognized which improves the precision of a number of queries including `java/index-out-of-bounds`,
`java/dereferenced-value-may-be-null`, and `java/useless-null-check`.
* The default sanitizer in taint tracking has been made more precise. The
sanitizer works by looking for guards that inspect tainted strings. It
previously worked at the level of individual variables. Now it
uses the `Guards` library, such that only guarded variable accesses are
sanitized. This may give additional results for security queries.
* Spring framework support now takes into account additional
annotations that indicate remote user input. This affects all security
* The security queries now track data flow through Base64 decoders such as the Node.js `Buffer` class, the DOM function `atob`, and a number of npm packages including [`abab`](https://www.npmjs.com/package/abab), [`atob`](https://www.npmjs.com/package/atob), [`btoa`](https://www.npmjs.com/package/btoa), [`base-64`](https://www.npmjs.com/package/base-64), [`js-base64`](https://www.npmjs.com/package/js-base64), [`Base64.js`](https://www.npmjs.com/package/Base64) and [`base64-js`](https://www.npmjs.com/package/base64-js).
* The security queries now track data flow through exceptions.
* The security queries now treat comparisons with symbolic constants as sanitizers, resulting in fewer false positive results.
* TypeScript 3.5 is now supported.
* On LGTM, TypeScript projects now have static type information extracted by default, resulting in more security results.
Users of the command-line tools must still pass `--typescript-full` to the extractor to enable this.
| Missing regular expression anchor (`js/regex/missing-regexp-anchor`) | correctness, security, external/cwe/cwe-20 | Highlights regular expression patterns that may be missing an anchor, indicating a possible violation of [CWE-20](https://cwe.mitre.org/data/definitions/20.html). Results are not shown on LGTM by default. |
| Prototype pollution (`js/prototype-pollution`) | security, external/cwe-250, external/cwe-400 | Highlights code that allows an attacker to modify a built-in prototype object through an unsanitized recursive merge function. Results are not shown on [LGTM](https://lgtm.com/rules/1508857356317/) by default. |
| Arbitrary file write during zip extraction ("Zip Slip") | More results | This rule now considers more libraries, including tar as well as zip. |
| Client-side URL redirect | More results and fewer false-positive results | This rule now recognizes additional uses of the document URL. It also treats URLs as safe in more cases where the hostname cannot be tampered with. |
| Double escaping or unescaping | More results | This rule now considers the flow of regular expressions literals. |
| Expression has no effect | Fewer false-positive results | This rule now treats uses of `Object.defineProperty` more conservatively. |
| Incomplete regular expression for hostnames | More results | This rule now tracks regular expressions for host names further. |
| Incomplete string escaping or encoding | More results | This rule now considers the flow of regular expressions literals, and it no longer flags the removal of trailing newlines. |
| Incorrect suffix check | Fewer false-positive results | This rule now recognizes valid checks in more cases. |
| Password in configuration file | Fewer false positive results | This query now excludes passwords that are inserted into the configuration file using a templating mechanism or read from environment variables. Results are no longer shown on LGTM by default. |
| Replacement of a substring with itself | More results | This rule now considers the flow of regular expressions literals. |
| Server-side URL redirect | Fewer false-positive results | This rule now treats URLs as safe in more cases where the hostname cannot be tampered with. |
| Tainted path | More results and fewer false-positive results | This rule now analyzes path manipulation code more precisely. |
| Type confusion through parameter tampering | Fewer false-positive results | This rule now recognizes additional emptiness checks. |
| Useless assignment to property | Fewer false-positive results | This rule now ignores reads of additional getters. |
| Unreachable statement | Unreachable throws no longer give an alert | This ignores unreachable throws, as they could be intentional (for example, to placate the TS compiler). |
## Changes to QL libraries
*`RegExpLiteral` is now a `DataFlow::SourceNode`.
*`JSDocTypeExpr` now has source locations and is a subclass of `Locatable` and `TypeAnnotation`.
* The two-parameter versions of predicate `isBarrier` in `DataFlow::Configuration` and of predicate `isSanitizer` in `TaintTracking::Configuration` have been renamed to `isBarrierEdge` and `isSanitizerEdge`, respectively. The old names are maintained for backwards-compatibility in this version, but will be deprecated in the next version and subsequently removed.
* Various predicates named `getTypeAnnotation()` now return `TypeAnnotation` instead of `TypeExpr`.
In rare cases, this may cause compilation errors in existing code. Cast the result to `TypeExpr` if this happens.
* The `getALabel` predicate in `LabeledBarrierGuardNode` and `LabeledSanitizerGuardNode`
has been deprecated and overriding it no longer has any effect.
Instead use the 3-parameter version of `blocks` or `sanitizes`.
The old API is now deprecated, but will be continued to be supported for at least another year.
### Impact on existing queries.
As points-to analysis underpins many queries, and provides the call-graph and reachability analysis required for taint-tracking, the results of many queries may change.
The improved reachability analysis and non-local tracking of bound methods may identify new results.
The increased precision in tracking of values through `*` arguments may remove false positive results.
Overall the number of true positive results should increase and the number false negative results should decline.
We welcome feedback on the new implementation, particularly any surprising changes in results.
## New queries
| **Query** | **Tags** | **Purpose** |
|-----------|----------|-------------|
| Accepting unknown SSH host keys when using Paramiko (`py/paramiko-missing-host-key-validation`) | security, external/cwe/cwe-295 | Finds instances where Paramiko is configured to accept unknown host keys. Results are shown [on LGTM](https://lgtm.com/rules/1508297729270/) by default. |
| Pythagorean calculation with sub-optimal numerics (`py/pythagorean`) | accuracy | Finds instances of hypotenuse calculation using `math.sqrt` instead of `math.hypot`. Results are not shown on LGTM by default. |
| Use of 'return' or 'yield' outside a function (`py/return-or-yield-outside-function`) | reliability, correctness | Finds instances where `return`, `yield`, and `yield from` are used outside a function. Results are not shown on LGTM by default. |
## Changes to code extraction
* String literals as expressions within literal string interpolation (f-strings) are now handled correctly.
* The Python extractor now handles invalid input more robustly. In particular, it exits gracefully when:
* A non-existent file or directory is specified using the `--path` option, or as a file name.
* An invalid number is specified for the `--max-procs` option.
* Custom file types can now be specified using the `filetypes` property in the `extraction/javascript/index` section of `lgtm.yml`. The property should be a map from file extensions (including the dot) to file types. Valid file types are `html`, `js`, `json`, `typescript`, `xml` and `yaml`.
* ECMAScript 2019 support is now enabled by default.
* On LGTM, JavaScript extraction for projects that do not contain any JavaScript or TypeScript code will now fail, even if the project contains other file types (such as HTML or YAML) recognized by the JavaScript extractor.
* XML files can now be extracted on LGTM. To enable XML extraction, set the `xml_mode` property in the `extraction/javascript/index` section of your `lgtm.yml` file to `all`. The default value of this property is `disabled`, meaning that XML files will not be extracted. (Note, that the `xml_mode` property does not apply to files that you map to the `xml` file type using the `filetypes` property. LGTM will always extract these files.)
* YAML files are now extracted by default on LGTM. If required, you can specify exclusion filters in your `lgtm.yml` file to override this behavior.
For detailed information about customizing LGTM extraction, see [JavaScript extraction](https://help.semmle.com/lgtm-enterprise/user/help/javascript-extraction.html).
| Call to alloca in a loop (`cpp/alloca-in-loop`) | Fewer false positive results | The query no longer highlights code where the stack allocation could not be reached multiple times in the loop, typically due to a `break` or `return` statement. |
| Continue statement that does not continue (`cpp/continue-in-false-loop`) | Fewer false positive results | Analysis is now restricted to `do`-`while` loops. This query is now run and displayed by default on LGTM. |
| Expression has no effect (`cpp/useless-expression`) | Fewer false positive results | Calls to functions with the `weak` attribute are no longer considered to be side-effect free, because they could be overridden with a different implementation at link time. |
| No space for zero terminator (`cpp/no-space-for-terminator`) | Fewer false positive results | False positive results for strings that are not null-terminated have been excluded. |
| Non-constant format string (`cpp/non-constant-format`) | Fewer false positive results | The query was rewritten using the taint-tracking library. |
| Sign check of bitwise operation (`cpp/bitwise-sign-check`) | Fewer false positive and more true positive results | The query now understands the direction of each comparison, making it more accurate. |
| Suspicious pointer scaling (`cpp/suspicious-pointer-scaling`) | Lower precision | The precision of this query has been reduced to "medium". This coding pattern is used intentionally and safely in a number of real-world projects. Results are no longer displayed on LGTM unless you choose to display them. |
| Variable used in its own initializer (`cpp/use-in-own-initializer`) | Fewer false positive results | False positive results for constant variables with the same name in different namespaces have been removed. |
## Changes to QL libraries
- The data flow library (`semmle.code.cpp.dataflow.DataFlow`) has had the
following improvements, all of which benefit the taint tracking library
(`semmle.code.cpp.dataflow.TaintTracking`) as well.
- This release includes preliminary support for interprocedural flow through
fields (non-static data members). In some cases, data stored in a field in
one function can now flow to a read of the same field in a different
function.
- The possibility of specifying barrier edges using
`isBarrierEdge`/`isSanitizerEdge` in data-flow and taint-tracking
configurations has been replaced with the option of specifying in- and
out-barriers on nodes by overriding `isBarrierIn`/`isSanitizerIn` and
`isBarrierOut`/`isSanitizerOut`. This should be simpler to use effectively,
as it does not require knowledge about the actual edges used internally by
the library.
- The library now models data flow through `std::swap`.
- Recursion through the `DataFlow` library is now always a compile error. Such recursion has been deprecated since release 1.16 in March 2018. If one `DataFlow::Configuration` needs to depend on the results of another, switch one of them to use one of the `DataFlow2` through `DataFlow4` libraries.
- In the `semmle.code.cpp.dataflow.TaintTracking` library, the second copy of `Configuration` has been renamed from `TaintTracking::Configuration2` to `TaintTracking2::Configuration`, and the old name is now deprecated. Import `semmle.code.cpp.dataflow.TaintTracking2` to access the new name.
- The `semmle.code.cpp.security.TaintTracking` library now considers a pointer difference calculation as blocking taint flow.
- The predicate `Variable.getAnAssignedValue()` now reports assignments to fields resulting from aggregate initialization (` = {...}`).
- The predicate `TypeMention.toString()` has been simplified to always return the string "`type mention`". This may improve performance when using `Element.toString()` or its descendants.
- Fixed the `LocalScopeVariableReachability.qll` library's handling of loops where the entry condition is always true on first entry, and where there is more than one control flow path through the loop condition. This change increases the accuracy of the `LocalScopeVariableReachability.qll` library and queries that depend on it.
- There is a new `Variable.isThreadLocal()` predicate. It can be used to tell whether a variable is `thread_local`.
- C/C++ code examples have been added to QLDoc comments on many more classes in the QL libraries.
| Constant condition (`cs/constant-condition`) | Fewer false positive results | Results have been removed for default cases (`_`) in switch expressions. |
| Dispose may not be called if an exception is thrown during execution (`cs/dispose-not-called-on-throw`) | Fewer false positive results | Results have been removed where an object is disposed both by a `using` statement and a `Dispose` call. |
| Unchecked return value (`cs/unchecked-return-value`) | Fewer false positive results | Method calls that are expression bodies of `void` callables (for example, the call to `Foo` in `void Bar() => Foo()`) are no longer considered to use the return value. |
## Removal of old queries
The following historic queries are no longer available in the distribution:
* Added lines (`cs/vcs/added-lines-per-file`)
* Churned lines (`cs/vcs/churn-per-file`)
* Defect filter
* Defect from SVN
* Deleted lines (`cs/vcs/deleted-lines-per-file`)
* Files edited in pairs
* Filter: only files recently edited
* Large files currently edited
* Metric from SVN
* Number of authors (version control) (`cs/vcs/authors-per-file`)
* Number of file-level changes (`cs/vcs/commits-per-file`)
* Number of co-committed files (`cs/vcs/co-commits-per-file`)
* Number of file re-commits (`cs/vcs/recommits-per-file`)
* Number of recent file changes (`cs/vcs/recent-commits-per-file`)
* Number of authors
* Number of commits
* Poorly documented files with many authors
* Recent activity
## Changes to code extraction
* The following C# 8 features are now extracted:
- Suppress-nullable-warning expressions, for example `x!`
- Nullable reference types, for example `string?`
## Changes to QL libraries
* The new class `AnnotatedType` models types with type annotations, including nullability information, return kinds (`ref` and `readonly ref`), and parameter kinds (`in`, `out`, and `ref`).
- The new predicate `Assignable.getAnnotatedType()` gets the annotated type of an assignable (such as a variable or a property).
- The new predicates `Callable.getAnnotatedReturnType()` and `DelegateType.getAnnotatedReturnType()` gets the annotated type of the return value.
- The new predicate `ArrayType.getAnnotatedElementType()` gets the annotated type of the array element.
- The new predicate `ConstructedGeneric.getAnnotatedTypeArgument()` gets the annotated type of a type argument.
- The new predicate `TypeParameterConstraints.getAnAnnotatedTypeConstraint()` gets a type constraint with type annotations.
* The new class `SuppressNullableWarningExpr` models suppress-nullable-warning expressions such as `x!`.
* The data-flow and taint-tracking libraries now support flow through fields. All existing configurations will have field-flow enabled by default, but it can be disabled by adding `override int fieldFlowBranchLimit() { result = 0 }` to the configuration class. Field assignments, `this.Foo = x`, object initializers, `new C() { Foo = x }`, and field initializers `int Foo = 0` are supported.
* The possibility of specifying barrier edges using
`isBarrierEdge`/`isSanitizerEdge` in data-flow and taint-tracking
configurations has been replaced with the option of specifying in- and
out-barriers on nodes by overriding `isBarrierIn`/`isSanitizerIn` and
`isBarrierOut`/`isSanitizerOut`. This should be simpler to use effectively,
as it does not require knowledge about the actual edges used internally by
| Equals method does not inspect argument type (`java/unchecked-cast-in-equals`) | Fewer false positive and more true positive results | Precision has been improved by doing a bit of inter-procedural analysis and relying less on ad-hoc method names. |
| Uncontrolled data in arithmetic expression (`java/uncontrolled-arithmetic`) | Fewer false positive results | Precision has been improved in several ways, in particular, by better detection of guards along the data-flow path. |
| Uncontrolled data used in path expression (`java/path-injection`) | Fewer false positive results | The query no longer reports results guarded by `!var.contains("..")`. |
| User-controlled data in arithmetic expression (`java/tainted-arithmetic`) | Fewer false positive results | Precision has been improved in several ways, in particular, by better detection of guards along the data-flow path. |
## Changes to QL libraries
* The virtual dispatch library has been updated to give more precise dispatch
targets for `Object.toString()` calls. This affects all security queries and
removes false positive results that arose from paths through impossible `toString()`
calls.
* The library `VCS.qll` and all queries that imported it have been removed.
* The second copy of the interprocedural `TaintTracking` library has been
renamed from `TaintTracking::Configuration2` to
`TaintTracking2::Configuration`, and the old name is now deprecated. Import
`semmle.code.java.dataflow.TaintTracking2` to access the new name.
* The data-flow library now makes it easier to specify barriers/sanitizers
arising from guards by overriding the predicate
`isBarrierGuard`/`isSanitizerGuard` on data-flow and taint-tracking
configurations respectively.
* The possibility of specifying barrier edges using
`isBarrierEdge`/`isSanitizerEdge` in data-flow and taint-tracking
configurations has been replaced with the option of specifying in- and
out-barriers on nodes by overriding `isBarrierIn`/`isSanitizerIn` and
`isBarrierOut`/`isSanitizerOut`. This should be simpler to use effectively,
as it does not require knowledge about the actual edges used internally by
* Automatic classification of test files has been improved, in particular `__tests__` and `__mocks__` folders (as used by [Jest](https://jestjs.io)) are now recognized.
* Support for the following frameworks and libraries has been improved:
* Support for tracking data flow and taint through getter functions (that is, functions that return a property of one of their arguments) and through the receiver object of method calls has been improved. This may produce more security alerts.
* Taint tracking through object property names has been made more precise, resulting in fewer false positive results.
* Method calls are now resolved in more cases, due to improved class hierarchy analysis. This may produce more security alerts.
* Jump-to-definition now resolves calls to their definition in more cases, and supports jumping from a JSDoc type annotation to its definition.
| Indirect uncontrolled command line (`js/indirect-command-line-injection`) | correctness, security, external/cwe/cwe-078, external/cwe/cwe-088 | Highlights command-line invocations that may indirectly introduce a command-line injection vulnerability elsewhere, indicating a possible violation of [CWE-78](https://cwe.mitre.org/data/definitions/78.html). Results are not shown on LGTM by default. |
| Conflicting HTML element attributes (`js/conflicting-html-attribute`) | No changes to results | Results are no longer shown on LGTM by default. |
| Shift out of range (`js/shift-out-of-range`| Fewer false positive results | This rule now correctly handles BigInt shift operands. |
| Superfluous trailing arguments (`js/superfluous-trailing-arguments`) | Fewer false-positive results. | This rule no longer flags calls to placeholder functions that trivially throw an exception. |
| Undocumented parameter (`js/jsdoc/missing-parameter`) | No changes to results | This rule is now run on LGTM, although its results are still not shown by default. |
| Missing space in string concatenation (`js/missing-space-in-concatenation`) | Fewer false positive results | The rule now requires a word-like part exists in the string concatenation. |
## Changes to QL libraries
- The `getName()` predicate on functions and classes now gets a name that is
inferred from the context if the function or class was not declared with a name.
- The two-argument and three-argument variants of `DataFlow::Configuration::isBarrier` and
`TaintTracking::Configuration::isSanitizer` have been deprecated. Overriding them no
longer has any effect. Use `isBarrierEdge` and `isSanitizerEdge` instead.
- The QLDoc for most AST classes have been expanded with concrete syntax examples.
- Tutorials on how to use [flow labels](https://help.semmle.com/QL/learn-ql/javascript/flow-labels.html)
and [type tracking](https://help.semmle.com/QL/learn-ql/javascript/type-tracking.html) have been published,
as well as a [data flow cheat sheet](https://help.semmle.com/QL/learn-ql/javascript/dataflow-cheat-sheet.html) for quick reference.
Tracking of "unknown" values from modules that are absent from the database has been improved. Particularly when an "unknown" value is used as a decorator, the decorated function is tracked.
### Loop unrolling
The extractor now unrolls a single iteration of loops that are known to run at least once. This improves analysis in cases like the following
```python
ifseq:
forxinseq:
y=x
y# y is defined here
```
### Better API for function parameter annotations
Instances of the `Parameter` and `ParameterDefinition` class now have a `getAnnotation` method that returns the corresponding parameter annotation, if one exists.
### Improvements to the Value API
- The Value API has been extended with classes representing functions, classes, tuples, and other types.
-`Value::forInt(int x)` and `Value::forString(string s)` have been added to make it easier to refer to the `Value` entities for common constants.
### Other improvements
- Short flags for regexes (for example, `re.M` for multiline regexes) are now handled correctly.
- Modules with multiple import roots no longer get multiple names.
- A new `NegativeIntegerLiteral` class has been added as a subtype of `ImmutableLiteral`, so that `-1` is treated as an `ImmutableLiteral`. This means that queries looking for the use of constant integers will automatically handle negative numbers.
## New queries
| **Query** | **Tags** | **Purpose** |
|-----------|----------|-------------|
| Arbitrary file write during tarfile extraction (`py/tarslip`) | security, external/cwe/cwe-022 | Finds instances where extracting from a tar archive can result in arbitrary file writes. Results are not shown on LGTM by default. |
| Hard-coded Japanese era start date (`cpp/japanese-era/exact-era-date`) | reliability, japanese-era | This query is a combination of two old queries that were identical in purpose but separate as an implementation detail. This new query replaces Hard-coded Japanese era start date in call (`cpp/japanese-era/constructor-or-method-with-exact-era-date`) and Hard-coded Japanese era start date in struct (`cpp/japanese-era/struct-with-exact-era-date`). Results are not shown on LGTM by default. |
| Pointer overflow check (`cpp/pointer-overflow-check`) | correctness, security | Finds overflow checks that rely on pointer addition to overflow, which has undefined behavior. Example: `ptr + a < ptr`. Results are shown on LGTM by default. |
| Signed overflow check (`cpp/signed-overflow-check`) | correctness, security | Finds overflow checks that rely on signed integer addition to overflow, which has undefined behavior. Example: `a + b < a`. Results are shown on LGTM by default. |
| Comparison of narrow type with wide type in loop condition (`cpp/comparison-with-wider-type`) | Higher precision | The precision of this query has been increased to "high" as the alerts from this query have proved to be valuable on real-world projects. With this precision, results are now displayed by default in LGTM. |
| Hard-coded Japanese era start date in call (`cpp/japanese-era/constructor-or-method-with-exact-era-date`) | Deprecated | This query has been deprecated. Use the new combined query Hard-coded Japanese era start date (`cpp/japanese-era/exact-era-date`) instead. |
| Hard-coded Japanese era start date in struct (`cpp/japanese-era/struct-with-exact-era-date`) | Deprecated | This query has been deprecated. Use the new combined query Hard-coded Japanese era start date (`cpp/japanese-era/exact-era-date`) instead. |
| Hard-coded Japanese era start date (`cpp/japanese-era/exact-era-date`) | More correct results | This query now checks for the beginning date of the Reiwa era (1st May 2019). |
| Non-constant format string (`cpp/non-constant-format`) | Fewer false positive results | Fixed false positive results triggrered by mismatching declarations of a formatting function. |
| Sign check of bitwise operation (`cpp/bitwise-sign-check`) | Fewer false positive results | Results involving `>=` or `<=` are no longer reported. |
| Too few arguments to formatting function (`cpp/wrong-number-format-arguments`) | Fewer false positive results | Fixed false positive results triggered by mismatching declarations of a formatting function. |
| Too many arguments to formatting function (`cpp/too-many-format-arguments`) | Fewer false positive results | Fixed false positive results triggered by mismatching declarations of a formatting function. |
| Unclear comparison precedence (`cpp/comparison-precedence`) | Fewer false positive results | False positive results involving template classes and functions have been fixed. |
| Wrong type of arguments to formatting function (`cpp/wrong-type-format-argument`) | More correct results and fewer false positive results | This query now understands explicitly-specified argument numbers in format strings, such as the `1$` in `%1$s`. |
## Changes to libraries
* The data-flow library in `semmle.code.cpp.dataflow.DataFlow` and
`semmle.code.cpp.dataflow.TaintTracking` have had extensive changes:
* Data flow through fields is now more complete and reliable.
* The data-flow library has been extended with a new feature to aid debugging.
Previously, to explore the possible flow from all sources you could specify `isSink(Node n) { any() }` on a configuration.
Now you can use the new `Configuration::hasPartialFlow` predicate,
which gives a more complete picture of the partial flow paths from a given source, including flow that doesn't reach any sink.
The feature is disabled by default and can be enabled for individual configurations by overriding `int explorationLimit()`.
* There is now flow out of C++ reference parameters.
* There is now flow through the address-of operator (`&`).
* The `DataFlow::DefinitionByReferenceNode` class now considers `f(x)` to be a
definition of `x` when `x` is a variable of pointer type. It no longer
considers deep paths such as `f(&x.myField)` to be definitions of `x`. These
changes are in line with the user expectations we've observed.
* It's now easier to specify barriers/sanitizers
arising from guards by overriding the predicate
`isBarrierGuard`/`isSanitizerGuard` on data-flow and taint-tracking
configurations respectively.
* There is now a `DataFlow::localExprFlow` predicate and a
`TaintTracking::localExprTaint` predicate to make it easy to use the most
common case of local data flow and taint: from one `Expr` to another.
* The member predicates of the `FunctionInput` and `FunctionOutput` classes have been renamed for
clarity (for example, `isOutReturnPointer()` to `isReturnValueDeref()`). The existing member predicates
have been deprecated, and will be removed in a future release. Code that uses the old member
predicates should be updated to use the corresponding new member predicate.
* The predicate `Declaration.hasGlobalOrStdName` has been added, making it
easier to recognize C library functions called from C++.
* The control-flow graph is now computed in QL, not in the extractor. This can
lead to changes in how queries are optimized because
optimization in QL relies on static size estimates, and the control-flow edge
relations will now have different size estimates than before.
* Support has been added for non-type template arguments. This means that the
return type of `Declaration::getTemplateArgument()` and
`Declaration::getATemplateArgument` have changed to `Locatable`. For details, see the
CodeQL library documentation for `Declaration::getTemplateArgument()` and
| Deserialized delegate (`cs/deserialized-delegate`) | security, external/cwe/cwe-502 | Finds unsafe deserialization of delegate types. Results are shown on LGTM by default. |
| Deserialization of untrusted data (`cs/unsafe-deserialization-untrusted-input`) | security, external/cwe/cwe-502 | Finds flow of untrusted input to calls to unsafe deserializers. Results are shown on LGTM by default. |
| Mishandling the Japanese era start date (`cs/mishandling-japanese-era`) | reliability, date-time | Finds hard-coded Japanese era start dates that could be invalid. Results are not shown on LGTM by default. |
| Unsafe year argument for 'DateTime' constructor (`cs/unsafe-year-construction`) | reliability, date-time | Finds incorrect manipulation of `DateTime` values, which could lead to invalid dates. Results are not shown on LGTM by default. |
| Unsafe deserializer (`cs/unsafe-deserialization`) | security, external/cwe/cwe-502 | Finds calls to unsafe deserializers. By default, the query is not run on LGTM. |
| Dereferenced variable may be null (`cs/dereferenced-value-may-be-null`) | Fewer false positive results | More `null` checks are now taken into account, including `null` checks for `dynamic` expressions and `null` checks such as `object alwaysNull = null; if (x != alwaysNull) ...`. |
| Missing Dispose call on local IDisposable (`cs/local-not-disposed`) | Fewer false positive results | The query has been rewritten in order to identify more dispose patterns. For example, a local `IDisposable` that is disposed of by passing through a fluent API is no longer reported as missing a dispose call. |
## Changes to code extraction
*`nameof` expressions are now extracted correctly when the name is a namespace.
## Changes to libraries
* The new class `NamespaceAccess` models accesses to namespaces, for example in `nameof` expressions.
* The data-flow library now makes it easier to specify barriers/sanitizers
arising from guards. You can override the predicate
`isBarrierGuard`/`isSanitizerGuard` on data-flow and taint-tracking
configurations respectively.
* The data-flow library has been extended with a new feature to aid debugging.
Previously, to explore the possible flow from all sources you could specify `isSink(Node n) { any() }` on a configuration.
Now you can use the new `Configuration::hasPartialFlow` predicate,
which gives a more complete picture of the partial flow paths from a given source, including flow that doesn't reach any sink.
The feature is disabled by default and can be enabled for individual configurations by
overriding `int explorationLimit()`.
*`foreach` statements where the body is guaranteed to be executed at least once, such as `foreach (var x in new string[]{ "a", "b", "c" }) { ... }`, are now recognized by all analyses based on the control-flow graph (such as SSA, data flow and taint tracking).
* Fixed the control-flow graph for `switch` statements where the `default` case was not the last case. This had caused the remaining cases to be unreachable. `SwitchStmt.getCase(int i)` now puts the `default` case last.
* There is now a `DataFlow::localExprFlow` predicate and a
`TaintTracking::localExprTaint` predicate to make it easy to use the most
common case of local data flow and taint: from one `Expr` to another.
* Data is now tracked through null-coalescing expressions (`??`).
* A new library `semmle.code.csharp.Unification` has been added. This library exposes two predicates `unifiable` and `subsumes` for calculating type unification and type subsumption, respectively.
| Continue statement that does not continue (`java/continue-in-false-loop`) | correctness | Finds `continue` statements in `do { ... } while (false)` loops. Results are shown on LGTM by default. |
| Disabled Netty HTTP header validation (`java/netty-http-response-splitting`) | security, external/cwe/cwe-113 | Finds response-splitting vulnerabilities due to Netty HTTP header validation being disabled. Results are shown on LGTM by default. |
| Dereferenced variable may be null (`java/dereferenced-value-may-be-null`) | Fewer false positive results | Additional indirect null guards are detected, where two auxiliary variables are known to be equal. |
| Non-synchronized override of synchronized method (`java/non-sync-override`) | Fewer false positive results | Results are now only reported if the immediately overridden method is synchronized. |
| Query built from local-user-controlled sources (`java/sql-injection-local`) | More results | The query now identifies arguments to `Statement.executeLargeUpdate` and `Connection.prepareCall` as sinks for SQL expressions. |
| Query built from user-controlled sources (`java/sql-injection`) | More results | The query now identifies arguments to `Statement.executeLargeUpdate` and `Connection.prepareCall` as sinks for SQL expressions. |
| Query built without neutralizing special characters (`java/concatenated-sql-query`) | More results | The query now identifies arguments to `Statement.executeLargeUpdate` and `Connection.prepareCall` as sinks for SQL expressions. |
| Useless comparison test (`java/constant-comparison`) | Fewer false positive results | Additional overflow check patterns are now recognized and no longer reported. Also, a few bug fixes in the range analysis for floating-point variables gives a further reduction in false positive results. |
## Changes to libraries
The data-flow library has been extended with a new feature to aid debugging.
Previously, to explore the possible flow from all sources you could specify `isSink(Node n) { any() }` on a configuration.
Now you can use the new `Configuration::hasPartialFlow` predicate,
which gives a more complete picture of the partial flow paths from a given source, including flow that doesn't reach any sink.
The feature is disabled by default and can be enabled for individual configurations by overriding `int explorationLimit()`.
| Ignoring result from pure array method (`js/ignore-array-result`) | maintainability, correctness | Highlights calls to array methods without side effects where the return value is ignored. Results are shown on LGTM by default. |
| Incomplete URL scheme check (`js/incomplete-url-scheme-check`) | security, correctness, external/cwe/cwe-020 | Highlights checks for `javascript:` URLs that do not take `data:` or `vbscript:` URLs into account. Results are shown on LGTM by default. |
| Loop bound injection (`js/loop-bound-injection`) | security, external/cwe/cwe-834 | Highlights loops where a user-controlled object with an arbitrary `.length` value can trick the server into looping indefinitely. Results are shown on LGTM by default. |
| Shell command built from environment values (`js/shell-command-injection-from-environment`) | correctness, security, external/cwe/cwe-078, external/cwe/cwe-088 | Highlights shell commands that may change behavior inadvertently depending on the execution environment, indicating a possible violation of [CWE-78](https://cwe.mitre.org/data/definitions/78.html). Results are shown on LGTM by default.|
| Suspicious method name (`js/suspicious-method-name-declaration`) | correctness, typescript, methods | Highlights suspiciously named methods where the developer likely meant to write a constructor or function. Results are shown on LGTM by default. |
| Unreachable method overloads (`js/unreachable-method-overloads`) | correctness, typescript | Highlights method overloads that are impossible to use from client code. Results are shown on LGTM by default. |
| Unused index variable (`js/unused-index-variable`) | correctness | Highlights loops that iterate over an array, but do not use the index variable to access array elements, indicating a possible typo or logic error. Results are shown on LGTM by default. |
| Use of returnless function (`js/use-of-returnless-function`) | maintainability, correctness | Highlights calls where the return value is used, but the callee never returns a value. Results are shown on LGTM by default. |
| Useless regular expression character escape (`js/useless-regexp-character-escape`) | correctness, security, external/cwe/cwe-20 | Highlights regular expression strings with useless character escapes, indicating a possible violation of [CWE-20](https://cwe.mitre.org/data/definitions/20.html). Results are shown on LGTM by default. |
| Client-side cross-site scripting (`js/xss`) | More results, fewer false positive results | More potential vulnerabilities involving functions that manipulate DOM attributes are now recognized, and more sanitizers are detected. |
| Code injection (`js/code-injection`) | More results | More potential vulnerabilities involving functions that manipulate DOM event handler attributes are now recognized. |
| Hard-coded credentials (`js/hardcoded-credentials`) | Fewer false positive results | This rule now flags fewer password examples. |
| Illegal invocation (`js/illegal-invocation`) | Fewer false positive results | This rule now correctly handles methods named `call` and `apply`. |
| Incomplete string escaping or encoding (`js/incomplete-sanitization`) | Fewer false positive results | This rule now recognizes additional ways delimiters can be stripped away. |
| Incorrect suffix check (`js/incorrect-suffix-check`) | Fewer false positive results | The query recognizes valid checks in more cases. |
| Network data written to file (`js/http-to-file-access`) | Fewer false positive results | This query has been renamed to better match its intended purpose, and now only considers network data untrusted. |
| Password in configuration file (`js/password-in-configuration-file`) | Fewer false positive results | This rule now flags fewer password examples. |
| Prototype pollution (`js/prototype-pollution`) | More results | The query now highlights vulnerable uses of jQuery and Angular, and the results are shown on LGTM by default. |
| Reflected cross-site scripting (`js/reflected-xss`) | Fewer false positive results | The query now recognizes more sanitizers. |
| Stored cross-site scripting (`js/stored-xss`) | Fewer false positive results | The query now recognizes more sanitizers. |
| Uncontrolled command line (`js/command-line-injection`) | More results | This query now treats responses from servers as untrusted. |
| Uncontrolled data used in path expression (`js/path-injection`) | Fewer false positive results | This query now recognizes calls to Express `sendFile` as safe in some cases. |
| Unknown directive (`js/unknown-directive`) | Fewer false positive results | This query no longer flags uses of ":", which is sometimes used like a directive. |
## Changes to libraries
*`Expr.getDocumentation()` now handles chain assignments.
* String literals are now parsed as regular expressions.
Consequently, a `RegExpTerm` may occur as part of a string literal or
as a regular expression literal. Queries that search for regular expressions may need to
use `RegExpTerm.isPartOfRegExpLiteral` or `RegExpTerm.isUsedAsRegExp` to restrict the search.
A regular expression AST can be obtained from a string literal using `StringLiteral.asRegExp`.
## Removal of deprecated queries
The following queries (deprecated since 1.17) are no longer available in the distribution:
* Bad parity check (js/incomplete-parity-check)
* Builtin redefined (js/builtin-redefinition)
* Call to parseInt without radix (js/parseint-without-radix)
Python 3.8 syntax is now supported. In particular, the following constructs are parsed correctly:
- Assignment expressions using the "walrus" operator, such as `while chunk := file.read(1024): ...`.
- The positional argument separator `/`, such as in `def foo(a, /, b, *, c): ...`.
- Self-documenting expressions in f-strings, such as `f"{var=}"`.
### General query improvements
Following the replacement of the `Object` API (for example, `ClassObject`) in favor of the
`Value` API (for example, `ClassValue`) in the 1.21 release, many of the standard queries have been updated
to use the `Value` API. This should result in more precise results.
## New queries
| **Query** | **Tags** | **Purpose** |
|-----------|----------|-------------|
| Clear-text logging of sensitive information (`py/clear-text-logging-sensitive-data`) | security, external/cwe/cwe-312 | Finds instances where sensitive information is logged without encryption or hashing. Results are shown on LGTM by default. |
| Clear-text storage of sensitive information (`py/clear-text-storage-sensitive-data`) | security, external/cwe/cwe-312 | Finds instances where sensitive information is stored without encryption or hashing. Results are shown on LGTM by default. |
| Binding a socket to all network interfaces (`py/bind-socket-all-network-interfaces`) | security | Finds instances where a socket is bound to all network interfaces. Results are shown on LGTM by default. |
| Explicit export is undefined (`py/undefined-export`) | Fewer false positive results | Instances where an exported value may be defined in a module that lacks points-to information are no longer flagged. |
| Module-level cyclic import (`py/unsafe-cyclic-import`) | Fewer false positive results | Instances where one of the links in an import cycle is never actually executed are no longer flagged. |
| Non-iterable used in for loop (`py/non-iterable-in-for-loop`) | Fewer false positive results | `__aiter__` is now recognized as an iterator method. |
| Unreachable code (`py/unreachable-statement`) | Fewer false positive results | Analysis now accounts for uses of `contextlib.suppress` to suppress exceptions. |
| Unreachable code (`py/unreachable-statement`) | Fewer false positive results | Unreachable `else` branches that do nothing but `assert` their non-reachability are no longer flagged. |
| Unused import (`py/unused-import`) | Fewer false positive results | Instances where a module is used in a forward-referenced type annotation, or only during type checking are no longer flagged. |
| `__iter__` method returns a non-iterator (`py/iter-returns-non-iterator`) | Better alert message | Alert now highlights which class is expected to be an iterator. |
| `__init__` method returns a value (`py/explicit-return-in-init`) | Fewer false positive results | Instances where the `__init__` method returns the value of a call to a procedure are no longer flagged. |
## Changes to QL libraries
* Django library now recognizes positional arguments from a `django.conf.urls.url` regex (Django version 1.x)
* Instances of the `Value` class now support the `isAbsent` method, indicating
whether that `Value` lacks points-to information, but inference
suggests that it exists. For instance, if a file contains `import
django`, but `django` was not extracted properly, there will be a
`ModuleValue` corresponding to this "unknown" module, and the `isAbsent`
method will hold for this `ModuleValue`.
* The `Expr` class now has a nullary method `pointsTo` that returns the possible
instances of `Value` that this expression may have.
* Asynchronous generator methods are now parsed correctly and no longer cause a spurious syntax error.
* Files in `node_modules` and `bower_components` folders are no longer extracted by default. If you still want to extract files from these folders, you can add the following filters to your `lgtm.yml` file (or add them to existing filters):
```yaml
extraction:
javascript:
index:
filters:
- include: "**/node_modules"
- include: "**/bower_components"
```
* Additional [Flow](https://flow.org/) syntax is now supported.
* Recognition of CommonJS modules has improved. As a result, some files that were previously extracted as
global scripts are now extracted as modules.
* Top-level `await` is now supported.
* Bugs were fixed in how the TypeScript extractor handles default-exported anonymous classes and computed-instance field names.
exists(Block body | body = this.(Loop ).getStmt() or
body = this.(SwitchStmt).getStmt()
| strictcount(body.getAStmt+()) > 6)
and not exists (this.getGeneratingMacro())
exists(Block body |
body = this.(Loop).getStmt() or
body = this.(SwitchStmt).getStmt()
|
strictcount(body.getAStmt+()) > 6
) and
not exists(this.getGeneratingMacro())
}
}
from Block b, int n, ComplexStmt complexStmt
where n = strictcount(ComplexStmt s | s = b.getAStmt()) and n > 3
and complexStmt = b.getAStmt()
select b, "Block with too many statements (" + n.toString() + " complex statements in the block). Complex statements at: $@", complexStmt, complexStmt.toString()
where
n = strictcount(ComplexStmt s | s = b.getAStmt()) and
n > 3 and
complexStmt = b.getAStmt()
select b,
"Block with too many statements (" + n.toString() +
" complex statements in the block). Complex statements at: $@", complexStmt,
When eras change, date and time conversions that rely on a hard-coded era start date need to be reviewed. Conversions relying on Japanese dates in the current era can produce an ambiguous date.
The values for the current Japanese era dates should be read from a source that will be updated, such as the Windows registry.
</p>
</overview>
<references>
<li>
<a href="https://blogs.msdn.microsoft.com/shawnste/2018/04/12/the-japanese-calendars-y2k-moment/">The Japanese Calendar's Y2K Moment</a>.
s="0" or s="1" or s="2" or s="3" or s="4" or s="5" or s="6" or s="7" or s="8" or
s="9" or s="10" or s="11" or s="12" or s="13" or s="14" or s="15" or s="16" or s="17" or
s="18" or s="19" or s="20"
or
s="16" or s="24" or s="32" or s="64" or s="128" or s="256" or s="512" or s="1024" or
s="2048" or s="4096" or s="16384" or s="32768" or s="65536" or
s="1048576" or s="2147483648" or s="4294967296"
or
s="15" or s="31" or s="63" or s="127" or s="255" or s="511" or s="1023" or
s="2047" or s="4095" or s="16383" or s="32767" or s="65535" or
s="1048577" or s="2147483647" or s="4294967295"
or
s = "0x00000001" or s = "0x00000002" or s = "0x00000004" or s = "0x00000008" or s = "0x00000010" or s = "0x00000020" or s = "0x00000040" or s = "0x00000080" or s = "0x00000100" or s = "0x00000200" or s = "0x00000400" or s = "0x00000800" or s = "0x00001000" or s = "0x00002000" or s = "0x00004000" or s = "0x00008000" or s = "0x00010000" or s = "0x00020000" or s = "0x00040000" or s = "0x00080000" or s = "0x00100000" or s = "0x00200000" or s = "0x00400000" or s = "0x00800000" or s = "0x01000000" or s = "0x02000000" or s = "0x04000000" or s = "0x08000000" or s = "0x10000000" or s = "0x20000000" or s = "0x40000000" or s = "0x80000000" or
s = "0x00000001" or s = "0x00000003" or s = "0x00000007" or s = "0x0000000f" or s = "0x0000001f" or s = "0x0000003f" or s = "0x0000007f" or s = "0x000000ff" or s = "0x000001ff" or s = "0x000003ff" or s = "0x000007ff" or s = "0x00000fff" or s = "0x00001fff" or s = "0x00003fff" or s = "0x00007fff" or s = "0x0000ffff" or s = "0x0001ffff" or s = "0x0003ffff" or s = "0x0007ffff" or s = "0x000fffff" or s = "0x001fffff" or s = "0x003fffff" or s = "0x007fffff" or s = "0x00ffffff" or s = "0x01ffffff" or s = "0x03ffffff" or s = "0x07ffffff" or s = "0x0fffffff" or s = "0x1fffffff" or s = "0x3fffffff" or s = "0x7fffffff" or s = "0xffffffff" or
s = "0x0001" or s = "0x0002" or s = "0x0004" or s = "0x0008" or s = "0x0010" or s = "0x0020" or s = "0x0040" or s = "0x0080" or s = "0x0100" or s = "0x0200" or s = "0x0400" or s = "0x0800" or s = "0x1000" or s = "0x2000" or s = "0x4000" or s = "0x8000" or
s = "0x0001" or s = "0x0003" or s = "0x0007" or s = "0x000f" or s = "0x001f" or s = "0x003f" or s = "0x007f" or s = "0x00ff" or s = "0x01ff" or s = "0x03ff" or s = "0x07ff" or s = "0x0fff" or s = "0x1fff" or s = "0x3fff" or s = "0x7fff" or s = "0xffff" or
s = "0x01" or s = "0x02" or s = "0x04" or s = "0x08" or s = "0x10" or s = "0x20" or s = "0x40" or s = "0x80" or
s = "0x01" or s = "0x03" or s = "0x07" or s = "0x0f" or s = "0x1f" or s = "0x3f" or s = "0x7f" or s = "0xff" or
s = "0x00"
or
s = "10" or s = "100" or s = "1000" or s = "10000" or s = "100000" or s = "1000000" or s = "10000000" or s = "100000000" or s = "1000000000"
// Small numbers
s = [0 .. 20].toString() or
// Popular powers of two (decimal)
s = "16" or
s = "24" or
s = "32" or
s = "64" or
s = "128" or
s = "256" or
s = "512" or
s = "1024" or
s = "2048" or
s = "4096" or
s = "16384" or
s = "32768" or
s = "65536" or
s = "1048576" or
s = "2147483648" or
s = "4294967296" or
// Popular powers of two, minus one (decimal)
s = "15" or
s = "31" or
s = "63" or
s = "127" or
s = "255" or
s = "511" or
s = "1023" or
s = "2047" or
s = "4095" or
s = "16383" or
s = "32767" or
s = "65535" or
s = "1048577" or
s = "2147483647" or
s = "4294967295" or
// Popular powers of two (32-bit hex)
s = "0x00000001" or
s = "0x00000002" or
s = "0x00000004" or
s = "0x00000008" or
s = "0x00000010" or
s = "0x00000020" or
s = "0x00000040" or
s = "0x00000080" or
s = "0x00000100" or
s = "0x00000200" or
s = "0x00000400" or
s = "0x00000800" or
s = "0x00001000" or
s = "0x00002000" or
s = "0x00004000" or
s = "0x00008000" or
s = "0x00010000" or
s = "0x00020000" or
s = "0x00040000" or
s = "0x00080000" or
s = "0x00100000" or
s = "0x00200000" or
s = "0x00400000" or
s = "0x00800000" or
s = "0x01000000" or
s = "0x02000000" or
s = "0x04000000" or
s = "0x08000000" or
s = "0x10000000" or
s = "0x20000000" or
s = "0x40000000" or
s = "0x80000000" or
// Popular powers of two, minus one (32-bit hex)
s = "0x00000001" or
s = "0x00000003" or
s = "0x00000007" or
s = "0x0000000f" or
s = "0x0000001f" or
s = "0x0000003f" or
s = "0x0000007f" or
s = "0x000000ff" or
s = "0x000001ff" or
s = "0x000003ff" or
s = "0x000007ff" or
s = "0x00000fff" or
s = "0x00001fff" or
s = "0x00003fff" or
s = "0x00007fff" or
s = "0x0000ffff" or
s = "0x0001ffff" or
s = "0x0003ffff" or
s = "0x0007ffff" or
s = "0x000fffff" or
s = "0x001fffff" or
s = "0x003fffff" or
s = "0x007fffff" or
s = "0x00ffffff" or
s = "0x01ffffff" or
s = "0x03ffffff" or
s = "0x07ffffff" or
s = "0x0fffffff" or
s = "0x1fffffff" or
s = "0x3fffffff" or
s = "0x7fffffff" or
s = "0xffffffff" or
// Popular powers of two (16-bit hex)
s = "0x0001" or
s = "0x0002" or
s = "0x0004" or
s = "0x0008" or
s = "0x0010" or
s = "0x0020" or
s = "0x0040" or
s = "0x0080" or
s = "0x0100" or
s = "0x0200" or
s = "0x0400" or
s = "0x0800" or
s = "0x1000" or
s = "0x2000" or
s = "0x4000" or
s = "0x8000" or
// Popular powers of two, minus one (16-bit hex)
s = "0x0001" or
s = "0x0003" or
s = "0x0007" or
s = "0x000f" or
s = "0x001f" or
s = "0x003f" or
s = "0x007f" or
s = "0x00ff" or
s = "0x01ff" or
s = "0x03ff" or
s = "0x07ff" or
s = "0x0fff" or
s = "0x1fff" or
s = "0x3fff" or
s = "0x7fff" or
s = "0xffff" or
// Popular powers of two (8-bit hex)
s = "0x01" or
s = "0x02" or
s = "0x04" or
s = "0x08" or
s = "0x10" or
s = "0x20" or
s = "0x40" or
s = "0x80" or
// Popular powers of two, minus one (8-bit hex)
s = "0x01" or
s = "0x03" or
s = "0x07" or
s = "0x0f" or
s = "0x1f" or
s = "0x3f" or
s = "0x7f" or
s = "0xff" or
s = "0x00" or
// Powers of ten
s = "10" or
s = "100" or
s = "1000" or
s = "10000" or
s = "100000" or
s = "1000000" or
s = "10000000" or
s = "100000000" or
s = "1000000000"
}
predicate trivialIntValue(string s) {
trivialPositiveIntValue(s) or
trivialPositiveIntValue(s)
or
exists(string pos | trivialPositiveIntValue(pos) and s = "-" + pos)
}
predicate trivialLongValue(string s) {
exists(string v | trivialIntValue(v) and s = v + "L")
}
predicate trivialLongValue(string s) { exists(string v | trivialIntValue(v) and s = v + "L") }
predicate intTrivial(Literal lit) {
exists(string v | trivialIntValue(v) and v = lit.getValue())
}
predicate intTrivial(Literal lit) { exists(string v | trivialIntValue(v) and v = lit.getValue()) }
predicate longTrivial(Literal lit) {
exists(string v | trivialLongValue(v) and v = lit.getValue())
}
predicate longTrivial(Literal lit) { exists(string v | trivialLongValue(v) and v = lit.getValue()) }
predicate powerOfTen(float f) {
f = 10 or f = 100 or f = 1000 or f = 10000 or f = 100000 or f = 1000000 or f = 10000000 or f = 100000000 or f = 1000000000
hubIndex = fclass.getMetrics().getAfferentCoupling() * fclass.getMetrics().getEfferentCoupling() and
hubIndex > 100
where
f.hasSpecifier("public") and
f.hasSpecifier("virtual") and
f.getFile().fromSource() and
not f instanceof Destructor and
fclass = f.getDeclaringType() and
hubIndex = fclass.getMetrics().getAfferentCoupling() * fclass.getMetrics().getEfferentCoupling() and
hubIndex > 100
select f, "Avoid having public virtual methods (NVI idiom)"
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.