`SEMMLE_TYPESCRIPT_NODE_RUNTIME` can be used to provide the path to the Node.js runtime executable.
If this is omitted, the extractor defaults to the current behaviour of looking for `node` on the PATH.
`SEMMLE_TYPESCRIPT_NODE_RUNTIME_EXTRA_ARGS` can be used to provide additional arguments to the
Node.js runtime. These are passed first, before the arguments supplied by the extractor.
These changes are designed to allow TypeScript extraction in controlled customer environments where
we cannot control the PATH, or must use custom Node.js executables with certain arguments set.
This query uses data flow for nullness analysis, which is always going
to be a large overapproximation. The overapproximation became too big
for one of the test cases after the recent change to make data flow go
across assignment by reference.
To make this query more conservative, it will now only report that the
`pDacl` argument can be null if there isn't also evidence that it can be
non-null.
This commit changes how data flow works in the following code.
MyType x = source();
defineByReference(&x);
sink(x);
The question here is whether there should be flow from `source` to
`sink`. Such flow is desirable if `defineByReference` doesn't write to
all of `x`, but it's undesirable if `defineByReference` is a typical
init function in `C` that writes to every field or if
`defineByReference` is `memcpy` or `memset` on the full range.
Before 1.20.0, there would be flow from `source` to `sink` in case `x`
happened to be modeled with `BlockVar` but not in case `x` happened to
be modelled with SSA. The choice of modelling depends on an analysis of
how `x` is used elsewhere in the function, and it's supposed to be an
internal implementation detail that there are two ways to model
variables. In 1.20.0, I changed the `BlockVar` behavior so it worked the
same as SSA, never allowing that flow. It turns out that this change
broke a customer's query.
This commit reverts `BlockVar` to its old behavior of letting flow
propagate past the `defineByReference` call and then regains consistency
by changing all variables that are ever defined by reference to be
modelled with `BlockVar` instead of SSA. This means we now get too much
flow in certain cases, but that appears to be better overall than
getting too little flow. See also the discussion in CPP-336.
Previously, `AllConfigurations.qll` would pull in (almost) all taint
tracking configurations, which has started causing OOMEs during
compilation.
I've pruned it down to only the most interesting configurations. Since
flow summaries are experimental at this point and require a bit of manual
configuration anyway, this shouldn't be much of an issue in practice.
My recent changes to suppress FPs in `ReturnStackAllocatedMemory.ql`
caused us to lose all results where there was a `Conversion` at the
initial address escape. We cannot handle conversions in general, but
this commit restores the good results for the trivial types of
conversion that we can handle.
Since we cannot track data flow from a fully-converted expression but
only the unconverted expression, we should check whether the address
initially escapes into the unconverted expression, not the
fully-converted one.
This fixes most of the false positives observed on lgtm.com.
`yield import(...)` previously caused a syntax error, now it is parsed
correctly.
`parseYield` is the only place where the value of `startsExpr` matters,
so this change should not affect anything else.
Also added `PhiInstruction::getAnInputOperand()`, and renamed `PhiInstruction::getAnOperandDefinitionInstruction()` to `getAnInput()` for consistency with other `Instruction` classes.
The `automaticVariableAddressEscapes` predicate got join-ordered badly
in its `unaliased_ssa` version. These are the tuple counts on Wireshark,
where one pipeline step is seen to have 716 million tuples:
```
[2019-03-02 11:29:41] (42s) Starting to evaluate predicate AliasAnalysis::automaticVariableAddressEscapes#2#f
[2019-03-02 11:30:06] (67s) Tuple counts:
353419 ~0% {1} r1 = JOIN project#Instruction::VariableAddressInstruction#class#2#ff WITH AliasAnalysis::resultEscapesNonReturn#2#f ON project#Instruction::VariableAddressInstruction#class#2#ff.<0>=AliasAnalysis::resultEscapesNonReturn#2#f.<0> OUTPUT FIELDS {AliasAnalysis::resultEscapesNonReturn#2#f.<0>}
353419 ~0% {2} r2 = JOIN r1 WITH IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext ON r1.<0>=IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext.<0> OUTPUT FIELDS {IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext.<1>,r1.<0>}
353419 ~0% {2} r3 = JOIN r2 WITH FunctionIR::FunctionIR::getFunction_dispred#3#ff ON r2.<0>=FunctionIR::FunctionIR::getFunction_dispred#3#ff.<0> OUTPUT FIELDS {FunctionIR::FunctionIR::getFunction_dispred#3#ff.<1>,r2.<1>}
716040298 ~0% {2} r4 = JOIN r3 WITH IRVariable::IRVariable#class#3#ff_10#join_rhs ON r3.<0>=IRVariable::IRVariable#class#3#ff_10#join_rhs.<0> OUTPUT FIELDS {IRVariable::IRVariable#class#3#ff_10#join_rhs.<1>,r3.<1>}
4480139 ~0% {2} r5 = JOIN r4 WITH IRVariable::IRAutomaticVariable#class#3#ff ON r4.<0>=IRVariable::IRAutomaticVariable#class#3#ff.<0> OUTPUT FIELDS {r4.<1>,r4.<0>}
66760 ~91% {1} r6 = JOIN r5 WITH Instruction::VariableInstruction::getVariable_dispred#2#ff ON r5.<0>=Instruction::VariableInstruction::getVariable_dispred#2#ff.<0> AND r5.<1>=Instruction::VariableInstruction::getVariable_dispred#2#ff.<1> OUTPUT FIELDS {r5.<1>}
return r6
[2019-03-02 11:30:06] (67s) >>> Relation AliasAnalysis::automaticVariableAddressEscapes#2#f: 35531 rows using 0 MB
```
The predicate contained a cyclic join, which is always hard to optimize.
I couldn't see a reason to join the `FunctionIR`, so I removed that
part. The predicate is now fast, and there are no changes in the tests.
Before this change,
```
flowOutOfCallableStep(CallNode call, ReturnNode ret, OutNode out, CallContext cc)
```
would compute all combinations of call sites `call` and returned expressions `ret`
up front.
Now, we instead introduce explicit return nodes, so each callable has exactly
one return node (as well as one for each `out`/`ref` parameter). There is then
local flow from a returned expression to the relevant return node, and
`flowOutOfCallableStep()` computes combinations of call sites and return nodes.
Not only does this result in better performance, it also makes `flowOutOfCallableStep()`
symmetric to `flowIntoCallableStep()`, where each argument is mapped to a parameter,
and not to all reads of that parameter.
Javadoc claims not to be able to resolve this link, while Eclipse manages to do so without any problems, failing an internal PR check.
It's only in a test, though, so I just removed it.
When determining which core library a "tried control flow element" is compiled against,
first look at exceptions caught by the surrounding `try` block, then look at assembly
attributes, and finally choose (randomly) the core library with the highest lexicographic
order.
This was previously flagged if `exports` wasn't used any further. While it's true that the assignment to `exports` is redundant in this case, the assignment is also flagged by DeadStorOfLocal, so there is no point in InvalidExport flagging it as well.
This query is only appropriate for setuid programs. Since such programs
are at most 0.1% of all code we analyse, I would say this query has a
precision of at most 0.1%.
Instead of having many small independent tests, we now just have a single test that pulls in all the individual tests and runs them together.
Concretely, each `.ql` file has been turned into a `.qll` file with a query predicate corresponding to the original `select` clause and named after the original `.ql` file, plus a prefix `test_`.
The newly added `tests.ql` imports all these `.qll`s.
The individual `.expected` files have been concatenated together into `tests.expected`, each prefixed with the name of the corresponding query predicate. (This is the format that qltest produces for tests with multiple query predicates.)
These two elements weren't cached, which meant that local data flow was
recalculated in every query that used data flow. They are also cached in
the Java version of `DataFlowUtil.qll`.
This commits also adds a test that uses `getParameter`. The new tests
demonstrate that support for array-to-pointer decay works, but we get
data flow to the array rather than its contents.
This accessor may not be forward-compatible with an IR-based version,
and it's unclear whether it has any use. The `VariableAccess` remains in
the `TDefinitionByReferenceNode` constructor since it's used to
implement `getType`.
Every once in a while we encounter projects using some custom file extension for files that we could in principle extract, but since the extractor doesn't know about the extension the files are skipped.
To handle this, the legacy extractor has a `--file-type` option that one can use to specify a file type to use for all files in that particular extraction. So far, `AutoBuild` has nothing of the sort.
This PR proposes to introduce an environment variable `LGTM_INDEX_FILETYPES` to allow a similar customisation. In the fullness of time, this variable would be set through `lgtm.yml` in the usual way, but for now it is undocumented and for internal use only.
Specifically, `LGTM_INDEX_FILETYPES` is a newline-separated list of ".extension:filetype" pairs, specifying that files with the given `.extension` should be extracted as type `filetype`, where
`filetype` is one of `js`, `html`, `json`, `typescript` or `yaml`.
For example, `.jsm:js` causes all `.jsm` files to be extracted as JavaScript.
This can also be used to override default file types: for example, by specifying `.js:typescript` all JavaScript files will be extracted as TypeScript.
The configuration in `DefaultOptions.qll` assumed that a call to any
top-level function named `error` would exit the program. This is not
true.
The assumption was probably about `error(3)`, which is a GNU extension.
It only exits if its first argument it not 0. Furthermore, projects such
as openssh may define their own function named `error` with different
behaviour. Because the GNU `error` function is non-standard, it's
perfectly fine to shadow it with a project-specific definition.
This change removes two FPs from `PointlessComparison.qll` on
https://github.com/openssh/openssh-portable.
In the precense of multiple core libraries, `getAThrownException()` would return
multiple copies of the same exception, say `System.OverflowException`, one for each
core library. With this change we try to identify which core library a given control
flow element was compiled against, and only return the corresponding version.
A PR check was failing because this query was enabled on LGTM but had no
qhelp. I'm removing the `@precision` for now to take it off LGTM, and
then we can add it back when it has qhelp, tests, and change note.
I've eliminated the clumsily worded "client-side code" and "server-side code" distinction, not least because Electron fits neither of those categories.
While this file is part of the project used in the tutorial, it isn't necessary for the queries to work. It also specifies a dependency on a vulnerable version of Express, causing it to be (spuriously) flagged by security scanners.
Thanks to Sam Lanning (@samlanning) and Robert Marsh for taking the time to help
to make it possible. In fact, it was Robert Marsh who effectively
wrote the query and figured out that __builtin_alloca should be
used to also take functions like strdupa into account. I just
filled out the metadata :-)
Before this change, all the cached predicates in `IRGuards.qll` were in
separate cached stages, resulting in recomputation of most of the
library for each stage. This change groups the cached predicates in two
cached classes. A better grouping may be possible, but this grouping was
easy to do and seems to solve the problem.
Before this change, the `IRGuards` library accounted for five cached
stages when using the `RangeAnalysis` library. After this change, it
only accounts for one.
This suite isn't referenced from anywhere yet, but it'll be included in
a standard ODASA dist because the dist includes all files in the `c` and
`cpp` directories. We can modify the nightly test jobs to include the
experimental suite.
This new query is not written because it's the most interesting query we
could write but because it's an IR-based query whose results are easy to
verify.
After a `queries.xml` was added to the test directory,
`Container.getRelativePath` now considers source files to be relative to
the `cpp/test` directory rather than the directory of the `*.ql*` file.
This caused some benign test output changes, and it also caused an
unwanted alert for `test3.c:14` to appear in
`cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/tainted/IntegerOverflowTainted.expected`.
This alert came about because `inSystemMacroExpansion` holds for files
that don't have a relative path, but the pretend system header in
`../system_header` now does have a relative path because it's below the
`cpp/test` directory. The fix is to add another `queries.xml` just for
the directory with the affected test.
Write accesses in assignments, such as the access to `x` in `x = 0` are not
evaluated, so they should not have entries in the control flow graph. However,
qualifiers (and indexer arguments) should still be evaluated, for example in
```
x.Foo.Bar = 0;
```
the CFG should be `x --> x.Foo --> 0 --> x.Foo.Bar = 0` (as opposed to
`x --> x.Foo --> x.Foo.Bar --> 0 --> x.Foo.Bar = 0`, prior to this change).
A special case is assignments via acessors (properties, indexers, and event
adders), where we do want to include the access in the control flow graph,
as it represents the accessor call:
```
x.Prop = 0;
```
But instead of `x --> x.set_Prop --> 0 --> x.Prop = 0` the CFG should be
`x --> 0 --> x.set_Prop --> x.Prop = 0`, as the setter is called *after* the
assigned value has been evaluated.
An even more special case is tuple assignments via accessors:
```
(x.Prop1, y.Prop2) = (0, 1);
```
Here the CFG should be
`x --> y --> 0 --> 1 --> x.set_Prop1 --> y.set_Prop2 --> (x.Prop1, y.Prop2) = (0, 1)`.
This change fixes a few key problems with the existing SSA implementations:
For unaliased SSA, we were incorrectly choosing to model a local variable that had accesses that did not cover the entire variable. This has been changed to ensure that all accesses to the variable are at offset zero and have the same type as the variable itself. This was only possible to fix now that every `MemoryOperand` has its own type.
For aliased SSA, we now correctly track the offset and size of each memory access using an interval of bit offsets covered by the access. The offset interval makes the overlap computation more straightforward. Again, this is only possible now that operands have types.
The `getXXXMemoryAccess` predicates are now driven by the `MemoryAccessKind` on the operands and results, instead of by specific opcodes.
This change does fix an existing false negative in the IR dataflow tests.
I added a few simple test cases to the SSA IR tests, covering the various kinds of overlap (MustExcactly, MustTotally, and MayPartially).
I added "PrintSSA.qll", which can dump the SSA memory accesses as part of an IR dump.
For function parameters that are subject to "pointer decay", the database contains the type as originally declared (e.g. `T[]` instead of `T*`). The IR needs the actual type. Similarly, for variable declared as an array of unknown size, the actual size needs to be inferred from the initializer (e.g. `char a[] = "blah";` needs to have the type `char[5]`).
I've opened a ticket to have the extractor emit the actual type alongside the declared type, but for now, this workaround is enough to unblock progress for typical code.
Previously, we only excluded attributes where the value of the attribute itself suggests templating happening. Now we exclude all attributes in documents where _any_ attribute value suggests templating.
This change does some shuffling to make the distinction between memory operands and register operands more clear in the IR API. First, any given type that extends `Operand` is now either always a `MemoryOperand` or always a `RegisterOperand`. This required getting rid of `CopySourceOperand`, which was used for both the `CopyValue` instruction (as a `RegisterOperand`) and for the `Load` instruction (as a `MemoryOperand`). `CopyValue` is now just a `UnaryInstruction`, `Store` has a `StoreValueOperand` (`RegisterOperand`), and all of the instructions that read a value from memory indirectly (`Load`, `ReturnValue`, and `ThrowValue`) all now have a `LoadOperand` (`MemoryOperand`).
There are no diffs in the IR output for this commit, but this change is required for a subsequent commit that will make each `MemoryOperand` have a `Type`, which in turn is needed to fix a critical bug in aliased SSA construction.
The IR tests were getting kind of unwieldy. We were using "ir.cpp" to contain test cases that covered both IR construction (every language construct imaginable) and SSA construction. We would then build and dump all three flavors of IR. For IR construction tests, examining the SSA dumps when you add a new test case is tedious.
To make this easier to manage, I've split the SSA-specific test cases out into a separate directory. "ir.cpp" should now contain only IR construction test cases, and "ssa.cpp" should contain only SSA construction test cases. We dump just the raw IR for "ir.cpp", and just the two SSA flavors for "ssa.cpp". We still run all three flavors of the IR sanity tests for "ir.cpp", though.
I also removed the "ssa_block_count.ql" test, which wasn't really adding any coverage, because any change to the block count would be reflected in the dump as well.
As its first application, this library makes it possible for `StoredXss` to reuse the `Source` classes of `DomBasedXss` and `ReflectedXss` without having to pull in their libraries (which contain their `Configuration` classes, causing `StoredXss` to recompute all flow information for the other two queries).
We now highlight the `replace` call (instead of the regular expression), and the alert message for the case of missing backslash escapes clarifies that it is talking about failure to escape backslashes in the input, not in the replacement text.
We now classify sensitive expressions into four categories (secret, id, password, certificate). This allows queries more fine-grained control over what kinds of sensitive data they want to deal with: for clear-text storage, for instance, user ids aren't so much of a problem.
This PR adds new predicates to `Declaration` and `Type` to get a fully-qualified canonical name for the element, suitable for debugging and dumps. It includes template parameters, cv qualifiers, function parameter and return types, and fully-qualified names for all symbols. These strings are too large to compute in productions queries, so they should be used only for dumps and debugging. Feel free to suggest better names for these predicates.
I've updated PrintAST and PrintIR to use these instead of `Function.getFullSignature()`. The biggest advantage of the new predicates is that they handle lambdas and local classes, which `getQualifiedName` and `getFullSignature` do not. This makes IR and AST dumps much more usable for real-world snapshots.
Along the way, I cleaned up some of our handling of `IntegralType` to use a single table for tracking the signed, unsigned, and canonical versions of each type. The canonical part is new, and was necessary for `getTypeIdentityString` so that `signed int` and `int` both appear as `int`.
Methods that are mentioned in a member reference expression should count
as rootdefs for the unused parameter query. Such methods have to match
the functional interface of the reference expression, so it is to be
expected that they will sometimes have to declare parameters that they
don't actually use.
Using the class `ExceptionClass` in combination with a deliberate Cartesian
product can lead to bad join orders, for example
```
EVALUATE NONRECURSIVE RELATION:
Completion::TriedControlFlowElement::getAThrownException_dispred#ff(int this, int result) :-
{1} r1 = JOIN Expr::Expr::getType_dispred#ff_10#join_rhs WITH @integral_type#f ON Expr::Expr::getType_dispred#ff_10#join_rhs.<0>=@integral_type#f.<0> OUTPUT FIELDS {Expr::Expr::getType_dispred#ff_10#join_rhs.<1>}
{1} r2 = JOIN r1 WITH @un_op#f ON r1.<0>=@un_op#f.<0> OUTPUT FIELDS {r1.<0>}
{1} r3 = JOIN r2 WITH Stmt::TryStmt::getATriedElement#ff_1#join_rhs ON r2.<0>=Stmt::TryStmt::getATriedElement#ff_1#join_rhs.<0> OUTPUT FIELDS {r2.<0>}
{2} r4 = JOIN r3 WITH Stmt::ExceptionClass#f CARTESIAN PRODUCT OUTPUT FIELDS {Stmt::ExceptionClass#f.<0>,r3.<0>}
{2} r5 = JOIN r4 WITH System::SystemOverflowExceptionClass#class#f ON r4.<0>=System::SystemOverflowExceptionClass#class#f.<0> OUTPUT FIELDS {r4.<1>,r4.<0>}
```
where the CP is made with `ExceptionClass` rather than `SystemOverflowExceptionClass`
directly.
These predicates currently take a pair of `IRBlock`s - as it stands, at
most one edge can exist from one `IRBlock` to a given other `IRBlock`.
We may need to revisit that assumption and create an `IREdge` IPA type
at some future date
There are a few IR APIs that we've found to be confusingly named. This PR renames them to be more consistent within the IR and with the AST API:
`Instruction.getFunction` -> `Instruction.getEnclosingFunction`: This was especially confusing when you'd call `FunctionAddressInstruction.getFunction` to get the function whose address was taken, and wound up with the enclosing function instead.
`Instruction.getXXXOperand` -> `Instruction.getXXX`. Now that `Operand` is an exposed type, we want a way to get a specific `Operand` of an `Instruction`, but more often we want to get the definition instruction of that operand. Now, the pattern is that `getXXXOperand` returns the `Operand`, and `getXXX` is equivalent to `getXXXOperand().getDefinitionInstruction()`.
`Operand.getInstruction` -> `Operand.getUseInstruction`: More consistent with the existing `Operand.getDefinitionInstruction` predicate.
This adds `IntegerPartial.qll`, which is similar to
`IntegerConstant.qll` except that it contains partial functions on
integers instead of total functions on optional integers. This speeds up
the constant analysis so it takes 6.5s instead of 10.3s on comdb2.
This does to `SSAConstruction` what the previous commit did to
`IRConstruction`. An instruction in `SSAConstruction` is now defined in
terms of how it was created rather than what it can be queried for.
Effectively, this defines `TInstruction` as `TInstructionTag` was
defined before and then removes `TInstructionTag` from
`SSAConstruction`. This also has the benefit of removing the concept of
an instruction tag from the public predicates on `Instruction`.
This definition was denormalized to the extent that an instruction was
defined in terms of the six main attributes it could be queried for.
This made it possible to do multi-column joins on those six attributes,
but it doesn't appear that this feature was useful in practice. The main
multi-column join that was in use was on the pair of
(`TranslatedElement, InstructionTag`), but the `TranslatedElement` was
not part of the `TInstruction`.
This commit changes `TInstruction` to be defined in terms of what it's
_built from_ (`TranslatedElement, InstructionTag`) instead. This makes
it possible to do multi-column joins on those two components, and then
there are separate predicates (usually with two columns) to query
instruction attributes, replacing the many uncached projections from
`MkInstruction` that were generated before.
An immediate advantage is that an `Expr` with multiple types will no
longer give rise to multiple `Instruction`s, fixing most of the errors
from the sanity query `ambiguousSuccessors`. The code inside
`IRConstruction.qll` becomes simpler and hopefully faster as there is no
longer a translation from `TranslatedElement` to `Locatable` and back
again.
The previous commit had the side effect that `IRVariable`s were created
for all `Functions`, including those that did not have IR. This commit
restricts all `TIRVariable` constructors to functions that have IR.
The recent change to `AccessorCall` on dd99525566 resulted
in some bad join-orders, so I have (partly) reverted them. This means that the issues
orignally addressed by that change are now reintroduced, and I plan to instead apply a
fix to the CFG, which--unlike the original fix--should be able to handle multi-property-tuple
assignments.
This doesn't make it much faster, but it reduces the debug output
volume. It also simplifies the code.
I've found this change necessary when I compute the full IR on a
Wireshark snapshot in QL4E. Without it, Eclipse runs out of memory
because the console log is too large.
The new predicate `isOrphan` gets inlined into
`ignoreExprAndDescendants`, whose performance improves from
TranslatedElement::ignoreExprAndDescendants#f .. 23.4s (executed 9 times)
to
TranslatedElement::ignoreExprAndDescendants#f ... 4.3s (executed 9 times)
This dramatic improvement is not only due to eliminating a type check in
the recursive case. Removing the type check from the other base cases
also enabled them to get better join orders.
The previous reccomentation changed the behaviour of the code.
A user following the advice might have broken her/his code:
With call-by-value, the original parameter is not changed.
With a call-by-reference, however, it may be changed. To be sure,
nothing breaks by blindly following the advice, suggest to pass a
const reference.
The `SSAConstruction.getNewIRVariable` was very slow on Wireshark. This
was probably because it couldn't join on multiple columns
simultaneously. Instead of improving the join, I observed that the
`TIRVariable` type was the same between all three IR stages except for a
few occurrences of `FunctionIR` that could easily be changed to
`Function`. By sharing `TIRVariable` between all the stages, we avoid
recomputing it and translating it between every stage, turning the slow
`getNewIRVariable` predicate into a no-op.
This change means that later stages of the IR can't introduce new
variables, but that was already the case because
`config/identical-files.json` forced all three `IRVariable.qll` files to
be identical.
This predicate computed a local CP between all defs and uses of the same
virtual variable in a basic block. This wasn't a problem in
`unaliased_ssa`, but it became a huge problem in `aliased_ssa`, probably
because many variables can be modelled with a single virtual variable
there.
Before this commit, evaluation of `aliased_ssa`'s
`variableLiveOnEntryToBlock#ff#antijoin_rhs` on Wireshark took 80
_minutes_. After this commit, that predicate and its immediate
dependencies take around 5 _seconds_.
This relation was almost 40x the size it needed to be on Wireshark
because it lacked a restriction on the `tag` parameter. To implement
that restriction efficiently, I had to split the relation in two to
dictate the join order.
With the fix, `getInstruction` now computes the same as
`getInstructionTranslatedElementAndTag`, so the latter could be
simplified.
I made a corresponding change to `TranslatedElement.getTempVariable` for
the sake of consistency.
A part of `SSAConstruction.getInstructionOperandDefinition` was more
expensive than it had to be. On a ChakraCore snapshot, this changes the
tuple counts from
3020569 ~2% {3} r40 = JOIN OperandTag::TUnmodeledUseOperand#f WITH Instruction::Instruction::getFunction_dispred#ff CARTESIAN PRODUCT OUTPUT FIELDS {Instruction::Instruction::getFunction_dispred#ff.<0>,OperandTag::TUnmodeledUseOperand#f.<0>,Instruction::Instruction::getFunction_dispred#ff.<1>}
62405 ~0% {3} r41 = JOIN r40 WITH Instruction::UnmodeledUseInstruction#class#fffffff ON r40.<0>=Instruction::UnmodeledUseInstruction#class#fffffff.<0> OUTPUT FIELDS {r40.<2>,r40.<1>,r40.<0>}
2868421 ~1% {3} r42 = JOIN r41 WITH Instruction::Instruction::getFunction_dispred#ff_10#join_rhs ON r41.<0>=Instruction::Instruction::getFunction_dispred#ff_10#join_rhs.<0> OUTPUT FIELDS {Instruction::Instruction::getFunction_dispred#ff_10#join_rhs.<1>,r41.<1>,r41.<2>}
62405 ~0% {3} r43 = JOIN r42 WITH Instruction::UnmodeledDefinitionInstruction#class#fffffff ON r42.<0>=Instruction::UnmodeledDefinitionInstruction#class#fffffff.<0> OUTPUT FIELDS {r42.<2>,r42.<1>,r42.<0>}
to
(0s) Starting to evaluate predicate SSAConstruction::Cached::getUnmodeledUseInstruction#ff
(0s) Tuple counts:
62405 ~0% {2} r1 = JOIN Instruction::UnmodeledUseInstruction#class#fffffff WITH Instruction::Instruction::getFunction_dispred#ff ON Instruction::UnmodeledUseInstruction#class#fffffff.<0>=Instruction::Instruction::getFunction_dispred#ff.<0> OUTPUT FIELDS {Instruction::Instruction::getFunction_dispred#ff.<1>,Instruction::Instruction::getFunction_dispred#ff.<0>}
return r1
...
75716 ~0% {3} r40 = JOIN OperandTag::TUnmodeledUseOperand#f WITH FunctionIR::FunctionIR::getUnmodeledDefinitionInstruction#ff CARTESIAN PRODUCT OUTPUT FIELDS {FunctionIR::FunctionIR::getUnmodeledDefinitionInstruction#ff.<0>,OperandTag::TUnmodeledUseOperand#f.<0>,FunctionIR::FunctionIR::getUnmodeledDefinitionInstruction#ff.<1>}
62405 ~0% {3} r41 = JOIN r40 WITH FunctionIR::FunctionIR::getUnmodeledUseInstruction#ff ON r40.<0>=FunctionIR::FunctionIR::getUnmodeledUseInstruction#ff.<0> OUTPUT FIELDS {FunctionIR::FunctionIR::getUnmodeledUseInstruction#ff.<1>,r40.<1>,r40.<2>}
Now that we have `Expr.getParentWithConversions`, we can implement
`TranslatedElement.getRealParent` simpler. This implementation also
avoids recursion.
In theory this query will produce no results on C++ code; in practice, I
suspect the "cpp" suite is often run on code compiled as C, so it is
likely to be worth running anyways.
Rewrite the predicate `succSplits()` and the construction of the IPA type `TSplits`.
The two are now mutually dependent, see more in the comment for the module
`SuccSplits`.
These queries have no results on our test cases in the repo, but
`ambiguousSuccessors` has results on any large C++ code base, and
`unexplainedLoop` has results on Windows builds of ChakraCore.
The syntactic node assiociated with accessor calls was previously always the
underlying member access. For example, in
```
x.Prop = y.Prop;
```
the implicit call to `x.set_Prop()` was at the syntactic node `x.Prop`, while the
implicit call to `y.get_Prop()` was at the syntactic node `y.Prop`.
However, this breaks the invariant that arguments to calls dominate the call itself,
as the argument `y.Prop` for the implicit `value` parameter in `x.set_Prop()` will
be evaluated after the call (the left-hand side in an assignment is evaluated before
the right-hand side).
The solution is to redefine the access call to `x.set_Prop()` to point to the whole
assignment `x.Prop = y.Prop`, instead of the access `x.Prop`. For reads, we still want
to associate the accessor call with the member access.
A corner case arises when multiple setters are called in a tuple assignment:
```
(x.Prop1, x.Prop2) = (0, 1)
```
In this case, we cannot associate the assignment with both `x.set_Prop1()` and
`x.set_Prop2()`, so we instead revert to using the underlying member accesses as
before.
This predicate was unbearably slow on a ChakraCore snapshot (and
probably everywhere else):
ReachableBlock::getAFeasiblePredecessorBlock#2#ff#antijoin_rhs .. 1m6s
ReachableBlock::getAFeasiblePredecessorBlock#ff#antijoin_rhs .... 31.8s
With this change, the predicate is so fast that it doesn't even show up
in the clause timing report.
It's possible that we only tested this for performance in 1.18, and then
it has regressed in 1.19. Otherwise I can't explain how we've missed
this. I'm using QL for Eclipse 1.20.0.201901070127.
I looked through a few hundred results from this query on lgtm.com and
found that most of the FPs had to do with comment lines ending in `}`.
This change should fix most of them, at the cost of very few false
negatives.
On Wireshark, this query goes from 7,425 results to 6,686 results before
filtering for generated code. Almost all the lost results were FP,
except a handful of results involving initializer lists.
The existing `Node.asExpr` predicate changes semantics so it becomes the
one that most users should use when they don't want to think about
`Conversion`s. A new `Node.asConvertedExpr` predicate is added and has
the same semantics as the old `Node.asExpr` predicate. It's for advanced
users that know about `Conversion`s and want to account for them.
With this change, the `IRDataflowTestCommon.qll` and
`DataflowTestCommon.qll` files use the same definitions of sources and
sinks. Since the IR data flow library is meant to be compatible with the
AST data flow library, this is what we ought to be testing.
Two alerts change but not necessarily for the right reasons.
Data flow nodes for expressions do not take CFG splitting into account. Example:
```
if (b)
x = tainted;
x = x.ToLower();
if (!b)
Use(x);
```
Flow is incorrectly reported from `tainted` to `x` in `Use(x)`, because the step
from `tainted` to `x.ToLower()` throws away the information that `b = true`.
The solution is to remember the splitting in data flow expression nodes, that is,
to represent the exact control flow node instead of just the expression. With that
we get flow from `tainted` to `[b = true] x.ToLower()`, but not from `tainted` to
`[b = false] x.ToLower()`.
The data flow API remains unchanged, but in order for analyses to fully benefit from
CFG splitting, sanitizers in particular should be CFG-based instead of expression-based:
```
if (b)
x = tainted;
if (IsInvalid(x))
return;
Use(x);
```
If the call to `IsInvalid()` is a sanitizer, then defining an expression node to be
a sanitizer using `GuardedExpr` will be too conservative (`x` in `Use(x)` is in fact
not guarded). However, `[b = true] x` in `[b = true] Use(x)` is guarded, and to help
defining guard-based sanitizers, the class `GuardedDataFlowNode` has been introduced.
As we prepare to clarify how conversions are treated, we don't want a
`sink(...)` declaration where it's non-obvious which conversions are
applied to arguments.
This code was clearly using `isConstant` as an indirect way of checking
whether `getValue` would have a result. That's no longer valid, so I
changed it to check `getValue` directly.
These files had come out of sync due to 89148a9ec7 and 8c9c316e1b. I
synced the files by replaying the changes that those commits made in
`aliased_ssa/` to the two other copies.
The CFG construction code previously contained half of an approximation
of which address expressions are constant. Now this this property is
properly modelled by `Expr.isConstant`, we can remove this code.
This fixes most discrepancies between the QL-based CFG and the
extractor-based CFG on Wireshark.
Before this change, `Expr.isConstant` only was only true for those
constant expressions that could be represented as QL values: numbers,
Booleans, and string literals. It was not true for string literals
converted from arrays to pointers, and it was not true for addresses of
variables with static lifetime.
The concept of a "constant expression" varies between C and C++ and
between versions of the standard, but they all include addresses of data
with static lifetime. These are modelled by the new library
`AddressConstantExpression.qll`, which is based on the code in
`EscapesTree.qll` and modified for its new purpose.
I've tested the change for performance on Wireshark and for correctness
with the included tests. I've also checked on Wireshark that all static
initializers in C files are considered constant, which was not the case
before.
This change suppresses results from "Declaration hides parameter" where
the ParameterDeclarationEntry does not link up to the right
FunctionDeclarationEntry.
Bad magic ended up in `LocalVariable.getFunction` and effectively
created a Cartesian product. Before this change, the timing looked like
this:
Variable::LocalVariable::getFunction_dispred#bb ... 50.1s
#select#cpe#123#fff ............................... 20.6s
After this change, those predicates become much faster:
Variable::LocalVariable::getFunction_dispred#ff ... 121ms
DeclarationHidesParameter::localVariableNames#fff . 77ms
#select#cpe#123#fff ............................... 28ms
Introducing the predicate `localVariableNames` ensures that we can do
the main join on two columns simultaneously, so that's a change we
should keep even if we remove the `pragma[nomagic]` later.
This test was intended to catch regressions in the CFG, but it looks
like it's just catching insignificant extractor changes. The test has
started failing after some recent extractor changes, but I have no way
to pinpoint the failure and understand whether it's a problem or not, so
I think it's better to delete this test.
The remaining tests check whether the QL-based CFG generates the same
graph as the extractor-based CFG. Furthermore, the `successor-tests`
check that the extractor-based CFG works as intended.
We agreed in the review of the original PR that `getName` is more
appropriate here than `getQualifiedName`. Using `getName` ensures that
we also match the `std::`-prefixed versions of these functions as well
as user-defined versions.
This commit doesn't change any behavior but just uses the preferred
high-level predicates. The `getChild` predicate inspects the raw
database more or less directly, and the database layout could change in
the future.
This should make the documentation more in line with the documentation
for our other queries. The @name of the query is changed to "Use of
string copy function in a condition".
https://github.com/Semmle/ql/pull/698 removed `document.cookie` as a remote flow source, which some of the tests relied on. We now use `location.search` instead.
This recursive predicate is made faster by working around a known
optimizer problem (QL-796) that causes the optimizer to insert extra
type checks in recursive case even when they are only needed in the
base case.
If a summary does not specify a configuration, it is taken to apply to all configurations without custom sanitisers/barriers.
If a source summary does not specify a flow label, `data` is assumed.
If a sink summary does not specify a flow label, both `data` and `taint` are assumed.
Flow step summaries cannot omit flow labels.
Note that the standard extraction queries always provide explicit configurations and flow labels, and hence do not exercise this functionality.
This reduces the number of bounds computed, and will simplify use of the
library. The resulting locations in the tests may be slightly strange,
because the example `Instruction` for a `ValueNumber` is the first
appearing in the IR, regardless of source order, and may not be the most
closely related `Instruction` to the bounded value. I think that's worth
doing for the performance and usability benefits.
It now uses a facade pattern similar to `InvokeNode`: the range of the class is defined by an abstract class `DataFlow::SourceNode::Range`, while the actual behaviour is defined by the (no longer abstract) `SourceNode` class itself.
Clients that want to add new source nodes need to extend `DataFlow::SourceNode::Range`, those that want to refine the behaviour of existing source nodes should extend `DataFlow::SourceNode` itself.
While this is technically a breaking API change, I think separating the two aspects in this way is cleaner and makes it easier to use, and improves performance as well.
I changed to detect any logical operation usage (i.e. !, ==), but I kept usage in a conditional directly as a separate detection condition. I found no false positives on the projects you shared with me previously.
This implements calculation of the control-flow graph in QL. The new
code is not enabled yet as we'll need more extractor changes first.
The `SyntheticDestructorCalls.qll` file is a temporary solution that can
be removed when the extractor produces this information directly.
Add more tests for duplicated entities, and fix some duplicated entities.
Update the TupleTypes output - some extraneous results gone so it's probably better.
Instead of only considering a fixed set of paths for `dotnet` and `mono`,
first attempt to lookup the paths based on the `PATH` environment variable.
This change also fixes a potential `System.IO.DirectoryNotFoundException` exception,
which could be thrown when the `shared/Microsoft.NETCore.App` folder was not
present.
It turns out that we still need `getURL()` to account for cases where there is no
`getLocation()`. Not having `getURL()` for entities without a `getLocation()` results
in a `file://0:0:0:0` URL, which is not rendered in QL4E, unlike a `""` URL.
The existing unreachable IR removal code only retargeted an infeasible edge to an `Unreached` instruction if the successor of the edge was an unreachable block. This is too conservative, because it doesn't remove an infeasible edge that targets a block that is still reachable via other paths. The trivial example of this is `do { } while (false);`, where the back edge is infeasible, but the body block is still reachable from the loop entry.
This change retargets all infeasible edges to `Unreached` instructions, regardless of the reachability of the successor block.
With these changes we can run `odasa prepareQueries --check-only
--fail-on-warnings` on the C++ query directory. Two changes were needed:
1. The `Metrics/queries.xml` file had to be deleted. It existed because
the built distribution has a different file layout, where `Metrics`
is moved to the top-level query dir `odasa-cpp-metrics`. Since
internal PR 28230 this file is created as needed as part of the dist
build process, so it doesn't need to be checked in with the sources.
2. All uses of the `deprecated` and stubbed-out Objective C classes were
removed.
The AST-based data flow libraries and the IR-based ones both define
modules `DataFlow`, `DataFlow2`, etc. This caused
`ImportAdditionalLibraries.ql` to fail in compilation.
Computing strings and locations for CIL instructions can be quite time consuming.
The CIL `toString()`s are not very helpful in path explanations, and their locations
are only useful when a PDB source file exists. Therefore, produce a simple constant
`toString()`, and restrict locations to those in PDB files.
This change removes any IR instructions that can be statically proven unreachable. To detect unreachable IR, we first run a simple constant value analysis on the IR. Then, any `ConditionalBranch` with a constant condition has the appropriate edge marked as "infeasible". We define a class `ReachableBlock` as any `IRBlock` with a path from the entry block of the function. SSA construction has been modified to operate only on `ReachableBlock` and `ReachableInstruction`, which ensures that only reachable IR gets translated into SSA form. For any infeasible edge where its predecessor block is reachable, we replace the original target of the branch with an `Unreached` instruction, which lets us preserve the invariant that all `ConditionalBranch` instructions have both a true and a false edge, and allows guard inference to still work.
The changes to `SSAConstruction.qll` are not as scary as they look. They are almost entirely a mechanical replacement of `OldIR::IRBlock` with `OldBlock`, which is just an alias for `ReachableBlock`.
Note that the `constant_func.ql` test can determine that the two new test functions always return 0.
Removing unreachable code helps get rid of some common FPs in IR-based dataflow analysis, especially for constructs like `while(true)`.
This change moves the simple constant analysis that was used by the const_func test into a pyrameterized module for use on any stage of the IR. This will be used to detect unreachable code.
A method such as
```
void M()
{
throw new Exception();
}
```
was incorrectly not categorized as a `ThrowingCallable`, that is, a callable
that always throws an exception upon invocation.
This sort of fixes one FP and causes a new FN, but for the wrong reasons. The IR dataflow is tracking the reference itself, rather than the referred-to object. Once we can better model indirections, we can make this work correctly.
This change is still the right thing to do, because it ensures that the dataflow is looking at actual expression being computed by the instruction.
I've separated the model interface for memory side effects from the model for escaped addresses. It will be fairly common for a given model to extend both interfaces, but they are used for two different purposes.
I've also put each model interface and the non-member predicates that query it into a named module, which seemed cleaner than having predicates named `functionModelReadsMemory()` and `getFunctionModelParameterAliasBehavior()`.
This commit fixes a few issues that were identified during the last dist upgrade,
and which were introduced/revealed on 836daaf07b.
- Expand environment variables that are passed from `lgtm.yml` to the autobuilder,
for example `solution: $LGTM_SRC/mysolution.sln`.
- Distinguish between when a build rule is applied automatically and when it is applied
manually via `lgtm.yml`.
- Catch `FileNotFoundException`s when parsing project files and solution files.
These type checks were overlapping with `assignOperatorWithWrongType` is
are no longer needed now that `assignOperatorWithWrongType` is improved.
They were causing FPs and misleading error messages on uninstantiated
templates.
Adding this call to `getUnspecifiedType` makes the error message better
in the presence of typedefs and qualifiers on an assignment operator
return type. It's also needed to avoid losing valid results in the
commit that comes after this.
The predicate `AlwaysTrueUponEntryLoop.getARelevantVariable` was very
sensitive to join ordering, and with the 1.19 QL engine it got an
unfortunate join order that made it explode on certain snapshots. With
this change, it goes from taking minutes to taking less than a second on
a libretro-uae snapshot.
Made `Node::getType()`, `Node::asParameter()`, and `Node::asUninitialized()` operate directly on the IR. This actually fixed several diffs compared to the AST dataflow, because `getType()` wasn't holding for nodes that weren't `Exprs`.
Made `Uninitialized` a `VariableInstruction`. This makes it consistent with `InitializeParameter`.
This commit adds a new model interface that describes the known side effects (or lack thereof) of a library function. Does it read memory, does it write memory, and do any of its parameters escape? Initially, we have models for just two Standard Library functions: `std::move` and `std::forward`, which neither read nor write memory, and do not escape their parameter.
IR construction has been updated to insert the correct side effect instruction (or no side effect instruction) based on the model.
This fixes a subtle bug in the construction of aliased SSA. `getResultMemoryAccess` was failing to return a `MemoryAccess` for a store to a variable whose address escaped. This is because no `VirtualIRVariable` was being created for such variables. The code was assuming that any access to such a variable would be via `UnknownMemoryAccess`. The result is that accesses to such variables were not being modeled in SSA at all.
Instead, the way to handle this is to have a `VariableMemoryAccess` even when the variable being accessed has escaped, and to have `VariableMemoryAccess::getVirtualVariable()` return the `UnknownVirtualVariable` for escaped variables. In the future, this will also let us be less conservative about inserting `Chi` nodes, because we'll be able to determine that there's an exact overlap between two accesses to the same escaped variable in some cases.
The AST dataflow library essentially ignores conversions, which is probably the right behavior. Converting an `int` to a `long` preserves the value, even if the bit pattern might be different. It's arguable whether narrowing conversions should be treated as dataflow, but we'll do so for now. We can revisit that if we see it cause problems.
For example, in
```
void M(object x)
{
var y = x == null ? 1 : 2;
if (y == 2)
x.ToString();
}
```
the guard `y == 2` implies that the guard `x == null` must be false,
as the assignment of `2` to `y` is unique.
For example, in
```
void M(object x)
{
var y = x != null ? "" : null;
if (y != null)
x.ToString();
}
```
the guard `y != null` implies that the guard `x != null` must be true.
All the filtering is now done in `getALikelyCallee`, to which I have also added an additional parameter that improves the join in the `select` clause.
I've also simplified the alert message to no longer use `toString`, which isn't meant for alert messages anyway. (This is an old query.)
I noticed that queries using the data flow library spent significant
time in `#Dominance::bbIDominates#fbPlus`, which is the body of the
`bbStrictlyDominates` predicate. That predicate took 28 seconds to
compute on Wireshark.
The `b` in the predicate name means that magic was applied, and the
application of magic meant that it could not be evaluated with the
built-in `fastTC` HOP but became an explicit recursion instead. Applying
`pragma[nomagic]` to this predicate means that we will always get it
evaluated with `fastTC`, and that takes less than a second in my test
case.
The internal pre-SSA library was extended on 3e78c2671f
to include fields/properties that are local-scope-like. The CFG splitting logic
uses ranking of SSA definitions to define an (arbitrary) order of splits, but for
fields/properties the implicit entry definition all have the same line and column.
In effect, such SSA definitions incorrectly get the same rank. Adding the name
of the field/property to the lexicographic ordering resolves the issue.
By pulling this out of the condition we can avoid computing its negation for the `else` branch, which could previously lead to quite an enormous pipeline.
This predicate was fast with the queries and engine from 1.18. With the
queries from `master` it got a bad join order in the
`UninitializedLocal.ql` query, which made it take 2m34s on Wireshark.
This commit decomposes `bbEntryReachesLocally` into two predicates that
together take only 4s.
The `nullValue` predicate performs a slow custom data-flow analysis to
find possible null values. It's so slow that it timed out after 1200s on
Wireshark.
In `UnsafeCreateProcessCall.ql`, the values found with `nullValue` were
used as sources in another data-flow analysis. By using the `NullValue`
class as sink instead of `nullValue`, we avoid the slow-down of doing
data flow twice. The `NullValue` class is essentially the base case of
`nullValue`. Confusing names, yes.
Just like syntax elements can be split in the control flow graph, so can SSA
definitions. To make this clear, and to make debugging easier, this commit
adds the splits as a prefix in the textual representation of SSA definitions.
On 03e69e9945, I updated the guards library to account
for control flow graph splitting. However, the logic that relates SSA qualifiers for
the guard and the guarded expression was not updated accordingly.
This generalises our previous handling of sanitisers operating on property accesses to support dynamic property accesses where the property name is an SSA variable by representing them as access paths.
This commit adds Chi nodes to the successor relation and accounts for
them in the CFG, but does not add them to the SSA data graph. Chi nodes
are inserted for partial writes to any VirtualVariable, regardless of
whether the partial write reaches any uses.
This enables the addition of new instructions in later phases of IR
construction; in particular, aliasing write instructions and inference
instructions.
When determining the target of `msbuild` or `dotnet build`, first look for `.proj`
files, then `.sln` files, and finally `.csproj`/`.vcxproj` files. In all three cases,
choose the project/solution file closest to the root.
This test was failing due to a semantic merge conflict between #509,
which added `UninitializedInstruction`, and #517, which added new test
code that would get `UninitializedInstruction`s in it after merging with #509.
This adds a `NewOrNewArrayExpr.getPlacementPointer` predicate and uses
it in `Alloc.qll` to detect when a `new`-expression is not an
allocation.
User-defined replacements for `operator new` may not be allocations
either, but the code continues to assume that they are. It's possible
that we want to change this assumption in the future or leave it up to
individual queries to decide on which side to err. It's hard to
statically tell whether `operator new` has been overloaded in a
particular file because it can be overloaded by a definition that is not
in scope but is only linked together with that file.
An slightly invalid AST can cause IR construction to generate extremely bad IR. This change provides a single place to detect invalid ASTs, and to skip IR construction for the affected functions.
This change provides a mechanism by which a query can tell the IR package to only create IR for certain functions. This is mostly useful for "PrintIR.qll", which uses this feature to avoid the expense of creating IR for functions that aren't going to be printed.
This adds default exclusion filters for `**/*.min.js` and `**/*-min.js` to the JavaScript auto-builder, meaning that files matching these patterns will no longer be extracted,
unless they are re-included in the `.lgtm.yml` file.
Alerts in minified code aren't shown by default anyway, so we can save ourselves some work by not analyzing them in the first place.
While including minified files in the snapshot can in theory improve analysis results in non-minified files, this is likely to be rare in practice.
This commit inserts an `Uninitialized` instruction to "initialize" a local variable when that variable is initialized with an initializer list. This ensures that there is always a definition of the whole variable before any read or write to part of that variable.
This change appears in a different form in @rdmarsh2's Chi node PR, but I needed to refactor the initialization code anyway to handle ConditionDeclExpr.
This test exhibits two issues with Boolean CFG splitting: incorrect handling of
negated variables and incorrect splitting for variables defined inside a loop.
This rule, named "No virtual destructor", was supposed to be superseded
by `cpp/virtual-destructor` in 0c796de83, but that commit didn't
actually disable this rule, so both rules are now active in the LGTM
suite.
This commit disables the rule by removing `@precision`. We're still
discussing the best way to disable rules that are precise and valid but
not universally applicable. For now, removing `@precision` is consistent
with how we're keeping most other JSF queries from appearing on LGTM.
For all of these queries, the results we tend to see in practice are certainly worth investigating, but aren't crashing bugs, so making them warnings seems more appropriate.
This is not so much because extractor output has changed (it hasn't, except for corner cases) but to disable trap caching so as to help us to flush out bugs.
We now use module auto-detection and no TypeScript mode.
This only affects extern extraction in `AutoBuild`, everything else sets these options explicitly.
We currently do not have any ES2015 modules or TypeScript code in our externs, so in practice this is behaviour-preserving.
Marked as Low precision as Linux kernel code mix the usage of logical operators and bit-wise opeartors.
warning C6317: incorrect operator: logical-not (!) is not interchangeable with ones-complement (~)
The Java guards library includes a set of "implies" predicates to handle
short-circuiting conditionals. C++ handles those in IR generation, so
dominance on the IR produces correct results for controlling blocks.
Without the last commit, this addition to the test gives the following
results:
```
+| AV Rule 82.cpp:176:14:176:22 | operator= | Assignment operator in class Forgivable does not return a reference to *this. |
+| AV Rule 82.cpp:181:14:181:22 | operator= | Assignment operator in class Forgivable does not return a reference to *this. |
```
I removed this condition in #362, thinking it was covered by the new
conditions on return statements, but it turns out it wasn't in at least
the following cases.
1. Assignment operators that are deleted or marked private in order to
make them inaccessible.
2. Templates whose body was not extracted.
While some of these results are technically valid, they are not nearly
as interesting as the results that this query was designed to produce.
AV Rule 78 has proved too noisy for use on lgtm.com. However, if we make the rule less noisy by, say, allowing a protected destructor to be non-virtual, we're no longer actually enforcing AV Rule 78. Instead, I've copied AV Rule 78 into NonVirtualDestructorInBaseClass.ql, given the new query the `@id` that AV Rule 78 had, and given AV Rule 78 a new JSF-specific `@id`. The new rule allows non-public non-virtual destructors, which is the problem originally reported by an lgtm.com user.
The previous formulation of this predicate caused a CP in snapshots
where a variable had a large number of definitions and also reached a
large number of sub-basic-blocks.
This should fix performance of https://github.com/FrodeSolheim/fs-uae
and https://github.com/libretro/libretro-uae.
The `FlowVar.getAnAccess` predicate is still at risk of CP'ing when a
large group of defs has a large group of uses, but that has not been
observed to happen in practice yet. We would need to make
`localFlowStep` expose phi definitions in order to avoid that risk.
We previously removed our entry because the notifications got too noisy,
but we agreed recently in the C++ analysis team to try adding an entry
with just the analysis team and only in the public repository.
@rdmarsh2 has been working on various queries and libraries on top of the IR, and has pointed out that having to always refer to an operand of an instruction by the pair of (instruction, operandTag) makes using the IR a bit clunky. This PR adds a new `Operand` IPA type that represents an operand of an instruction. `OperandTag` still exists, but is now an internal type used only in the IR implementation.
Superset of C6293, it looks for a mismatch between the initialization statement && condition and the direction of the iteration expression in a for loop.
It still runs on uninstantiated templates because its underlying
libraries do. It's not clear whether that leads to other false
positives, but that's independent of the change I'm making here.
Certain Microsoft projects, such as CoreCLR and ChakraCore, use a
library called the PAL, which enables two-byte strings in the printf
family of functions, even when built on a platform with four-byte
strings. This adds support for determining the size of a wide character
from the definitions of such functions, rather than assuming that they
match the compiler's wchar_t.
When constructing a path through a property write/read pair, we want to make sure that we only use value-preserving steps to track the base object. However, the value flowing in from the right-hand side of the assignment may have a different flow label (such as `taint()`), so we cannot use the normal `append` predicate to construct the composite path.
It still runs on uninstantiated templates because its underlying
libraries do. It's not clear whether that leads to other false
positives, but that's independent of the change I'm making here.
Previously, `Instruction.toString()` returned the same string that is used in IR dumps, which requires numbering all instructions and generating a unique string for each instruction. This is too expensive on large snapshots. I've moved the original code into the new `Instruction.getDumpString()`, and made `Instruction.toString()` just return the opcode plus `getAST().toString()`.
This is motivated by test performance; IR compilation happens separately
for each test and takes a bit over a minute, so combining these 8 tests
saves about 10 minutes of test running.
The IR for the conversion to bool results in a comparison where the left
hand side is not the result of any expression in the AST, so they can't
be usefully converted back to the AST
We have external users editing queries with Visual Studio, and it seems
to automatically add very specific files to `.gitignore`. These changes
cause conflicts between unrelated PRs.
This commit adds all of `/.vs` to `.gitignore`, which should hopfully
make Visual Studio stop adding more entries.
For ease of reviewing, I've checked in the .expected files from the
AST-based guards library. The next commit accepts output for these tests
and adds tests that use getAST rather than the translation layer.
It turns out that when building aliased SSA IR, we were still keeping around the Phi instructions from unaliased SSA IR. These leftover instructions didn't show up in dumps because they were not assigned to a block. However, when dumping additional instruction properties, they would show up as a top-level node in the dump, without a label.
The `InstructionSanity::duplicateOperand` predicate used `count` instead
of `strictcount`. The 0-case of this `count` was as large as the
Cartesian product of `Instruction` and `OperandTag`, which made
`duplicateOperand` take forever to compute on large snapshots.
This makes two changes to how example exprs are selected. Example exprs
are now ordered separately by each piece of the location, rather than by
stringifying their location. Second, UnknownLocations are now ordered
after locations with absolute paths, by using "~" in the lexicographic
comparison of absolute paths. I think this works on both POSIX and
Windows systems, but it's possible I'm missing a way to start an
absolute path with a unicode character.
This commit was reverted on `master` but should remain on `next`, so I'm
reverting the revert before merging `master` into `next`.
This reverts commit adda4c91cf.
This commit to update test changes got merged to Semmle/ql master but
doesn't belong there because it's not compatible with how the 1.18
extractor works. The corresponding extractor change got merged to the
internal-repo master right after the internal branch for 1.18 was taken.
This reverts commit d4f9b5eb52.
Globals can still have declarations in declaration statements.
We already rule out local variables etc via the isTopLevel check,
so we don't need to consider DeclStmt.
ZipSlip can be avoided by checking that the combined and resolved
path `StartsWith` the appropriate destination directory. Refine the
`StartsWith` sanitizer to:
* Consider expressions guarded by an appropriate StartsWith check to be
sanitized.
* Consider a StartsWith check to be inappropriate if it is checking the
result of `Path.Combine`, as that has not been appropriately resolved.
Tests have been updated to reflect this refinement.
Use the GetFileName call as a sanitizer, rather than an argument to that
call. It is the _result_ of the GetFileName call which should be
considered sanitized. By using the argument, we can spuriously suppress
use-use flow. Consider:
```
var path = Path.Combine(destDir, entry.GetFullName());
var fileName = Path.GetFileName(path);
log("Extracting " + fileName);
entry.ExtractToFile(path);
```
Previously, the `ExtractToFile(path)` call would not have been flagged,
because the `path` argument to `GetFileName` was considered sanitized,
and that argument formed a use-use pair with the `path` argument to
`ExtractToFile`. Now, this result would be flagged because only the
result of the `GetFileName` call is considered sanitized.
@@ -4,8 +4,8 @@ This open source repository contains the standard QL libraries and queries that
## How do I learn QL and run queries?
LGTM has [extensive documentation](https://lgtm.com/help/ql/introduction-to-ql) on getting started with writing QL.
You can use the [interactive query console](https://lgtm.com/help/lgtm/using-query-console) or the [QL for Eclipse](https://lgtm.com/help/lgtm/running-queries-ide) plugin to try out your queries on any open-source project that's currently being analyzed.
There is [extensive documentation](https://help.semmle.com/QL/learn-ql/) on getting started with writing QL.
You can use the [interactive query console](https://lgtm.com/help/lgtm/using-query-console) on LGTM.com or the [QL for Eclipse](https://lgtm.com/help/lgtm/running-queries-ide) plugin to try out your queries on any open-source project that's currently being analyzed.
## Contributing
@@ -13,4 +13,4 @@ We welcome contributions to our standard library and standard checks. Do you hav
## License
The LGTM queries are licensed under [Apache License 2.0](LICENSE) by [Semmle](https://semmle.com).
The QL queries in this repository are licensed under [Apache License 2.0](LICENSE) by [Semmle](https://semmle.com).
| Cast between `HRESULT` and a Boolean type (`cpp/hresult-boolean-conversion`) | security, external/cwe/cwe-253 | Finds logic errors caused by mistakenly treating the Windows `HRESULT` type as a Boolean instead of testing it with the appropriate macros. Results are shown on LGTM by default. |
| Cast from `char*` to `wchar_t*` (`cpp/incorrect-string-type-conversion`) | security, external/cwe/cwe-704 | Detects potentially dangerous casts from `char*` to `wchar_t*`. Results are shown on LGTM by default. |
| Dead code due to `goto` or `break` statement (`cpp/dead-code-goto`) | maintainability, external/cwe/cwe-561 | Detects dead code following a `goto` or `break` statement. Results are shown on LGTM by default. |
| Inconsistent direction of for loop (`cpp/inconsistent-loop-direction`) | correctness, external/cwe/cwe-835 | Detects `for` loops where the increment and guard condition don't appear to correspond. Results are shown on LGTM by default. |
| Incorrect 'not' operator usage (`cpp/incorrect-not-operator-usage`) | security, external/cwe/cwe-480 | Finds uses of the logical not (`!`) operator that look like they should be bit-wise not (`~`). Results are hidden on LGTM by default. |
| Non-virtual destructor in base class (`cpp/virtual-destructor`) | reliability, readability, language-features | This query, `NonVirtualDestructorInBaseClass.ql`, is a replacement in LGTM for the query: No virtual destructor (`AV Rule 78.ql`). The new query ignores base classes with non-public destructors since we consider those to be adequately protected. The new version retains the query identifier, `cpp/virtual-destructor`, and results are displayed by default on LGTM. The old query is no longer run on LGTM. |
| `NULL` application name with an unquoted path in call to `CreateProcess` (`cpp/unsafe-create-process-call`) | security, external/cwe/cwe-428 | Finds unsafe uses of the `CreateProcess` function. Results are hidden on LGTM by default. |
| Setting a DACL to `NULL` in a `SECURITY_DESCRIPTOR` (`cpp/unsafe-dacl-security-descriptor`) | security, external/cwe/cwe-732 | Finds code that creates world-writable objects on Windows by setting their DACL to `NULL`. Results are shown on LGTM by default. |
| Comparison result is always the same (`cpp/constant-comparison`) | Fewer false positive results | Comparisons in template instantiations are now excluded from results. |
| Empty branch of conditional (`cpp/empty-block`) | Fewer false positive results | Now recognizes commented blocks more reliably. |
| Expression has no effect (`cpp/useless-expression`) | Fewer false positive results | Expressions in template instantiations are now excluded from results. |
| Missing return statement (`cpp/missing-return`) | Fewer false positive results, visible by default | Improved results when a function returns a template-dependent type, or makes a non-returning call to another function. Precision increased from 'medium' to 'high' so that alerts are shown by default in LGTM. |
| Multiplication result converted to larger type (`cpp/integer-multiplication-cast-to-long`) | Fewer false positive results | Char-typed numbers are no longer considered to be potentially large. |
| No virtual destructor (`cpp/jsf/av-rule-78`) | No results in LGTM | This query is part of the [Joint Strike Fighter](http://www.stroustrup.com/JSF-AV-rules.pdf) suite which defines strict coding rules for air vehicles. Its query identifier has been revised to reflect this. On LGTM this query has been replaced by the similar query "Non-virtual destructor in base class", see New queries above. The new query highlights only code that is likely to be a problem in the majority of projects. |
| Overloaded assignment does not return 'this' (`cpp/assignment-does-not-return-this`) | Fewer false positive results | Any return statements that are unreachable are now ignored. |
| Resource not released in destructor (`cpp/resource-not-released-in-destructor`) | Fewer false positive results | No longer highlights uses of C++ _placement new_ and results are no longer reported for resources where the destructor body is not in the snapshot database. |
| Self comparison (`cpp/comparison-of-identical-expressions`) | Fewer false positive results | Code inside macro invocations is now excluded from the query. |
| Static array access may cause overflow (`cpp/static-buffer-overflow`) | More correct results | Data flow to the `size` argument of a buffer operation is now checked in this query. |
| Suspicious add with sizeof (`cpp/suspicious-add-sizeof`) | Fewer false positive results | Arithmetic with void pointers (where allowed) is now excluded from results. |
| Unsigned comparison to zero (`cpp/unsigned-comparison-zero`) | Fewer false positive results | Comparisons in template instantiations are now excluded from results. |
| Wrong type of arguments to formatting function (`cpp/wrong-type-format-argument`) | Fewer false positive results | False positive results involving `typedef`s have been removed. Expected argument types are determined more accurately, especially for wide string and pointer types. Custom (non-standard) formatting functions are also identified more accurately. |
| Array offset used before range check (`cpp/offset-use-before-range-check`) | More results and fewer false positive results | Now recognizes array accesses in different positions within the expression. Code where the range is checked before and after the array access is no longer highlighted. |
| AV Rule 164 (`cpp/jsf/av-rule-164`) | Fewer false positive results | Now accounts for explicit casts. |
| Call to memory access function may overflow buffer (`cpp/overflow-buffer`) | More correct results | Array indexing with a negative index is now detected by this query. |
| Global could be static (`cpp/jpl-c/limited-scope-file` and `cpp/power-of-10/global-could-be-static`)| Fewer false positive results | Variables with declarations in header files are now excluded from results. |
| Memory is never freed (`cpp/memory-never-freed`)| Fewer false positive results | No longer highlights uses of C++ _placement new_, which returns a pointer that does not need to be freed. |
| Negation of unsigned value (`cpp/jsf/av-rule-165`) | Fewer false positive results | Now accounts for explicit casts. |
| Suspicious call to memset (`cpp/suspicious-call-to-memset`) | Fewer false positive results | Types involving `decltype` are now correctly compared. |
| Variable scope too large (`cpp/jpl-c/limited-scope-function` and `cpp/power-of-10/variable-scope-too-large`) | Fewer false positive results | Variables with declarations in header files, or that are used at file scope, are now excluded from results. |
## Changes to QL libraries
* New hash consing library (`semmle.code.cpp.valuenumbering.HashCons`) for structural comparison of expressions. Unlike the existing library for global value numbering, this library implements a pure syntactic comparison of expressions and will equate expressions even if they may not compute the same value.
* The `Buffer.qll` library has more conservative treatment of arrays embedded in structs. This reduces false positive results in a number of security queries, especially `cpp/overflow-buffer`.
* Pre-C99 encodings of _flexible array members_ are recognized more reliably.
* Arrays of zero size are now treated as a special case.
* The library `semmle.code.cpp.dataflow.RecursionPrevention` is now deprecated. It was an aid for transitioning data-flow queries from 1.16 to 1.17, and it no longer has any function. Imports of this library should simply be deleted.
During code extraction, when determining the target of `msbuild` or `dotnet build`, the autobuilder now looks for:
*`.proj` files,
* then `.sln` files,
* and finally `.csproj`/`.vcxproj` files.
In all three cases, when multiple files of the same type are found, the project/solution file closest to the root is used to build the project.
### Control flow graph improvements
* The control flow graph construction now takes simple Boolean conditions on local scope variables into account. For example, in `if (b) x = 0; if (b) x = 1;`, the control flow graph will reflect the fact that taking the `true` (resp. `false`) branch in the first condition implies taking the same branch in the second condition. In effect, the first assignment to `x` will now be identified as being dead.
* Code that is only reachable from a constant failing assertion, such as `Debug.Assert(false)`, is considered to be unreachable.
| Uncontrolled format string (`cs/uncontrolled-format-string`) | security, external/cwe/cwe-134 | Finds data flow from remote inputs to the format string in `String.Format`. Results are shown on LGTM by default. |
| Using a package with a known vulnerability (`cs/use-of-vulnerable-package`) | security, external/cwe/cwe-937 | Finds project build files that import packages with known vulnerabilities. Results are shown on LGTM by default. |
| Cross-site scripting (`cs/web/xss`) | More results | Finds cross-site scripting vulnerabilities in ASP.NET Core applications. |
| Inconsistent lock sequence (`cs/inconsistent-lock-sequence`) | More results | Finds inconsistent lock sequences globally across calls. |
| Local scope variable shadows member (`cs/local-shadows-member`) | Fewer results | Results have been removed where a constructor parameter shadows a member, because the parameter is probably used to initialize the member. |
## Changes to code extraction
* Arguments passed using `in` are now extracted.
* Fixed a bug where the `dynamic` type name was not extracted correctly in certain circumstances.
* Fixed a bug where method type signatures were extracted incorrectly in some circumstances.
## Changes to QL libraries
*`getArgument()` on `AccessorCall` has been improved so it now takes tuple assignments into account. For example, the argument for the implicit `value` parameter in the setter of property `P` is `0` in `(P, x) = (0, 1)`. Additionally, the argument for the `value` parameter in compound assignments is now only the expanded value, for example, in `P += 7` the argument is `P + 7` and not `7`.
* The predicate `isInArgument()` has been added to the `AssignableAccess` class. This holds for expressions that are passed as arguments using `in`
| Arbitrary file write during archive extraction ("Zip Slip") (`java/zipslip`) | security, external/cwe/cwe-022 | Identifies extraction routines that allow arbitrary file overwrite vulnerabilities. Results are shown on LGTM by default. |
| Missing catch of NumberFormatException (`java/uncaught-number-format-exception`) | reliability, external/cwe/cwe-248 | Finds calls to `Integer.parseInt` and similar string-to-number conversions that might raise a `NumberFormatException` without a corresponding `catch`-clause. Results are hidden on LGTM by default. |
| Array index out of bounds (`java/index-out-of-bounds`) | Fewer false positive results | Results for arrays with a length evenly divisible by 3, or some greater number, and an index being increased with a similar stride length are no longer reported. |
| Confusing overloading of methods (`java/confusing-method-signature`) | Fewer false positive results | A correction to the inheritance relation ensures that spurious results on certain generic classes no longer occur. |
| Query built from user-controlled sources (`java/sql-injection`) | More results | SQL injection sinks from the Spring JDBC, MyBatis, and Hibernate frameworks are now reported. |
| Query built without neutralizing special characters (`java/concatenated-sql-query`) | More results | SQL injection sinks from the Spring JDBC, MyBatis, and Hibernate frameworks are now reported. |
| Unreachable catch clause (`java/unreachable-catch-clause`) | Fewer false positive results | Now accounts for calls to generic methods that throw generic exceptions. |
| Useless comparison test (`java/constant-comparison`) | Fewer false positive results | Constant comparisons guarding `java.util.ConcurrentModificationException` are no longer reported, as they are intended to always be false in the absence of API misuse. |
## Changes to QL libraries
* The class `ControlFlowNode` (and by extension `BasicBlock`) has until now
been directly equatable to `Expr` and `Stmt`. Exploiting these equalities,
for example by using casts, is now deprecated, and the conversions
`Expr.getControlFlowNode()` and `Stmt.getControlFlowNode()` should be used
instead.
* The default set of taint sources in the `FlowSources` library is extended to
cover parameters annotated with Spring framework annotations indicating
remote user input from servlets. This affects all security queries, which
will yield additional results on projects that use the Spring Web framework.
* The `ParityAnalysis` library is replaced with the more general `ModulusAnalysis` library, which improves the range analysis.
* Modeling of taint flow through array and buffer operations has been improved. This may give additional results for the security queries.
* Support for AMD modules has been improved. This may give additional results for the security queries, as well as any queries that use type inference on code bases that use such modules.
* Support for popular libraries has been improved. Consequently, queries may produce more results on code bases that use the following features:
- File system access, for example, through [fs-extra](https://github.com/jprichardson/node-fs-extra) or [globby](https://www.npmjs.com/package/globby)
- Outbound network access, for example, through the [fetch API](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API)
- The [lodash](https://lodash.com), [underscore](https://underscorejs.org/), [async](https://www.npmjs.com/package/async) and [async-es](https://www.npmjs.com/package/async-es) libraries
* The taint tracking library now recognizes additional sanitization patterns. This may give fewer false-positive results for the security queries.
* Type inference for function calls has been improved. This may give additional results for queries that rely on type inference.
* Path explanations have been added to the relevant security queries.
Use [QL for Eclipse](https://help.semmle.com/ql-for-eclipse/Content/WebHelp/getting-started.html)
to run queries and explore the data flow in results.
| Hard-coded data interpreted as code (`js/hardcoded-data-interpreted-as-code`) | security, external/cwe/cwe-506 | Highlights locations where hard-coded data is transformed and then executed as code or interpreted as an import path, which may indicate embedded malicious code ([CWE-506](https://cwe.mitre.org/data/definitions/506.html)). Results are hidden on LGTM by default. |
| Host header poisoning in email generation (`js/host-header-forgery-in-email-generation`)| security, external/cwe/cwe-640 | Highlights code that generates emails with links that can be hijacked by HTTP host header poisoning, indicating a potential violation of [CWE-640](https://cwe.mitre.org/data/definitions/640.html). Results shown on LGTM by default. |
| Replacement of a substring with itself (`js/identity-replacement`) | correctness, security, external/cwe/cwe-116 | Highlights string replacements that replace a string with itself, which usually indicates a mistake. Results shown on LGTM by default. |
| Stored cross-site scripting (`js/stored-xss`) | security, external/cwe/cwe-079, external/cwe/cwe-116 | Highlights uncontrolled stored values flowing into HTML content, indicating a potential violation of [CWE-079](https://cwe.mitre.org/data/definitions/79.html). Results shown on LGTM by default. |
| Unclear precedence of nested operators (`js/unclear-operator-precedence`) | maintainability, correctness, external/cwe/cwe-783 | Highlights nested binary operators whose relative precedence is easy to misunderstand. Results shown on LGTM by default. |
| Unneeded defensive code (`js/unneeded-defensive-code`) | correctness, external/cwe/cwe-570, external/cwe/cwe-571 | Highlights locations where defensive code is not needed. Results are shown on LGTM by default. |
| Unsafe dynamic method access (`js/unsafe-dynamic-method-access` ) | security, external/cwe/cwe-094 | Highlights code that invokes a user-controlled method on an object with unsafe methods. Results are shown on LGTM by default. |
| Unvalidated dynamic method access (`js/unvalidated-dynamic-method-call` ) | security, external/cwe/cwe-754 | Highlights code that invokes a user-controlled method without guarding against exceptional circumstances. Results are shown on LGTM by default. |
| Useless assignment to property (`js/useless-assignment-to-property`) | maintainability | Highlights property assignments whose value is always overwritten. Results are shown on LGTM by default. |
| Enabling Node.js integration for Electron web content renderers (`js/enabling-electron-renderer-node-integration`) | security, frameworks/electron, external/cwe/cwe-094 | Highlights Electron web content renderer preferences with Node.js integration enabled, indicating a potential violation of [CWE-94](https://cwe.mitre.org/data/definitions/94.html). |
| File data in outbound network request (`js/file-access-to-http`) | security, external/cwe/cwe-200 | Highlights locations where file data is sent in a network request, indicating a potential violation of [CWE-200](https://cwe.mitre.org/data/definitions/200.html). |
| User-controlled data written to file (`js/http-to-file-access`) | security, external/cwe/cwe-434, external/cwe/cwe-912 | Highlights locations where user-controlled data is written to a file, indicating a potential violation of [CWE-912](https://cwe.mitre.org/data/definitions/912.html). |
| Duplicate switch case (`js/duplicate-switch-case`) | Lower severity | Severity revised to "warning". |
| Inconsistent use of 'new' (`js/inconsistent-use-of-new`) | Simpler result presentation | Results show one call with `new` and one without. |
| Information exposure through a stack trace (`js/stack-trace-exposure`) | More results | Cases where the entire exception object (including the stack trace) may be exposed are highlighted. |
| Missing 'this' qualifier (`js/missing-this-qualifier`) | Fewer false positive results | Additional intentional calls to global functions are recognized. |
| Missing CSRF middleware (`js/missing-token-validation`) | Fewer false positive results | Additional types of CSRF protection middleware are recognized. |
| Regular expression injection (`js/regex-injection`) | Fewer false positive results | Calls to `String.prototype.search` are identified with more precision. |
| Remote property injection (`js/remote-property-injection`) | Fewer results | No longer highlights dynamic method calls, which are now handled by two new queries: `js/unsafe-dynamic-method-access` and `js/unvalidated-dynamic-method-call`. The precision of this rule has been revised to "medium", reflecting the precision of the remaining results. Results are now hidden on LGTM by default. |
| Self assignment (`js/redundant-assignment`) | Fewer false positive results | Self-assignments preceded by a JSDoc comment with a `@type` tag are no longer highlighted. |
| Server-side URL redirect (`js/server-side-unvalidated-url-redirection`) | More results and fewer false positive results | More redirection calls are identified. More safe redirections are recognized and ignored. |
| Unbound event handler receiver (`js/unbound-event-handler-receiver`) | Fewer false positive results | Additional ways that class methods can be bound are recognized. |
| Uncontrolled data used in network request (`js/request-forgery`) | More results | Additional kinds of requests are identified. |
| Unknown directive (`js/unknown-directive`) | Fewer false positives results | YUI compressor directives are now recognized. |
| Unused variable, import, function or class (`js/unused-local-variable`) | Fewer false positive results and fewer results | Imports used by the `transform-react-jsx` Babel plugin and fewer variables that may be used by `eval` calls are highlighted. Only one result is reported for an import statement with multiple unused imports. |
| Useless assignment to local variable (`js/useless-assignment-to-local`) | Fewer false positive results | Additional ways default values can be set are recognized. |
| Useless conditional (`js/trivial-conditional`) | More results, fewer false positive results | More types of conditional are recognized. Additional defensive coding patterns are now ignored. |
| Whitespace contradicts operator precedence (`js/whitespace-contradicts-precedence`) | Fewer false positive results | Operators with asymmetric whitespace are no longer highlighted. |
| Wrong use of 'this' for static method (`js/mixed-static-instance-this-access`) | More results, fewer false-positive results | Inherited methods are now identified. |
## Changes to QL libraries
* A `DataFlow::ParameterNode` instance now exists for all function parameters. Previously, unused parameters did not have a corresponding data-flow node.
*`ReactComponent::getAThisAccess` has been renamed to `getAThisNode`. The old name is still usable but is deprecated. It no longer gets individual `this` expressions, but the `ThisNode` mentioned below.
* The `DataFlow::ThisNode` class now corresponds to the implicit receiver parameter of a function, as opposed to an individual `this` expression. This means that `getALocalSource` now maps all `this` expressions within a given function to the same source. The data-flow node associated with a `ThisExpr` can no longer be cast to `DataFlow::SourceNode` or `DataFlow::ThisNode`. Using `getALocalSource` before casting, or instead of casting, is recommended.
* The flow configuration framework now supports distinguishing and tracking different kinds of taint, specified by an extensible class `FlowLabel` (which can also be referred to by its alias `TaintKind`).
The representation of the control flow graph (CFG) has been modified to better reflect the semantics of Python. As part of these changes, a new predicate `Stmt.getAnEntryNode()` has been added to make it easier to write reachability queries involving statements.
#### CFG nodes removed
The following statement types no longer have a CFG node for the statement itself, as their sub-expressions already contain all the
semantically significant information:
*`ExprStmt`
*`If`
*`Assign`
*`Import`
For example, the CFG for `if cond: foo else bar` now starts with the CFG node for `cond`.
#### CFG nodes reordered
For the following statement types, the CFG node for the statement now follows the CFG nodes of its sub-expressions to follow Python semantics:
*`Print`
*`TemplateWrite`
*`ImportStar`
For example the CFG for `print foo` (in Python 2) has changed from `print -> foo` to `foo -> print`, to reflect the runtime behavior.
The CFG for the `with` statement has been re-ordered to more closely reflect the semantics.
| Assert statement tests the truth value of a literal constant (`py/assert-literal-constant`) | reliability, correctness | Checks whether an assert statement is testing the truth of a literal constant value. Results are hidden on LGTM by default. |
| Flask app is run in debug mode (`py/flask-debug`) | security, external/cwe/cwe-215, external/cwe/cwe-489 | Finds instances where a Flask application is run in debug mode. Results are shown on LGTM by default. |
| Information exposure through an exception (`py/stack-trace-exposure`) | security, external/cwe/cwe-209, external/cwe/cwe-497 | Finds instances where information about an exception may be leaked to an external user. Results are shown on LGTM by default. |
| Jinja2 templating with autoescape=False (`py/jinja2/autoescape-false`) | security, external/cwe/cwe-079 | Finds instantiations of `jinja2.Environment` with `autoescape=False` which may allow XSS attacks. Results are hidden on LGTM by default. |
| Request without certificate validation (`py/request-without-cert-validation`) | security, external/cwe/cwe-295 | Finds requests where certificate verification has been explicitly turned off, possibly allowing man-in-the-middle attacks. Results are hidden on LGTM by default. |
| Use of weak cryptographic key (`py/weak-crypto-key`) | security, external/cwe/cwe-326 | Finds creation of weak cryptographic keys. Results are shown on LGTM by default. |
## Changes to existing queries
All taint-tracking queries now support visualization of paths in QL for Eclipse.
Most security alerts are now visible on LGTM by default. This means that you may see results that were previously hidden for the following queries:
| Command injection (`py/command-line-injection`) | More results | Additional sinks in the `os`, and `popen` modules may find more results in some projects. |
| Encoding error (`py/encoding-error`) | Better alert location | Alerts are now shown at the start of the encoding error, rather than at the top of the file. |
| Missing call to \_\_init\_\_ during object initialization (`py/missing-call-to-init`) | Fewer false positive results | Results where it is likely that the full call chain has not been analyzed are no longer reported. |
| URL redirection from remote source (`py/url-redirection`) | Fewer false positive results | Taint is no longer tracked from the right-hand side of binary expressions. In other words `SAFE + TAINTED` is now treated as safe. |
## Changes to code extraction
## Improved reporting of encoding errors
The extractor now outputs the location of the first character that triggers an `EncodingError`.
Any queries that report encoding errors will now show results at the location of the character that caused the error.
### Improved scalability
Scaling is near linear to at least 20 CPU cores.
### Improved logging
* Five levels of logging are available: `CRITICAL`, `ERROR`, `WARN`, `INFO` and `DEBUG`. `WARN` is the default.
* LGTM uses `INFO` level logging. QL tools use `WARN` level logging by default.
* The `--verbose` flag can be specified specified multiple times to increase the logging level once per flag added.
* The `--quiet` flag can be specified multiple times to reduce the logging level once per flag added.
* Log lines are now in the `[SEVERITY] message` style and never overlap.
## Changes to QL libraries
* Taint-tracking analysis now understands HTTP requests in the `twisted` library.
* The analysis now handles `isinstance` and `issubclass` tests involving the basic abstract base classes better. For example, the test `issubclass(list, collections.Sequence)` is now understood to be `True`
* Taint tracking automatically tracks tainted mappings and collections, without you having to add additional taint kinds. This means that custom taints are tracked from `x` to `y` in the following flow: `l = [x]; y =l[0]`.
* On LGTM, files whose name ends in `.min.js` or `-min.js` are no longer extracted by default. These files usually contain minified code and any alerts in these files would be hidden by default. If you still want to extract code from these files, you can add the following filters to your `lgtm.yml` file (or add them to existing filters):
```yaml
extraction:
javascript:
index:
filters:
- include:"**/*.min.js"
- include:"**/*-min.js"
```
* The TypeScript compiler is now included in the LGTM Enterprise and QL command-line tools installations, and you no longer need to install it manually.
If you need to override the compiler version, set the `SEMMLE_TYPESCRIPT_HOME` environment variable to
point to an installation of the `typescript` NPM package.
| Array argument size mismatch (`cpp/array-arg-size-mismatch`) | reliability | Finds function calls where the size of an array being passed is smaller than the array size of the declared parameter. Newly displayed on LGTM. |
| Lossy function result cast (`cpp/lossy-function-result-cast`) | correctness | Finds function calls whose result type is a floating point type, which are implicitly cast to an integral type. Newly available on LGTM but results not displayed by default. |
| Returning stack-allocated memory (`cpp/return-stack-allocated-memory`) | reliability, external/cwe/cwe-825 | Finds functions that may return a pointer or reference to stack-allocated memory. This query existed already but has been rewritten from scratch to make the error rate low enough for use on LGTM. Results displayed by default. |
| Use of string copy function in a condition (`cpp/string-copy-return-value-as-boolean`) | correctness | This query identifies calls to string copy functions used in conditions, where it's likely that a different function was intended to be called. Results are displayed by default on LGTM. |
| Array argument size mismatch (`cpp/array-arg-size-mismatch`) | Fewer false positive results | An exception has been added to this query for variable sized arrays. |
| Call to memory access function may overflow buffer (`cpp/overflow-buffer`) | More correct results | This query now recognizes calls to `RtlCopyMemoryNonTemporal` and `RtlSecureZeroMemory`. |
| Call to memory access function may overflow buffer (`cpp/overflow-buffer`) | More correct results | Calls to `fread` are now examined by this query. |
| Lossy function result cast (`cpp/lossy-function-result-cast`) | Fewer false positive results | The whitelist of rounding functions built into this query has been expanded. |
| Memory is never freed (`cpp/memory-never-freed`) | More correct results | Support for more Microsoft-specific memory allocation/de-allocation functions has been added. |
| Memory may not be freed (`cpp/memory-may-not-be-freed`) | More correct results | Support for more Microsoft-specific memory allocation/de-allocation functions has been added. |
| Mismatching new/free or malloc/delete (`cpp/new-free-mismatch`) | More correct results | Data flow through global variables for this query has been improved. |
| 'new[]' array freed with 'delete' (`cpp/new-array-delete-mismatch`) | More correct results | Data flow through global variables for this query has been improved. |
| 'new' object freed with 'delete[]' (`cpp/new-delete-array-mismatch`) | More correct results | Data flow through global variables for this query has been improved. |
| Potential buffer overflow (`cpp/potential-buffer-overflow`) | Deprecated | This query has been deprecated. Use Potentially overrunning write (`cpp/overrunning-write`) and Potentially overrunning write with float to string conversion (`cpp/overrunning-write-with-float`) instead. |
| Resource not released in destructor (`cpp/resource-not-released-in-destructor`) | Fewer false positive results | The query no longer highlights code that releases a resource via a virtual method call, function pointer, or lambda. |
| Returning stack-allocated memory (`cpp/return-stack-allocated-memory`) | More correct results | Many more stack allocated expressions are now recognized. |
| Suspicious add with sizeof (`cpp/suspicious-add-sizeof`) | Fewer false positive results | Pointer arithmetic on `char * const` expressions (and other variations of `char *`) are now correctly excluded from the results. |
| Suspicious pointer scaling (`cpp/suspicious-pointer-scaling`) | Fewer false positive results | False positive results involving types that are not uniquely named in the snapshot have been fixed. |
| Unused static variable (`cpp/unused-static-variable`) | Fewer false positive results | Variables with the attribute `unused` are now excluded from the query. |
| Use of inherently dangerous function (`cpp/potential-buffer-overflow`) | Cleaned up | This query no longer catches uses of `gets`, and has been renamed 'Potential buffer overflow'. |
| Use of potentially dangerous function (`cpp/potentially-dangerous-function`) | More correct results | This query now catches uses of `gets`. |
## Changes to QL libraries
* The `semmle.code.cpp.dataflow.DataFlow` library now supports _definition by reference_ via output parameters of known functions.
* Data flows through `memcpy` and `memmove` by default.
* Custom flow into or out of arguments assigned by reference can be modeled with the new class `DataFlow::DefinitionByReferenceNode`.
* The data flow library adds flow through library functions that are modeled in `semmle.code.cpp.models.interfaces.DataFlow`. Queries can add subclasses of `DataFlowFunction` to specify additional flow.
* There is a new `Namespace.isInline()` predicate, which holds if the namespace was declared as `inline namespace`.
* The `Expr.isConstant()` predicate now also holds for _address constant expressions_, which are addresses that will be constant after the program has been linked. These address constants do not have a result for `Expr.getValue()`.
* There are new `Function.isDeclaredConstexpr()` and `Function.isConstexpr()` predicates. They can be used to tell whether a function was declared as `constexpr`, and whether it actually is `constexpr`.
* There is a new `Variable.isConstexpr()` predicate. It can be used to tell whether a variable is `constexpr`.
| Clear text storage of sensitive information (`cs/cleartext-storage-of-sensitive-information`) | More results | Now includes data sources for user controls in `System.Windows.Forms`. |
| Dereferenced variable is always null (`cs/dereferenced-value-is-always-null`) | Improved results | The query has been rewritten from scratch, and the analysis is now based on static single assignment (SSA) forms. Results are now shown by default in LGTM. |
| Dereferenced variable may be null (`cs/dereferenced-value-may-be-null`) | Improved results | The query has been rewritten from scratch, and the analysis is now based on static single assignment (SSA) forms. Results are now shown by default in LGTM. |
| Double-checked lock is not thread-safe (`cs/unsafe-double-checked-lock`) | Fewer false positive and more true positive results | No longer highlights code where the underlying field was not updated in the `lock` statement, or where the field is a `struct`. Results have been added where there are other statements inside the `lock` statement. |
| Exposure of private information (`cs/exposure-of-sensitive-information`) | More results | Now includes data sources for user controls in `System.Windows.Forms`. |
| Improper control of generation of code (`cs/code-injection`) | More results | Now includes data sources for user controls in `System.Windows.Forms`. |
| Off-by-one comparison against container length (`cs/index-out-of-bounds`) | Fewer false positive results | No longer reports results when there are additional guards on the index. |
| SQL query built from user-controlled sources (`cs/sql-injection`) | More results | Now includes data sources for user controls in `System.Windows.Forms`. |
| Uncontrolled format string (`cs/uncontrolled-format-string`) | More results | Now includes data sources for user controls in `System.Windows.Forms`. |
| Unused format argument (`cs/format-argument-unused`) | Fewer false positive results | No longer reports results where the format string is empty. This is often used as a default value and is not an interesting result. |
| Use of default ToString() (`cs/call-to-object-tostring`) | Fewer false positive results | No longer reports results for `char` arrays passed to `StringBuilder.Append()`, which were incorrectly marked as using `ToString`. |
| Use of default ToString() (`cs/call-to-object-tostring`) | Fewer results | No longer reports results when the object is an interface or an abstract class. |
| Using a package with a known vulnerability (`cs/use-of-vulnerable-package`) | More results | This query detects packages vulnerable to CVE-2019-0657. |
## Changes to code extraction
* Fix extraction of `for` statements where the condition declares new variables using `is`.
* Initializers of `stackalloc` arrays are now extracted.
## Changes to QL libraries
* The class `TrivialProperty` now includes library properties determined to be trivial using CIL analysis. This may increase the number of results for all queries that use data flow.
* Taint-tracking steps have been added for the `Json.NET` package. This will improve results for queries that use taint tracking.
* Support has been added for EntityFrameworkCore, including
- Stored data flow sources
- Sinks for SQL expressions
- Data flow through fields that are mapped to the database
* Support has been added for NHibernate-Core, including
- Stored data flow sources
- Sinks for SQL expressions
- Data flow through fields that are mapped to the database
| Double-checked locking is not thread-safe (`java/unsafe-double-checked-locking`) | reliability, correctness, concurrency, external/cwe/cwe-609 | Identifies wrong implementations of double-checked locking that does not use the `volatile` keyword. |
| Race condition in double-checked locking object initialization (`java/unsafe-double-checked-locking-init-order`) | reliability, correctness, concurrency, external/cwe/cwe-609 | Identifies wrong implementations of double-checked locking that performs additional initialization after exposing the constructed object. |
| Arbitrary file write during archive extraction ("Zip Slip") (`java/zipslip`) | Fewer false positive results | Results involving a sanitization step that converts a destination `Path` to a `File` are no longer reported. |
| Result of multiplication cast to wider type (`java/integer-multiplication-cast-to-long`) | Fewer results | Results involving conversions to `float` or `double` are no longer reported, as they were almost exclusively false positives. |
## Changes to QL libraries
* The deprecated library `semmle.code.java.security.DataFlow` has been removed.
Improved data flow libraries have been available in
`semmle.code.java.dataflow.DataFlow`,
`semmle.code.java.dataflow.TaintTracking`, and
`semmle.code.java.dataflow.FlowSources` since 1.16.
* Taint tracking now includes additional default data-flow steps through
collections, maps, and iterators. This affects all security queries, which
can report more results based on such paths.
* The `FlowSources` and `TaintTracking` libraries are extended to cover additional remote user
input and taint steps from the following frameworks: Guice, Protobuf, Thrift and Struts.
This affects all security queries, which may yield additional results on projects
* File classification now recognizes additional generated files, for example, files from [HTML Tidy](html-tidy.org).
* The taint tracking library now recognizes flow through persistent storage, class fields, and callbacks in certain cases. Handling of regular expressions has also been improved. This may give more results for the security queries.
* Type inference for function calls has been improved. This may give additional results for queries that rely on type inference.
* The [Closure-Library](https://github.com/google/closure-library/wiki/goog.module:-an-ES6-module-like-alternative-to-goog.provide) module system is now supported.
| Arbitrary file write during archive extraction ("Zip Slip") (`js/zipslip`) | security, external/cwe/cwe-022 | Identifies extraction routines that allow arbitrary file overwrite vulnerabilities, indicating a possible violation of [CWE-022](https://cwe.mitre.org/data/definitions/22.html). Results are shown on LGTM by default. |
| Arrow method on Vue instance (`js/vue/arrow-method-on-vue-instance`) | reliability, frameworks/vue | Highlights arrow functions that are used as methods on Vue instances. Results are shown on LGTM by default.|
| Cross-window communication with unrestricted target origin (`js/cross-window-information-leak`) | security, external/cwe/201, external/cwe/359 | Highlights code that sends potentially sensitive information to another window without restricting the receiver window's origin, indicating a possible violation of [CWE-201](https://cwe.mitre.org/data/definitions/201.html). Results are shown on LGTM by default. |
| Double escaping or unescaping (`js/double-escaping`) | correctness, security, external/cwe/cwe-116 | Highlights potential double escaping or unescaping of special characters, indicating a possible violation of [CWE-116](https://cwe.mitre.org/data/definitions/116.html). Results are shown on LGTM by default. |
| Incomplete regular expression for hostnames (`js/incomplete-hostname-regexp`) | correctness, security, external/cwe/cwe-020 | Highlights hostname sanitizers that are likely to be incomplete, indicating a violation of [CWE-020](https://cwe.mitre.org/data/definitions/20.html). Results are shown on LGTM by default.|
| Incomplete URL substring sanitization | correctness, security, external/cwe/cwe-020 | Highlights URL sanitizers that are likely to be incomplete, indicating a violation of [CWE-020](https://cwe.mitre.org/data/definitions/20.html). Results shown on LGTM by default. |
| Incorrect suffix check (`js/incorrect-suffix-check`) | correctness, security, external/cwe/cwe-020 | Highlights error-prone suffix checks based on `indexOf`, indicating a potential violation of [CWE-20](https://cwe.mitre.org/data/definitions/20.html). Results are shown on LGTM by default. |
| Loop iteration skipped due to shifting (`js/loop-iteration-skipped-due-to-shifting`) | correctness | Highlights code that removes an element from an array while iterating over it, causing the loop to skip over some elements. Results are shown on LGTM by default. |
| Unused property (`js/unused-property`) | maintainability | Highlights properties that are unused. Results are shown on LGTM by default. |
| Useless comparison test (`js/useless-comparison-test`) | correctness | Highlights code that is unreachable due to a numeric comparison that is always true or always false. Results are shown on LGTM by default. |
| Ambiguous HTML id attribute | Fewer false positive results | This query now treats templates more conservatively. Its precision has been revised to 'high'. |
| Assignment to exports variable | Fewer results | This query no longer flags code that is also flagged by the query "Useless assignment to local variable". |
| Client-side cross-site scripting | More true positive and fewer false positive results. | This query now recognizes WinJS functions that are vulnerable to HTML injection. It no longer flags certain safe uses of jQuery, and recognizes custom sanitizers. |
| Hard-coded credentials | Fewer false positive results | This query no longer flags the empty string as a hardcoded username. |
| Insecure randomness | More results | This query now flags insecure uses of `crypto.pseudoRandomBytes`. |
| Reflected cross-site scripting | Fewer false positive results. | This query now recognizes custom sanitizers. |
| Stored cross-site scripting | Fewer false positive results. | This query now recognizes custom sanitizers. |
| Unbound event handler receiver (`js/unbound-event-handler-receiver`) | Fewer false positive results | Additional ways that class methods can be bound are now recognized. |
| Uncontrolled data used in network request | More results | This query now recognizes host values that are vulnerable to injection. |
| Uncontrolled data used in path expression | Fewer false positive results | This query now recognizes the Express `root` option, which prevents path traversal. |
| Unneeded defensive code | More true positive and fewer false positive results | This query now recognizes additional defensive code patterns. |
| Unsafe dynamic method access | Fewer false positive results | This query no longer flags concatenated strings as unsafe method names. |
| Unused parameter | Fewer false positive results | This query no longer flags parameters with leading underscore. |
| Unused variable, import, function or class | Fewer false positive results | This query now flags fewer variables that are implictly used by JSX elements. It no longer flags variables with a leading underscore and variables in dead code. |
| Unvalidated dynamic method call | More true positive results | This query now flags concatenated strings as unvalidated method names in more cases. |
| Useless assignment to property. | Fewer false positive results | This query now treats assignments with complex right-hand sides correctly. |
| Useless conditional | Fewer results | Additional defensive coding patterns are now ignored. |
| Useless conditional | More true positive results | This query now flags additional uses of function call values. |
## Changes to QL libraries
*`DataFlow::SourceNode` is no longer an abstract class; to add new source nodes, extend `DataFlow::SourceNode::Range` instead.
* Subclasses of `DataFlow::PropRead` are no longer automatically made source nodes; you now need to additionally define a corresponding subclass of `DataFlow::SourceNode::Range` to achieve this.
* The deprecated libraries `semmle.javascript.DataFlow` and `semmle.javascript.dataflow.CallGraph` have been removed; they are both superseded by `semmle.javascript.dataflow.DataFlow`.
* Overriding `DataFlow::InvokeNode.getACallee()` no longer affects the call graph seen by the interprocedural data flow libraries. To do this, the 1-argument version `getACallee(int imprecision)` can be overridden instead.
* The predicate `DataFlow::returnedPropWrite` was intended for internal use only and is no longer available.
The extractor now parses all Python code from a single unified grammar. This means that almost all Python code will be successfully parsed, even if mutually incompatible Python code is present in the same project. This also means that Python code for any version can be correctly parsed on a worker running any other supported version of Python. For example, Python 3.7 code is parsed correctly, even if the installed version of Python is only 3.5. This will reduce the number of syntax errors found in many projects.
### Regular expression analysis improvements
The Python `re` (regular expressions) module library has a couple of constants called `MULTILINE` and `VERBOSE` which determine the parsing of regular expressions. Python 3.6 changed the implementation of these constants, which resulted in false positive results for some queries. The relevant QL libraries have been updated to support both implementations which will remove false positive results from projects that use Python 3.6 and later versions.
### API improvements
The API has been improved to declutter the global namespace and improve discoverability and readability.
* New predicates `ModuleObject::named(name)` and `ModuleObject.attr(name)` have been added, allowing more readable access to common objects. For example, `(any ModuleObject m | m.getName() = "sys").getAttribute("exit")` can be replaced with `ModuleObject::named("sys").attr("exit")`
* The API for accessing builtin functions has been improved. Predicates of the form `theXXXFunction()`, such as `theLenFunction()`, have been deprecated in favor of `Object::builtin(name)`.
* A configuration based API has been added for writing data flow and taint tracking queries. This is provided as a convenience for query authors who have written data flow or taint tracking queries for other languages, so they can use a similar format of query across multiple languages.
| Default version of SSL/TLS may be insecure (`py/insecure-default-protocol`) | security, external/cwe/cwe-327 | Finds instances where an insecure default protocol may be used. Results are shown on LGTM by default. |
| Incomplete regular expression for hostnames (`py/incomplete-hostname-regexp`) | security, external/cwe/cwe-020 | Finds instances where a hostname is incompletely sanitized because a regular expression contains an unescaped character. Results are shown on LGTM by default. |
| Incomplete URL substring sanitization (`py/incomplete-url-substring-sanitization`) | security, external/cwe/cwe-020 | Finds instances where a URL is incompletely sanitized due to insufficient checks. Results are shown on LGTM by default. |
| Insecure temporary file (`py/insecure-temporary-file`) | security, external/cwe/cwe-377 | Finds uses of the insecure and deprecated `tempfile.mktemp`, `os.tempnam`, and `os.tmpnam` functions. Results are shown on LGTM by default. |
| Overly permissive file permissions (`py/overly-permissive-file`) | security, external/cwe/cwe-732 | Finds instances where a file is created with overly permissive permissions. Results are not shown on LGTM by default. |
| Use of insecure SSL/TLS version (`py/insecure-protocol`) | security, external/cwe/cwe-327 | Finds instances where a known insecure protocol has been specified. Results are shown on LGTM by default. |
| Comparison using is when operands support \_\_eq\_\_ (`py/comparison-using-is`) | Fewer false positive results | Results where one of the objects being compared is an enum member are no longer reported. |
| Modification of parameter with default (`py/modification-of-default-value`) | More true positive results | Instances where the mutable default value is mutated inside other functions are now also reported. |
| Mutation of descriptor in \_\_get\_\_ or \_\_set\_\_ method (`py/mutable-descriptor`) | Fewer false positive results | Results where the mutation does not occur when calling one of the `__get__`, `__set__` or `__delete__` methods are no longer reported. |
| Redundant comparison (`py/redundant-comparison`) | Fewer false positive results | Results in chained comparisons are no longer reported. |
| Unused import (`py/unused-import`) | Fewer false positive results | Results where the imported module is used in a `doctest` string are no longer reported. |
| Unused import (`py/unused-import`) | Fewer false positive results | Results where the imported module is used in a type-hint comment are no longer reported. |
## Changes to QL libraries
* Added support for the `dill` pickle library.
* Added support for the `bottle` web framework.
* Added support for the `CherryPy` web framework.
* Added support for the `falcon` web API framework.
* Added support for the `turbogears` web framework.
* Parallel extraction of JavaScript files (but not TypeScript files) on LGTM is now supported. If LGTM is configured to evaluate queries using multiple threads, then JavaScript files are also extracted using multiple threads.
* Experimental support for [E4X](https://developer.mozilla.org/en-US/docs/Archive/Web/E4X), a legacy language extension developed by Mozilla, is available.
* Additional [Flow](https://flow.org/) syntax is now supported.
* [Nullish Coalescing](https://github.com/tc39/proposal-nullish-coalescing) expressions are now supported.
* [TypeScript 3.2](https://www.typescriptlang.org/docs/handbook/release-notes/typescript-3-2.html) is now supported.
* The TypeScript extractor now handles the control flow of logical operators and destructuring assignments more accurately.
<p>This query shows the distribution of inheritance depth across all types, i.e. classes. Library types are ignored.</p>
<p>This query shows the distribution of inheritance depth across all types, that is, classes. Library types are ignored.</p>
<p>The result of this query is a line graph showing, for each number <i>n</i>, how many types have an inheritance depth of <i>n</i>, where
the inheritance depth of a type is the length of a longest path in the inheritance hierarchy from top class to the type.</p>
<p>When hovering the mouse pointer over a specific depth value, the number of types having this inheritance depth is displayed.</p>
</overview>
<section title="How to Address the Query Results">
<recommendation>
<p>The depth of a type is an indication of how deeply nested a type is in a given design.
Very deep types can be an indication of over-engineering, whereas a system with predominantly shallow types
may not be exploiting object-orientation to the full.</p>
</recommendation>
</section>
<references>
<li>
Shyam R. Chidamber and Chris F. Kemerer.
<a href="http://www.pitt.edu/~ckemerer/CK%20research%20papers/MetricForOOD_ChidamberKemerer94.pdf">A Metrics Suite for Object Oriented Design
</a>.
Shyam R. Chidamber and Chris F. Kemerer,
<i><a href="http://www.pitt.edu/~ckemerer/CK%20research%20papers/MetricForOOD_ChidamberKemerer94.pdf">A Metrics Suite for Object Oriented Design
</a></i>.
IEEE Transactions on Software Engineering,
20(6), pages 476-493, June 1994.
20(6), pages 476-493, June 1994.</li>
<a href="http://www.dmst.aueb.gr/dds/index.en.html">Diomides D. Spinnelis</a>.
<a href="http://www.spinellis.gr/codequality/">Code Quality: The Open Source Perspective</a>.
Addison-Wesley 2007.
<a href="http://www.dmst.aueb.gr/dds/index.en.html">Diomides D. Spinnelis</a>.
<a href="http://www.spinellis.gr/sw/ckjm/">ckjm - Chidamber and Kemerer Java Metrics</a>.
(implementation of CK metrics), 2006.
</li></references></qhelp>
<li>
Lutz Prechelt, Barbara Unger, Michael Philippsen, and Walter Tich, <i><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.159.2229&rep=rep1&type=pdf">A Controlled Experiment on Inheritance Depth as a Cost Factor for Code Maintenance
</a></i>.
Journal of Systems and Software, 65 (2):115-126, 2003.
<p>This query finds classes that define a destructor, a copy constructor, or a copy assignment operator, but not all three of them. The compiler generates default implementations for these functions, and since they deal with similar concerns it is likely that if the default implementation of one of them is not satisfactory, then neither are those of the others.</p>
<p>The query flags any such class with a warning, and also display the list of generated warnings in the result view.</p>
</overview>
<section title="How to Address the Query Results">
<p>This rule finds branching statements with conditions that always evaluate to the same value.
More likely than not these conditions indicate a defect in the branching condition or are an artifact left behind after debugging.</p>
<p>This query finds branching statements with conditions that always evaluate to the same value.
It is likely that these conditions indicate an error in the branching condition.
Alternatively, the conditions may have been left behind after debugging.</p>
<include src="aliasAnalysisWarning.qhelp" />
</overview>
<recommendation>
<p>Check the branch condition for defects, and verify that it isn't a remnant from debugging.</p>
<p>Check the branch condition for logic errors. Check whether it is still required.</p>
</recommendation>
<example><sample src="DeadCodeCondition.cpp" />
<example>
<p>This example shows two branch conditions that always evaluate to the same value.
The two conditions and their associated branches should be deleted.
This will simplify the code and make it easier to maintain.</p>
<sample src="DeadCodeCondition.cpp" />
</example>
<references>
<li>SEI CERT C++ Coding Standard <a href="https://wiki.sei.cmu.edu/confluence/display/c/MSC12-C.+Detect+and+remove+code+that+has+no+effect+or+is+never+executed">MSC12-C. Detect and remove code that has no effect or is never executed</a>.</li>
<p>This rule finds functions that are non-public, non-virtual and are never called. Dead functions are often deprecated pieces of code, and should be removed
as they may increase object code size, decrease code comprehensibility, and create the possibility of misuse.</p>
<p>This query highlights functions that are non-public, non-virtual, and are never called.
Dead functions are often deprecated pieces of code, and should be removed.
If left in the code base they increase object code size, decrease code comprehensibility, and create the possibility of misuse.</p>
<p>
<code>public</code> and <code>protected</code> functions are not considered by the check, as they could be part of the program's
API and could be used by external programs.
<code>public</code> and <code>protected</code> functions are ignored by this query.
This type of function may be part of the program's API and could be used by external programs.
</p>
<include src="callGraphWarning.qhelp" />
</overview>
<recommendation>
<p>Consider removing the function.</p>
<p>Verify that the function is genuinely unused and consider removing it.</p>
</recommendation>
<example><sample src="DeadCodeFunction.cpp" />
<example>
<p>The example below includes a function <code>f</code> that is no longer used and should be deleted.</p>
<sample src="DeadCodeFunction.cpp" />
</example>
<references>
<li>SEI CERT C++ Coding Standard: <a href="https://wiki.sei.cmu.edu/confluence/display/c/MSC12-C.+Detect+and+remove+code+that+has+no+effect+or+is+never+executed">MSC12-C. Detect and remove code that has no effect or is never executed</a>.</li>
* @description A function is never called, and should be considered for removal. Unused functions may increase object size, decrease readability and create the possibility of misuse.
* @description Unused functions may increase object size, decrease readability, and create the possibility of misuse.
* @kind problem
* @id cpp/dead-code-function
* @problem.severity warning
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.