`SEMMLE_TYPESCRIPT_NODE_RUNTIME` can be used to provide the path to the Node.js runtime executable.
If this is omitted, the extractor defaults to the current behaviour of looking for `node` on the PATH.
`SEMMLE_TYPESCRIPT_NODE_RUNTIME_EXTRA_ARGS` can be used to provide additional arguments to the
Node.js runtime. These are passed first, before the arguments supplied by the extractor.
These changes are designed to allow TypeScript extraction in controlled customer environments where
we cannot control the PATH, or must use custom Node.js executables with certain arguments set.
This query uses data flow for nullness analysis, which is always going
to be a large overapproximation. The overapproximation became too big
for one of the test cases after the recent change to make data flow go
across assignment by reference.
To make this query more conservative, it will now only report that the
`pDacl` argument can be null if there isn't also evidence that it can be
non-null.
This commit changes how data flow works in the following code.
MyType x = source();
defineByReference(&x);
sink(x);
The question here is whether there should be flow from `source` to
`sink`. Such flow is desirable if `defineByReference` doesn't write to
all of `x`, but it's undesirable if `defineByReference` is a typical
init function in `C` that writes to every field or if
`defineByReference` is `memcpy` or `memset` on the full range.
Before 1.20.0, there would be flow from `source` to `sink` in case `x`
happened to be modeled with `BlockVar` but not in case `x` happened to
be modelled with SSA. The choice of modelling depends on an analysis of
how `x` is used elsewhere in the function, and it's supposed to be an
internal implementation detail that there are two ways to model
variables. In 1.20.0, I changed the `BlockVar` behavior so it worked the
same as SSA, never allowing that flow. It turns out that this change
broke a customer's query.
This commit reverts `BlockVar` to its old behavior of letting flow
propagate past the `defineByReference` call and then regains consistency
by changing all variables that are ever defined by reference to be
modelled with `BlockVar` instead of SSA. This means we now get too much
flow in certain cases, but that appears to be better overall than
getting too little flow. See also the discussion in CPP-336.
Previously, `AllConfigurations.qll` would pull in (almost) all taint
tracking configurations, which has started causing OOMEs during
compilation.
I've pruned it down to only the most interesting configurations. Since
flow summaries are experimental at this point and require a bit of manual
configuration anyway, this shouldn't be much of an issue in practice.
My recent changes to suppress FPs in `ReturnStackAllocatedMemory.ql`
caused us to lose all results where there was a `Conversion` at the
initial address escape. We cannot handle conversions in general, but
this commit restores the good results for the trivial types of
conversion that we can handle.
Since we cannot track data flow from a fully-converted expression but
only the unconverted expression, we should check whether the address
initially escapes into the unconverted expression, not the
fully-converted one.
This fixes most of the false positives observed on lgtm.com.
`yield import(...)` previously caused a syntax error, now it is parsed
correctly.
`parseYield` is the only place where the value of `startsExpr` matters,
so this change should not affect anything else.
Also added `PhiInstruction::getAnInputOperand()`, and renamed `PhiInstruction::getAnOperandDefinitionInstruction()` to `getAnInput()` for consistency with other `Instruction` classes.
The `automaticVariableAddressEscapes` predicate got join-ordered badly
in its `unaliased_ssa` version. These are the tuple counts on Wireshark,
where one pipeline step is seen to have 716 million tuples:
```
[2019-03-02 11:29:41] (42s) Starting to evaluate predicate AliasAnalysis::automaticVariableAddressEscapes#2#f
[2019-03-02 11:30:06] (67s) Tuple counts:
353419 ~0% {1} r1 = JOIN project#Instruction::VariableAddressInstruction#class#2#ff WITH AliasAnalysis::resultEscapesNonReturn#2#f ON project#Instruction::VariableAddressInstruction#class#2#ff.<0>=AliasAnalysis::resultEscapesNonReturn#2#f.<0> OUTPUT FIELDS {AliasAnalysis::resultEscapesNonReturn#2#f.<0>}
353419 ~0% {2} r2 = JOIN r1 WITH IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext ON r1.<0>=IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext.<0> OUTPUT FIELDS {IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext.<1>,r1.<0>}
353419 ~0% {2} r3 = JOIN r2 WITH FunctionIR::FunctionIR::getFunction_dispred#3#ff ON r2.<0>=FunctionIR::FunctionIR::getFunction_dispred#3#ff.<0> OUTPUT FIELDS {FunctionIR::FunctionIR::getFunction_dispred#3#ff.<1>,r2.<1>}
716040298 ~0% {2} r4 = JOIN r3 WITH IRVariable::IRVariable#class#3#ff_10#join_rhs ON r3.<0>=IRVariable::IRVariable#class#3#ff_10#join_rhs.<0> OUTPUT FIELDS {IRVariable::IRVariable#class#3#ff_10#join_rhs.<1>,r3.<1>}
4480139 ~0% {2} r5 = JOIN r4 WITH IRVariable::IRAutomaticVariable#class#3#ff ON r4.<0>=IRVariable::IRAutomaticVariable#class#3#ff.<0> OUTPUT FIELDS {r4.<1>,r4.<0>}
66760 ~91% {1} r6 = JOIN r5 WITH Instruction::VariableInstruction::getVariable_dispred#2#ff ON r5.<0>=Instruction::VariableInstruction::getVariable_dispred#2#ff.<0> AND r5.<1>=Instruction::VariableInstruction::getVariable_dispred#2#ff.<1> OUTPUT FIELDS {r5.<1>}
return r6
[2019-03-02 11:30:06] (67s) >>> Relation AliasAnalysis::automaticVariableAddressEscapes#2#f: 35531 rows using 0 MB
```
The predicate contained a cyclic join, which is always hard to optimize.
I couldn't see a reason to join the `FunctionIR`, so I removed that
part. The predicate is now fast, and there are no changes in the tests.
Before this change,
```
flowOutOfCallableStep(CallNode call, ReturnNode ret, OutNode out, CallContext cc)
```
would compute all combinations of call sites `call` and returned expressions `ret`
up front.
Now, we instead introduce explicit return nodes, so each callable has exactly
one return node (as well as one for each `out`/`ref` parameter). There is then
local flow from a returned expression to the relevant return node, and
`flowOutOfCallableStep()` computes combinations of call sites and return nodes.
Not only does this result in better performance, it also makes `flowOutOfCallableStep()`
symmetric to `flowIntoCallableStep()`, where each argument is mapped to a parameter,
and not to all reads of that parameter.
Javadoc claims not to be able to resolve this link, while Eclipse manages to do so without any problems, failing an internal PR check.
It's only in a test, though, so I just removed it.
When determining which core library a "tried control flow element" is compiled against,
first look at exceptions caught by the surrounding `try` block, then look at assembly
attributes, and finally choose (randomly) the core library with the highest lexicographic
order.
This was previously flagged if `exports` wasn't used any further. While it's true that the assignment to `exports` is redundant in this case, the assignment is also flagged by DeadStorOfLocal, so there is no point in InvalidExport flagging it as well.
This query is only appropriate for setuid programs. Since such programs
are at most 0.1% of all code we analyse, I would say this query has a
precision of at most 0.1%.
Instead of having many small independent tests, we now just have a single test that pulls in all the individual tests and runs them together.
Concretely, each `.ql` file has been turned into a `.qll` file with a query predicate corresponding to the original `select` clause and named after the original `.ql` file, plus a prefix `test_`.
The newly added `tests.ql` imports all these `.qll`s.
The individual `.expected` files have been concatenated together into `tests.expected`, each prefixed with the name of the corresponding query predicate. (This is the format that qltest produces for tests with multiple query predicates.)
These two elements weren't cached, which meant that local data flow was
recalculated in every query that used data flow. They are also cached in
the Java version of `DataFlowUtil.qll`.
This commits also adds a test that uses `getParameter`. The new tests
demonstrate that support for array-to-pointer decay works, but we get
data flow to the array rather than its contents.
This accessor may not be forward-compatible with an IR-based version,
and it's unclear whether it has any use. The `VariableAccess` remains in
the `TDefinitionByReferenceNode` constructor since it's used to
implement `getType`.
Every once in a while we encounter projects using some custom file extension for files that we could in principle extract, but since the extractor doesn't know about the extension the files are skipped.
To handle this, the legacy extractor has a `--file-type` option that one can use to specify a file type to use for all files in that particular extraction. So far, `AutoBuild` has nothing of the sort.
This PR proposes to introduce an environment variable `LGTM_INDEX_FILETYPES` to allow a similar customisation. In the fullness of time, this variable would be set through `lgtm.yml` in the usual way, but for now it is undocumented and for internal use only.
Specifically, `LGTM_INDEX_FILETYPES` is a newline-separated list of ".extension:filetype" pairs, specifying that files with the given `.extension` should be extracted as type `filetype`, where
`filetype` is one of `js`, `html`, `json`, `typescript` or `yaml`.
For example, `.jsm:js` causes all `.jsm` files to be extracted as JavaScript.
This can also be used to override default file types: for example, by specifying `.js:typescript` all JavaScript files will be extracted as TypeScript.
The configuration in `DefaultOptions.qll` assumed that a call to any
top-level function named `error` would exit the program. This is not
true.
The assumption was probably about `error(3)`, which is a GNU extension.
It only exits if its first argument it not 0. Furthermore, projects such
as openssh may define their own function named `error` with different
behaviour. Because the GNU `error` function is non-standard, it's
perfectly fine to shadow it with a project-specific definition.
This change removes two FPs from `PointlessComparison.qll` on
https://github.com/openssh/openssh-portable.
In the precense of multiple core libraries, `getAThrownException()` would return
multiple copies of the same exception, say `System.OverflowException`, one for each
core library. With this change we try to identify which core library a given control
flow element was compiled against, and only return the corresponding version.
A PR check was failing because this query was enabled on LGTM but had no
qhelp. I'm removing the `@precision` for now to take it off LGTM, and
then we can add it back when it has qhelp, tests, and change note.
I've eliminated the clumsily worded "client-side code" and "server-side code" distinction, not least because Electron fits neither of those categories.
While this file is part of the project used in the tutorial, it isn't necessary for the queries to work. It also specifies a dependency on a vulnerable version of Express, causing it to be (spuriously) flagged by security scanners.
Thanks to Sam Lanning (@samlanning) and Robert Marsh for taking the time to help
to make it possible. In fact, it was Robert Marsh who effectively
wrote the query and figured out that __builtin_alloca should be
used to also take functions like strdupa into account. I just
filled out the metadata :-)
Before this change, all the cached predicates in `IRGuards.qll` were in
separate cached stages, resulting in recomputation of most of the
library for each stage. This change groups the cached predicates in two
cached classes. A better grouping may be possible, but this grouping was
easy to do and seems to solve the problem.
Before this change, the `IRGuards` library accounted for five cached
stages when using the `RangeAnalysis` library. After this change, it
only accounts for one.
This suite isn't referenced from anywhere yet, but it'll be included in
a standard ODASA dist because the dist includes all files in the `c` and
`cpp` directories. We can modify the nightly test jobs to include the
experimental suite.
This new query is not written because it's the most interesting query we
could write but because it's an IR-based query whose results are easy to
verify.
After a `queries.xml` was added to the test directory,
`Container.getRelativePath` now considers source files to be relative to
the `cpp/test` directory rather than the directory of the `*.ql*` file.
This caused some benign test output changes, and it also caused an
unwanted alert for `test3.c:14` to appear in
`cpp/ql/test/query-tests/Security/CWE/CWE-190/semmle/tainted/IntegerOverflowTainted.expected`.
This alert came about because `inSystemMacroExpansion` holds for files
that don't have a relative path, but the pretend system header in
`../system_header` now does have a relative path because it's below the
`cpp/test` directory. The fix is to add another `queries.xml` just for
the directory with the affected test.
Write accesses in assignments, such as the access to `x` in `x = 0` are not
evaluated, so they should not have entries in the control flow graph. However,
qualifiers (and indexer arguments) should still be evaluated, for example in
```
x.Foo.Bar = 0;
```
the CFG should be `x --> x.Foo --> 0 --> x.Foo.Bar = 0` (as opposed to
`x --> x.Foo --> x.Foo.Bar --> 0 --> x.Foo.Bar = 0`, prior to this change).
A special case is assignments via acessors (properties, indexers, and event
adders), where we do want to include the access in the control flow graph,
as it represents the accessor call:
```
x.Prop = 0;
```
But instead of `x --> x.set_Prop --> 0 --> x.Prop = 0` the CFG should be
`x --> 0 --> x.set_Prop --> x.Prop = 0`, as the setter is called *after* the
assigned value has been evaluated.
An even more special case is tuple assignments via accessors:
```
(x.Prop1, y.Prop2) = (0, 1);
```
Here the CFG should be
`x --> y --> 0 --> 1 --> x.set_Prop1 --> y.set_Prop2 --> (x.Prop1, y.Prop2) = (0, 1)`.
This change fixes a few key problems with the existing SSA implementations:
For unaliased SSA, we were incorrectly choosing to model a local variable that had accesses that did not cover the entire variable. This has been changed to ensure that all accesses to the variable are at offset zero and have the same type as the variable itself. This was only possible to fix now that every `MemoryOperand` has its own type.
For aliased SSA, we now correctly track the offset and size of each memory access using an interval of bit offsets covered by the access. The offset interval makes the overlap computation more straightforward. Again, this is only possible now that operands have types.
The `getXXXMemoryAccess` predicates are now driven by the `MemoryAccessKind` on the operands and results, instead of by specific opcodes.
This change does fix an existing false negative in the IR dataflow tests.
I added a few simple test cases to the SSA IR tests, covering the various kinds of overlap (MustExcactly, MustTotally, and MayPartially).
I added "PrintSSA.qll", which can dump the SSA memory accesses as part of an IR dump.
For function parameters that are subject to "pointer decay", the database contains the type as originally declared (e.g. `T[]` instead of `T*`). The IR needs the actual type. Similarly, for variable declared as an array of unknown size, the actual size needs to be inferred from the initializer (e.g. `char a[] = "blah";` needs to have the type `char[5]`).
I've opened a ticket to have the extractor emit the actual type alongside the declared type, but for now, this workaround is enough to unblock progress for typical code.
Previously, we only excluded attributes where the value of the attribute itself suggests templating happening. Now we exclude all attributes in documents where _any_ attribute value suggests templating.
This change does some shuffling to make the distinction between memory operands and register operands more clear in the IR API. First, any given type that extends `Operand` is now either always a `MemoryOperand` or always a `RegisterOperand`. This required getting rid of `CopySourceOperand`, which was used for both the `CopyValue` instruction (as a `RegisterOperand`) and for the `Load` instruction (as a `MemoryOperand`). `CopyValue` is now just a `UnaryInstruction`, `Store` has a `StoreValueOperand` (`RegisterOperand`), and all of the instructions that read a value from memory indirectly (`Load`, `ReturnValue`, and `ThrowValue`) all now have a `LoadOperand` (`MemoryOperand`).
There are no diffs in the IR output for this commit, but this change is required for a subsequent commit that will make each `MemoryOperand` have a `Type`, which in turn is needed to fix a critical bug in aliased SSA construction.
The IR tests were getting kind of unwieldy. We were using "ir.cpp" to contain test cases that covered both IR construction (every language construct imaginable) and SSA construction. We would then build and dump all three flavors of IR. For IR construction tests, examining the SSA dumps when you add a new test case is tedious.
To make this easier to manage, I've split the SSA-specific test cases out into a separate directory. "ir.cpp" should now contain only IR construction test cases, and "ssa.cpp" should contain only SSA construction test cases. We dump just the raw IR for "ir.cpp", and just the two SSA flavors for "ssa.cpp". We still run all three flavors of the IR sanity tests for "ir.cpp", though.
I also removed the "ssa_block_count.ql" test, which wasn't really adding any coverage, because any change to the block count would be reflected in the dump as well.
As its first application, this library makes it possible for `StoredXss` to reuse the `Source` classes of `DomBasedXss` and `ReflectedXss` without having to pull in their libraries (which contain their `Configuration` classes, causing `StoredXss` to recompute all flow information for the other two queries).
We now highlight the `replace` call (instead of the regular expression), and the alert message for the case of missing backslash escapes clarifies that it is talking about failure to escape backslashes in the input, not in the replacement text.
We now classify sensitive expressions into four categories (secret, id, password, certificate). This allows queries more fine-grained control over what kinds of sensitive data they want to deal with: for clear-text storage, for instance, user ids aren't so much of a problem.
This PR adds new predicates to `Declaration` and `Type` to get a fully-qualified canonical name for the element, suitable for debugging and dumps. It includes template parameters, cv qualifiers, function parameter and return types, and fully-qualified names for all symbols. These strings are too large to compute in productions queries, so they should be used only for dumps and debugging. Feel free to suggest better names for these predicates.
I've updated PrintAST and PrintIR to use these instead of `Function.getFullSignature()`. The biggest advantage of the new predicates is that they handle lambdas and local classes, which `getQualifiedName` and `getFullSignature` do not. This makes IR and AST dumps much more usable for real-world snapshots.
Along the way, I cleaned up some of our handling of `IntegralType` to use a single table for tracking the signed, unsigned, and canonical versions of each type. The canonical part is new, and was necessary for `getTypeIdentityString` so that `signed int` and `int` both appear as `int`.
Methods that are mentioned in a member reference expression should count
as rootdefs for the unused parameter query. Such methods have to match
the functional interface of the reference expression, so it is to be
expected that they will sometimes have to declare parameters that they
don't actually use.
Using the class `ExceptionClass` in combination with a deliberate Cartesian
product can lead to bad join orders, for example
```
EVALUATE NONRECURSIVE RELATION:
Completion::TriedControlFlowElement::getAThrownException_dispred#ff(int this, int result) :-
{1} r1 = JOIN Expr::Expr::getType_dispred#ff_10#join_rhs WITH @integral_type#f ON Expr::Expr::getType_dispred#ff_10#join_rhs.<0>=@integral_type#f.<0> OUTPUT FIELDS {Expr::Expr::getType_dispred#ff_10#join_rhs.<1>}
{1} r2 = JOIN r1 WITH @un_op#f ON r1.<0>=@un_op#f.<0> OUTPUT FIELDS {r1.<0>}
{1} r3 = JOIN r2 WITH Stmt::TryStmt::getATriedElement#ff_1#join_rhs ON r2.<0>=Stmt::TryStmt::getATriedElement#ff_1#join_rhs.<0> OUTPUT FIELDS {r2.<0>}
{2} r4 = JOIN r3 WITH Stmt::ExceptionClass#f CARTESIAN PRODUCT OUTPUT FIELDS {Stmt::ExceptionClass#f.<0>,r3.<0>}
{2} r5 = JOIN r4 WITH System::SystemOverflowExceptionClass#class#f ON r4.<0>=System::SystemOverflowExceptionClass#class#f.<0> OUTPUT FIELDS {r4.<1>,r4.<0>}
```
where the CP is made with `ExceptionClass` rather than `SystemOverflowExceptionClass`
directly.
These predicates currently take a pair of `IRBlock`s - as it stands, at
most one edge can exist from one `IRBlock` to a given other `IRBlock`.
We may need to revisit that assumption and create an `IREdge` IPA type
at some future date
There are a few IR APIs that we've found to be confusingly named. This PR renames them to be more consistent within the IR and with the AST API:
`Instruction.getFunction` -> `Instruction.getEnclosingFunction`: This was especially confusing when you'd call `FunctionAddressInstruction.getFunction` to get the function whose address was taken, and wound up with the enclosing function instead.
`Instruction.getXXXOperand` -> `Instruction.getXXX`. Now that `Operand` is an exposed type, we want a way to get a specific `Operand` of an `Instruction`, but more often we want to get the definition instruction of that operand. Now, the pattern is that `getXXXOperand` returns the `Operand`, and `getXXX` is equivalent to `getXXXOperand().getDefinitionInstruction()`.
`Operand.getInstruction` -> `Operand.getUseInstruction`: More consistent with the existing `Operand.getDefinitionInstruction` predicate.
This adds `IntegerPartial.qll`, which is similar to
`IntegerConstant.qll` except that it contains partial functions on
integers instead of total functions on optional integers. This speeds up
the constant analysis so it takes 6.5s instead of 10.3s on comdb2.
This does to `SSAConstruction` what the previous commit did to
`IRConstruction`. An instruction in `SSAConstruction` is now defined in
terms of how it was created rather than what it can be queried for.
Effectively, this defines `TInstruction` as `TInstructionTag` was
defined before and then removes `TInstructionTag` from
`SSAConstruction`. This also has the benefit of removing the concept of
an instruction tag from the public predicates on `Instruction`.
This definition was denormalized to the extent that an instruction was
defined in terms of the six main attributes it could be queried for.
This made it possible to do multi-column joins on those six attributes,
but it doesn't appear that this feature was useful in practice. The main
multi-column join that was in use was on the pair of
(`TranslatedElement, InstructionTag`), but the `TranslatedElement` was
not part of the `TInstruction`.
This commit changes `TInstruction` to be defined in terms of what it's
_built from_ (`TranslatedElement, InstructionTag`) instead. This makes
it possible to do multi-column joins on those two components, and then
there are separate predicates (usually with two columns) to query
instruction attributes, replacing the many uncached projections from
`MkInstruction` that were generated before.
An immediate advantage is that an `Expr` with multiple types will no
longer give rise to multiple `Instruction`s, fixing most of the errors
from the sanity query `ambiguousSuccessors`. The code inside
`IRConstruction.qll` becomes simpler and hopefully faster as there is no
longer a translation from `TranslatedElement` to `Locatable` and back
again.
The previous commit had the side effect that `IRVariable`s were created
for all `Functions`, including those that did not have IR. This commit
restricts all `TIRVariable` constructors to functions that have IR.
The recent change to `AccessorCall` on dd99525566 resulted
in some bad join-orders, so I have (partly) reverted them. This means that the issues
orignally addressed by that change are now reintroduced, and I plan to instead apply a
fix to the CFG, which--unlike the original fix--should be able to handle multi-property-tuple
assignments.
This doesn't make it much faster, but it reduces the debug output
volume. It also simplifies the code.
I've found this change necessary when I compute the full IR on a
Wireshark snapshot in QL4E. Without it, Eclipse runs out of memory
because the console log is too large.
The new predicate `isOrphan` gets inlined into
`ignoreExprAndDescendants`, whose performance improves from
TranslatedElement::ignoreExprAndDescendants#f .. 23.4s (executed 9 times)
to
TranslatedElement::ignoreExprAndDescendants#f ... 4.3s (executed 9 times)
This dramatic improvement is not only due to eliminating a type check in
the recursive case. Removing the type check from the other base cases
also enabled them to get better join orders.
The previous reccomentation changed the behaviour of the code.
A user following the advice might have broken her/his code:
With call-by-value, the original parameter is not changed.
With a call-by-reference, however, it may be changed. To be sure,
nothing breaks by blindly following the advice, suggest to pass a
const reference.
The `SSAConstruction.getNewIRVariable` was very slow on Wireshark. This
was probably because it couldn't join on multiple columns
simultaneously. Instead of improving the join, I observed that the
`TIRVariable` type was the same between all three IR stages except for a
few occurrences of `FunctionIR` that could easily be changed to
`Function`. By sharing `TIRVariable` between all the stages, we avoid
recomputing it and translating it between every stage, turning the slow
`getNewIRVariable` predicate into a no-op.
This change means that later stages of the IR can't introduce new
variables, but that was already the case because
`config/identical-files.json` forced all three `IRVariable.qll` files to
be identical.
This predicate computed a local CP between all defs and uses of the same
virtual variable in a basic block. This wasn't a problem in
`unaliased_ssa`, but it became a huge problem in `aliased_ssa`, probably
because many variables can be modelled with a single virtual variable
there.
Before this commit, evaluation of `aliased_ssa`'s
`variableLiveOnEntryToBlock#ff#antijoin_rhs` on Wireshark took 80
_minutes_. After this commit, that predicate and its immediate
dependencies take around 5 _seconds_.
This relation was almost 40x the size it needed to be on Wireshark
because it lacked a restriction on the `tag` parameter. To implement
that restriction efficiently, I had to split the relation in two to
dictate the join order.
With the fix, `getInstruction` now computes the same as
`getInstructionTranslatedElementAndTag`, so the latter could be
simplified.
I made a corresponding change to `TranslatedElement.getTempVariable` for
the sake of consistency.
A part of `SSAConstruction.getInstructionOperandDefinition` was more
expensive than it had to be. On a ChakraCore snapshot, this changes the
tuple counts from
3020569 ~2% {3} r40 = JOIN OperandTag::TUnmodeledUseOperand#f WITH Instruction::Instruction::getFunction_dispred#ff CARTESIAN PRODUCT OUTPUT FIELDS {Instruction::Instruction::getFunction_dispred#ff.<0>,OperandTag::TUnmodeledUseOperand#f.<0>,Instruction::Instruction::getFunction_dispred#ff.<1>}
62405 ~0% {3} r41 = JOIN r40 WITH Instruction::UnmodeledUseInstruction#class#fffffff ON r40.<0>=Instruction::UnmodeledUseInstruction#class#fffffff.<0> OUTPUT FIELDS {r40.<2>,r40.<1>,r40.<0>}
2868421 ~1% {3} r42 = JOIN r41 WITH Instruction::Instruction::getFunction_dispred#ff_10#join_rhs ON r41.<0>=Instruction::Instruction::getFunction_dispred#ff_10#join_rhs.<0> OUTPUT FIELDS {Instruction::Instruction::getFunction_dispred#ff_10#join_rhs.<1>,r41.<1>,r41.<2>}
62405 ~0% {3} r43 = JOIN r42 WITH Instruction::UnmodeledDefinitionInstruction#class#fffffff ON r42.<0>=Instruction::UnmodeledDefinitionInstruction#class#fffffff.<0> OUTPUT FIELDS {r42.<2>,r42.<1>,r42.<0>}
to
(0s) Starting to evaluate predicate SSAConstruction::Cached::getUnmodeledUseInstruction#ff
(0s) Tuple counts:
62405 ~0% {2} r1 = JOIN Instruction::UnmodeledUseInstruction#class#fffffff WITH Instruction::Instruction::getFunction_dispred#ff ON Instruction::UnmodeledUseInstruction#class#fffffff.<0>=Instruction::Instruction::getFunction_dispred#ff.<0> OUTPUT FIELDS {Instruction::Instruction::getFunction_dispred#ff.<1>,Instruction::Instruction::getFunction_dispred#ff.<0>}
return r1
...
75716 ~0% {3} r40 = JOIN OperandTag::TUnmodeledUseOperand#f WITH FunctionIR::FunctionIR::getUnmodeledDefinitionInstruction#ff CARTESIAN PRODUCT OUTPUT FIELDS {FunctionIR::FunctionIR::getUnmodeledDefinitionInstruction#ff.<0>,OperandTag::TUnmodeledUseOperand#f.<0>,FunctionIR::FunctionIR::getUnmodeledDefinitionInstruction#ff.<1>}
62405 ~0% {3} r41 = JOIN r40 WITH FunctionIR::FunctionIR::getUnmodeledUseInstruction#ff ON r40.<0>=FunctionIR::FunctionIR::getUnmodeledUseInstruction#ff.<0> OUTPUT FIELDS {FunctionIR::FunctionIR::getUnmodeledUseInstruction#ff.<1>,r40.<1>,r40.<2>}
Now that we have `Expr.getParentWithConversions`, we can implement
`TranslatedElement.getRealParent` simpler. This implementation also
avoids recursion.
In theory this query will produce no results on C++ code; in practice, I
suspect the "cpp" suite is often run on code compiled as C, so it is
likely to be worth running anyways.
Rewrite the predicate `succSplits()` and the construction of the IPA type `TSplits`.
The two are now mutually dependent, see more in the comment for the module
`SuccSplits`.
These queries have no results on our test cases in the repo, but
`ambiguousSuccessors` has results on any large C++ code base, and
`unexplainedLoop` has results on Windows builds of ChakraCore.
The syntactic node assiociated with accessor calls was previously always the
underlying member access. For example, in
```
x.Prop = y.Prop;
```
the implicit call to `x.set_Prop()` was at the syntactic node `x.Prop`, while the
implicit call to `y.get_Prop()` was at the syntactic node `y.Prop`.
However, this breaks the invariant that arguments to calls dominate the call itself,
as the argument `y.Prop` for the implicit `value` parameter in `x.set_Prop()` will
be evaluated after the call (the left-hand side in an assignment is evaluated before
the right-hand side).
The solution is to redefine the access call to `x.set_Prop()` to point to the whole
assignment `x.Prop = y.Prop`, instead of the access `x.Prop`. For reads, we still want
to associate the accessor call with the member access.
A corner case arises when multiple setters are called in a tuple assignment:
```
(x.Prop1, x.Prop2) = (0, 1)
```
In this case, we cannot associate the assignment with both `x.set_Prop1()` and
`x.set_Prop2()`, so we instead revert to using the underlying member accesses as
before.
This predicate was unbearably slow on a ChakraCore snapshot (and
probably everywhere else):
ReachableBlock::getAFeasiblePredecessorBlock#2#ff#antijoin_rhs .. 1m6s
ReachableBlock::getAFeasiblePredecessorBlock#ff#antijoin_rhs .... 31.8s
With this change, the predicate is so fast that it doesn't even show up
in the clause timing report.
It's possible that we only tested this for performance in 1.18, and then
it has regressed in 1.19. Otherwise I can't explain how we've missed
this. I'm using QL for Eclipse 1.20.0.201901070127.
I looked through a few hundred results from this query on lgtm.com and
found that most of the FPs had to do with comment lines ending in `}`.
This change should fix most of them, at the cost of very few false
negatives.
On Wireshark, this query goes from 7,425 results to 6,686 results before
filtering for generated code. Almost all the lost results were FP,
except a handful of results involving initializer lists.
The existing `Node.asExpr` predicate changes semantics so it becomes the
one that most users should use when they don't want to think about
`Conversion`s. A new `Node.asConvertedExpr` predicate is added and has
the same semantics as the old `Node.asExpr` predicate. It's for advanced
users that know about `Conversion`s and want to account for them.
With this change, the `IRDataflowTestCommon.qll` and
`DataflowTestCommon.qll` files use the same definitions of sources and
sinks. Since the IR data flow library is meant to be compatible with the
AST data flow library, this is what we ought to be testing.
Two alerts change but not necessarily for the right reasons.
Data flow nodes for expressions do not take CFG splitting into account. Example:
```
if (b)
x = tainted;
x = x.ToLower();
if (!b)
Use(x);
```
Flow is incorrectly reported from `tainted` to `x` in `Use(x)`, because the step
from `tainted` to `x.ToLower()` throws away the information that `b = true`.
The solution is to remember the splitting in data flow expression nodes, that is,
to represent the exact control flow node instead of just the expression. With that
we get flow from `tainted` to `[b = true] x.ToLower()`, but not from `tainted` to
`[b = false] x.ToLower()`.
The data flow API remains unchanged, but in order for analyses to fully benefit from
CFG splitting, sanitizers in particular should be CFG-based instead of expression-based:
```
if (b)
x = tainted;
if (IsInvalid(x))
return;
Use(x);
```
If the call to `IsInvalid()` is a sanitizer, then defining an expression node to be
a sanitizer using `GuardedExpr` will be too conservative (`x` in `Use(x)` is in fact
not guarded). However, `[b = true] x` in `[b = true] Use(x)` is guarded, and to help
defining guard-based sanitizers, the class `GuardedDataFlowNode` has been introduced.
As we prepare to clarify how conversions are treated, we don't want a
`sink(...)` declaration where it's non-obvious which conversions are
applied to arguments.
This code was clearly using `isConstant` as an indirect way of checking
whether `getValue` would have a result. That's no longer valid, so I
changed it to check `getValue` directly.
These files had come out of sync due to 89148a9ec7 and 8c9c316e1b. I
synced the files by replaying the changes that those commits made in
`aliased_ssa/` to the two other copies.
The CFG construction code previously contained half of an approximation
of which address expressions are constant. Now this this property is
properly modelled by `Expr.isConstant`, we can remove this code.
This fixes most discrepancies between the QL-based CFG and the
extractor-based CFG on Wireshark.
Before this change, `Expr.isConstant` only was only true for those
constant expressions that could be represented as QL values: numbers,
Booleans, and string literals. It was not true for string literals
converted from arrays to pointers, and it was not true for addresses of
variables with static lifetime.
The concept of a "constant expression" varies between C and C++ and
between versions of the standard, but they all include addresses of data
with static lifetime. These are modelled by the new library
`AddressConstantExpression.qll`, which is based on the code in
`EscapesTree.qll` and modified for its new purpose.
I've tested the change for performance on Wireshark and for correctness
with the included tests. I've also checked on Wireshark that all static
initializers in C files are considered constant, which was not the case
before.
This change suppresses results from "Declaration hides parameter" where
the ParameterDeclarationEntry does not link up to the right
FunctionDeclarationEntry.
Bad magic ended up in `LocalVariable.getFunction` and effectively
created a Cartesian product. Before this change, the timing looked like
this:
Variable::LocalVariable::getFunction_dispred#bb ... 50.1s
#select#cpe#123#fff ............................... 20.6s
After this change, those predicates become much faster:
Variable::LocalVariable::getFunction_dispred#ff ... 121ms
DeclarationHidesParameter::localVariableNames#fff . 77ms
#select#cpe#123#fff ............................... 28ms
Introducing the predicate `localVariableNames` ensures that we can do
the main join on two columns simultaneously, so that's a change we
should keep even if we remove the `pragma[nomagic]` later.
This test was intended to catch regressions in the CFG, but it looks
like it's just catching insignificant extractor changes. The test has
started failing after some recent extractor changes, but I have no way
to pinpoint the failure and understand whether it's a problem or not, so
I think it's better to delete this test.
The remaining tests check whether the QL-based CFG generates the same
graph as the extractor-based CFG. Furthermore, the `successor-tests`
check that the extractor-based CFG works as intended.
We agreed in the review of the original PR that `getName` is more
appropriate here than `getQualifiedName`. Using `getName` ensures that
we also match the `std::`-prefixed versions of these functions as well
as user-defined versions.
This commit doesn't change any behavior but just uses the preferred
high-level predicates. The `getChild` predicate inspects the raw
database more or less directly, and the database layout could change in
the future.
This should make the documentation more in line with the documentation
for our other queries. The @name of the query is changed to "Use of
string copy function in a condition".
https://github.com/Semmle/ql/pull/698 removed `document.cookie` as a remote flow source, which some of the tests relied on. We now use `location.search` instead.
This recursive predicate is made faster by working around a known
optimizer problem (QL-796) that causes the optimizer to insert extra
type checks in recursive case even when they are only needed in the
base case.
If a summary does not specify a configuration, it is taken to apply to all configurations without custom sanitisers/barriers.
If a source summary does not specify a flow label, `data` is assumed.
If a sink summary does not specify a flow label, both `data` and `taint` are assumed.
Flow step summaries cannot omit flow labels.
Note that the standard extraction queries always provide explicit configurations and flow labels, and hence do not exercise this functionality.
This reduces the number of bounds computed, and will simplify use of the
library. The resulting locations in the tests may be slightly strange,
because the example `Instruction` for a `ValueNumber` is the first
appearing in the IR, regardless of source order, and may not be the most
closely related `Instruction` to the bounded value. I think that's worth
doing for the performance and usability benefits.
It now uses a facade pattern similar to `InvokeNode`: the range of the class is defined by an abstract class `DataFlow::SourceNode::Range`, while the actual behaviour is defined by the (no longer abstract) `SourceNode` class itself.
Clients that want to add new source nodes need to extend `DataFlow::SourceNode::Range`, those that want to refine the behaviour of existing source nodes should extend `DataFlow::SourceNode` itself.
While this is technically a breaking API change, I think separating the two aspects in this way is cleaner and makes it easier to use, and improves performance as well.
I changed to detect any logical operation usage (i.e. !, ==), but I kept usage in a conditional directly as a separate detection condition. I found no false positives on the projects you shared with me previously.
This implements calculation of the control-flow graph in QL. The new
code is not enabled yet as we'll need more extractor changes first.
The `SyntheticDestructorCalls.qll` file is a temporary solution that can
be removed when the extractor produces this information directly.
Add more tests for duplicated entities, and fix some duplicated entities.
Update the TupleTypes output - some extraneous results gone so it's probably better.
Instead of only considering a fixed set of paths for `dotnet` and `mono`,
first attempt to lookup the paths based on the `PATH` environment variable.
This change also fixes a potential `System.IO.DirectoryNotFoundException` exception,
which could be thrown when the `shared/Microsoft.NETCore.App` folder was not
present.
It turns out that we still need `getURL()` to account for cases where there is no
`getLocation()`. Not having `getURL()` for entities without a `getLocation()` results
in a `file://0:0:0:0` URL, which is not rendered in QL4E, unlike a `""` URL.
The existing unreachable IR removal code only retargeted an infeasible edge to an `Unreached` instruction if the successor of the edge was an unreachable block. This is too conservative, because it doesn't remove an infeasible edge that targets a block that is still reachable via other paths. The trivial example of this is `do { } while (false);`, where the back edge is infeasible, but the body block is still reachable from the loop entry.
This change retargets all infeasible edges to `Unreached` instructions, regardless of the reachability of the successor block.
With these changes we can run `odasa prepareQueries --check-only
--fail-on-warnings` on the C++ query directory. Two changes were needed:
1. The `Metrics/queries.xml` file had to be deleted. It existed because
the built distribution has a different file layout, where `Metrics`
is moved to the top-level query dir `odasa-cpp-metrics`. Since
internal PR 28230 this file is created as needed as part of the dist
build process, so it doesn't need to be checked in with the sources.
2. All uses of the `deprecated` and stubbed-out Objective C classes were
removed.
Computing strings and locations for CIL instructions can be quite time consuming.
The CIL `toString()`s are not very helpful in path explanations, and their locations
are only useful when a PDB source file exists. Therefore, produce a simple constant
`toString()`, and restrict locations to those in PDB files.
A method such as
```
void M()
{
throw new Exception();
}
```
was incorrectly not categorized as a `ThrowingCallable`, that is, a callable
that always throws an exception upon invocation.
For example, in
```
void M(object x)
{
var y = x == null ? 1 : 2;
if (y == 2)
x.ToString();
}
```
the guard `y == 2` implies that the guard `x == null` must be false,
as the assignment of `2` to `y` is unique.
For example, in
```
void M(object x)
{
var y = x != null ? "" : null;
if (y != null)
x.ToString();
}
```
the guard `y != null` implies that the guard `x != null` must be true.
I noticed that queries using the data flow library spent significant
time in `#Dominance::bbIDominates#fbPlus`, which is the body of the
`bbStrictlyDominates` predicate. That predicate took 28 seconds to
compute on Wireshark.
The `b` in the predicate name means that magic was applied, and the
application of magic meant that it could not be evaluated with the
built-in `fastTC` HOP but became an explicit recursion instead. Applying
`pragma[nomagic]` to this predicate means that we will always get it
evaluated with `fastTC`, and that takes less than a second in my test
case.
The internal pre-SSA library was extended on 3e78c2671f
to include fields/properties that are local-scope-like. The CFG splitting logic
uses ranking of SSA definitions to define an (arbitrary) order of splits, but for
fields/properties the implicit entry definition all have the same line and column.
In effect, such SSA definitions incorrectly get the same rank. Adding the name
of the field/property to the lexicographic ordering resolves the issue.
By pulling this out of the condition we can avoid computing its negation for the `else` branch, which could previously lead to quite an enormous pipeline.
Just like syntax elements can be split in the control flow graph, so can SSA
definitions. To make this clear, and to make debugging easier, this commit
adds the splits as a prefix in the textual representation of SSA definitions.
On 03e69e9945, I updated the guards library to account
for control flow graph splitting. However, the logic that relates SSA qualifiers for
the guard and the guarded expression was not updated accordingly.
This generalises our previous handling of sanitisers operating on property accesses to support dynamic property accesses where the property name is an SSA variable by representing them as access paths.
@@ -4,8 +4,8 @@ This open source repository contains the standard QL libraries and queries that
## How do I learn QL and run queries?
LGTM has [extensive documentation](https://lgtm.com/help/ql/introduction-to-ql) on getting started with writing QL.
You can use the [interactive query console](https://lgtm.com/help/lgtm/using-query-console) or the [QL for Eclipse](https://lgtm.com/help/lgtm/running-queries-ide) plugin to try out your queries on any open-source project that's currently being analyzed.
There is [extensive documentation](https://help.semmle.com/QL/learn-ql/) on getting started with writing QL.
You can use the [interactive query console](https://lgtm.com/help/lgtm/using-query-console) on LGTM.com or the [QL for Eclipse](https://lgtm.com/help/lgtm/running-queries-ide) plugin to try out your queries on any open-source project that's currently being analyzed.
## Contributing
@@ -13,4 +13,4 @@ We welcome contributions to our standard library and standard checks. Do you hav
## License
The LGTM queries are licensed under [Apache License 2.0](LICENSE) by [Semmle](https://semmle.com).
The QL queries in this repository are licensed under [Apache License 2.0](LICENSE) by [Semmle](https://semmle.com).
| Array argument size mismatch (`cpp/array-arg-size-mismatch`) | reliability | Finds function calls where the size of an array being passed is smaller than the array size of the declared parameter. Newly displayed on LGTM. |
| Lossy function result cast (`cpp/lossy-function-result-cast`) | correctness | Finds function calls whose result type is a floating point type, which are implicitly cast to an integral type. Newly available on LGTM but results not displayed by default. |
| Returning stack-allocated memory (`cpp/return-stack-allocated-memory`) | reliability, external/cwe/cwe-825 | Finds functions that may return a pointer or reference to stack-allocated memory. This query existed already but has been rewritten from scratch to make the error rate low enough for use on LGTM. Results displayed by default. |
| Use of string copy function in a condition (`cpp/string-copy-return-value-as-boolean`) | correctness | This query identifies calls to string copy functions used in conditions, where it's likely that a different function was intended to be called. Results are displayed by default on LGTM. |
| Array argument size mismatch (`cpp/array-arg-size-mismatch`) | Fewer false positive results | An exception has been added to this query for variable sized arrays. |
| Call to memory access function may overflow buffer (`cpp/overflow-buffer`) | More correct results | This query now recognizes calls to `RtlCopyMemoryNonTemporal` and `RtlSecureZeroMemory`. |
| Call to memory access function may overflow buffer (`cpp/overflow-buffer`) | More correct results | Calls to `fread` are now examined by this query. |
| Lossy function result cast (`cpp/lossy-function-result-cast`) | Fewer false positive results | The whitelist of rounding functions built into this query has been expanded. |
| Memory is never freed (`cpp/memory-never-freed`) | More correct results | Support for more Microsoft-specific memory allocation/de-allocation functions has been added. |
| Memory may not be freed (`cpp/memory-may-not-be-freed`) | More correct results | Support for more Microsoft-specific memory allocation/de-allocation functions has been added. |
| Mismatching new/free or malloc/delete (`cpp/new-free-mismatch`) | More correct results | Data flow through global variables for this query has been improved. |
| 'new[]' array freed with 'delete' (`cpp/new-array-delete-mismatch`) | More correct results | Data flow through global variables for this query has been improved. |
| 'new' object freed with 'delete[]' (`cpp/new-delete-array-mismatch`) | More correct results | Data flow through global variables for this query has been improved. |
| Potential buffer overflow (`cpp/potential-buffer-overflow`) | Deprecated | This query has been deprecated. Use Potentially overrunning write (`cpp/overrunning-write`) and Potentially overrunning write with float to string conversion (`cpp/overrunning-write-with-float`) instead. |
| Resource not released in destructor (`cpp/resource-not-released-in-destructor`) | Fewer false positive results | The query no longer highlights code that releases a resource via a virtual method call, function pointer, or lambda. |
| Returning stack-allocated memory (`cpp/return-stack-allocated-memory`) | More correct results | Many more stack allocated expressions are now recognized. |
| Suspicious add with sizeof (`cpp/suspicious-add-sizeof`) | Fewer false positive results | Pointer arithmetic on `char * const` expressions (and other variations of `char *`) are now correctly excluded from the results. |
| Suspicious pointer scaling (`cpp/suspicious-pointer-scaling`) | Fewer false positive results | False positive results involving types that are not uniquely named in the snapshot have been fixed. |
| Unused static variable (`cpp/unused-static-variable`) | Fewer false positive results | Variables with the attribute `unused` are now excluded from the query. |
| Use of inherently dangerous function (`cpp/potential-buffer-overflow`) | Cleaned up | This query no longer catches uses of `gets`, and has been renamed 'Potential buffer overflow'. |
| Use of potentially dangerous function (`cpp/potentially-dangerous-function`) | More correct results | This query now catches uses of `gets`. |
## Changes to QL libraries
* The `semmle.code.cpp.dataflow.DataFlow` library now supports _definition by reference_ via output parameters of known functions.
* Data flows through `memcpy` and `memmove` by default.
* Custom flow into or out of arguments assigned by reference can be modeled with the new class `DataFlow::DefinitionByReferenceNode`.
* The data flow library adds flow through library functions that are modeled in `semmle.code.cpp.models.interfaces.DataFlow`. Queries can add subclasses of `DataFlowFunction` to specify additional flow.
* There is a new `Namespace.isInline()` predicate, which holds if the namespace was declared as `inline namespace`.
* The `Expr.isConstant()` predicate now also holds for _address constant expressions_, which are addresses that will be constant after the program has been linked. These address constants do not have a result for `Expr.getValue()`.
* There are new `Function.isDeclaredConstexpr()` and `Function.isConstexpr()` predicates. They can be used to tell whether a function was declared as `constexpr`, and whether it actually is `constexpr`.
* There is a new `Variable.isConstexpr()` predicate. It can be used to tell whether a variable is `constexpr`.
| Clear text storage of sensitive information (`cs/cleartext-storage-of-sensitive-information`) | More results | Now includes data sources for user controls in `System.Windows.Forms`. |
| Dereferenced variable is always null (`cs/dereferenced-value-is-always-null`) | Improved results | The query has been rewritten from scratch, and the analysis is now based on static single assignment (SSA) forms. Results are now shown by default in LGTM. |
| Dereferenced variable may be null (`cs/dereferenced-value-may-be-null`) | Improved results | The query has been rewritten from scratch, and the analysis is now based on static single assignment (SSA) forms. Results are now shown by default in LGTM. |
| Double-checked lock is not thread-safe (`cs/unsafe-double-checked-lock`) | Fewer false positive and more true positive results | No longer highlights code where the underlying field was not updated in the `lock` statement, or where the field is a `struct`. Results have been added where there are other statements inside the `lock` statement. |
| Exposure of private information (`cs/exposure-of-sensitive-information`) | More results | Now includes data sources for user controls in `System.Windows.Forms`. |
| Improper control of generation of code (`cs/code-injection`) | More results | Now includes data sources for user controls in `System.Windows.Forms`. |
| Off-by-one comparison against container length (`cs/index-out-of-bounds`) | Fewer false positive results | No longer reports results when there are additional guards on the index. |
| SQL query built from user-controlled sources (`cs/sql-injection`) | More results | Now includes data sources for user controls in `System.Windows.Forms`. |
| Uncontrolled format string (`cs/uncontrolled-format-string`) | More results | Now includes data sources for user controls in `System.Windows.Forms`. |
| Unused format argument (`cs/format-argument-unused`) | Fewer false positive results | No longer reports results where the format string is empty. This is often used as a default value and is not an interesting result. |
| Use of default ToString() (`cs/call-to-object-tostring`) | Fewer false positive results | No longer reports results for `char` arrays passed to `StringBuilder.Append()`, which were incorrectly marked as using `ToString`. |
| Use of default ToString() (`cs/call-to-object-tostring`) | Fewer results | No longer reports results when the object is an interface or an abstract class. |
| Using a package with a known vulnerability (`cs/use-of-vulnerable-package`) | More results | This query detects packages vulnerable to CVE-2019-0657. |
## Changes to code extraction
* Fix extraction of `for` statements where the condition declares new variables using `is`.
* Initializers of `stackalloc` arrays are now extracted.
## Changes to QL libraries
* The class `TrivialProperty` now includes library properties determined to be trivial using CIL analysis. This may increase the number of results for all queries that use data flow.
* Taint-tracking steps have been added for the `Json.NET` package. This will improve results for queries that use taint tracking.
* Support has been added for EntityFrameworkCore, including
- Stored data flow sources
- Sinks for SQL expressions
- Data flow through fields that are mapped to the database
* Support has been added for NHibernate-Core, including
- Stored data flow sources
- Sinks for SQL expressions
- Data flow through fields that are mapped to the database
| Double-checked locking is not thread-safe (`java/unsafe-double-checked-locking`) | reliability, correctness, concurrency, external/cwe/cwe-609 | Identifies wrong implementations of double-checked locking that does not use the `volatile` keyword. |
| Race condition in double-checked locking object initialization (`java/unsafe-double-checked-locking-init-order`) | reliability, correctness, concurrency, external/cwe/cwe-609 | Identifies wrong implementations of double-checked locking that performs additional initialization after exposing the constructed object. |
| Arbitrary file write during archive extraction ("Zip Slip") (`java/zipslip`) | Fewer false positive results | Results involving a sanitization step that converts a destination `Path` to a `File` are no longer reported. |
| Result of multiplication cast to wider type (`java/integer-multiplication-cast-to-long`) | Fewer results | Results involving conversions to `float` or `double` are no longer reported, as they were almost exclusively false positives. |
## Changes to QL libraries
* The deprecated library `semmle.code.java.security.DataFlow` has been removed.
Improved data flow libraries have been available in
`semmle.code.java.dataflow.DataFlow`,
`semmle.code.java.dataflow.TaintTracking`, and
`semmle.code.java.dataflow.FlowSources` since 1.16.
* Taint tracking now includes additional default data-flow steps through
collections, maps, and iterators. This affects all security queries, which
can report more results based on such paths.
* The `FlowSources` and `TaintTracking` libraries are extended to cover additional remote user
input and taint steps from the following frameworks: Guice, Protobuf, Thrift and Struts.
This affects all security queries, which may yield additional results on projects
* File classification now recognizes additional generated files, for example, files from [HTML Tidy](html-tidy.org).
* The taint tracking library now recognizes flow through persistent storage, class fields, and callbacks in certain cases. Handling of regular expressions has also been improved. This may give more results for the security queries.
* Type inference for function calls has been improved. This may give additional results for queries that rely on type inference.
* The [Closure-Library](https://github.com/google/closure-library/wiki/goog.module:-an-ES6-module-like-alternative-to-goog.provide) module system is now supported.
| Arbitrary file write during archive extraction ("Zip Slip") (`js/zipslip`) | security, external/cwe/cwe-022 | Identifies extraction routines that allow arbitrary file overwrite vulnerabilities, indicating a possible violation of [CWE-022](https://cwe.mitre.org/data/definitions/22.html). Results are shown on LGTM by default. |
| Arrow method on Vue instance (`js/vue/arrow-method-on-vue-instance`) | reliability, frameworks/vue | Highlights arrow functions that are used as methods on Vue instances. Results are shown on LGTM by default.|
| Cross-window communication with unrestricted target origin (`js/cross-window-information-leak`) | security, external/cwe/201, external/cwe/359 | Highlights code that sends potentially sensitive information to another window without restricting the receiver window's origin, indicating a possible violation of [CWE-201](https://cwe.mitre.org/data/definitions/201.html). Results are shown on LGTM by default. |
| Double escaping or unescaping (`js/double-escaping`) | correctness, security, external/cwe/cwe-116 | Highlights potential double escaping or unescaping of special characters, indicating a possible violation of [CWE-116](https://cwe.mitre.org/data/definitions/116.html). Results are shown on LGTM by default. |
| Incomplete regular expression for hostnames (`js/incomplete-hostname-regexp`) | correctness, security, external/cwe/cwe-020 | Highlights hostname sanitizers that are likely to be incomplete, indicating a violation of [CWE-020](https://cwe.mitre.org/data/definitions/20.html). Results are shown on LGTM by default.|
| Incomplete URL substring sanitization | correctness, security, external/cwe/cwe-020 | Highlights URL sanitizers that are likely to be incomplete, indicating a violation of [CWE-020](https://cwe.mitre.org/data/definitions/20.html). Results shown on LGTM by default. |
| Incorrect suffix check (`js/incorrect-suffix-check`) | correctness, security, external/cwe/cwe-020 | Highlights error-prone suffix checks based on `indexOf`, indicating a potential violation of [CWE-20](https://cwe.mitre.org/data/definitions/20.html). Results are shown on LGTM by default. |
| Loop iteration skipped due to shifting (`js/loop-iteration-skipped-due-to-shifting`) | correctness | Highlights code that removes an element from an array while iterating over it, causing the loop to skip over some elements. Results are shown on LGTM by default. |
| Unused property (`js/unused-property`) | maintainability | Highlights properties that are unused. Results are shown on LGTM by default. |
| Useless comparison test (`js/useless-comparison-test`) | correctness | Highlights code that is unreachable due to a numeric comparison that is always true or always false. Results are shown on LGTM by default. |
| Ambiguous HTML id attribute | Fewer false positive results | This query now treats templates more conservatively. Its precision has been revised to 'high'. |
| Assignment to exports variable | Fewer results | This query no longer flags code that is also flagged by the query "Useless assignment to local variable". |
| Client-side cross-site scripting | More true positive and fewer false positive results. | This query now recognizes WinJS functions that are vulnerable to HTML injection. It no longer flags certain safe uses of jQuery, and recognizes custom sanitizers. |
| Hard-coded credentials | Fewer false positive results | This query no longer flags the empty string as a hardcoded username. |
| Insecure randomness | More results | This query now flags insecure uses of `crypto.pseudoRandomBytes`. |
| Reflected cross-site scripting | Fewer false positive results. | This query now recognizes custom sanitizers. |
| Stored cross-site scripting | Fewer false positive results. | This query now recognizes custom sanitizers. |
| Unbound event handler receiver (`js/unbound-event-handler-receiver`) | Fewer false positive results | Additional ways that class methods can be bound are now recognized. |
| Uncontrolled data used in network request | More results | This query now recognizes host values that are vulnerable to injection. |
| Uncontrolled data used in path expression | Fewer false positive results | This query now recognizes the Express `root` option, which prevents path traversal. |
| Unneeded defensive code | More true positive and fewer false positive results | This query now recognizes additional defensive code patterns. |
| Unsafe dynamic method access | Fewer false positive results | This query no longer flags concatenated strings as unsafe method names. |
| Unused parameter | Fewer false positive results | This query no longer flags parameters with leading underscore. |
| Unused variable, import, function or class | Fewer false positive results | This query now flags fewer variables that are implictly used by JSX elements. It no longer flags variables with a leading underscore and variables in dead code. |
| Unvalidated dynamic method call | More true positive results | This query now flags concatenated strings as unvalidated method names in more cases. |
| Useless assignment to property. | Fewer false positive results | This query now treats assignments with complex right-hand sides correctly. |
| Useless conditional | Fewer results | Additional defensive coding patterns are now ignored. |
| Useless conditional | More true positive results | This query now flags additional uses of function call values. |
## Changes to QL libraries
*`DataFlow::SourceNode` is no longer an abstract class; to add new source nodes, extend `DataFlow::SourceNode::Range` instead.
* Subclasses of `DataFlow::PropRead` are no longer automatically made source nodes; you now need to additionally define a corresponding subclass of `DataFlow::SourceNode::Range` to achieve this.
* The deprecated libraries `semmle.javascript.DataFlow` and `semmle.javascript.dataflow.CallGraph` have been removed; they are both superseded by `semmle.javascript.dataflow.DataFlow`.
* Overriding `DataFlow::InvokeNode.getACallee()` no longer affects the call graph seen by the interprocedural data flow libraries. To do this, the 1-argument version `getACallee(int imprecision)` can be overridden instead.
* The predicate `DataFlow::returnedPropWrite` was intended for internal use only and is no longer available.
The extractor now parses all Python code from a single unified grammar. This means that almost all Python code will be successfully parsed, even if mutually incompatible Python code is present in the same project. This also means that Python code for any version can be correctly parsed on a worker running any other supported version of Python. For example, Python 3.7 code is parsed correctly, even if the installed version of Python is only 3.5. This will reduce the number of syntax errors found in many projects.
### Regular expression analysis improvements
The Python `re` (regular expressions) module library has a couple of constants called `MULTILINE` and `VERBOSE` which determine the parsing of regular expressions. Python 3.6 changed the implementation of these constants, which resulted in false positive results for some queries. The relevant QL libraries have been updated to support both implementations which will remove false positive results from projects that use Python 3.6 and later versions.
### API improvements
The API has been improved to declutter the global namespace and improve discoverability and readability.
* New predicates `ModuleObject::named(name)` and `ModuleObject.attr(name)` have been added, allowing more readable access to common objects. For example, `(any ModuleObject m | m.getName() = "sys").getAttribute("exit")` can be replaced with `ModuleObject::named("sys").attr("exit")`
* The API for accessing builtin functions has been improved. Predicates of the form `theXXXFunction()`, such as `theLenFunction()`, have been deprecated in favor of `Object::builtin(name)`.
* A configuration based API has been added for writing data flow and taint tracking queries. This is provided as a convenience for query authors who have written data flow or taint tracking queries for other languages, so they can use a similar format of query across multiple languages.
| Default version of SSL/TLS may be insecure (`py/insecure-default-protocol`) | security, external/cwe/cwe-327 | Finds instances where an insecure default protocol may be used. Results are shown on LGTM by default. |
| Incomplete regular expression for hostnames (`py/incomplete-hostname-regexp`) | security, external/cwe/cwe-020 | Finds instances where a hostname is incompletely sanitized because a regular expression contains an unescaped character. Results are shown on LGTM by default. |
| Incomplete URL substring sanitization (`py/incomplete-url-substring-sanitization`) | security, external/cwe/cwe-020 | Finds instances where a URL is incompletely sanitized due to insufficient checks. Results are shown on LGTM by default. |
| Insecure temporary file (`py/insecure-temporary-file`) | security, external/cwe/cwe-377 | Finds uses of the insecure and deprecated `tempfile.mktemp`, `os.tempnam`, and `os.tmpnam` functions. Results are shown on LGTM by default. |
| Overly permissive file permissions (`py/overly-permissive-file`) | security, external/cwe/cwe-732 | Finds instances where a file is created with overly permissive permissions. Results are not shown on LGTM by default. |
| Use of insecure SSL/TLS version (`py/insecure-protocol`) | security, external/cwe/cwe-327 | Finds instances where a known insecure protocol has been specified. Results are shown on LGTM by default. |
| Comparison using is when operands support \_\_eq\_\_ (`py/comparison-using-is`) | Fewer false positive results | Results where one of the objects being compared is an enum member are no longer reported. |
| Modification of parameter with default (`py/modification-of-default-value`) | More true positive results | Instances where the mutable default value is mutated inside other functions are now also reported. |
| Mutation of descriptor in \_\_get\_\_ or \_\_set\_\_ method (`py/mutable-descriptor`) | Fewer false positive results | Results where the mutation does not occur when calling one of the `__get__`, `__set__` or `__delete__` methods are no longer reported. |
| Redundant comparison (`py/redundant-comparison`) | Fewer false positive results | Results in chained comparisons are no longer reported. |
| Unused import (`py/unused-import`) | Fewer false positive results | Results where the imported module is used in a `doctest` string are no longer reported. |
| Unused import (`py/unused-import`) | Fewer false positive results | Results where the imported module is used in a type-hint comment are no longer reported. |
## Changes to QL libraries
* Added support for the `dill` pickle library.
* Added support for the `bottle` web framework.
* Added support for the `CherryPy` web framework.
* Added support for the `falcon` web API framework.
* Added support for the `turbogears` web framework.
* Parallel extraction of JavaScript files (but not TypeScript files) on LGTM is now supported. If LGTM is configured to evaluate queries using multiple threads, then JavaScript files are also extracted using multiple threads.
* Experimental support for [E4X](https://developer.mozilla.org/en-US/docs/Archive/Web/E4X), a legacy language extension developed by Mozilla, is available.
* Additional [Flow](https://flow.org/) syntax is now supported.
* [Nullish Coalescing](https://github.com/tc39/proposal-nullish-coalescing) expressions are now supported.
* [TypeScript 3.2](https://www.typescriptlang.org/docs/handbook/release-notes/typescript-3-2.html) is now supported.
* The TypeScript extractor now handles the control flow of logical operators and destructuring assignments more accurately.
* @description An object larger than 64 bytes is passed by value to a function. Passing large objects by value unnecessarily use up scarce stack space, increase the cost of calling a function and can be a security risk. Use a pointer to the object instead.
* @description An object larger than 64 bytes is passed by value to a function. Passing large objects by value unnecessarily use up scarce stack space, increase the cost of calling a function and can be a security risk. Use a const pointer to the object instead.
* @kind problem
* @problem.severity warning
* @precision high
* @problem.severity recommendation
* @precision very-high
* @id cpp/large-parameter
* @tags efficiency
* readability
@@ -18,6 +18,7 @@ where f.getAParameter() = p
and t.getSize() = size
and size > 64
and not t.getUnderlyingType() instanceof ArrayType
and not f instanceof CopyAssignmentOperator
select
p, "This parameter of type $@ is " + size.toString() + " bytes - consider passing a pointer/reference instead.",
p, "This parameter of type $@ is " + size.toString() + " bytes - consider passing a const pointer/reference instead.",
* @description In an enumerator list, the = construct should not be used to explicitly initialize members other than the first, unless all items are explicitly initialized. An exception is the pattern to use the last element of an enumerator list to get the number of possible values.
* @description Function length should be limited to what can be printed on a single sheet of paper (60 lines). Number of parameters is limited to 6 or fewer.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.