There was unfortunately a semantic merge conflict between #3419 and
#3587 that caused a performance regression on (at least) OpenJDK.
This reverts commit 982fb38807, reversing
changes made to b841cacb83.
Replace most direct calls to `TRawInstruction()` with calls to `getInstructionTranslatedElement()` and `getInstructionTag()`, matching existing practice. One tiny RA diff in an inconsequential join order in `getInstructionVariable`.
This noopt predicate is no longer necessary. It's equivalent to `instruction = TRawInstruction(element, tag)`, which is already materialized and has a more favorable column order anyway.
These predicates are only used within the new single IR stage, so there's no need to cache them beyond that. RA diffs are trivial. Where previously many of the predicate on `Instruction` were inline wrappers around cached predicates from `IRConstruction`, now the predicates from `IRConstruction` get inlined into the `Instruction` predicates, and the `Instruction` predicates get materialized. The net amount of work is the same, but now it's not getting cached unnecessarily.
We were creating a `TranslatedFunction` even for functions that were not from source code, but then telling the IR package that those functions didn't have IR. This resulted in having prologue/epilogue instructions (e.g. `EnterFunction`, `ExitFunction`) with no enclosing `IRFunction`.
Some of our IR consistency failure query predicates already produced results in the schema as an `@kind problem` query, including `$@` replacements for the enclosing `IRFunction` to make it easier to figure out which function to dump when debugging. This change moves the rest of the query predicates in `IRConsistency.qll` to do the same. In addition, it wraps each call to `getEnclosingIRFunction()` to return an `OptionalIRFunction`, which can be either a real `IRFunction` or a placeholder in case `getEnclosingIRFunction()` returned no results. This exposes a couple new consistency failures in `syntax-zoo`, which will be fixed in a subsequent commit.
This change also deals with consistency failures when the enclosing `IRFunction` has more than one `Function` or `Location`. For multiple `Function`s, we concatenate the function names. For multiple `Location`s, we pick the first one in lexicographical order. This changes the number of results produced in the existing tests, but does't change the actual number of problems.
Generalize logic by recognizing not only calls to
`Microsoft.AspNetCore.Mvc.ViewFeatures.HtmlHelper.Raw()`, but calls to all `Raw()`
methods that implement `Microsoft.AspNetCore.Mvc.Rendering.IHtmlHelper.Raw()`.
Instead of a vague reference to a code comment for another language, the
`controlsBlock` predicate now has the whole comment in it directly.
I've adjusted the wording so it should be reasonably correct for C/C++.
As with the other comments in this file, I don't distinguish between the
condition and its block. I think that makes the explanation clearer
without losing any detail we care about.
To make the code fit the wording of the comment, I changed the
`hasBranchEdge/2` predicate into `getBranchSuccessor/1`.
On kamailio/kamailio the `DataFlowUtil::simpleInstructionLocalFlowStep`
predicate was slow because of the case for single-field structs, where
there was a large tuple-count bulge when joining with
`getFieldSizeOfClass`:
3552902 ~2% {2} r1 = SCAN Instruction::CopyInstruction::getSourceValueOperand_dispred#3#ff AS I OUTPUT I.<1>, I.<0>
2065347 ~2% {2} r35 = JOIN r1 WITH Operand::NonPhiMemoryOperand::getAnyDef_dispred#3#ff AS R ON FIRST 1 OUTPUT r1.<1>, R.<1>
2065827 ~2% {3} r36 = JOIN r35 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r35.<1>, r35.<0>
2065825 ~3% {3} r37 = JOIN r36 WITH Type::Type::getSize_dispred#ff AS R ON FIRST 1 OUTPUT r36.<1>, r36.<2>, R.<1>
2068334 ~2% {4} r38 = JOIN r37 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r37.<2>, r37.<0>, r37.<1>
314603817 ~0% {3} r39 = JOIN r38 WITH DataFlowUtil::getFieldSizeOfClass#fff_120#join_rhs AS R ON FIRST 2 OUTPUT r38.<3>, R.<2>, r38.<2>
8 ~0% {2} r40 = JOIN r39 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 2 OUTPUT r39.<2>, r39.<0>
That's 314M tuples.
Strangely, there is no such bulge on more well-behaved snapshots like
mysql/mysql-server.
With this commit the explosion is gone:
...
2065825 ~0% {4} r37 = JOIN r36 WITH Type::Type::getSize_dispred#ff AS R ON FIRST 1 OUTPUT r36.<0>, R.<1>, r36.<1>, r36.<2>
1521 ~1% {3} r38 = JOIN r37 WITH DataFlowUtil::getFieldSizeOfClass#fff_021#join_rhs AS R ON FIRST 2 OUTPUT r37.<2>, R.<2>, r37.<3>
8 ~0% {2} r39 = JOIN r38 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 2 OUTPUT r38.<0>, r38.<2>
The `controlsBlock` predicate had some dramatic bulges in its tuple
counts. To make matters worse, those bulges were in materialized
intermediate predicates like `#shared` and `#antijoin_rhs`, not just in
the middle of a pipeline.
The problem was particularly evident on kamailio/kamailio, where
`controlsBlock` was the slowest predicate in the IR libraries:
IRGuards::IRGuardCondition::controlsBlock_dispred#fff#shared#4 ........ 58.8s
IRGuards::IRGuardCondition::controlsBlock_dispred#fff#antijoin_rhs .... 33.4s
IRGuards::IRGuardCondition::controlsBlock_dispred#fff#antijoin_rhs#1 .. 26.7s
The first of the above relations had 201M rows, and the others
had intermediate bulges of similar size.
The bulges could be observed even on small projects although they did
not cause measurable performance issues there. The
`controlsBlock_dispred#fff#shared#4` relation had 3M rows on git/git,
which is a lot for a project with only 1.5M IR instructions.
This commit borrows an efficient implementation from Java's
`Guards.qll`, tweaking it slightly to fit into `IRGuards`. Performance
is now much better:
IRGuards::IRGuardCondition::controlsBlock_dispred#fff ................... 6.1s
IRGuards::IRGuardCondition::hasDominatingEdgeTo_dispred#ff .............. 616ms
IRGuards::IRGuardCondition::hasDominatingEdgeTo_dispred#ff#antijoin_rhs . 540ms
After this commit, the biggest bulge in `controlsBlock` is the size of
`IRBlock::dominates`. On kamailio/kamailio this is an intermediate tuple
count of 18M rows in the calculation of `controlsBlock`, which in the
end produces 11M rows.
The optimizer picked a terrible join order in `VirtualDispatch::DataSensitiveCall::flowsFrom()`. Telling it that `getAnOutNode()` has a unique result convinces it to join first on the `Callable`, rather than on the `ReturnKind`.
There wasn't a good join order for the "store to global var" case in the
virtual dispatch library. When a global variable had millions of
accesses but few stores to it, the `flowsFrom` predicate would join to
see all those millions of accesses before filtering down to stores only.
The solution is to pull out a `storeIntoGlobal` helper predicate that
pre-computes which accesses are stores.
To make the code clearer, I've also pulled out a repeated chunk of code
into a new `addressOfGlobal` helper predicate.
For the kamailio/kamailio project, these are the tuple counts before:
Starting to evaluate predicate DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta/3[3]@21a1df (iteration 3)
Tuple counts for DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta:
...
59002 ~0% {3} r17 = SCAN DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#prev_delta AS I OUTPUT I.<1>, true, I.<0>
58260 ~1% {3} r31 = JOIN r17 WITH DataFlowUtil::Node::asVariable_dispred#fb AS R ON FIRST 1 OUTPUT R.<1>, true, r17.<2>
2536187389 ~6% {3} r32 = JOIN r31 WITH Instruction::VariableInstruction::getASTVariable_dispred#fb_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r31.<2>
2536187389 ~6% {3} r33 = JOIN r32 WITH project#Instruction::VariableAddressInstruction#class#3#ff AS R ON FIRST 1 OUTPUT r32.<0>, true, r32.<2>
58208 ~0% {3} r34 = JOIN r33 WITH Instruction::StoreInstruction::getDestinationAddress_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r33.<2>
Tuple counts after:
Starting to evaluate predicate DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta/3[3]@6073c5 (iteration 3)
Tuple counts for DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta:
...
59002 ~0% {3} r17 = SCAN DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#prev_delta AS I OUTPUT I.<1>, true, I.<0>
58260 ~1% {3} r23 = JOIN r17 WITH DataFlowUtil::Node::asVariable_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, true, r17.<2>
58208 ~0% {3} r24 = JOIN r23 WITH DataFlowDispatch::VirtualDispatch::storeIntoGlobal#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r23.<2>
58208 ~0% {3} r25 = JOIN r24 WITH DataFlowUtil::InstructionNode#ff_10#join_rhs AS R ON FIRST 1 OUTPUT true, r24.<2>, R.<1>
Notice that the final tuple count, 58208, is the same before and after.
The kamailio/kamailio project seems to have been affected by this issue
because it has global variables to do with logging policy, and these
variables are loaded from in every place where their logging macro is
used.
The four cached predicates used to access common properties of instructions took a `TStageInstruction` as a parameter. This requires the calling code, in `Instruction.qll`, to then join the results with `hasInstruction()` to filter out results for `TRawInstruction`s that were discarded as unreachable. By simply switching the parameter types to `Instruction`, we can force that join to happen in the cached predicate itself. This makes the various accessor predicates on `Instruction` trivially inlinable to the cached predicate, instead of being joins of two huge relations that might have to be recomputed in later stages.
Most of the predicates on `Instruction` are thin wrappers around cached predicates in the `IRConstruction` or `SSAConstruction` modules. However, `getResultIRType()` has to join `Construction::getInstructionResultType()` with `LanguageType::getIRType()`. `getResultIRType()` is called frequently both within the IR code and by IR consumers, and that's a big join to have to repeat in multiple stages.
I looked at most of the other predicates in `Instruction.qll`, and didn't see any other predicates that met all of the criteria of "large, commonly called, and not already inline".
There were a few places in the IR itself where we use `Instruction.getResultType()`, which returns the C++ `Type` of the result, instead of `Instruction.getResultIRType()`, which returns the language-neutral `IRType` of the result. By removing this usage, we can avoid evaluating `getResultType()` at all.
There are still other uses of `Instruction.getResultType()` in other libraries. We should switch those as well.
Before this change, evaluation of the IR was spread out across about 5 stages. This resulted in a lot of redundant evaluation, especially tuple numbering of large IPA types like `TInstruction`. This change makes two small changes that, when combined, ensure that the IR is evaluated all in one stage:
First, we mark `TInstruction` as `cached`. This collapses all of the work to create instructions, across all three IR phases, into a single phase.
Second, we make the `SSA` module in `SSAConstruction.qll` just contain aliases to `cached` predicates defined in the `Cached` module. This ensures that all of the `Operand`-related SSA computation happens in the same stage as all of the `Instruction`-related SSA computation.
- Added experimental/Security/CWE/CWE-094/MvelInjection.ql
- Added experimental/Security/CWE/CWE-094/MvelInjectionLib.qll
- Added a qhelp file with an example of vulnerable code
- Added tests and stubs for mvel2-2.4.7
When running `sync-files` (or `sync-identical-files`) with the `--latest` switch, if one or more of the files in a group does not exist, the script will crash. This happens all the time when I add a new group, or add a new file path in an existing group. This has bothered me for a long time, so I finally fixed it when I ran into it again today.
I've changed the script as follows:
- If _none_ of the paths in the group exist, print an error message listing the paths in the group. This happens with or without `--latest`.
- If `--latest` is specified, copy the master file to the paths of the missing files.
This updates C#'s IR to share `TInstruction` across stages the same way C++ does. The only interesting part is that, since we have not yet ported full alias analysis to C#, I stubbed out the required parts of the aliased SSA interface in `AliasedSSAStub.qll`.
Making these part of the IPA object identity changes the failure mode for cases where we assign multiple result types to an instruction. Previously, we would just have one instruction with two result types, but now we'd have two instructions, which breaks things worse. This change goes back to how things were before, to avoid any new surprises on real-world code with invalid ASTs or IR.
The conflicts came from how `this` is now a parameter but not a
`Parameter` on `master`.
Conflicts:
cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll
cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/defaulttainttracking.cpp
cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/tainted.expected
cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/test_diff.expected
cpp/ql/test/library-tests/dataflow/dataflow-tests/dataflow-ir-consistency.expected
cpp/ql/test/library-tests/dataflow/fields/ir-flow.expected
cpp/ql/test/library-tests/syntax-zoo/dataflow-ir-consistency.expected
The taint-tracking configuration in `ExposureOfPrivateInformation.ql`
overlaps with the XSS taint-tracking configuration, as witnessed by this import chain:
```
semmle.code.csharp.security.dataflow.ExposureOfPrivateInformation.qll imports
semmle.code.csharp.security.dataflow.flowsinks.ExternalLocationSink imports
semmle.code.csharp.security.dataflow.flowsinks.Remote imports
semmle.code.csharp.security.dataflow.XSS
```
(The same for `CleartextStorage.qll` and `LogForging.ql`.)
The fix is to use `TaintTracking2` for the XSS configuration.
Now that `TInstruction` is shared between IR stages, several of the per-stage IR construction predicates can now be moved into the `Raw` interface exposed only by the initial construction of IR from the ASTs. This also removed a couple predicates that were not used previously at all.
Each stage of the IR reuses the majority of the instructions from previous stages. Previously, we've been wrapping each reused old instruction in a branch of the `TInstruction` type for the next stage. This causes use to create roughly three times as many `TInstruction` objects as we actually need.
Now that IPA union types are supported in the compiler, we can share a single `TInstruction` IPA type across stages. We create a single `TInstruction` IPA type, with individual branches of this type for instructions created directly from the AST (`TRawInstruction`) and for instructions added by each stage of SSA construction (`T*PhiInstruction`, `T*ChiInstruction`, `T*UnreachedInstruction`). Each stage then defines a `TStageInstruction` type that is a union of all of the branches that can appear in that particular stage. The public `Instruction` class for each phase extends the `TStageInstruction` type for that stage.
The interface that each stage exposes to the pyrameterized modules in the IR is now split into three pieces:
- The `Raw` module, exposed only by the original IR construction stage. This module identifies which functions have IR, which `TRawInstruction`s exist, and which `IRVariable`s exist.
- The `SSA` module, exposed only by the two SSA construction stages. This identifiers which `Phi`, `Chi`, and `Unreached` instructions exist.
- The global module, exposed by all three stages. This module has all of the predicates whose implementation is different for each stage, like gathering definitions of `MemoryOperand`s.
Similarly, there is now a single `TIRFunction` IPA type that is shared across all three stages. There is a single `IRFunctionBase` class that exposes the stage-indepdendent predicates; the `IRFunction` class for each stage extends `IRFunctionBase`.
Most of the other changes are largely mechanical.
- Added experimental/Security/CWE/CWE-094/SpelInjection.ql
and a couple of libraries
- Added a qhelp file with a few examples
- Added tests and stubs for Spring
This QL
```codeql
import python
import semmle.python.dataflow.TaintTracking
import semmle.python.security.strings.Untrusted
from CollectionKind ck
where
ck.(DictKind).getMember() instanceof StringKind
or
ck.getMember().(DictKind).getMember() instanceof StringKind
select ck, ck.getAQlClass(), ck.getMember().getAQlClass()
```
generates these 6 results.
```
1 {externally controlled string} ExternalStringDictKind UntrustedStringKind
2 {externally controlled string} StringDictKind UntrustedStringKind
3 [{externally controlled string}] SequenceKind ExternalStringDictKind
4 [{externally controlled string}] SequenceKind StringDictKind
5 {{externally controlled string}} DictKind ExternalStringDictKind
6 {{externally controlled string}} DictKind StringDictKind
```
StringDictKind was only used in *one* place in our library code. As illustrated
above, it pollutes our set of TaintKinds. Effectively, every time we make a
flow-step for dictionaries with tainted strings as values, we do it TWICE --
once for ExternalStringDictKind, and once for StringDictKind... that is just a
waste.
If I wanted to use my own TaintKind and not have any interaction with
`UntrustedStringKind` that wouldn't be possible today since these standard http
libraries import it directly. (also, I wouldn't get any sources of my custom
TaintKind from turbogears or bottle). I changed them to use the same pattern of
`ExternalStringKind` as everything else does.
While investigating a bug with `TInstruction` sharing, I discovered that we had a case where alias analysis could create two `VirtualVariable`s for the same `Allocation`. For an indirect parameter allocation, we were using the type of the pointer variable as the type of the indirect allocation, instead of just `Unknown`. If the `IRType` of the pointer variable was the same type as the type of at least one access to the indirect allocation, we'd create both an `EntireAllocationVirtualVariable` and a `VariableVirtualVariable` for the allocation.
I added a new consistency test to guard against this in the future. This also turned out to be the root cause of the one existing known consistency failure in the IR tests.
Currently, only `arguments[c]` for a constant value `c` is supported.
This allows us to detect the prototype-pollution vulnerabilities in (old versions of) `extend`, `jquery`, and `node.extend`.
All these queries have been deprecated since 2018. There is
unfortunately no way to deprecate a library, but it's been years since
we populated any databases using the VCS library, so nobody should be
using it.
I took most of the docs from the corresponding predicates in
JavaScript's `CodeDuplication.qll`. Where JavaScript had a corresponding
predicate but didn't have QLDoc, I added new QLDoc to both.
The `cpp/too-few-arguments` query produced alerts for ambiguous
databases where a function had multiple possible declarations, with some
declarations having the right number of parameters and some having too
many. With this change, the query errs on the side of caution in those
cases and does not produce an alert.
This fixes false positives on racket/racket.
The new `hasDefiniteNumberOfParameters` is exactly the negation of the
old `hasZeroParamDecl`.
I got my eyes on this one since it was using a deprecated method, BUT it was
also doing the thing, since File.getName() is the same as
File.getAbsolutePath(), and that doesn't match the description :\
In a near-future release overriding a deprecated predicate without making as
deprecated would give a compiler warning.
Not fixing the XML one. [I can see that this shouldn't be reported
anymore](https://github.com/github/codeql/pull/3520#issuecomment-631552943), and
it's not safe to remove since it was only marked as deprecated in
e6425bb4cf.
In a couple of cases, we use `glval<unknown>` as the result type of an instruction because we can't come up with anything better. Two examples are the result of `VariableAddress[#ellipsis]`, and the address of the temp variable that holds the lvalue result of the conditional operator in `(a ? b : c) = y`. In both cases, we call `getTypeForGLValue(any(UnknownType t))`, but that would have multiple results because `result.hasType(any(UnknownType t), true)` also holds for `CppFunctionGLValueType`. I tightened the result type to ensure we get the right one.
The QL compiler is about to be changed to emit a warning when overriding a deprecated predicate. This PR marks the existing overrides of deprecated predicates as `deprecated` themselves, which avoids the warning.
The `Print.qll` models seem to preserve the `isWideCharDefault()` predicate for backwards compatibility, so we can't remove them and must continue overriding them.
The `XML.qll` override is necessary because both superclasses declare the `getName()` predicate. One is `deprecated`, and the other is `abstract`, so we have to have an override.
These queries are currently run by default, but don't have their results displayed.
Looking through results on LGTM.com, they are either false positives (e.g., `BitwiseSignCheck` which flags many perfectly harmless operations and `CompareIdenticalValues` which mostly flags NaN checks) or harmless results that developers are unlikely to care about (e.g., `EmptyArrayInit` or `MisspelledIdentifier`).
With this PR, the only queries that are still run but not displayed are security queries, where different considerations may apply.
No evidence that Spring Actuators are being used, e.g. `http.authorizeRequests().anyRequest().permitAll()`
Only safe Actuators are enabled, e.g. `EndpointRequest.to("health", "info")`
The previous changes made the optimizer choose a bad join order for the RHS of the antijoin in `addressOperandAllocationAndOffset`. Once again, `unique` to the rescue.
After #3382 changed the escape analysis to model qualifiers as escaping,
there was an imbalance in the SSA library, where `addressTakenVariable`
excludes variables from SSA analysis if they have their address taken
but are _not_ passed by reference. This showed up as a missing result in
`TOCTOUFilesystemRace.ql`, demonstrated with a test case in #3432.
This commit changes the definition of "pass by reference" to include
call qualifiers, which allows SSA modeling of variables that have member
function calls on them.
Adding a new test case leads to changes in all `.expected` files in its
directory.
The new results show that the `DefinitionsAndUses` library does not
model `std::addressof` correctly, but that library is not intended to be
used for new code.
I didn't add this support in `AddressConstantExpression.qll` since I
think it would require extra work and testing to get the constexprness
right. My long-term plan for `AddressConstantExpression.qll` is to move
its functionality to the extractor.
It does so by first defining what a pointer dereference is (on the IR
`Instruction` level), and then using the array length analysis and the range
analysis together to prove that some of these pointer dereferences are safe.
The class org.apache.commons.io.serialization.ValidatingObjectInputStream
is an implementation of ObjectInputStream that validates the deserialized
classes against a white list. Therefore, this class should not be considered an
unsafe deserialization sink.
These already have results for BoundMethodValue, although
1) it's a bit strange that `getParameter(-1)` has results
2) why does `Method(Function C.n, class C)` exists? this would only be relevant
if `n` was a classmethod, but it isn't. It's not a problem that it exsits per
se, but curious.
The `fieldFlow` predicate contained a fragile join that has become
ordered wrong recently, either as result of an unrelated change in the
data-flow library or as part of the stats change for the last dbscheme
change.
The minimal fix is to use `getEnclosingCallable` instead of
`getFunction` since the former uses `unique` to ensure good join
ordering in its callers. A longer-term fix should be applied to the AST
base libraries, but this will be invasive and require independent
testing.
Tuple counts on Wireshark before (cancelled after a few minutes):
(747s) Starting to evaluate predicate DataFlowUtil::localFlowStep#ff/2@bdba82
(848s) Tuple counts for DataFlowUtil::localFlowStep#ff:
1766640980 ~1% {2} r1 = JOIN DataFlowUtil::Node::getFunction_dispred#ff_10#join_rhs AS L WITH DataFlowUtil::Node::getFunction_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT L.<1>, R.<1>
1327 ~0% {2} r2 = JOIN r1 WITH project#DataFlowImplLocal::Configuration::hasFlow#fbb AS R ON FIRST 2 OUTPUT r1.<0>, r1.<1>
9691232 ~0% {2} r3 = DataFlowUtil::simpleLocalFlowStep#ff@staged_ext \/ r2
return r3
After:
(0s) Starting to evaluate predicate DataFlowUtil::localFlowStep#ff/2@a852a0
(0s) Tuple counts for DataFlowUtil::localFlowStep#ff:
49017 ~4% {3} r1 = JOIN project#DataFlowImplLocal::Configuration::hasFlow#fff AS L WITH DataFlowUtil::Node::getEnclosingCallable_dispred#ff AS R ON FIRST 1 OUTPUT L.<1>, R.<1>, R.<0>
42359 ~0% {2} r2 = JOIN r1 WITH DataFlowUtil::Node::getEnclosingCallable_dispred#ff AS R ON FIRST 2 OUTPUT r1.<2>, r1.<0>
9732264 ~0% {2} r3 = DataFlowUtil::simpleLocalFlowStep#ff@staged_ext \/ r2
return r3
Report an alert in _any_ `Web.config` file, as long as it does not have an
`X-Frame-Options` entry (as opposed to only reporting alerts when _all_
`Web.config` files lack the entry).
This gives a slight speedup, and I think it makes the code shorter and
clearer.
On Wireshark, the time from the beginning of the `IRBlock` stage until
just before evaluation of `getInstruction` drops from 44s to 34s.
Flow from a definition by reference of a field into its object was
working inconsistently and in a very syntax-dependent way. For a
function `f` receiving a reference, `f(a->x)` could propagate data back
to `a` via the _reverse read_ mechanism in the shared data-flow library,
but for a function `g` receiving a pointer, `g(&a->x)` would not work.
And `f((*a).x)` would not work either.
In all cases, the issue was that the shared data-flow library propagates
data backwards between `PostUpdateNode`s only, but there is no
`PostUpdateNode` for `a->x` in `g(&a->x)`. This pull request inserts
such post-update nodes where appropriate and links them to their
neighbors. In this exapmle, flow back from the output parameter of `g`
passes first to the `PostUpdateNode` of `&`, then to the (new)
`PostUpdateNode` of `a->x`, and finally, as a _reverse read_ with the
appropriate field projection, to `a`.
This case was added in dccc0f4db. The surrounding code has changed a lot
since then, and the case no longer seems to have an effect except to
create some dead ends and possibly cycles in the local flow graph.
Also clarify the docs on `Call` to decrease the likelyhood of such an
omission happening again.
The updated test reflects that `f1.operator()` lets the address of `f1`
escape from the caller.
For each pointer, we start tracking (starting from the allocation or an array declaration)
1) how long is the chunk of memory allocated
2) where the current pointer is in this chunk of memory.
This information might not always exist, but when it does, it is reliable.
Currently only works intraprocedurally.
We already modelled usage of the protected `MessageDigest(String algo)`
constructor as a crypto algorithm specification. For some reason we did
not model the more commonly used public `MessageDigest.getInstance` method.
I've added a `.vscode/extensions.json` file that will automatically recommend the CodeQL for Visual Studio Code extension to anyone who opens the repo in VS Code (without the extension already installed).
If you're developing one of the libraries that has muiltiple copies auto-generated by `sync-files.py`, you can now run `sync-files.py --latest` by going to the `Terminal | Run Task...` menu in VS Code and selecting the `Sync Identical Files` task. You can set a keyboard binding to run this task for quicker access.
Logic is hard, and I made a mistake inverting the formula for the second case, so the
predicate never held for a sizeExpr like sizeof(int)*sizeof(void).
Now, this case is correctly handled by the fallback.
Based on PR feedback. This will avoid a syntactic wart and make the
invocation do the right thing both with and without
`language[monotonicAggregates]`.
Since this test inheriently has `--max-import-depth=1`, by using six, we would
never look at the actual source-code of urllib.parse/urlparse and therefore the
test would never show if we understood the library code good enough that we
could propagate taint out-of-the-box.
All tests moved by one line... that is why the diff is so big
This yields more precise size information in a lot of the common cases of C allocation code,
as the common pattern malloc(count * sizeof(type)) is now understood.
We don't hyphenate "QL-library" and there were a few typos. Feel free to further revise this if I've changed the meaning too much.
As discussed separately, I was unable to raise this as a PR in GitHub.com and had to resort to a direct commit.
IR generation was not handling the special two-operand flavor of the `?:` operator that GCC supports as an extension. The extractor doesn't quite give us enough information to do this correctly (see github/codeql-c-extractor-team#67), but we can get pretty close.
About half of the code could be shared between the two-operand and three-operand flavors. The main differences for the two-operand flavor are:
1. The "then" operand isn't a child of the `ConditionalExpr`. Instead, we just reuse the original value of the "condition" operand, skipping any implicit cast to `bool` (see comment for rationale).
2. For the three-operand flavor, we generate the condition as control flow rather than the computation of a `bool` value, to avoid creating unnecessarily complicated branching. For the two-operand version, we just compute the value, since we have to reuse that value in the "then" branch anyway.
I've added IR tests for these new cases. I've also updated the expectations for `SignAnalysis.ql` based on the fix. @rdmarsh2, can you please double-check that these diffs look correct? I believe they do, but you're the range/sign analysis expert.
Includes a fairly exhaustive test case for arithmetic operations involving `_Complex` and/or `_Imaginary` types. Thanks to these new tests, I discovered that the extractor treats certain arithmetic operations on `_Imaginary` types as separate expression kinds, so I added support for those kinds in IR construction.
`getVariableType()` is used to compute the actual semantic type of a variable from its declared type. That's where we handle pointer and function decay for parameters, and it's also where we handle arrays of unknown bound initialized with an initializer of known bound.
Previously, even if neither of the above situations applied, the type that we returned was the `getUnspecifiedType()` of the variable. This meant that, for example, `const char* p` would be treated as `char *`. This is inconsistent with how we handle types elsewhere in IR construction, where we preserve typedefs and cv-qualifiers when creating the `CppType` of an `IRVariable`, `Instruction`, or `Operand`.
The only visible effect this fix has is to fix the inferred result type for `Phi` instructions for variables affect by this change in `getVariableType()` behavior. Previously, we would see the variable accessed as both `const char*` and as `char*`, so we'd fall back to the canonical pointer type, which is `decltype(nullptr)`. Now, we see the same type for all accesses to the variable, so we use that type as the type of the SSA memory location and as the result type of the `Phi` instruction.
When `PrintSSA.qll` is imported, IR dumps will be annotated with the alias analysis information used during SSA construction. When printing this information, we incorrectly treated instructions at offset -1, which should only be `Phi` instructions, as `Chi` instructions for the instruction at offset 0. This produced phantom annotations, but did not affect the correctness of the actual IR.
The IR generation for `InitializeIndirection` currently connects its load operand to the result of the corresponding `InitializeParameter` instruction. This isn't exactly wrong, but it doesn't fit the IR invariant of "All unmodeled uses consume `UnmodeledDefinition`". Our current code doesn't care, because we just throw away all of the existing def-use information, modeled or otherwise, when we build unaliased SSA. However, some upcoming SSA changes don't work correctly if this invariant is broken.
I've added the trivial IR generation change, along with a new sanity query.
When the extractor can't prove that control flow will never reach the end of a non-`void`-returning function without reaching an explicit `return` statement, it inserts an implicit `return` without an operand. If control actually reaches this point, the behavior is undefined.
We were previously generating invalid IR for these implicit `return` statements, because the lack of an operand meant that there was no definition of the return value variable along that path. Instead, I've changed the IR generation to emit an `Unreached` instruction for the implicit `return`. This ensures that we don't create a control flow edge from the end of the body to the function epilogue.
The change to the range analysis test avoids having that test depend on the previous bad IR behavior, while still preserving the original spirit of the test.
These classes/predicates are not used by anything in our codebase, and is using
deprecated classes/predicates, so I think it's safe to assume they should also
have been marked with the deprecated annotation.
Changes the QL compiler warnings with:
-WARNING: Type Configuration has been deprecated and may be removed in future (/home/rasmus/code/ql/python/ql/src/semmle/python/dataflow/TaintTracking.qll:663,50-63)
-WARNING: Type Configuration has been deprecated and may be removed in future (/home/rasmus/code/ql/python/ql/src/semmle/python/dataflow/TaintTracking.qll:666,19-32)
-WARNING: Type Configuration has been deprecated and may be removed in future (/home/rasmus/code/ql/python/ql/src/semmle/python/dataflow/TaintTracking.qll:671,19-32)
-WARNING: Type Configuration has been deprecated and may be removed in future (/home/rasmus/code/ql/python/ql/src/semmle/python/dataflow/TaintTracking.qll:733,16-39)
-WARNING: Type CustomPointsToAttribute has been deprecated and may be removed in future (/home/rasmus/code/ql/python/ql/src/semmle/python/types/Extensions.qll:181,28-51)
-WARNING: Type CustomPointsToFact has been deprecated and may be removed in future (/home/rasmus/code/ql/python/ql/src/semmle/python/types/Extensions.qll:155,60-78)
-WARNING: Type CustomPointsToFact has been deprecated and may be removed in future (/home/rasmus/code/ql/python/ql/src/semmle/python/types/Extensions.qll:159,19-37)
-WARNING: Type CustomPointsToFact has been deprecated and may be removed in future (/home/rasmus/code/ql/python/ql/src/semmle/python/types/Extensions.qll:41,33-51)
+WARNING: Type CustomPointsToFact has been deprecated and may be removed in future (/home/rasmus/code/ql/python/ql/src/semmle/python/types/Extensions.qll:41,44-62)
I was considering if this was actually something different than
Value.hasAttribute, and the names were just accidentially the same. But after
looking at the definition for Value, I'm happy about marking this as an
override (I did not test whether it was neede though):
```codeql
class Value extends TObject {
...
/** Holds if this value has the attribute `name` */
predicate hasAttribute(string name) { this.(ObjectInternal).hasAttribute(name) }
```
This workaround from `DataFlowUtil.qll` should be useful for any query
that selects an `Expr`. In particular, it's useful for IR data flow.
This commit does not include test changes.
Every data-flow node should have a unique enclosing function (_callable_
in the terminology of the data-flow library), but this was not evident
for the optimizer, and it led to a bad join order in `pathStep`. This
commit fixes the join order for C++ AST data flow. All other copies of
data flow seem to be fine.
These are the tuple counts for OpenJDK before this commit:
(231s) Tuple counts for DataFlowImplLocal::pathStep#fffff#cur_delta:
5882 ~0% {6} r1 = SCAN DataFlowImplLocal::PathNodeMid#class#ffffff#prev_delta AS I OUTPUT I.<2>, I.<0>, I.<1>, I.<3>, I.<4>, I.<5>
1063406780 ~0% {7} r2 = JOIN r1 WITH DataFlowImplCommon::CallContext::relevantFor_dispred#ff AS R ON FIRST 1 OUTPUT r1.<2>, R.<1>, r1.<1>, r1.<0>, r1.<3>, r1.<4>, r1.<5>
5882 ~1% {6} r3 = JOIN r2 WITH DataFlowUtil::Node::getFunction_dispred#ff AS R ON FIRST 2 OUTPUT r2.<0>, r2.<6>, r2.<2>, r2.<3>, r2.<4>, r2.<5>
105 ~0% {5} r4 = JOIN r3 WITH project#DataFlowImplLocal::LocalFlowBigStep::localFlowBigStep#ffffff_021#join_rhs AS R ON FIRST 2 OUTPUT r3.<2>, r3.<3>, r3.<4>, r3.<5>, R.<2>
5882 ~1% {6} r5 = JOIN r2 WITH DataFlowUtil::Node::getFunction_dispred#ff AS R ON FIRST 2 OUTPUT r2.<5>, r2.<2>, r2.<0>, r2.<3>, r2.<4>, r2.<6>
5882 ~0% {6} r6 = JOIN r5 WITH DataFlowImplLocal::TNil#ff_1#join_rhs AS R ON FIRST 1 OUTPUT r5.<2>, false, r5.<5>, r5.<1>, r5.<3>, r5.<4>
0 ~0% {5} r7 = JOIN r6 WITH DataFlowImplLocal::LocalFlowBigStep::localFlowBigStep#ffffff_02413#join_rhs AS R ON FIRST 3 OUTPUT R.<4>, r6.<3>, r6.<4>, r6.<5>, R.<3>
0 ~0% {5} r8 = JOIN r7 WITH DataFlowImplLocal::TNil#ff AS R ON FIRST 1 OUTPUT r7.<1>, r7.<2>, r7.<3>, R.<1>, r7.<4>
105 ~0% {5} r9 = r4 \/ r8
The problem is that `DataFlowUtil::Node::getFunction_dispred#ff`
(`getEnclosingCallable`) is joined too late.
After this commit, the tuple counts look like this:
(13s) Tuple counts for DataFlowImplLocal::pathStep#fffff#cur_delta:
5882 ~1% {6} r1 = SCAN DataFlowImplLocal::PathNodeMid#class#ffffff#prev_delta AS I OUTPUT I.<1>, I.<0>, I.<2>, I.<3>, I.<4>, I.<5>
5882 ~3% {7} r2 = JOIN r1 WITH DataFlowUtil::Node::getEnclosingCallable_dispred#ff AS R ON FIRST 1 OUTPUT r1.<2>, R.<1>, r1.<1>, r1.<0>, r1.<3>, r1.<4>, r1.<5>
5882 ~1% {6} r3 = JOIN r2 WITH DataFlowImplCommon::CallContext::relevantFor_dispred#ff AS R ON FIRST 2 OUTPUT r2.<3>, r2.<6>, r2.<2>, r2.<0>, r2.<4>, r2.<5>
105 ~0% {5} r4 = JOIN r3 WITH project#DataFlowImplLocal::LocalFlowBigStep::localFlowBigStep#ffffff_021#join_rhs AS R ON FIRST 2 OUTPUT r3.<2>, r3.<3>, r3.<4>, r3.<5>, R.<2>
5882 ~1% {6} r5 = JOIN r2 WITH DataFlowImplCommon::CallContext::relevantFor_dispred#ff AS R ON FIRST 2 OUTPUT r2.<5>, r2.<2>, r2.<3>, r2.<0>, r2.<4>, r2.<6>
5882 ~0% {6} r6 = JOIN r5 WITH DataFlowImplLocal::TNil#ff_1#join_rhs AS R ON FIRST 1 OUTPUT r5.<2>, false, r5.<5>, r5.<1>, r5.<3>, r5.<4>
0 ~0% {5} r7 = JOIN r6 WITH DataFlowImplLocal::LocalFlowBigStep::localFlowBigStep#ffffff_02413#join_rhs AS R ON FIRST 3 OUTPUT R.<4>, r6.<3>, r6.<4>, r6.<5>, R.<3>
0 ~0% {5} r8 = JOIN r7 WITH DataFlowImplLocal::TNil#ff AS R ON FIRST 1 OUTPUT r7.<1>, r7.<2>, r7.<3>, R.<1>, r7.<4>
105 ~0% {5} r9 = r4 \/ r8
There is a slight slowdown coming from the introduction of a new
predicate `DataFlowImplLocal::pathStep#fffff#join_rhs`, which is used
only in the standard order:
(12s) Tuple counts for DataFlowImplLocal::pathStep#fffff#join_rhs:
282057 ~0% {2} r1 = SCAN DataFlowImplCommon::CallContext::relevantFor_dispred#ff AS I OUTPUT I.<1>, I.<0>
9159890 ~1% {2} r2 = JOIN r1 WITH DataFlowUtil::Node::getEnclosingCallable_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>
return r2
The evaluation of `unique` is cheap but not free:
DataFlowUtil::Node::getEnclosingCallable_dispred#ff .............. 3.9s
DataFlowUtil::Node::getEnclosingCallable_dispred#ff_10#join_rhs .. 3.5s
The first of these two predicates evaluates `unique`, and the second
simply reorders columns. They take about the same time, which suggests
that `unique` is about as fast as it can be, given the number of tuples
it needs to push around. Note that the column reordering predicate is
only needed because of the standard order.
This revert brings back the performance problems in
`DataFlowImplLocal.qll` so they can be fixed in a different way. The fix
in #2872 was asymptotically good but had undesired overhead because it
introduced another predicate in the SCC that existed purely for join
ordering.
I did the revert by inlining the helper predicate, eliminating the
`enclosing` variable, and re-ordering the resulting lines to what they
were before #2872.
This PR adds better support for differentiating complex and imaginary floating-point types from real floating-point types, in both the AST and in the IR type system.
*AST Changes*
- Introduces the new class `TypeDomain`, which can be either `RealDomain`, `ImaginaryDomain` or `ComplexDomain`. "type domain" is the term used for this concept in the C standard, and I couldn't think of a better one.
- Introduces `FloatingPointType.getDomain()`, to get the type domain of the type.
- Introduces `FloatingPointType.getBase()`, to get the numeric base of the type (either 2 or 10).
- Introduces three new subtypes of `FloatingPointType`: `RealNumberType`, `ComplexNumberType`, and `ImaginaryNumberType`, which differentiate between the types based on their type domain. Note that the decimal types (e.g., `_Decimal32`) are included in `RealNumberType`.
- Introduces two new subtypes of `FloatingPointType`: `BinaryFloatingPointType` and `DecimalFloatingPointType`, which differentiate between the types based on their numeric base, independent of type domain.
*IR Changes*
- `IRFloatingPointType` now has two additional parameters: the base and the type domain.
- New test that ensures that C++ types get mapped to the correct IR types.
- New IR test that verifies the IR for some basic usage of complex FP types.
With the new `unique` aggregate added to QL, we can express directly
what the "min = max" pattern emulates.
Replacing "min and max" with `unique` might in general lead to fewer
results, but that happens only in cases where the aggregate expression
has multiple values. For the three predicates changed in this commit,
that should only happen on malformed databases.
This code of conduct outlines expectations for participation in the Semmle open source community, including any open source repositories on GitHub.com, as well as steps for reporting unacceptable behavior. We are committed to providing a welcoming and inspiring community for all.
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.
People violating this code of conduct may be banned from the community.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
Our community strives to:
* Be friendly and patient: Remember you might not be communicating in someone else’s primary spoken or programming language, and others may not have your level of understanding.
* Be welcoming: Our community welcomes and supports people of all backgrounds and identities. This includes, but is not limited to members of any race, ethnicity, culture, national origin, color, immigration status, social and economic class, educational level, sex, sexual orientation, gender identity and expression, age, size, family status, political belief, religion, and mental and physical ability.
* Be respectful: We are a world-wide community of professionals, and we conduct ourselves professionally. Disagreement is no excuse for poor behavior and poor manners. Disrespectful and unacceptable behavior includes, but is not limited to:
* Violent threats or language.
* Discriminatory or derogatory jokes and language.
* Posting sexually explicit or violent material.
* Posting, or threatening to post, people’s personally identifying information (“doxing”).
* Insults, especially those using discriminatory terms or slurs.
* Behavior that could be perceived as sexual attention.
* Advocating for or encouraging any of the above behaviors.
* Understand disagreements: Disagreements, both social and technical, are useful learning opportunities. Seek to understand others’ viewpoints and resolve differences constructively.
## Our Standards
This code is not exhaustive or complete. It serves to capture our common understanding of a productive, collaborative environment. We expect the code to be followed in spirit as much as in the letter.
Examples of behavior that contributes to a positive environment for our
community include:
# Scope
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community
This code of conduct applies to all repositories and communities for Semmle open source projects, regardless of whether or not the repository explicitly calls out its use of this code. The code also applies in public spaces when an individual is representing the Semmle open source community. Examples include using an official project email address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
# Reporting Code of Conduct Issues
We encourage members of the community to resolve issues on their own whenever possible. This builds a broader and deeper understanding and ultimately a healthier interaction. In the event that an issue cannot be resolved locally, please feel free to report your concerns by contacting code-of-conduct@semmle.com.
In your report please include:
* Your contact information.
* Names (real, usernames or pseudonyms) of any individuals involved. If there are additional witnesses, please include them as well.
* Your account of what occurred, and if you believe the incident is ongoing. If there is a publicly available record (e.g. a mailing list archive or a public chat log), please include a link or attachment.
* Any additional information that may be helpful.
## Enforcement Responsibilities
All reports will be reviewed by a multi-person team and will result in a response that is deemed necessary and appropriate to the circumstances. Where additional perspectives are needed, the team may seek insight from others with relevant expertise or experience. The confidentiality of the person reporting the incident will be kept at all times. Involved parties are never part of the review team.
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Anyone asked to stop unacceptable behavior is expected to comply immediately. If an individual engages in unacceptable behavior, the review team may take any action they deem appropriate, including a permanent ban from the community.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
*This text is licensed under the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license. It is based on a template established by the [TODO Group](http://todogroup.org/) and variants thereof used by numerous other large communities (e.g., [Microsoft](https://microsoft.github.io/codeofconduct/), [Facebook](https://code.fb.com/codeofconduct/), [Yahoo](https://yahoo.github.io/codeofconduct), [Twitter](https://github.com/twitter/code-of-conduct), [GitHub](https://blog.github.com/2015-07-20-adopting-the-open-code-of-conduct/)) and the Scope section from the [Contributor Covenant version 1.4](http://contributor-covenant.org/version/1/4/).*
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
opensource@github.com.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
@@ -53,14 +53,6 @@ After the experimental query is merged, we welcome pull requests to improve it.
## Using your personal data
If you contribute to this project, we will record your name and email
address (as provided by you with your contributions) as part of the code
repositories, which are public. We might also use this information
to contact you in relation to your contributions, as well as in the
normal course of software development. We also store records of your
CLA agreements. Under GDPR legislation, we do this
on the basis of our legitimate interest in creating the CodeQL product.
Please do get in touch (privacy@github.com) if you have any questions about
this or our data protection policies.
If you contribute to this project, we will record your name and email address (as provided by you with your contributions) as part of the code repositories, which are public. We might also use this information to contact you in relation to your contributions, as well as in the normal course of software development. We also store records of CLA agreements signed in the past, but no longer require contributors to sign a CLA. Under GDPR legislation, we do this on the basis of our legitimate interest in creating the CodeQL product.
Please do get in touch (privacy@github.com) if you have any questions about this or our data protection policies.
This open source repository contains the standard CodeQL libraries and queries that power [LGTM](https://lgtm.com) and the other CodeQL products that [GitHub](https://github.com) makes available to its customers worldwide.
This open source repository contains the standard CodeQL libraries and queries that power [LGTM](https://lgtm.com) and the other CodeQL products that [GitHub](https://github.com) makes available to its customers worldwide. For the queries, libraries, and extractor that power Go analysis, visit the [CodeQL for Go repository](https://github.com/github/codeql-go).
## How do I learn CodeQL and run queries?
@@ -13,4 +13,16 @@ We welcome contributions to our standard library and standard checks. Do you hav
## License
The code in this repository is licensed under [Apache License 2.0](LICENSE) by [GitHub](https://github.com).
The code in this repository is licensed under the [MIT License](LICENSE) by [GitHub](https://github.com).
## Visual Studio Code integration
If you use Visual Studio Code to work in this repository, there are a few integration features to make development easier.
### CodeQL for Visual Studio Code
You can install the [CodeQL for Visual Studio Code](https://marketplace.visualstudio.com/items?itemName=GitHub.vscode-codeql) extension to get syntax highlighting, IntelliSense, and code navigation for the QL language, as well as unit test support for testing CodeQL libraries and queries.
### Tasks
The `.vscode/tasks.json` file defines custom tasks specific to working in this repository. To invoke one of these tasks, select the `Terminal | Run Task...` menu option, and then select the desired task from the dropdown. You can also invoke the `Tasks: Run Task` command from the command palette.
* The library `VCS.qll` and all queries that imported it have been removed.
* The data-flow library has been improved, which affects most security queries by potentially
adding more results. Flow through functions now takes nested field reads/writes into account.
For example, the library is able to track flow from `taint()` to `sink()` via the method
`getf2f1()` in
```c
struct C {
int f1;
};
struct C2
{
C f2;
int getf2f1() {
return f2.f1; // Nested field read
}
void m() {
f2.f1 = taint();
sink(getf2f1()); // NEW: taint() reaches here
}
};
```
* The security pack taint tracking library (`semmle.code.cpp.security.TaintTracking`) now considers that equality checks may block the flow of taint. This results in fewer false positive results from queries that use this library.
* The length of a tainted string (such as the return value of a call to `strlen` or `strftime` with tainted parameters) is no longer itself considered tainted by the `models` library. This leads to fewer false positive results in queries that use any of our taint libraries.
| Cross-site scripting through DOM (`js/xss-through-dom`) | security, external/cwe/cwe-079, external/cwe/cwe-116 | Highlights potential XSS vulnerabilities where existing text from the DOM is used as HTML. Results are not shown on LGTM by default. |
| Incomplete HTML attribute sanitization (`js/incomplete-html-attribute-sanitization`) | security, external/cwe/cwe-20, external/cwe/cwe-079, external/cwe/cwe-116 | Highlights potential XSS vulnerabilities due to incomplete sanitization of HTML meta-characters. Results are shown on LGTM by default. |
| Unsafe expansion of self-closing HTML tag (`js/unsafe-html-expansion`) | security, external/cwe/cwe-079, external/cwe/cwe-116 | Highlights potential XSS vulnerabilities caused by unsafe expansion of self-closing HTML tags. |
| Unsafe shell command constructed from library input (`js/shell-command-constructed-from-input`) | correctness, security, external/cwe/cwe-078, external/cwe/cwe-088 | Highlights potential command injections due to a shell command being constructed from library inputs. Results are shown on LGTM by default. |
| Client-side cross-site scripting (`js/xss`) | Fewer results | This query now recognizes additional safe patterns of constructing HTML. |
| Client-side URL redirect (`js/client-side-unvalidated-url-redirection`) | Fewer results | This query now recognizes additional safe patterns of doing URL redirects. |
| Code injection (`js/code-injection`) | More results | More potential vulnerabilities involving NoSQL code operators are now recognized. |
| Expression has no effect (`js/useless-expression`) | Fewer results | This query no longer flags an expression when that expression is the only content of the containing file. |
| Hard-coded credentials (`js/hardcoded-credentials`) | More results | This query now recognizes hard-coded credentials sent via HTTP authorization headers. |
| Incomplete URL scheme check (`js/incomplete-url-scheme-check`) | More results | This query now recognizes additional url scheme checks. |
| Misspelled variable name (`js/misspelled-variable-name`) | Message changed | The message for this query now correctly identifies the misspelled variable in additional cases. |
| Non-linear pattern (`js/non-linear-pattern`) | Fewer duplicates and message changed | This query now generates fewer duplicate alerts and has a clearer explanation in case of type annotations used in a pattern. |
| Prototype pollution in utility function (`js/prototype-pollution-utility`) | More results | This query now recognizes additional utility functions as vulnerable to prototype polution. |
| Uncontrolled command line (`js/command-line-injection`) | More results | This query now recognizes additional command execution calls. |
| Uncontrolled data used in path expression (`js/path-injection`) | More results | This query now recognizes additional file system calls. |
| Unknown directive (`js/unknown-directive`) | Fewer results | This query no longer flags directives generated by the Babel compiler. |
| Unused property (`js/unused-property`) | Fewer results | This query no longer flags properties of objects that are operands of `yield` expressions. |
| Zip Slip (`js/zipslip`) | More results | This query now recognizes additional vulnerabilities. |
The following low-precision queries are no longer run by default on LGTM (their results already were not displayed):
-`js/angular/dead-event-listener`
-`js/angular/unused-dependency`
-`js/bitwise-sign-check`
-`js/comparison-of-identical-expressions`
-`js/conflicting-html-attribute`
-`js/ignored-setter-parameter`
-`js/jsdoc/malformed-param-tag`
-`js/jsdoc/missing-parameter`
-`js/jsdoc/unknown-parameter`
-`js/json-in-javascript-file`
-`js/misspelled-identifier`
-`js/nested-loops-with-same-variable`
-`js/node/cyclic-import`
-`js/node/unused-npm-dependency`
-`js/omitted-array-element`
-`js/return-outside-function`
-`js/single-run-loop`
-`js/too-many-parameters`
-`js/unused-property`
-`js/useless-assignment-to-global`
## Changes to libraries
* A library `semmle.javascript.explore.CallGraph` has been added to help write queries for exploring the call graph.
* Added data flow for `Map` and `Set`, and added matching type-tracking steps that can accessed using the `CollectionsTypeTracking` module.
* The data-flow node representing a parameter or destructuring pattern is now always the `ValueNode` corresponding to that AST node. This has a few consequences:
-`Parameter.flow()` now gets the correct data flow node for a parameter. Previously this had a result, but the node was disconnected from the data flow graph.
-`ParameterNode.asExpr()` and `.getAstNode()` now gets the parameter's AST node, whereas previously it had no result.
-`Expr.flow()` now has a more meaningful result for destructuring patterns. Previously this node was disconnected from the data flow graph. Now it represents the values being destructured by the pattern.
* The global data-flow and taint-tracking libraries now model indirect parameter accesses through the `arguments` object in some cases, which may lead to additional results from some of the security queries, particularly "Prototype pollution in utility function".
* Importing `semmle.python.web.HttpRequest` will no longer import `UntrustedStringKind` transitively. `UntrustedStringKind` is the most commonly used non-abstract subclass of `ExternalStringKind`. If not imported (by one mean or another), taint-tracking queries that concern `ExternalStringKind` will not produce any results. Please ensure such queries contain an explicit import (`import semmle.python.security.strings.Untrusted`).
* Provides a library for working with local (intra-procedural) control-flow
* reachability involving stack variables.
*/
import cpp
/**
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.