The virtual-dispatch code for globals was missing any relationship
between the union field access and the global variable, which meant it
propagated function-pointer flow between any two fields of a global
struct. This resulted in false positives from
`cpp/tainted-format-string` on projects using SDL, such as
WohlSoft/PGE-Project.
In addition to fixing that bug, this commit also brings the code up to
date with the new style of modeling flow through global variables:
`DataFlow::Node.asVariable()`.
This test demonstrates that IR data flow conflates unrelated fields of a
global struct-typed variable and that this bug is not present in the old
AST-based implementation of `semmle.code.cpp.security.TaintTracking`.
This cleans up the test results, which were confusing because functions
like `sink` had multiple locations.
There are some additional results now involving casts to `const char *`
because previously it varied whether `sink` used `const`, and now it
always does.
This `select` clause had become very slow after we started caching
`ElementBase::toString` because the query used string concatenation to
produce alert messages, and those string concatenations were done very
early in the pipeline, producing lots of strings that would be discarded
moments later.
By using `$@` to interpolate elements into strings, the concatenation is
done outside of QL.
Testing on a Chromium snapshot, this commit takes us from
#select#ff ................ 6m2s
to
#select#cpe#134#fff ....... 15.2s
We don't hyphenate "QL-library" and there were a few typos. Feel free to further revise this if I've changed the meaning too much.
As discussed separately, I was unable to raise this as a PR in GitHub.com and had to resort to a direct commit.
(cherry picked from commit e29468135d)
This predicate is effectively a Cartesian product between all enum
types. It's infeasible to compute it in full, so luckily the optimizer
has been able to apply enough magic to make it feasible. That's not a
robust solution, and it has indeed broken on at least one version of the
1.24 release candidate.
On a Chromium snapshot where I ran the LGTM suite overnight, the
`m#MistypedFunctionArguments::arithTypesMatch#bb` predicate (magic for
`arithTypesMatch`) took 170m5s. That was commit b69fdf5 from the
internal repo. I tried to reproduce it in VSCode, this time with commit
646646, but it wasn't quite as bad: the predicate took only 38 seconds.
In any case, making the problematic predicate `pragma[inline]` removes
the slow magic and makes the `MistypedFunctionArguments.ql` query
faster.
- Merged the two notes for `cpp/uncontrolled-allocation-size` into one.
- Added note about renaming of a query id.
- Moved the use of IR in queries from the library section to the queries
section, rephrasing the note in terms of query results/performance
rather than library implementation.
- Grouped, without text changes, the three notes about the `Allocation`
library
- Grouped all the notes about standard-library models, abbreviating them
to eliminate the common text.
- Removed the note about `strlen` (#2647) since that should no longer
affect the results of queries or IR data flow after we started using
unsound IR for data flow.
Flow through partial chi-instruction operands was introduced to make
definition-by-reference work, but its implementation also allowed all
other partial writes to propagate. In particular, tainting a field would
taint the whole struct, which in turn led to taint propagating across
unrelated fields of a struct.
The security test `CWE-134/semmle/argv/argvLocal.c` shows that we also
want to propagate taint from an array element to the whole array, and it
also seems right to propagate taint from a union member to the whole
union.
For some reason I thought that these two queries were special because
they manipulate `SecurityOptions` to change the taint-tracking sources.
It turns out it was just the opposite: the queries used to be special
because they invalidated the cache for the `tainted` predicate, but that
predicate is no longer used, so these queries are no longer special.
This consistency check seems to have value for AST data flow, but I've
disabled it on the IR for now.
This commit also includes two unrelated changes that seem to fix a
semantic merge conflict.
How could the tests fail because of autoformatting, you may ask?
The answer is deprecation warnings. These specify the location of the deprecated
entity, and due to autoformatting these moved around.
We currently use a script to keep certain duplicate QL files in sync across the repo. For historical reasons, this script has lived in the private repo alongside the rest of CodeQL, even though it's only used for files in the public `ql` repo. This PR moves the script into the public `ql` repo. It is still invoked by Jenkins scripts that live in the private repo during CI, but it can also be invoked directly without having a checkout of the private repo. This is useful for anyone who is modifying the dataflow or IR libraries with only a QL checkout.
When running ql/python/ql/src/Security/CWE-079/ReflectedXss.ql against the
database for flask.
Iitially there were 10 million result-tuples for iterable_unpacking_descent.
With this change, we're down to roughly 2100,
I had included `InitializeNonLocal` in the recursion because it made
everything look better in the presence of a bug that's since been fixed.
Taking it out means the sanity test is again aligned with the old
`isChiForAllAliasedMemory`.
`Instruction.getDefinitionOverlap()` depends on `SSAConstruction::getMemoryOperandDefinition()`, which in turn depends on `SSAConstruction::hasMemoryOperandDefinition()`. When the definition in question came from a `Chi` instruction, `hasMemoryOperandDefinition()` incorrectly bound `overlap` to the overlap relationship between the original (non-`Chi`) instruction and the use. The fix is to make use of the `actualDefLocation` parameter to `getDefinitionOrChiInstruction()`, which specifies the location for the result of the `Chi` in that case.
The result of `getDefinitionOverlap()` should never be `MayPartiallyOverlap`, because if that were the case, we should have inserted as `Chi` instruction and hooked the definition up to that instead.
There are quite a few existing failures.
This predicate replaces `isChiForAllAliasedMemory`, which was always
intended to be temporary. A test is added to `IRSanity.qll` to verify
that the new predicate corresponds exactly with (a fixed version of) the
old one.
The implementation of the new predicate,
`Cached::hasConflatedMemoryResult` in `SSAConstruction.qll`, is faster
to compute than the old `isChiForAllAliasedMemory` because it uses
information that's readily available during SSA construction.
In the Unix ABI, `std::va_list` is defined as `typedef struct __va_list_tag { ... } va_list[1];`, which means that any `std::va_list` used as a function parameter decays to `struct __va_list_tag*`. Handling this actually made the QL code slightly cleaner. The only tricky bit is that we have to determine what type to use as the actual `va_list` type when loading, storing, or modifying a `std::va_list`. To do this, we look at the type of the argument to the `va_*` macro. A detailed QLDoc comment explains the details.
I added a test case for passing a `va_list` as an argument, and then manipulating that `va_list` in the callee.
This PR changes the IR we generate for functions that accept a variable argument list. Rather than simply using `BuiltInOperationInstruction` to model the various `va_*` macros as mysterious function-like operations, we now model them in more detail. The intent is to enable better alias analysis and taint flow through varargs.
The `va_start` macro now generates a unary `VarArgsStart` instruction that takes the address of the ellipsis pseudo-parameter as its operand, and returns a value of type `std::va_list`. This value is then stored into the actual `std::va_list` variable via a regular `Store`.
The `va_arg` macro now loads the `std::va_list` argument, then emits a `VarArg` instruction on the result. This returns the address of the vararg argument to be loaded. That address is later used as the address operand of a regular `Load` to return the value of the argument. To model the side effect of moving to the next argument, we emit a `NextVarArg` instruction that takes the previous `std::va_list` value and returns an updated one, which is then stored back into the `std::va_list` variable.
The `va_end` macro just emits a `VarArgsEnd` unary instruction that takes the address of the `std::va_list` argument and does nothing, since `va_end` doesn't really do anything on most compiler implementations anyway.
The `va_copy` macro is just modeled as a plain copy.
Since `runtimeExprInStaticInitializer` only looks at expressions at the
top level of an initializer or directly below some number of top-level
aggregate literals, there is no need for `inStaticInitializer` to
include expressions strictly below those in the AST.
I tested this on Wireshark, which has very large static initializers,
but found no measureable difference in run time. There are some
differences in tuple counts and iteration counts, though:
- `inStaticInitializer` changes from 6,241,153 rows (86 iterations) to
5,031,617 rows (7 iterations).
- `runtimeExprInStaticInitializer` changes from 386,350 rows to 4,705
rows.
- `hasDynamicInitialization` has 410 rows both before and after, which
suggests that this change does not affect results.
Even though there is no impact on this snapshot at this time, things
might look different if/when the restriction on aggregate literals to
100 children is removed in the extractor.
This change introduces a new synthesized `IRVariable` in every varargs function. This variable represents the entire set of arguments passed to the ellipsis by the caller. We give it an opaque type big enough hold all of the arguments passed by the largest vararg call in the database. It is treated just like any other parameter. It is initialized the same, it has indirect buffers, etc.
I had to introduce a couple new APIs to `Call` and `Function`. The QLDoc comments should explain these. I added tests for these new APIs as well.
The next step will be to change the IR generation for the `va_*` macros to manipulate the ellipsis parameter.
The spaceship (<=>) operator adds a new row to the C++ precendence
table. In preparation for that shift the necessary precedences up one
to create a suitable hole.
Note: In investigations I belive precedence 14 was not used. However,
in order to make review easier I have kept that gap.
- Introduce `ReadTaintNode` and `TaintStoreNode` to simplify logic for taint
getters and taint setters, respectively.
- `nodeCandFwd2`: Restrict `stored` column after a read, based on what it might
be before a store of the same field.
- `nodeCand2`: Restrict `read` column (renamed from `stored`) after a store, based
on what it might be after a read of the same field.
- Move big step predicates into a `LocalFlowBigStep` module.
- Define predicates by dispatch in `AccessPath[Front]` class.
- `flowCandFwd0`: Restrict `apf` column after a read, as it should be able to match
a Boolean `read` column from `nodeCand2`.
- `flowFwd0`: Restrict columns `ap` and `apf` after a read, by introducing a
`flowConsCandFwd` predicate (similar to what is done in the previous pruning steps).
- `flowFwd0`: Restrict columns `ap` and `apf` after a store, by introducing a
`flowConsCand` predicate (similar to what is done in the previous pruning steps).
Added a new `StaticLocalVariable` class, which made several other pieces of the original change a bit cleaner.
Fixed test failures due to a mistake in the original `CFG.qll` change.
Added a test case for static local variables with constructors.
Removed the `Uninitialized` instruction from the initialization of a static local, because all objects with static storage duration are zero-initialized at startup.
Fixed expectations for `SignAnalysis.ql` to reflect that a bad result is now fixed.
Previously, the IR for the initialization of a static local variable ran the initialization unconditionally, every time the declaration was reached during execution. This means that we don't model the possibility that an access to the static variable fetches a value that was set on a previous execution of the function.
I've added some simple modelling of the correct behavior to the IR. For each static local variable that has a dynamic initializer, we synthesize a (static) `bool` variable to hold whether the initializer for the original variable has executed. When executing a declaration, we check the value of the synthesized variable, and skip the initialization code if it is `true`. If it is `false`, we execute the initialization code as before, and then set the flag to `true`. This doesn't capture the thread-safe nature of static initialization, but I think it's more than enough to handle anything we're likely to care about for the foreseeable future.
In `TranslatedDeclarationEntry.qll`, I split the translation of a static local variable declaration into two `TranslatedElement`s: one for the declaration itself, and one for the initialization. The declaration part handles the checking and setting of the flag; the initialization just does the initialization as before.
I've added an IR test case that has static variables with constant, zero, and dynamic initialization. I've also verified the new IR generated for @jbj's previous test cases for constant initialization.
I inverted the sense of the `hasConstantInitialization()` predicate to be `hasDynamicInitialization()`. Mostly this just made more sense to me, but I think it also fixed a potential bug where `hasConstantInitialization()` would not hold for a zero-initialized variable. Technically, constant initialization isn't the same as zero initialization, but I believe that most code really cares about the distinction between dynamic initialization and static initialization, where static initialization includes both constant and zero initialization.
I've fixed up the C# side of IR generation to continue working, but it doesn't use any of the dynamic initialization stuff. In theory, it could use something similar to model the initialization of static fields.
For syntax errors, we simply report the major version.
For unused imports, we were getting a result for `typing.py` when run under
Python 3.7.3. To prevent this import from being considered, I've set the maximum
import depth to `0`.
The only Semmle references now left in the public Markdown files are in
URLs and in legal text. There are also two Semmle references left in
`docs/language/vale-styles/README.md` because I didn't understand them
well enough to change them.
Since this code is shared between the AST CFG and the IR construction,
it seems right to have only one copy. That copy lives on a new class
`StaticStorageDurationVariable`, which may prove useful on its own.
Obviously the result module shouldn't be a package 🤦 I was confusing
myself, since I wanted to say that `Module::named("Crypto.Cipher")` should be a package :D
The problem with the deleted code is that it would add flow to what might be an
unrelated ControlFlowNode, which is illustrated in the query below (that gives
results on flask)
from ControlFlowNode arg, CallNode call, CallNode other_call
where
call.getNode().getAKeyword().getValue() = arg.getNode() and
not call.getAnArg() = arg and
other_call.getAnArg() = arg and
not other_call = call
select call, arg, other_call
The problem here was that in the base case, there was no relationship between
`left_parent` and `right_parent`. These could be any two tuples or lists, even
if they were not part of an assignment statement.
To fix this, we add a bit of manual "magic", requiring that both of these
arguments must belong to the left and right-hand sides of the same assignment
statement.
(Note that this is in principle _still_ a gross overapproximation, but since
assignment statements are usually quite restricted in size, I don't expect this
to be a major problem.)
Having that module in `Instruction.qll` slowed down the parsing of that
file both humans and the compiler.
This commit moves the `InstructionSanity` module to `IRSanity.qll`
without making any changes to its contents apart from adding some
imports.
On Wireshark with 6GB RAM, I've observed `definitionReachesRank` to be
the slowest predicate in the IR. It seems that the implementation was
slow because the optimizer failed to eliminate the common
`reachesRank - 1` subexpression. This led to context being pushed into
the `not`, which got implemented as `MATERIALIZE`. That wouldn't
normally be a disaster, but this is one of the largest predicates in the
IR SSA construction, and iteration 2 was very slow.
Before:
(1505s) Starting to evaluate predicate SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta/4[1]@93f592 (iteration 1)
(1535s) Tuple counts for SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta:
130670697 ~0% {4} r1 = SCAN project#SSAConstruction::DefUse::hasDefinitionAtRank#fffff AS I OUTPUT I.<0>, I.<1>, I.<2>, (I.<2> + 1)
130670697 ~6% {5} r2 = JOIN r1 WITH SSAConstruction::DefUse::exitRank#fff AS R ON FIRST 2 OUTPUT r1.<0>, r1.<1>, r1.<2>, r1.<3>, R.<2>
130670697 ~6% {5} r3 = SELECT r2 ON r2.<3> <= r2.<4>
130670697 ~0% {4} r4 = SCAN r3 OUTPUT r3.<0>, r3.<1>, r3.<2>, r3.<3>
return r4
(1535s) - SSAConstruction::DefUse::definitionReachesRank#ffff_delta has 130670697 rows (order for disjuncts: delta=<standard>).
(1535s) Starting to evaluate predicate SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta/4[2]@866c14 (iteration 2)
(1626s) Tuple counts for SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta:
261341394 ~107% {4} r1 = JOIN SSAConstruction::DefUse::definitionReachesRank#ffff#prev_delta AS L WITH SSAConstruction::DefUse::definitionReachesRank#ffff#join_rhs AS R ON FIRST 3 OUTPUT R.<0>, R.<1>, R.<2>, (1 + L.<3>)
261341394 ~107% {4} r2 = r1 AND NOT SSAConstruction::DefUse::definitionReachesRank#ffff#prev AS R(r1.<0>, r1.<1>, r1.<2>, r1.<3>)
130670697 ~0% {5} r3 = SCAN r2 OUTPUT r2.<0>, r2.<1>, (r2.<3> - 1), r2.<2>, r2.<3>
106034590 ~1% {4} r4 = JOIN r3 WITH project#SSAConstruction::DefUse::hasDefinitionAtRank#fffff AS R ON FIRST 3 OUTPUT r3.<0>, r3.<1>, r3.<3>, r3.<4>
106034590 {4} r5 = MATERIALIZE r4 AS antijoin_rhs
24636107 ~3% {4} r6 = r2 AND NOT r5(r2.<0>, r2.<1>, r2.<2>, r2.<3>)
24636107 ~0% {5} r7 = JOIN r6 WITH SSAConstruction::DefUse::exitRank#fff AS R ON FIRST 2 OUTPUT r6.<0>, r6.<1>, r6.<2>, r6.<3>, R.<2>
2749441 ~0% {5} r8 = SELECT r7 ON r7.<3> <= r7.<4>
2749441 ~4% {4} r9 = SCAN r8 OUTPUT r8.<0>, r8.<1>, r8.<2>, r8.<3>
return r9
(1626s) - SSAConstruction::DefUse::definitionReachesRank#ffff_delta has 2749441 rows (order for disjuncts: delta=<standard>).
After:
(12s) Tuple counts for SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta:
130670697 ~0% {4} r1 = SCAN project#SSAConstruction::DefUse::hasDefinitionAtRank#fffff AS I OUTPUT I.<0>, I.<1>, I.<2>, (I.<2> + 1)
return r1
(12s) - SSAConstruction::DefUse::definitionReachesRank#ffff_delta has 130670697 rows (order for disjuncts: delta=<standard>).
(12s) Starting to evaluate predicate SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta/4[2]@fff64c (iteration 2)
(34s) Tuple counts for SSAConstruction::DefUse::definitionReachesRank#ffff#cur_delta:
108784031 ~0% {4} r1 = SSAConstruction::DefUse::definitionReachesRank#ffff#prev_delta AS L AND NOT SSAConstruction::DefUse::exitRank#fff AS R(L.<0>, L.<1>, L.<3>)
2749441 ~5% {4} r2 = r1 AND NOT project#SSAConstruction::DefUse::hasDefinitionAtRank#fffff AS R(r1.<0>, r1.<1>, r1.<3>)
2749441 ~4% {4} r3 = SCAN r2 OUTPUT r2.<0>, r2.<1>, r2.<2>, (r2.<3> + 1)
2749441 ~4% {4} r4 = r3 AND NOT SSAConstruction::DefUse::definitionReachesRank#ffff#prev AS R(r3.<0>, r3.<1>, r3.<2>, r3.<3>)
return r4
(34s) - SSAConstruction::DefUse::definitionReachesRank#ffff_delta has 2749441 rows (order for disjuncts: delta=<standard>).
Note that the row counts are exactly the same before and after.
The charpred of class `GVN` in `ASTValueNumbering.qll` got inlined into
the member predicate `getAnInstruction` and caused a tuple explosion on
Wireshark in the query `StrncpyFlippedArgs.ql`.
I interrupted the predicate after 10 minutes and got these intermediate
tuple counts:
(5208s) Tuple counts for ASTValueNumbering::GVN::getAnInstruction_dispred#ff:
8754900909 ~5% {3} r1 = JOIN ValueNumberingInternal::tvalueNumber#ff_10#join_rhs AS L WITH ValueNumberingInternal::tvalueNumber#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, L.<1>, L.<0>
4390274632 ~150085% {2} r2 = JOIN r1 WITH project#SSAConstruction::Cached::getInstructionUnconvertedResultExpression AS R ON FIRST 1 OUTPUT r1.<2>, r1.<1>
return r2
After this change, the `getAnInstruction` predicate is itself inlined,
like it should be. The new non-inlined charpred takes 2.1s and has these
tuple counts:
(2s) Tuple counts for ASTValueNumbering::GVN#f:
9158442 ~117% {1} r1 = JOIN project#SSAConstruction::Cached::getInstructionUnconvertedResultExpression AS L WITH ValueNumberingInternal::tvalueNumber#ff@staged_ext AS R ON FIRST 1 OUTPUT R.<1>
return r1
The code is now quadratic in the number of statements in a basic block,
whereas before it was quadratic in the number of _control-flow nodes_ in
a basic block.
The `iDominates` relation is directly on control-flow nodes, and its
transitive closure is far too large. It got compiled into a recursion
rather than `fastTC`, and I've observed that recursion to take about an
hour on a medium-size customer snapshot.
The fix is to check for dominance at the basic-block level.
Also rewrites `multi_assignment_points_to` to be a bit more readable.
I'm not entirely sure that we want an unknown instance of `object` rather than
just `UnknownInternal`. The latter gets filtered out in the characteristic
predicate for `Value`, though, so I opted for the slightly more permissive
variant.
The `cpp/missing-case-in-switch` performed badly on some snapshots, to
the extent where it was as slow as the most expensive IR stages
(example: ChakraCore). This commit makes it faster, removing a
`pragma[noopt]` along the way.
The intermediate tuple counts on a customer codebase drop from 84M to
3M, while the content hash of `getAMissingCase` is the same.
Before:
(124s) Tuple counts for Stmt::EnumSwitch::getAMissingCase#ff#antijoin_rhs:
20867789 ~0% {3} r1 = JOIN Stmt::SwitchStmt::getASwitchCase_dispred#ff AS L WITH Stmt::EnumSwitch::getAMissingCase#ff#shared AS R ON FIRST 1 OUTPUT L.<1>, R.<0>, R.<1>
20122830 ~0% {3} r2 = JOIN r1 WITH Stmt::SwitchCase::getExpr_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>, r1.<2>
20122830 ~0% {3} r3 = JOIN r2 WITH Expr::Expr::getValue_dispred#ff AS R ON FIRST 1 OUTPUT r2.<2>, r2.<1>, R.<1>
83961918 ~0% {4} r4 = JOIN r3 WITH Enum::EnumConstant::getInitializer_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r3.<1>, r3.<0>, r3.<2>
83961918 ~0% {4} r5 = JOIN r4 WITH initialisers AS R ON FIRST 1 OUTPUT R.<2>, r4.<3>, r4.<1>, r4.<2>
234348 ~185% {2} r6 = JOIN r5 WITH Expr::Expr::getValue_dispred#ff AS R ON FIRST 2 OUTPUT r5.<2>, r5.<3>
return r6
...
(124s) Tuple counts for Stmt::EnumSwitch::getAMissingCase#ff:
663127 ~4% {2} r1 = Stmt::EnumSwitch::getAMissingCase#ff#shared AS L AND NOT Stmt::EnumSwitch::getAMissingCase#ff#antijoin_rhs AS R(L.<0>, L.<1>)
return r1
(124s) Registering Stmt::EnumSwitch::getAMissingCase#ff + [] with content 2060ff326cvhihcsvoph6k9divuv4
(124s) >>> Wrote relation Stmt::EnumSwitch::getAMissingCase#ff with 663127 rows and 2 columns.
After:
(5s) Tuple counts for Stmt::EnumSwitch::getAMissingCase_dispred#ff#antijoin_rhs:
746029 ~0% {2} r1 = JOIN Stmt::EnumSwitch::getAMissingCase_dispred#ff#shared AS L WITH Enum::Enum::getAnEnumConstant_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, L.<1>
3116197 ~2% {3} r2 = JOIN r1 WITH Enum::EnumConstant::getInitializer_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>, r1.<0>
3116197 ~0% {3} r3 = JOIN r2 WITH initialisers AS R ON FIRST 1 OUTPUT R.<2>, r2.<1>, r2.<2>
3116197 ~311% {3} r4 = JOIN r3 WITH Expr::Expr::getValue_dispred#ff AS R ON FIRST 1 OUTPUT r3.<1>, R.<1>, r3.<2>
234348 ~185% {2} r5 = JOIN r4 WITH Stmt::EnumSwitch::matchesValue#ff AS R ON FIRST 2 OUTPUT r4.<0>, r4.<2>
return r5
(5s) Registering Stmt::EnumSwitch::getAMissingCase_dispred#ff#antijoin_rhs + [] with content 173483d71508vl534mvlr1g0ehi12
(5s) >>> Wrote relation Stmt::EnumSwitch::getAMissingCase_dispred#ff#antijoin_rhs with 82902 rows and 2 columns.
(5s) Starting to evaluate predicate Stmt::EnumSwitch::getAMissingCase_dispred#ff/2@ae4c0b
(5s) Tuple counts for Stmt::EnumSwitch::getAMissingCase_dispred#ff:
746029 ~2% {2} r1 = JOIN Stmt::EnumSwitch::getAMissingCase_dispred#ff#shared AS L WITH Enum::Enum::getAnEnumConstant_dispred#ff AS R ON FIRST 1 OUTPUT L.<1>, R.<1>
663127 ~4% {2} r2 = r1 AND NOT Stmt::EnumSwitch::getAMissingCase_dispred#ff#antijoin_rhs AS R(r1.<0>, r1.<1>)
return r2
(5s) Registering Stmt::EnumSwitch::getAMissingCase_dispred#ff + [] with content 2060ff326cvhihcsvoph6k9divuv4
(5s) >>> Wrote relation Stmt::EnumSwitch::getAMissingCase_dispred#ff with 663127 rows and 2 columns.
This turned out to be a fairly simple but easy to make bug. When we want to
figure out the value pointed-to in a multi-assignment, we look at the left hand
side to see what value from the right hand side we should assign. Unfortunately,
we accidentally attempted to look up this information in the _left hand side_ of
the assignment, resulting in no points-to information at all. The only thing
needed to fix this was to properly link up the left and right hand sides: using
the left hand side to figure out what index to look at, and then looking up the
points-to information for the corresponding place in the right hand side.
Move the stdlib tests from test/{2,3}/library-tests/ into /test/library-tests/,
and deal with version by using sys.version_info (results should be the same for
both versions).
six tests were moved from /library-tests/web/client/stdlib => /library-tests/web/client/six
To ease the rollout of this test, currently we only report missing points-to
information for nodes that either
- appear as an argument in a call to a function named `check`, or
- appear inside a scope where the first line is annotated with a comment ending
in "check".
The idea behind the second version is that once we have points-to running at a
level where no node inside a scope that _ought_ to have points-to is missing
this information, we can simply remove all uses of `check(...)` from inside this
scope, and annotate the entire scope with `# check`. Once this has been done for
the entire file, we can then remove all the comments and just require
_everything_ to be checked.
Note that I don't expect all nodes to have the need for points-to information.
For instance, there are nodes representing scope entry and exit, and for these
it doesn't make sense to require that they "point-to" anything. Similarly,
`NameNode` appearing in a "store" (i.e. as the left hand side of an assignment)
do not strictly need to have points-to information, although it might be more
intuitive if they did.
Thus, the `relevant_node` predicate will almost certainly need to be extended to
exclude these kinds of nodes.
Fixes issue https://github.com/github/codeql-c-analysis-team/issues/20
This change undoes the workaround in https://github.com/Semmle/ql/pull/2736, and replaces it with a fix for the underlying cause. The problem was that the IR construction code for side effects incorrectly assumed that `BufferAccessOpcode` included `SizedBufferAccessOpcode`. I think that was actually a perfectly reasonable assumption to make, so I changed the `Opcode` hierarchy to make it true.
Following the setup I invented for library-tests/taint/unpacking.
TestStep is still a bit annoying, since the output is not easy to eyeball; but
for now I guess we can live with it :)
I honestly didn't get the point of DistinctStringKinds.ql, other than showing we
can handle multiple taint kinds
This is a temporary fix!
Added minimal working example (MWE) as a regression, so it's easier to fix the
real problem.
only Python 3 is facing the problem -- and without --max-import-depth=1 the test
times out at 10 minutes :O
Introduced a regression, since the old code was:
```
predicate is_a_string_type(ClassObject seqtype) {
seqtype = theBytesType() and major_version() = 2
or
seqtype = theUnicodeType()
}
```
but *now* we're good!
+ Extend PropertyInternal.getSetter to handle non-decorator
+ Add PropertyInternal.getDeleter
It seems like a bit hacky way to do things, since we're not using the
PropertySetterOrDeleter class at all, but for now I'll leave it be.
e.(StrConst).isDocString() can only hold if e instanceof StrConst, since we have
that condition on the line above, we can safely remove this condition.
* Adds the...Type() predicates as foresight modernizations.
* Removes predicates that are not currently ported/portable
* Adds range types
* Update python/ql/src/semmle/python/objects/ObjectAPI.qll
Co-Authored-By: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
* Update python/ql/src/semmle/python/objects/ObjectAPI.qll
Co-Authored-By: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
* Swaps xType for just x, at least when it's new
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
The `isInCycle` predicate would take a long time on Wireshark with 6GB
RAM, sometimes OOMing in the fastTC HOP. Analyzing wireshark with 6GB is
important because that's the standard configuration on our Jenkins
workers. With this commit, I can analyze Wireshark with 6GB on my
laptop.
The `getNonPhiOperandDef` predicate on Wireshark is 34M tuples, while
`getDefIfHasNeighbors` is 11M tuples, and the TC of
`getDefIfHasNeighbors` is 23M tuples (487 MB).
When multiple results are available, we usually name the function
`getAnArgument` or `getASomething`. The support for django copied the way bottle
did things, so this commits cleans up both
Without this fix, running the full LGTM suite would get the IR evaluated
twice. That's because we have multiple IPA types and constructors with
the same name (like `TInstruction` and `MkIRFunction`), and the QL
compiler chooses how to disambiguate those names differently depending
on import order.
I've tested that the IR is only evaluated once now by running the whole
suite on a tiny project (jbj/magicrescue) and looking at the output of
perl -ne 'print if /^RESULTS IN:/ .. /^\[/ and not /^\[/' runSnapshotQueries-debug.log | sort |uniq -c |sort -n |less
Instructions that are removed from the normal value numbering recursion
because they have a duplicated type or AST element get unique value
numbers rather than going unnumbered. This ensures comparisons of value
numbers using `!=` hold for filtered instructions.
This entry in `.codeqlmanifest.json` was intended to allow
unpacking the CodeQL CLI as a subdirectory of `ql`, and things
would Just Work.
However, it is not necessary anymore because recent releases of
the CLI will search their own directory as a fallback
_independently_ of the parent directory.
On the contrary, removing this link will make internal testing
easier because you then run a test build of the CLI with
`--search-path` pointing to the `ql` checkout without inadvertently
making extractors in a _different_ build that is unpacked there visible.
On some snapshots, notably ffmpeg, the IR `ValueNumbering` recursion
would generate billions of tuples and eventually run out of space.
It turns out it was fairly common for an `Instruction` to get more than
one `ValueNumber` in the base cases for `VariableAddressInstruction` and
`InitializeParameterInstruction`, and it could also happen in an
instruction with more than one operand of the same `OperandTag`. When a
binary operation was applied to an instruction with `m` value numbers
and another instruction with `n` value numbers, the result would get
`m * n` value numbers. This led to doubly-exponential growth in the
number of value numbers in rare cases.
The underlying reason why a `VariableAddressInstruction` could get
multiple value numbers is that it was keyed on the associated
`IRVariable`, and the `IRVariable` is defined in part by the type of its
underlying `Variable` (or other AST element). If the extractor defines a
variable to have multiple types because of linker ambiguity, this leads
to the creation of multiple `IRVariable`s. That should ideally be solved
in `TIRVariable.qll`, but for now I've put a workaround in
`ValueNumberingInternal.qll` instead.
To remove the problem with instructions having multiple operands, the
construction in `Operand.qll` will now filter out any such operand. It
wasn't enough to apply that filter to the `raw` stage, so I've applied
it to all three stages.
Before I added `--max-import-depth=2`, there was a bit of trouble, where it
would alert on `from pkg_ok import foo2` -- since all the `pkg_ok.foo<n>`
modules were missing, I guess the analysis didn't make any assumptions on
whether `foo2` is a module or a regular attribute.
So I've been thinking a bit about import pkg_ok.foo1 after reading the Python
references for imports of submodules
https://docs.python.org/3/reference/import.html#submodules
> When a submodule is loaded using any mechanism (...) a binding is placed in the
parent module’s namespace to the submodule object. For example, if package spam
has a submodule foo, after importing spam.foo, spam will have an attribute foo
which is bound to the submodule.
That does at least explain what is going on here.
I feel that import pkg_ok.foo1 might be a very contrived example. In principle
it should be an alert, since the module pkg_ok ends up with an import of itself,
but my gut feeling is that in practice it's not a very important piece of code
to give alerts for. if we really care about giving these import related alerts,
we could probably add a new query for this pattern, as it's kind of surprising
that it works when you're just an ordinary python programmer.
This predicate got an unfortunate join order, leading to these tuple
counts on ElektraInitiative/libelektra:
(290s) Tuple counts for AliasedSSA::VariableMemoryLocation::coversEntireVariable_dispred#f:
57117 ~0% {3} r1 = SCAN IRType::IRType::getByteSize_dispred#ff AS I OUTPUT 0, (I.<1> * 8), I.<0>
421445272 ~0% {3} r2 = JOIN r1 WITH AliasedSSA::VariableMemoryLocation#fffffff_5601#join_rhs AS R ON FIRST 2 OUTPUT R.<3>, r1.<2>, R.<2>
103282 ~2% {1} r3 = JOIN r2 WITH AliasConfiguration::Allocation::getIRType_dispred#ff AS R ON FIRST 2 OUTPUT r2.<2>
return r3
With this commit, we get these tuple counts instead:
(0s) Tuple counts for AliasedSSA::VariableMemoryLocation::varIRTypeHasBitRange#bff:
361874 ~0% {3} r1 = SCAN AliasedSSA::VariableMemoryLocation#fffffff AS I OUTPUT I.<1>, 0, I.<0>
361874 ~0% {3} r2 = JOIN r1 WITH AliasConfiguration::Allocation::getIRType_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, 0, r1.<2>
361874 ~1% {3} r3 = JOIN r2 WITH IRType::IRType::getByteSize_dispred#ff AS R ON FIRST 1 OUTPUT r2.<2>, 0, (R.<1> * 8)
return r3
(0s) Tuple counts for AliasedSSA::VariableMemoryLocation::coversEntireVariable_dispred#f:
103282 ~2% {1} r1 = JOIN AliasedSSA::VariableMemoryLocation#fffffff_056#join_rhs AS L WITH AliasedSSA::VariableMemoryLocation::varIRTypeHasBitRange#bff AS R ON FIRST 3 OUTPUT L.<0>
103282 ~2% {1} r2 = STREAM DEDUP r1
return r2
We had these tuple counts on ElektraInitiative/libelektra (note that the
`modelFlow` predicate got inlined into
`simpleInstructionLocalFlowStep`):
(652s) Tuple counts for DataFlowUtil::simpleInstructionLocalFlowStep#ff:
...
19701 ~1% {4} r27 = JOIN r26 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r26.<2>, r26.<1>, r26.<0>
7908 ~0% {3} r28 = JOIN r27 WITH SSAConstruction::Cached::getInstructionIndex#ff@staged_ext AS R ON FIRST 2 OUTPUT r27.<0>, r27.<2>, r27.<3>
4023 ~0% {3} r29 = JOIN r28 WITH Instruction::WriteSideEffectInstruction#class#ff AS R ON FIRST 1 OUTPUT r28.<1>, r28.<2>, r28.<0>
...
1060807009 ~3% {3} r34 = JOIN r33 WITH SSAConstruction::Cached::getInstructionIndex#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r33.<1>, r33.<2>
15670 ~5% {2} r35 = JOIN r34 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 2 OUTPUT r34.<0>, r34.<2>
7973 ~0% {2} r36 = JOIN r35 WITH Instruction::ReadSideEffectInstruction::getSideEffectOperand_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r35.<1>
...
In this predicate there are two cases (`WriteSideEffectInstruction` and
`ReadSideEffectInstruction`) where we need to join on both the call and
the argument index of a side effect. It works well enough for the first
case, `WriteSideEffectInstruction`, where the call is joined on before
the index, but it explodes in the second case,
`ReadSideEffectInstruction`, where the index is joined first. To fix the
second case, and to guard against future optimizer accidents in the
first case, this commit changes both of those cases to use a new helper
predicate that makes it possible to join on both columns at once. The
resulting tuple counts are:
(3s) Tuple counts for DataFlowUtil::simpleInstructionLocalFlowStep#ff:
...
7908 ~0% {3} r27 = JOIN r26 WITH DataFlowUtil::getSideEffectFor#fff AS R ON FIRST 2 OUTPUT R.<2>, r26.<2>, r26.<0>
4023 ~0% {3} r28 = JOIN r27 WITH Instruction::WriteSideEffectInstruction#class#ff AS R ON FIRST 1 OUTPUT r27.<1>, r27.<2>, r27.<0>
...
15670 ~5% {2} r33 = JOIN r32 WITH DataFlowUtil::getSideEffectFor#fff AS R ON FIRST 2 OUTPUT R.<2>, r32.<2>
7973 ~0% {2} r34 = JOIN r33 WITH Instruction::ReadSideEffectInstruction::getSideEffectOperand_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r33.<1>
...
The bulge is now limited to a factor of two, and that's just because I
didn't write separate versions of `getSideEffectFor` for
`ReadSideEffectInstruction` and `WriteSideEffectInstruction`.
Avoids a Cartesian product on nodes:
```
[2020-02-07 11:01:22] (432s) Tuple counts for dom#DataFlowImpl::TPathNodeSink#ff:
0 ~0% {2} r1 = JOIN DataFlowImpl::Configuration::isSource_dispred#ff AS L WITH DataFlowImpl::Configuration::isSink_dispred#ff AS R ON FIRST 2 OUTPUT R.<1>, R.<0>
101611 ~0% {2} r2 = SCAN DataFlowImpl::PathNodeMid#class#ffffff AS I OUTPUT I.<5>, I.<0>
3534537047 ~3% {3} r3 = JOIN r2 WITH DataFlowImpl::Configuration::isSink_dispred#ff AS R ON FIRST 1 OUTPUT r2.<1>, R.<1>, R.<0>
251 ~41% {3} r4 = JOIN r3 WITH project#DataFlowImpl::pathStep#fffff AS R ON FIRST 2 OUTPUT R.<2>, r3.<2>, r3.<1>
251 ~50% {2} r5 = JOIN r4 WITH DataFlowImpl::TNil#ff_1#join_rhs AS R ON FIRST 1 OUTPUT r4.<2>, r4.<1>
251 ~50% {2} r6 = r1 \/ r5
323 ~67% {3} r7 = JOIN r6 WITH DataFlowImpl::flow#ff AS R ON FIRST 1 OUTPUT r6.<1>, r6.<0>, R.<1>
288 ~58% {3} r8 = SELECT r7 ON r7.<2> >= r7.<0>
251 ~53% {3} r9 = SELECT r8 ON r8.<2> <= r8.<0>
251 ~50% {2} r10 = SCAN r9 OUTPUT r9.<1>, r9.<0>
```
Joining on the enclosing callable before the kind is crucial, as witnessed by this pipeline:
```
[2020-02-06 17:58:21] (1086s) Starting to evaluate predicate DataFlowImplCommon::getReturnPosition#ff/2@83c546
[2020-02-06 18:53:16] (4382s) Tuple counts for DataFlowImplCommon::getReturnPosition#ff:
385478 ~1% {3} r1 = SCAN DataFlowImplCommon::Cached::TReturnPosition0#fff@staged_ext AS I OUTPUT I.<2>, I.<0>, I.<1>
385478 ~2% {3} r2 = JOIN r1 WITH DataFlowImplCommon::Cached::TReturnPosition0#fff_2#join_rhs AS R ON FIRST 1 OUTPUT r1.<2>, r1.<1>, r1.<0>
58638116860 ~0% {3} r3 = JOIN r2 WITH DataFlowImplCommon::ReturnNodeExt::getKind_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>, r2.<2>
914049 ~0% {2} r4 = JOIN r3 WITH DataFlowImplCommon::returnNodeGetEnclosingCallable#ff AS R ON FIRST 2 OUTPUT r3.<0>, r3.<2>
return r4
```
We can't understand the real `six.py` file, so we have some internal plumbing
that enables us to handle six anyway. While updating that, I had a hell of a lot
of trouble with these tests.
What we actually want, is to see that we can understand what the values imported
from six are (i.e., their points-to information). I added a few more, that I
think would be useful. If we can figure out all of these, I don't actually care
if we're doing it by understanding the real `six.py` file, or by some internal
trick.
I verified that we don't get results with the real `six.py` file by disabling
our internal tricks, and putting a copy of six.py just next to test.py.
We used to have an other file that would list all the properties we knew and
their value, but that turned out to be a fragile and annoying test, since the
results differed from which version of python you ran it with (3.5 vs 3.8) and
which machine you ran it on (my machien vs jenkins). I don't care about the
results in this file, and I can certainly not eyeball it to see if it's correct
or not.
Also fixed a minor bug where we should have been treating `AllNonLocalMemory` as _totally_ overlapping an access to a non-local variable, rather than _partially_ overlapping it. This fix is exhibited both in the new test case and in a couple existing test functions in `ssa.cpp`.
The predicate
```
argumentValueFlowsThrough(ArgumentNode arg, OutNode out, CallContext cc)
```
has been generalized to
```
argumentValueFlowsThrough(
DataFlowCall call, ArgumentNode arg, Node out, ContentOption contentIn,
ContentOption contentOut
)
```
This enables us to summarize normal flow-through (as before), getters, setters,
as well as getter-setters.
Replacing `Value.booleanValue`. We wanted to match `Object.booleanValue` that
only gives a result if it is either `true` or `false`, but also wanted to keep
the flexibility to see if the Value _could_ be `true`/`false`. We don't have a
motivating usecase, so let's see if we ever need it :P
+ fix modernisation regression on py/jinja2/autoescape-false
Since `IntObjectInternal` extends `TInt`, and `TInt` is defined for all
instances of `Builtin.intValue`, and `Builtin.intValue` includes both `int` and
`long`, we don't need to handles Longs in a special manner, as we did in NumericObject.
It turns out that the evaluator will evaluate the GVN stage even when no
predicate from it is needed after optimization of the subsequent stages.
The GVN library is expensive to evaluate, and it'll become even more
expensive when we switch its implementation to IR.
This PR disables the use of GVN in `DataFlow::BarrierNode` for the AST
data-flow library, which should improve performance when evaluating a
single data-flow query on a snapshot with no cache. Precision decreases
slightly, leading to a new FP in the qltests.
There is no corresponding change for the IR data-flow library since IR
GVN is not very expensive.
We were hitting a combinatorial explosion in `hasDefinitionAtRank` for functions that contain a large number of string literals. The problem was that every `Chi` instruction for `AliasedVirtualVariable` was treated as a definition of every string literal. We already mark string literals as `isReadOnly()`, but we were allowing `AliasedVirtualVariable` to define read-only locations so that the `AliasedDefinition` instruction would provide the initial definition for all string literals.
To fix this, I've introduced the new `InitializeNonLocal` instruction, which is inserted in the prologue of every function right after `AliasedDefinition`. It provides the initial definition for every non-stack memory location, including read-only locations, but is never written to anywhere else. It is the conterpart of the `AliasedUse` instruction in the function epilogue, which represents the use of all non-stack memory after the function returns. I considered renaming `AliasedUse` to `ReturnNonLocal`, to match the `InitializeXXX`/`ReturnXXX` pattern we already use for parameters and indirections, but held off to avoid unnecessary churn. Any thoughts on whether I should make this name change?
This change has a significant speedup in evaluation time for a few of our troublesome databases:
`attnam/ivan`: 13%
`awslabs/s2n`: 26%
`SinaMostafanejad/OpenRDM`: 7%
`zcoinofficial/zcoin`: 8%
As a temporary workaround in the `DefaultTaintTracking` library, we
funnel flow across calls by conflating pointer and object both at the
caller and the callee.
The three cases in `adjustedSink` were deleted because they are now
covered by the one case for `ReadSideEffectInstruction` in
`instructionTaintStep`.
When enabling `DefaultTaintTracking`, this commit on top of #2736 has
the effect effect of recovering two lost results:
--- a/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/OverflowDestination.expected
+++ b/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/OverflowDestination.expected
@@ -1,2 +1,4 @@
| overflowdestination.cpp:30:2:30:8 | call to strncpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
| overflowdestination.cpp:46:2:46:7 | call to memcpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
+| overflowdestination.cpp:53:2:53:7 | call to memcpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
+| overflowdestination.cpp:64:2:64:7 | call to memcpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
In the internal repo, we recover one lost result. Additionally, there
are two queries that gain an extra source for an existing sink. I'll
classify that as noise. The new results look like this:
foo(argv); // this `argv` is a new source for the sink in `bar`
bar(argv); // this `argv` is the existing source for the sink in `bar`
The type of the source argument to `memcpy` is `void *`, and somehow
that meant that the copied object itself got type `void`. Since that has
size 0, the SSA construction did not model it as reading from the last
write.
This is probably not the right fix, but maybe it's good enough for now.
The right fix would ensure that the type reported by
`hasOperandMemoryAccess` is `UnknownType`.
When `DefaultTaintTracking.qll` is enabled, this commit has the effect
of restoring a lost results:
--- a/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/OverflowDestination.expected
+++ b/cpp/ql/test/query-tests/Security/CWE/CWE-119/semmle/tests/OverflowDestination.expected
@@ -1 +1,2 @@
| overflowdestination.cpp:30:2:30:8 | call to strncpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
+| overflowdestination.cpp:46:2:46:7 | call to memcpy | To avoid overflow, this operation should be bounded by destination-buffer size, not source-buffer size. |
Adds
- `StringValue` as a new class,
- `Value::booleanValue` which returns the boolean interpretation of the given
value, and
- `ClassValue::str` which returns the value of the `str` class, depending on the
Python version.
This changes the flow to be taint rather than data flow, and it extends
it to include chi instructions with unknown type as long as they're not
for the `AliasedVirtualVariable`.
We're losing three good test results because these tests are not
affected by `DefaultTaintTracking.qll`. The taint step added here can
later be ported to `TaintTrackingUtil.qll` to recover these results, but
we probably want a better API than transitive-closure search through
instructions before doing that.
This change ensures that the diff test will show the difference between
the old and the new library even after we switch the default
implementation of `security.TaintTracking` to be the new one.
The code contained the remains of how `isUserInput` in `Security.qll`
used to be ported to IR. It's wrong to use that port since many queries
call `userInput` directly to get the "cause" string.
This introduces a new type of `MemoryLocation`: `EntireAllocationMemoryLocation`, representing an entire contiguous allocation whose size is not known. This is used to model the memory accesses on `InitializeIndirection` and `ReturnIndirection`.
Instead of tracking `IRVariable`s directly, alias analysis now tracks instances of the `Allocation` type provided by its `Configuration` parameter. For unaliased SSA, an `Allocation` is just an `IRAutomaticVariable`. For aliased SSA, an `Allocation` is either an `IRVariable` or the memory pointed to by an indirect parameter.
Some of the tests currently fail, since they can't reproduce the old tests
results (since the sinks/sources defined in the library code are not
HttpResponseTaintSink/HttpRequestTaintSource)
Our definition of `toString` for the internal tuple objects we create during the
points-to analysis may have been a _tad_ too ambitious. In particular, it can
easily lead to non-termination, e.g. using the following piece of code:
```python
x = ()
while True:
x = (x, x)
```
This commit cuts off the infinite recursion by replacing _nested_ tuples with
the string "...". In particular this means even non-recursive tuples will be cut
off at that point, so that the following tuples
```python
(1, "2")
((3, 4), [5, 6])
(1, 2, 3, 4, 5)
```
Get the following string representations.
```
"(int 1, '2', )"
"(..., List, )"
"(int 1, int 2, int 3, 2 more...)"
```
They are not tainted in assignment, only in use.
I also adopted an attempt at a better test-setup, where it's easy to see if
everything is the way you hoped for, instead of browsing through 100 of lines of
taint-step output :P
We designed the IR's `DataFlow::Node.asExpr` very carefully so that it's
suitable for taint tracking, but then we didn't use it in
`DefaultTaintTracking.qll`. This meant that the sources in
`ArithmeticWithExtremeValues.ql` didn't get associated with any
`Instruction` and thus didn't propagate anywhere.
With this commit, the mapping of `Expr`-based sources to IR data-flow
nodes uses `asExpr`.
This commit undoes the code sharing between `TranslatedAssignExpr` (`=`)
and `TranslatedAssignOperation` (`+=`, `<<=`, ...). In the next commit,
when we change how the `Load` works on the LHS of
`TranslatedAssignOperation`, these classes will become so different that
sharing is no longer helpful.
Previously, the `Load` would be associated with the `CrementOperation`
rather than its operand, which gave surprising results when mapping
taint sinks back to `Expr`.
The changes in `raw_ir.expected` are to add `Copy` operations on the
`x++` in code like `y = x++`. This is now needed because the result that
`x++` would otherwise have (the Load) no longer belongs to the `++`
expression. Copies are inserted to ensure that all expressions are
associated with an `Instruction` result.
The changes in `*aliased_ssa_ir.expected` appear to be just wobble.
There was already a `WriteSideEffectInstruction` class that served as a
superclass for all the specific write side effects. This new class
serves the same purpose for read side effects.
When building SSA, we'll be assuming that stack variables do not escape, at least until we improve our alias analysis. I've added a new `IREscapeAnalysisConfiguration` class to allow the query to control this, and a new `UseSoundEscapeAnalysis.qll` module that can be imported to switch to the sound escape analysis. I've cloned the existing IR and SSA tests to have both sound and unsound versions. There were relatively few diffs in the IR dump tests, and the sanity tests still give the same results after one change described below.
Assuming that stack variables do not escape exposed an existing bug where we do not emit an `Uninitialized` instruction for the temporary variables used by `return` statements and `throw` expressions, even if the initializer is a constructor call or array initializer. I've refactored the code for handling elements that initialize a variable to share a common base class. I added a test case for returning an object initialized by constructor call, and ensured that the IR diffs for the existing `throw` test cases are correct.
The IR data flow library now supports virtual dispatch with a library
that's similar to `security.TaintTracking`. In particular, it should
have the same performance characteristics. The main difference is that
non-recursive callers of `flowsFrom` now pass `_` instead of `true` for
`boolean allowFromArg`. This change allows flow through `return` to
actually work.
The aliased SSA code was assuming that, for every automatic variable, there would be at least one memory access that reads or writes the entire variable. We've encountered a couple cases where that isn't true due to extractor issues. As a workaround, we now always create the `VariableMemoryLocation` for every local variable.
I've also added a sanity test to detect this condition in the future.
Along the way, I had to fix a perf issue in the PrintIR code. When determining the ID of a result based on line number, we were considering all `Instruction`s generated for a particular line, regardless of whether they were all in the same `IRFunction`. In addition, the predicate had what appeared to be a bad join order that made it take forever on large snapshots. I've scoped it down to just consider `Instruction`s in the same function, and outlined that predicate to fix the join order issue. This causes some numbering changes, but they're for the better. I don't think there was actually any nondeterminism there before, but now the numbering won't depend on the number of instantiations of a template, either.
The `overlappingVariableMemoryLocations` predicate was a helper
predicate introduced to fix a join-order issue in
`overlappingIRVariableMemoryLocations`. Unfortunately it caused a
performance issue of its own because it could grow too large. On the
small project (38MB zip) awslabs/s2n there were 181M rows in
`overlappingVariableMemoryLocations`, and it took 134s to evaluate.
The fix is to collapse the two predicates into one and fix join ordering
by including an extra column in the predicates being joined.
In addition, some parameters were reordered to avoid the overhead of
auto-generated `join_rhs` predicates.
Tuple counts of `overlappingVariableMemoryLocations` before:
623285 ~176% {2} r1 = JOIN AliasedSSA::isCoveredOffset#fff_120#join_rhs AS L WITH AliasedSSA::isCoveredOffset#fff_120#join_rhs AS R ON FIRST 2 OUTPUT L.<2>, R.<2>
119138 ~3% {2} r2 = SCAN AliasedSSA::VariableMemoryLocation::getVirtualVariable_dispred#ff AS I OUTPUT I.<1>, I.<0>
172192346 ~0% {2} r3 = JOIN r2 WITH AliasedSSA::hasUnknownOffset#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>
172815631 ~0% {2} r4 = r1 \/ r3
172192346 ~0% {2} r5 = JOIN r2 WITH AliasedSSA::hasUnknownOffset#ff_10#join_rhs AS R ON FIRST 1 OUTPUT r2.<1>, R.<1>
345007977 ~87% {2} r6 = r4 \/ r5
return r6
Tuple counts of `overlappingIRVariableMemoryLocations` after:
117021 ~134% {2} r1 = JOIN AliasedSSA::isCoveredOffset#ffff AS L WITH AliasedSSA::isCoveredOffset#ffff AS R ON FIRST 3 OUTPUT L.<3>, R.<3>
201486 ~1% {2} r2 = JOIN AliasedSSA::hasUnknownOffset#fff AS L WITH AliasedSSA::hasVariableAndVirtualVariable#fff AS R ON FIRST 2 OUTPUT L.<2>, R.<2>
318507 ~26% {2} r3 = r1 \/ r2
201486 ~3% {2} r4 = JOIN AliasedSSA::hasUnknownOffset#fff AS L WITH AliasedSSA::hasVariableAndVirtualVariable#fff AS R ON FIRST 2 OUTPUT R.<2>, L.<2>
519993 ~92% {2} r5 = r3 \/ r4
return r5
Previously, we had several predicates on `Instruction` and `Operand` whose values were determined solely by the opcode of the instruction. For large snapshots, this meant that we would populate large tables mapping each of the millions of `Instruction`s to the appropriate value, times three (once for each IR flavor).
This change moves all of these opcode properties onto `Opcode` itself, with inline wrapper predicates on `Instruction` and `Operand` where necessary. On smaller snapshots, like ChakraCore, performance is a wash, but this did speed up Wireshark by about 4%.
Even ignoring the modest performance benefit, having these properties defined on `Opcode` seems like a better organization than having them on `Instruction` and `Operand`.
Original code:
exists(Expr e, For forloop | forloop = loop and e.pointsTo(_, _, capturing) |
not loop.contains(e)
)
The new version will preserve the same semantics. The problem with the first
rewrite was that `not loop.(For).somethingMore` would hold for any AstNode that
was not a For
Our database representation of ASTs does not use assignment patterns, instead encoding the relevant information directly in the associated function/loop/assignment. We convert from an AST with assignment patterns to one without during parsing, so the extractor does not expect any assignment patterns to be present in the AST.
Due to a bug in the parser, this can currently happen for malformed programs. While we should fix that bug once it gets fixed in Acorn, it also makes sense for the extractor to be more robust, so this PR teaches the `ASTExtractor` pass to raise a parse error when it encounters an assignment pattern, and all other passes to simply ignore them.
The `overlappingVariableMemoryLocations` predicate was a helper
predicate introduced to fix a join-order issue in
`overlappingIRVariableMemoryLocations`. Unfortunately it caused a
performance issue of its own because it could grow too large. On the
small project (38MB zip) awslabs/s2n there were 181M rows in
`overlappingVariableMemoryLocations`, and it took 134s to evaluate.
The fix is to collapse the two predicates into one and fix join ordering
by including an extra column in the predicates being joined.
In addition, some parameters were reordered to avoid the overhead of
auto-generated `join_rhs` predicates.
Tuple counts of `overlappingVariableMemoryLocations` before:
623285 ~176% {2} r1 = JOIN AliasedSSA::isCoveredOffset#fff_120#join_rhs AS L WITH AliasedSSA::isCoveredOffset#fff_120#join_rhs AS R ON FIRST 2 OUTPUT L.<2>, R.<2>
119138 ~3% {2} r2 = SCAN AliasedSSA::VariableMemoryLocation::getVirtualVariable_dispred#ff AS I OUTPUT I.<1>, I.<0>
172192346 ~0% {2} r3 = JOIN r2 WITH AliasedSSA::hasUnknownOffset#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>
172815631 ~0% {2} r4 = r1 \/ r3
172192346 ~0% {2} r5 = JOIN r2 WITH AliasedSSA::hasUnknownOffset#ff_10#join_rhs AS R ON FIRST 1 OUTPUT r2.<1>, R.<1>
345007977 ~87% {2} r6 = r4 \/ r5
return r6
Tuple counts of `overlappingIRVariableMemoryLocations` after:
117021 ~134% {2} r1 = JOIN AliasedSSA::isCoveredOffset#ffff AS L WITH AliasedSSA::isCoveredOffset#ffff AS R ON FIRST 3 OUTPUT L.<3>, R.<3>
201486 ~1% {2} r2 = JOIN AliasedSSA::hasUnknownOffset#fff AS L WITH AliasedSSA::hasVariableAndVirtualVariable#fff AS R ON FIRST 2 OUTPUT L.<2>, R.<2>
318507 ~26% {2} r3 = r1 \/ r2
201486 ~3% {2} r4 = JOIN AliasedSSA::hasUnknownOffset#fff AS L WITH AliasedSSA::hasVariableAndVirtualVariable#fff AS R ON FIRST 2 OUTPUT R.<2>, L.<2>
519993 ~92% {2} r5 = r3 \/ r4
return r5
Performance on `taers232c/GAMADV-X` (which exhibited pathological behaviour in
the most recent dist upgrade) went from ~670s to ~313s on
`py/hardcoded-credentials`.
There are still a few tuple counts in the 10-100 million range, but this commit
takes care of all of the ones that numbered in the billions. (A single tuple
count in the 100-1000 million range remains, but it appears to be less critical,
taking only two seconds to calculate.)
Ternary conditionals `b ? x : y` mistakenly had taint-tracking steps from both
`b`, `x`, and `y` to the conditional expression itself. Flow from `b` was not
intented, and flow from `x` and `y` is already part of ordinary data flow.
The database type `@xmlparent` is defined a bit too loosely in that it includes all of `@file`, not just XML files. Fixing that would involve fiddling with the extractor/dbscheme, so I have opted to fix it at the QL level instead.
This used to take 30s on `cpython`.
```
Tuple counts for StatementNoEffect::side_effecting_binary#f:
46522 ~0% {2} r1 = ClassObject::ClassObject::hasAttribute_dispred#fb AS L AND NOT StatementNoEffect::side_effecting_binary#f#antijoin_rhs AS R(L.<0>, L.<1>)
46522 ~2% {2} r2 = SCAN r1 OUTPUT r1.<1>, r1.<0>
950960 ~2% {2} r3 = JOIN r2 WITH Operations::Operator::getSpecialMethodName_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>
950960 ~2% {2} r4 = JOIN r3 WITH py_operators AS R ON FIRST 1 OUTPUT R.<2>, r3.<1>
950960 ~0% {3} r5 = JOIN r4 WITH AstGenerated::BinaryExpr_::getLeft_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r4.<1>, r4.<0>
122934382 ~0% {2} r6 = JOIN r2 WITH Operations::Cmpop::getSpecialMethodName_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>
122934382 ~3% {3} r7 = JOIN r6 WITH project#Operations::Compare::compares_dispred#ffff#3_201#join_rhs AS R ON FIRST 1 OUTPUT R.<2>, r6.<1>, R.<1>
123885342 ~3% {3} r8 = r5 \/ r7
300 ~8% {1} r9 = JOIN r8 WITH project#Exprs::Expr::refersTo_dispred#ffff AS R ON FIRST 2 OUTPUT r8.<2>
return r9
```
With this commit, it takes a few milliseconds.
This commit adds type information to data flow paths, by mapping node types onto
the smaller set of GVN types, and implementing `ppReprType()`.
The effect is a mere change in `DataFlow::PathNode::toString()`; no type-based
pruning is done yet.
The `queries.xml` file defines which extractor the `codeql test` runner will use
to extract databases for the tests. In the future one will be able to write this
information in `qlpack.yml`, but we can't do that immediately because the
_existing_ CodeQL tooling would refuse to parse a `qlpack.yml` that has
the new field in it.
The `queries.xml` file defines which extractor the `codeql test` runner will use
to extract databases for the tests. In the future one will be able to write this
information in `qlpack.yml`, but we can't do that immediately because the
_existing_ CodeQL tooling would refuse to parse a `qlpack.yml` that has the new
field in it.
Adding a queries.xml file means that the normalization of file names in the test
output changes even with the old QLTest, so there are a number of consequential
updates of expected output files.
With the coming `codeql test` support, the `predefined_macros` file will not
necessarily be located under a `tools` directory. Change the test to hide more
of its actual path, so it will work in both cases.
This is a test of an internal query for the Semmle repository. It cannot
run against the public QL repository alone, and therefore should not be
tested here.
https://git.semmle.com/Semmle/code/pull/35690 adds the test back to the
internal repo.
Previously, we would consider an HTML file to be a dependent of all scripts embedded in it. Now we instead consider each JavaScript toplevel inside the HTML file to be a dependent, which is more sensible anyway.
It looks like allowing statics in `Nullness.qll` is fine since it's a
"may be null" analysis rather than a "must be null" analysis.
This reverts commit f5b9837e19.
These were originally meant to give you the term that is textually matched right before/right after the receiver. When I introduced support for lookbehinds, I changed the behaviour to give you the term that is _operationally_ matched before/after the receiver (remember that lookbehinds are implemented by reverse-matching).
However, I think that's rarely ever what you want, and is wrong for the only two uses of these predicates, where it's the textual matching order that we are after, not the operational order.
Consequently, I've changed the semantics back and updated the comments to hopefully clarify the intention.
In the IR, some memory accesses are "must" accesses (the entire memory location is always read or written), and some are "may" accesses (some, all, or none of the bits in the location are written). We previously had to special case specific "may" accesses in a few places. This change regularizes our handling of "may" accesses.
The `MemoryAccessKind` enumeration now describes only the extent of the access (the set of locations potentially accessed), but does not distinguish "must" from "may". The new predicates `Operand.hasMayMemoryAccess()` and `Instruction.hasResultMayMemoryAccess()` hold when the access is a "may" access.
Unaliased SSA now correctly ignores variables that are ever accessed via a "may" access.
Aliased SSA now distinguishes `MemoryLocation`s for "may" and "must" accesses. I've refactored `getOverlap()` into the core `getExtentOverlap()`, which considers only the extent, but not the "may" vs. "must", and `getOverlap()`, which tweaks the result of `getExtentOverlap()` based on "may" vs. "must" and read-only locations.
When determining the overlap between a `Phi` operand and its definition, we now use the result of the defining `Chi` instruction, if one exists. This gives exact definitions for `Phi` operands for virtual variables.
Should fix#2426.
Essentially, we disregard expressions used inside annotations, if these
annotations occur in a file that has `from __future__ import annotations`, as
this prevents the annotations from being evaluated.
* Adds fix for __init_subclass__ bug.
* Adds test case.
* Move test on name.
I think it makes more sense here, alongside the other "special" method names.
Fixes#1969.
The points-to analysis does not know that the assignment `input = raw_input`
cannot fail under Python 2, and so there are two possible values that `input`
could point-to after exiting the exception handler: the built-in `input`, or the
built-in `raw_input`. In the latter case we do not want to report the alert, and
so adding a check that the given function does not point-to the built-in
`raw_input` suffices.
Files have strange `:0:0:0:0` locations for... reasons. This makes the predicates inherited from `Locatable` meaningless. A particularly bad case is `getNumLines()`, which will always return one. The right predicate to use is, of course, `getNumberOfLines()`, which is defined in `File` itself.
Should fix#1833, #2137, and #2187.
Internally, comprehensions are (at present) elaborated into local functions and
iterators as described in [PEP-289](https://www.python.org/dev/peps/pep-0289/).
That is, something like:
```
g = (x**2 for x in range(10))
```
becomes something akin to
```
def __gen(exp):
for x in exp:
yield x**2
g = __gen(iter(range(10)))
```
In the context of the top-level of a class, this means `__gen` looks as if it is
a method of the class, and in particular `exp` looks like it's the `self`
argument of this method, which leads the points-to analysis to think that `exp`
is an instance of the surrounding class itself.
The fix in this case is pretty simple: we look for occurrences of `exp` (in fact
called `.0` internally -- carefully chosen to _not_ be a valid Python
identifier) and explicitly exclude this parameter from being classified as a
`self` parameter.
Having `toString()` defined to be `none()` is a major headache when debugging,
as `toString`-less results are silently elided. This PR puts dummy `toString`s
in place of the `none()`s.
(I am mostly creating this to see if it impacts our tests and/or the
performance. If not, we may as well merge it.)
This might cause fewer variables to be analysed because not every use of
`LocalScopeVariable` was constrained by the def-use library. Hopefully
this leads to an improved nullness analysis since it avoids treating
`static T *x = nullptr;` the same as `static T *x; x = nullptr;`.
These changes should not affect semantics since these uses of
`LocalScopeVariable` were already constrained to stack variables by
their use of SSA or def-use.
In these files it was possible to remove calls to `isStatic` by
switching from `LocalScopeVariable` to `StackVariable`. This changes
semantics, hopefully for the better, to treat `thread_local` locals the
same as `static` locals.
This is now a copy of `LocalScopeVariableReachability.qll`, just with
`s/LocalScopeVariable/StackVariable/g`. It can be used as a drop-in
replacement since the `LocalScopeVariableReachability.qll` library
implementation was already restricted to `SemanticStackVariable`.
Fixes#1832.
In the taint sink, we add an additional check that the given control-flow node
can indeed point to a value that is mutable. This takes care of the guard on the
type.
If and when we get around to adding configurations for all of the taint
analyses, we may want to implement this as a barrier instead, pruning any steps
that go through a type test where the type is not mutable.
This commit adds support for (classical) NUnit assertions (see
https://github.com/nunit/docs/wiki/Assertions). Modern constraint-based assertions,
such as `Assert.That(o, Is.Not.Null)` are currently not supported, because they
would require a restructuring of the assertion library.
Since we are now able to trace shared compilation builds on Linux and macOS
(starting from .NET Core 3), and always were able to on Windows, there is
no need to set `UseSharedCompilation=false` in those cases. This may have a
positive performance impact, as shared compilation is generally faster then
non-shared compilation.
The `toString` for IR data-flow nodes are now similar to AST data-flow
nodes. This should make it easier to use the IR as a drop-in replacement
in the future. There are still differences because the IR data flow
library takes conversions into account.
I did not attempt to align the new nodes we use for field flow. That can
come later, when we add field flow to IR data flow.
This string looked out of place compared to `ExplicitParameterNode`,
whose string is simply the name of the parameter and therefore
indistinguishable from an access to the parameter without looking at the
location also. This has not been a problem so far, and if we want to
distinguish more clearly between initial values and accesses at some
point, we should do it for `ExplicitParameterNode` and
`UninitializedNode` too.
We welcome contributions to our standard library and standard checks. Got an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request!
We welcome contributions to our CodeQL libraries and queries. Got an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request! Contributions to this project are [released](https://help.github.com/articles/github-terms-of-service/#6-contributions-under-repository-license) to the public under the [project's open source license](LICENSE).
Before we accept your pull request, we require that you have agreed to our Contributor License Agreement, this is not something that you need to do before you submit your pull request, but until you've done so, we will be unable to accept your contribution.
There is lots of useful documentation to help you write queries, ranging from information about query file structure to tutorials for specific target languages. For more information on the documentation available, see [CodeQL queries](https://help.semmle.com/QL/learn-ql/writing-queries/writing-queries.html) on [help.semmle.com](https://help.semmle.com).
## Adding a new query
If you have an idea for a query that you would like to share with other Semmle users, please open a pull request to add it to this repository.
Follow the steps below to help other users understand what your query does, and to ensure that your query is consistent with the other Semmle queries.
## Submitting a new experimental query
1.**Consult the documentation for query writers**
If you have an idea for a query that you would like to share with other CodeQL users, please open a pull request to add it to this repository. New queries start out in a `<language>/ql/src/experimental` directory, to which they can be merged when they meet the following requirements.
There is lots of useful documentation to help you write queries, ranging from information about query file structure to tutorials for specific target languages. For more information on the documentation available, see [Writing CodeQL queries](https://help.semmle.com/QL/learn-ql/writing-queries/writing-queries.html) on [help.semmle.com](https://help.semmle.com).
1.**Directory structure**
2.**Format your code correctly**
There are five language-specific query directories in this repository:
All of Semmle's standard queries and libraries are uniformly formatted for clarity and consistency, so we strongly recommend that all contributions follow the same formatting guidelines. If you use CodeQL for VS Code, you can autoformat your query in the [Editor](https://help.semmle.com/codeql/codeql-for-vscode/reference/editor.html#autoformatting). For more information, see the [CodeQL style guide](https://github.com/Semmle/ql/blob/master/docs/ql-style-guide.md).
* C/C++: `cpp/ql/src`
* C#:`csharp/ql/src`
* Java: `java/ql/src`
* JavaScript: `javascript/ql/src`
* Python: `python/ql/src`
3.**Make sure your query has the correct metadata**
Each language-specific directory contains further subdirectories that group queries based on their `@tags` or purpose.
- Experimental queries and libraries are stored in the `experimental` subdirectory within each language-specific directory in the [CodeQL repository](https://github.com/github/codeql). For example, experimental Java queries and libraries are stored in `java/ql/src/experimental` and any corresponding tests in `java/ql/test/experimental`.
- The structure of an `experimental` subdirectory mirrors the structure of its parent directory.
- Select or create an appropriate directory in `experimental` based on the existing directory structure of `experimental` or its parent directory.
Query metadata is used by Semmle's analysis to identify your query and make sure the query results are displayed properly.
The most important metadata to include are the `@name`, `@description`, and the `@kind`.
Other metadata properties (`@precision`, `@severity`, and `@tags`) are usually added after the query has been reviewed by Semmle staff.
For more information on writing query metadata, see the [Query metadata style guide](https://github.com/Semmle/ql/blob/master/docs/query-metadata-style-guide.md).
2.**Query metadata**
4.**Make sure the `select` statement is compatible with the query type**
- The query `@id` must conform to all the requirements in the [guide on query metadata](docs/query-metadata-style-guide.md#query-id-id). In particular, it must not clash with any other queries in the repository, and it must start with the appropriate language-specific prefix.
- The query must have a `@name` and `@description` to explain its purpose.
- The query must have a `@kind` and `@problem.severity` as required by CodeQL tools.
The `select` statement of your query must be compatible with the query type (determined by the `@kind` metadata property) for alert or path results to be displayed correctly in LGTM and CodeQL for VS Code.
For more information on `select` statement format, see [Introduction to query files](https://help.semmle.com/QL/learn-ql/writing-queries/introduction-to-queries.html#select-clause) on help.semmle.com.
For details, see the [guide on query metadata](docs/query-metadata-style-guide.md).
5.**Save your query in a `.ql` file in the correct language directory in this repository**
Make sure the `select` statement is compatible with the query `@kind`. See [About CodeQL queries](https://help.semmle.com/QL/learn-ql/writing-queries/introduction-to-queries.html#select-clause) on help.semmle.com.
There are five language-specific directories in this repository:
* C/C++: `ql/cpp/ql/src`
* C#:`ql/csharp/ql/src`
* Java: `ql/java/ql/src`
* JavaScript: `ql/javascript/ql/src`
* Python: `ql/python/ql/src`
3.**Formatting**
Each language-specific directory contains further subdirectories that group queries based on their `@tags` properties or purpose. Select the appropriate subdirectory for your new query, or create a new one if necessary.
- The queries and libraries must be autoformatted, for example using the "Format Document" command in [CodeQL for Visual Studio Code](https://help.semmle.com/codeql/codeql-for-vscode/procedures/about-codeql-for-vscode.html).
6.**Write a query help file**
4.**Compilation**
Query help files explain the purpose of your query to other users. Write your query help in a `.qhelp` file and save it in the same directory as your new query.
For more information on writing query help, see the [Query help style guide](https://github.com/Semmle/ql/blob/master/docs/query-help-style-guide.md).
- Compilation of the query and any associated libraries and tests must be resilient to future development of the [supported](docs/supported-queries.md) libraries. This means that the functionality cannot use internal libraries, cannot depend on the output of `getAQlClass`, and cannot make use of regexp matching on `toString`.
- The query and any associated libraries and tests must not cause any compiler warnings to be emitted (such as use of deprecated functionality or missing `override` annotations).
5.**Results**
- The query must have at least one true positive result on some revision of a real project.
Experimental queries and libraries may not be actively maintained as the [supported](docs/supported-queries.md) libraries evolve. They may also be changed in backwards-incompatible ways or may be removed entirely in the future without deprecation warnings.
After the experimental query is merged, we welcome pull requests to improve it. Before a query can be moved out of the `experimental` subdirectory, it must satisfy [the requirements for being a supported query](docs/supported-queries.md).
## Using your personal data
If you contribute to this project, we will record your name and email
address (as provided by you with your contributions) as part of the code
repositories, which might be made public. We might also use this information
repositories, which are public. We might also use this information
to contact you in relation to your contributions, as well as in the
normal course of software development. We also store records of your
CLA agreements. Under GDPR legislation, we do this
on the basis of our legitimate interest in creating the CodeQL product.
Please do get in touch (privacy@semmle.com) if you have any questions about
Please do get in touch (privacy@github.com) if you have any questions about
this or our data protection policies.
## Contributor License Agreement
This Contributor License Agreement (“Agreement”) is entered into between Semmle Limited (“Semmle,” “we” or “us” etc.), and You (as defined and further identified below).
Accordingly, You hereby agree to the following terms for Your present and future Contributions submitted to Semmle:
1.**Definitions**.
* "You" (or "Your") shall mean the Contribution copyright owner (whether an individual or organization) or legal entity authorized by the copyright owner that is making this Agreement with Semmle. For legal entities, the entity making a Contribution and all other entities that control, are controlled by, or are under common control with that entity are considered to be a single Contributor. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
* "Contribution(s)" shall mean the code, documentation or other original works of authorship, including any modifications or additions to an existing work, submitted by You to Semmle for inclusion in, or documentation of, any of the products or projects owned or managed by Semmle (the "Work(s)"). For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to Semmle or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, Semmle for the purpose of discussing and/or improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by You as "Not a Contribution."
2.**Grant of Copyright License**. You hereby grant to Semmle and to recipients of software distributed by Semmle a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Your Contributions and such derivative works.
3.**Grant of Patent License**. You hereby grant to Semmle and to recipients of software distributed by Semmle a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by You that are necessarily infringed by Your Contribution(s) alone or by combination of Your Contribution(s) with the Work to which such Contribution(s) was submitted. If any entity institutes patent litigation against You or any other entity (including a cross-claim or counterclaim in a lawsuit) alleging that Your Contribution, or the Work to which You have contributed, constitutes direct or contributory patent infringement, then any patent licenses granted to that entity under this Agreement for that Contribution or Work shall terminate as of the date such litigation is filed.
4.**Ownership**. Except as set out above, You keep all right, title, and interest in Your Contribution. The rights that You grant to us under this Agreement are effective on the date You first submitted a Contribution to us, even if Your submission took place before the date You entered this Agreement.
5.**Representations**. You represent and warrant that: (i) the Contributions are an original work and that You can legally grant the rights set out in this Agreement; (ii) the Contributions and Semmle’s exercise of any license rights granted hereunder, does not and will not, infringe the rights of any third party; (iii) You are not aware of any pending or threatened claims, suits, actions, or charges pertaining to the Contributions, including without limitation any claims or allegations that any or all of the Contributions infringes, violates, or misappropriate the intellectual property rights of any third party (You further agree that You will notify Semmle immediately if You become aware of any such actual or potential claims, suits, actions, allegations or charges).
6.**Employer**. If Your employer(s) has rights to intellectual property that You create that includes Your Contributions, You represent and warrant that Your employer has waived such rights for Your Contributions to Semmle, or that You have received permission to make Contributions on behalf of that employer and that You are authorized to execute this Agreement on behalf of Your employer.
7.**Inclusion of Code**. We determine the code that is in our Works. You understand that the decision to include the Contribution in any project or source repository is entirely that of Semmle, and this agreement does not guarantee that the Contributions will be included in any product.
8.**Disclaimer**. You are not expected to provide support for Your Contributions, except to the extent You desire to provide support. You may provide support for free, for a fee, or not at all. Except as set forth herein, and unless required by applicable law or agreed to in writing, You provide Your Contributions on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND.
9.**General**. The failure of either party to enforce its rights under this Agreement for any period shall not be construed as a waiver of such rights. No changes or modifications or waivers to this Agreement will be effective unless in writing and signed by both parties. In the event that any provision of this Agreement shall be determined to be illegal or unenforceable, that provision will be limited or eliminated to the minimum extent necessary so that this Agreement shall otherwise remain in full force and effect and enforceable. This Agreement shall be governed by and construed in accordance with the laws of the State of California in the United States without regard to the conflicts of laws provisions thereof. In any action or proceeding to enforce rights under this Agreement, the prevailing party will be entitled to recover costs and attorneys’ fees.
This open source repository contains the standard CodeQL libraries and queries that power [LGTM](https://lgtm.com), and the other products that [Semmle](https://semmle.com) makes available to its customers worldwide.
This open source repository contains the standard CodeQL libraries and queries that power [LGTM](https://lgtm.com) and the other CodeQL products that [GitHub](https://github.com) makes available to its customers worldwide.
## How do I learn CodeQL and run queries?
@@ -9,8 +9,8 @@ You can use the [interactive query console](https://lgtm.com/help/lgtm/using-que
## Contributing
We welcome contributions to our standard library and standard checks. Do you have an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request! Before you do, though, please take the time to read our [contributing guidelines](CONTRIBUTING.md). You can also consult our [style guides](https://github.com/Semmle/ql/tree/master/docs) to learn how to format your code for consistency and clarity, how to write query metadata, and how to write query help documentation for your query.
We welcome contributions to our standard library and standard checks. Do you have an idea for a new check, or how to improve an existing query? Then please go ahead and open a pull request! Before you do, though, please take the time to read our [contributing guidelines](CONTRIBUTING.md). You can also consult our [style guides](https://github.com/github/codeql/tree/master/docs) to learn how to format your code for consistency and clarity, how to write query metadata, and how to write query help documentation for your query.
## License
The code in this repository is licensed under [Apache License 2.0](LICENSE) by [Semmle](https://semmle.com).
The code in this repository is licensed under [Apache License 2.0](LICENSE) by [GitHub](https://github.com).
| Implicit function declarations (`cpp/Likely Bugs/Underspecified Functions/ImplicitFunctionDeclaration.ql`) | correctness, maintainability | This query finds calls to undeclared functions that are compiled by a C compiler. Results are shown on LGTM by default. |
## Changes to existing queries
A new taint-tracking library is used by all the security queries that track tainted values
| Boost\_asio TLS Settings Misconfiguration (`cpp/boost/tls-settings-misconfiguration`) | Query id change | The identifier was updated to use dashes in place of underscores (previous identifier `cpp/boost/tls_settings_misconfiguration`). |
| Buffer not sufficient for string (`cpp/overflow-calculated`) | More true positive results | This query now identifies a wider variety of buffer allocations using the `semmle.code.cpp.models.interfaces.Allocation` library. |
| Hard-coded Japanese era start date (`cpp/japanese-era/exact-era-date`) | | This query is no longer run on LGTM. |
| Memory is never freed (`cpp/memory-never-freed`) | More true positive results | This query now identifies a wider variety of buffer allocations using the `semmle.code.cpp.models.interfaces.Allocation` library. |
| Memory may not be freed (`cpp/memory-may-not-be-freed`) | More true positive results | This query now identifies a wider variety of buffer allocations using the `semmle.code.cpp.models.interfaces.Allocation` library. |
| Missing return statement (`cpp/missing-return`) | Fewer false positive results and more accurate locations | Functions containing `asm` statements are no longer highlighted by this query. The locations reported by this query are now more accurate in some cases. |
| No space for zero terminator (`cpp/no-space-for-terminator`) | More results with greater precision | The query gives more precise results for a wider variety of buffer allocations. String arguments to formatting functions are now (usually) expected to be null terminated strings. Use of the `semmle.code.cpp.models.interfaces.Allocation` library identifies problems with a wider variety of buffer allocations. This query is also more conservative when identifying which pointers point to null-terminated strings. |
| Overflow in uncontrolled allocation size (`cpp/uncontrolled-allocation-size`) | Fewer false positive results | The query now produces fewer, more accurate results. Cases where the tainted allocation size is range checked are more reliably excluded. |
| Overloaded assignment does not return 'this' (`cpp/assignment-does-not-return-this`) | Fewer false positive results | This query no longer reports incorrect results in template classes. |
| Pointer overflow check (`cpp/pointer-overflow-check`),<br> Possibly wrong buffer size in string copy (`cpp/bad-strncpy-size`),<br> Signed overflow check (`cpp/signed-overflow-check`) | More correct results | A new library is used for determining which expressions have identical value, giving more precise results. There is a performance cost to this, and the LGTM suite will overall run slower than before. |
| Unsafe array for days of the year (`cpp/leap-year/unsafe-array-for-days-of-the-year`) | | This query is no longer run on LGTM. |
| Unsigned comparison to zero (`cpp/unsigned-comparison-zero`) | More correct results | This query now also looks for comparisons of the form `0 <= x`. |
## Changes to libraries
* The built-in C++20 "spaceship operator" (`<=>`) is now supported via the QL
class `SpaceshipExpr`. Overloaded forms are modeled as calls to functions
named `operator<=>`.
* The data-flow library (`semmle.code.cpp.dataflow.DataFlow` and
`semmle.code.cpp.dataflow.TaintTracking`) has been improved, which affects
and improves some security queries. The improvements are:
- Track flow through functions that combine taint tracking with flow through fields.
- Track flow through clone-like functions, that is, functions that read contents of a field from a
parameter and stores the value in the field of a returned object.
* The security pack taint tracking library
(`semmle.code.cpp.security.TaintTracking`) uses a new intermediate
representation. This provides a more precise analysis of flow through
parameters and pointers. For new queries, however, we continue to recommend
using `semmle.code.cpp.dataflow.TaintTracking`.
* The global value numbering library
(`semmle.code.cpp.valuenumbering.GlobalValueNumbering`) uses a new
intermediate representation to provide a more precise analysis of
heap-allocated memory and pointers to stack variables.
* New libraries have been created to provide a more consistent and useful interface
for modeling allocation and deallocation. These replace the old
`semmle.code.cpp.commons.Alloc` library.
* The new `semmle.code.cpp.models.interfaces.Allocation` library models
allocations, such as `new` expressions and calls to `malloc`.
* The new `semmle.code.cpp.models.interfaces.Deallocation` library
models deallocations, such as `delete` expressions and calls to `free`.
* The predicate `freeCall` in `semmle.code.cpp.commons.Alloc` has been
deprecated. The `Allocation` and `Deallocation` models in
`semmle.code.cpp.models.interfaces` should be used instead.
* The new class `StackVariable` should be used in place of `LocalScopeVariable`
in most cases. The difference is that `StackVariable` does not include
variables declared with `static` or `thread_local`.
* As a rule of thumb, custom queries about the _values_ of variables should
be changed from `LocalScopeVariable` to `StackVariable`, while queries
about the _name or scope_ of variables should remain unchanged.
* The `LocalScopeVariableReachability` library is deprecated in favor of
`StackVariableReachability`. The functionality is the same.
* Taint tracking and data flow now features better modeling of commonly-used
| Assembly path injection (`cs/assembly-path-injection`) | security, external/cwe/cwe-114 | Finds user-controlled data used to load an assembly. Results are shown on LGTM by default. |
| Insecure configuration for ASP.NET requestValidationMode (`cs/insecure-request-validation-mode`) | security, external/cwe/cwe-016 | Finds where this attribute has been set to a value less than 4.5, which turns off some validation features and makes the application less secure. By default, the query is not run on LGTM. |
| Insecure SQL connection (`cs/insecure-sql-connection`) | security, external/cwe/cwe-327 | Finds unencrypted SQL connection strings. Results are not shown on LGTM by default. |
| Page request validation is disabled (`cs/web/request-validation-disabled`) | security, frameworks/asp.net, external/cwe/cwe-016 | Finds where ASP.NET page request validation has been disabled, which could make the application less secure. By default, the query is not run on LGTM. |
| Serialization check bypass (`cs/serialization-check-bypass`) | security, external/cwe/cwe-20 | Finds where data is not validated in a deserialization method. Results are not shown on LGTM by default. |
| XML injection (`cs/xml-injection`) | security, external/cwe/cwe-091 | Finds user-controlled data that is used to write directly to an XML document. Results are shown on LGTM by default. |
| Dereferenced variable may be null (`cs/dereferenced-value-may-be-null`) | More results | Results are reported from parameters with a default value of `null`. |
| Information exposure through an exception (`cs/information-exposure-through-exception`) | More results | The query now recognizes writes to cookies, writes to ASP.NET (`Inner`)`Text` properties, and email contents as additional sinks. |
| Information exposure through transmitted data (`cs/sensitive-data-transmission`) | More results | The query now recognizes writes to cookies and writes to ASP.NET (`Inner`)`Text` properties as additional sinks. |
| Potentially dangerous use of non-short-circuit logic (`cs/non-short-circuit`) | Fewer false positive results | Results have been removed when the expression contains an `out` parameter. |
| Useless assignment to local variable (`cs/useless-assignment-to-local`) | Fewer false positive results | Results have been removed when the value assigned is an (implicitly or explicitly) cast default-like value. For example, `var s = (string)null` and `string s = default`. Results have also been removed when the variable is named `_` in a `foreach` statement. |
| XPath injection (`cs/xml/xpath-injection`) | More results | The query now recognizes calls to methods on `System.Xml.XPath.XPathNavigator` objects. |
## Changes to code extraction
* Tuple expressions, for example `(int,bool)` in `default((int,bool))` are now extracted correctly.
* Expression nullability flow state is extracted.
* Implicitly typed `stackalloc` expressions are now extracted correctly.
* The difference between `stackalloc` array creations and normal array creations is extracted.
## Changes to libraries
* The data-flow library has been improved, which affects and improves most security queries. The improvements are:
- Track flow through methods that combine taint tracking with flow through fields.
- Track flow through clone-like methods, that is, methods that read the contents of a field from a
parameter and store the value in the field of a returned object.
* The taint tracking library now tracks flow through (implicit or explicit) conversion operator calls.
* [Code contracts](https://docs.microsoft.com/en-us/dotnet/framework/debug-trace-profile/code-contracts) are now recognized, and are treated like any other assertion methods.
* Expression nullability flow state is given by the predicates `Expr.hasNotNullFlowState()` and `Expr.hasMaybeNullFlowState()`.
*`stackalloc` array creations are now represented by the QL class `Stackalloc`. Previously they were represented by the class `ArrayCreation`.
* A new class `RemoteFlowSink` has been added to model sinks where data might be exposed to external users. Examples include web page output, emails, and cookies.
| Disabled Spring CSRF protection (`java/spring-disabled-csrf-protection`) | security, external/cwe/cwe-352 | Finds disabled Cross-Site Request Forgery (CSRF) protection in Spring. Results are shown on LGTM by default. |
| Failure to use HTTPS or SFTP URL in Maven artifact upload/download (`java/maven/non-https-url`) | security, external/cwe/cwe-300, external/cwe/cwe-319, external/cwe/cwe-494, external/cwe/cwe-829 | Finds use of insecure protocols during Maven dependency resolution. Results are shown on LGTM by default. |
| LDAP query built from user-controlled sources (`java/ldap-injection`) | security, external/cwe/cwe-090 | Finds LDAP queries vulnerable to injection of unsanitized user-controlled input. Results are shown on LGTM by default. |
| Left shift by more than the type width (`java/lshift-larger-than-type-width`) | correctness | Finds left shifts of ints by 32 bits or more and left shifts of longs by 64 bits or more. Results are shown on LGTM by default. |
| Suspicious date format (`java/suspicious-date-format`) | correctness | Finds date format patterns that use placeholders that are likely to be incorrect. Results are shown on LGTM by default. |
| Dereferenced variable may be null (`java/dereferenced-value-may-be-null`) | Fewer false positive results | Final fields with a non-null initializer are no longer reported. |
| Expression always evaluates to the same value (`java/evaluation-to-constant`) | Fewer false positive results | Expressions of the form `0 * x` are usually intended and no longer reported. Also left shift of ints by 32 bits and longs by 64 bits are no longer reported as they are not constant, these results are instead reported by the new query `java/lshift-larger-than-type-width`. |
| Useless null check (`java/useless-null-check`) | More true positive results | Useless checks on final fields with a non-null initializer are now reported. |
## Changes to libraries
* The data-flow library has been improved, which affects and improves most security queries. The improvements are:
- Track flow through methods that combine taint tracking with flow through fields.
- Track flow through clone-like methods, that is, methods that read contents of a field from a
parameter and stores the value in the field of a returned object.
* Identification of test classes has been improved. Previously, one of the
match conditions would classify any class with a name containing the string
"Test" as a test class, but now this matching has been replaced with one that
looks for the occurrence of actual unit-test annotations. This affects the
general file classification mechanism and thus suppression of alerts, and
also any security queries using taint tracking, as test classes act as
default barriers stopping taint flow.
* Parentheses are now no longer modeled directly in the AST, that is, the
`ParExpr` class is empty. Instead, a parenthesized expression can be
identified with the `Expr.isParenthesized()` member predicate.
| Cross-site scripting through exception (`js/xss-through-exception`) | security, external/cwe/cwe-079, external/cwe/cwe-116 | Highlights potential XSS vulnerabilities where an exception is written to the DOM. Results are not shown on LGTM by default. |
| Missing await (`js/missing-await`) | correctness | Highlights expressions that operate directly on a promise object in a nonsensical way, instead of awaiting its result. Results are shown on LGTM by default. |
| Polynomial regular expression used on uncontrolled data (`js/polynomial-redos`) | security, external/cwe/cwe-730, external/cwe/cwe-400 | Highlights expensive regular expressions that may be used on malicious input. Results are shown on LGTM by default. |
| Prototype pollution in utility function (`js/prototype-pollution-utility`) | security, external/cwe/cwe-400, external/cwe/cwe-471 | Highlights recursive assignment operations that are susceptible to prototype pollution. Results are shown on LGTM by default. |
| Regular expression always matches (`js/regex/always-matches`) | correctness, regular-expressions | Highlights regular expression checks that trivially succeed by matching an empty substring. Results are shown on LGTM by default. |
| Unsafe jQuery plugin (`js/unsafe-jquery-plugin`) | | Highlights potential XSS vulnerabilities in unsafely designed jQuery plugins. Results are shown on LGTM by default. |
| Unnecessary use of `cat` process (`js/unnecessary-use-of-cat`) | correctness, security, maintainability | Highlights command executions of `cat` where the fs API should be used instead. Results are shown on LGTM by default. |
| Clear-text logging of sensitive information (`js/clear-text-logging`) | More results | More results involving `process.env` and indirect calls to logging methods are recognized. |
| Duplicate parameter names (`js/duplicate-parameter-name`) | Fewer results | This query now ignores additional parameters that reasonably can have duplicated names. |
| Expression has no effect (`js/useless-expression`) | Fewer false positive results | The query now recognizes block-level flow type annotations and ignores the first statement of a try block. |
| Identical operands (`js/redundant-operation`) | Fewer results | This query now excludes cases where the operands change a value using ++/-- expressions. |
| Incomplete string escaping or encoding (`js/incomplete-sanitization`) | Fewer false positive results | This query now recognizes and excludes additional cases where a single replacement is likely to be intentional. |
| Incomplete URL scheme check (`js/incomplete-url-scheme-check`) | More results | This query now recognizes additional variations of URL scheme checks. |
| Missing CSRF middleware (`js/missing-token-validation`) | Fewer false positive results | The query reports fewer duplicates and only flags handlers that explicitly access cookie data. |
| Superfluous trailing arguments (`js/superfluous-trailing-arguments`) | Fewer results | This query now excludes cases where a function uses the `Function.arguments` value to process a variable number of parameters. |
| Syntax error (`js/syntax-error`) | Lower severity | This results of this query are now displayed with lower severity. |
| Unbound event handler receiver (`js/unbound-event-handler-receiver`) | Fewer false positive results | This query now recognizes additional ways event handler receivers can be bound. |
| Uncontrolled command line (`js/command-line-injection`) | More results | This query now recognizes additional ways of constructing arguments to `cmd.exe` and `/bin/sh`. |
| Uncontrolled data used in path expression (`js/path-injection`) | More results | This query now recognizes additional ways dangerous paths can be constructed and used. |
| Use of call stack introspection in strict mode (`js/strict-mode-call-stack-introspection`) | Fewer false positive results | The query no longer flags expression statements. |
| Use of password hash with insufficient computational effort (`js/insufficient-password-hash`) | Fewer false positive results | This query now recognizes and excludes additional cases that do not require secure hashing. |
| Useless regular-expression character escape (`js/useless-regexp-character-escape`) | Fewer false positive results | This query now distinguishes between escapes in strings and regular expression literals. |
## Changes to libraries
* The predicates `RegExpTerm.getSuccessor` and `RegExpTerm.getPredecessor` have been changed to reflect textual, not operational, matching order. This only makes a difference in lookbehind assertions, which are operationally matched backwards. Previously, `getSuccessor` would mimick this, so in an assertion `(?<=ab)` the term `b` would be considered the predecessor, not the successor, of `a`. Textually, however, `a` is still matched before `b`, and this is the order we now follow.
* An extensible model of the `EventEmitter` pattern has been implemented.
* Taint-tracking configurations now interact differently with the `data` flow label, which may affect queries
that combine taint-tracking and flow labels.
- Sources added by the 1-argument `isSource` predicate are associated with the `taint` label now, instead of the `data` label.
- Sanitizers now only block the `taint` label. As a result, sanitizers no longer block the flow of tainted values wrapped inside a property of an object.
To retain the old behavior, instead use a barrier, or block the `data` flow label using a labeled sanitizer.
| Arbitrary file write during tarfile extraction (`py/tarslip`) | Fewer false negative results | Negations are now handled correctly in conditional expressions that may sanitize tainted values. |
| First parameter of a method is not named 'self' (`py/not-named-self`) | Fewer false positive results | `__class_getitem__` is now recognized as a class method. |
| Import of deprecated module (`py/import-deprecated-module`) | Fewer false positive results | Deprecated modules that are used to provide backwards compatibility are no longer reported.|
| Module imports itself (`py/import-own-module`) | Fewer false positive results | Imports local to a given package are no longer classified as self-imports. |
| Uncontrolled command line (`py/command-line-injection`) | More results | We now model the `fabric` and `invoke` packages for command execution. |
### Web framework support
The CodeQL library has improved support for the web frameworks: Bottle, CherryPy, Falcon, Pyramid, TurboGears, Tornado, and Twisted. They now provide a proper `HttpRequestTaintSource`, instead of a `TaintSource`. This will enable results for the following queries:
- `py/path-injection`
- `py/command-line-injection`
- `py/reflective-xss`
- `py/sql-injection`
- `py/code-injection`
- `py/unsafe-deserialization`
- `py/url-redirection`
The library also has improved support for the web framework Twisted. It now provides a proper
`HttpResponseTaintSink`, instead of a `TaintSink`. This will enable results for the following
queries:
- `py/reflective-xss`
- `py/stack-trace-exposure`
## Changes to libraries
### Taint tracking
- The `urlsplit` and `urlparse` functions now propagate taint appropriately.
- HTTP requests using the `requests` library are now modeled.
* @description A function that uses more functions and variables from another file than functions and variables from its own file. This function might be better placed in the other file, to avoid exposing internals of the file it depends on.
* @description Two files share too much information about each other (accessing many operations or variables in both directions). It would be better to invert some of the dependencies to reduce the coupling between the two files.
<p>Using TLS or SSLv23 protool from the boost::asio library, but not disabling deprecated protocols or disabling minimum-recommended protocols.</p>
<p>Using the TLS or SSLv23 protocol from the boost::asio library, but not disabling deprecated protocols may expose the software to known vulnerabilities or permit weak encryption algorithms to be used. Disabling the minimum-recommended protocols is also flagged.</p>
</overview>
<recommendation>
<p>When using the TLS or SSLv23 protocol, set the <code>no_tlsv1</code> and <code>no_tlsv1_1</code> options, but do not set <code>no_tlsv1_2</code>. When using the SSLv23 protocol, also set the <code>no_sslv3</code> option.</p>
</recommendation>
<example>
<p>In the following example, the <code>no_tlsv1_1</code> option has not been set. Use of TLS 1.1 is not recommended.</p>
<p>In the corrected example, the <code>no_tlsv1</code> and <code>no_tlsv1_1</code> options have both been set, ensuring the use of TLS 1.2 or later.</p>
* @description Using TLS or SSLv23 protool from the boost::asio library, but not disabling deprecated protocols or disabling minimum-recommended protocols
* @description Using the TLS or SSLv23 protocol from the boost::asio library, but not disabling deprecated protocols, or disabling minimum-recommended protocols.
<p>Using boost::asio library but specifying a deprecated hardcoded protocol.</p>
<p>Using a deprecated hardcoded protocol instead of negotiting would lock your application to a protocol that has known vulnerabilities or weaknesses.</p>
</overview>
<recommendation>
<p>Only use modern protocols such as TLS 1.2 or TLS 1.3.</p>
</recommendation>
<example>
<p>In the following example, the <code>sslv2</code> protocol is specified. This protocol is out of date and its use is not recommended.</p>
<li>SEI CERT C Coding Standard: <a href="https://wiki.sei.cmu.edu/confluence/display/c/DCL31-C.+Declare+identifiers+before+using+them">DCL31-C. Declare identifiers before using them</a></li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.