Commit Graph

4039 Commits

Author SHA1 Message Date
Dave Bartolomeo
8e977dc6bf C++/C#: Move overrides of IRType::getByteSize() into leaf classes
See https://github.com/github/codeql/pull/2272. I've added code comments in all of the places that future me will be tempted to hoist these overrides.
2020-06-16 16:48:42 -04:00
Dave Bartolomeo
24c3110989 Merge from master 2020-06-16 16:37:38 -04:00
Jonas Jensen
e5e373cff2 Merge pull request #3673 from MathiasVP/assign-op-using-swap
C++: Add tests for taint through swap
2020-06-16 15:43:52 +02:00
Jonas Jensen
d80a033bed Merge pull request #3719 from dbartol/github/codeql-c-analysis-team/69-consistency
C++/C#: Fix a couple new consistency failures, and improve consistency messages
2020-06-16 08:48:35 +02:00
Aditya Sharad
d7d00bddf6 Merge pull request #3718 from adityasharad/cpp/formatting-function-doc
C++: Fix QLDoc on `FormattingFunction` library
2020-06-15 08:39:16 -07:00
Dave Bartolomeo
fecffab8e7 C++: Fix consistency error
`TTranslatedAllocationSideEffects` wasn't limiting itself to functions that actually have IR, so it was getting used even in template definitions.
2020-06-15 10:47:00 -04:00
Dave Bartolomeo
8cbc7e8654 C++/C#: Improve consistency failure result messages
Some of our IR consistency failure query predicates already produced results in the schema as an `@kind problem` query, including `$@` replacements for the enclosing `IRFunction` to make it easier to figure out which function to dump when debugging. This change moves the rest of the query predicates in `IRConsistency.qll` to do the same. In addition, it wraps each call to `getEnclosingIRFunction()` to return an `OptionalIRFunction`, which can be either a real `IRFunction` or a placeholder in case `getEnclosingIRFunction()` returned no results. This exposes a couple new consistency failures in `syntax-zoo`, which will be fixed in a subsequent commit.

This change also deals with consistency failures when the enclosing `IRFunction` has more than one `Function` or `Location`. For multiple `Function`s, we concatenate the function names. For multiple `Location`s, we pick the first one in lexicographical order. This changes the number of results produced in the existing tests, but does't change the actual number of problems.
2020-06-15 10:46:46 -04:00
Aditya Sharad
1033d22d1b C++: Fix QLDoc on FormattingFunction library
Copy-paste typo from `DataFlowFunction`.
2020-06-15 07:32:53 -07:00
Dave Bartolomeo
89a1fd4b4a C++/C#: Fix formatting 2020-06-13 08:22:04 -04:00
Dave Bartolomeo
73d2e09a8d C++:/C# Remove opcode from TRawInstruction 2020-06-12 17:36:01 -04:00
Dave Bartolomeo
978275cbd4 C++/C#: Move irFunc out of various TInstruction branches 2020-06-12 17:26:45 -04:00
Dave Bartolomeo
07c1520b4d C++/C#: Move ast out of TRawInstruction 2020-06-12 17:03:02 -04:00
Dave Bartolomeo
2aabe431f6 C++/C#: Stop caching getOldInstruction() 2020-06-12 16:22:58 -04:00
Dave Bartolomeo
ac169931b3 C++/C#: More efficient evaluation of SSA::hasInstruction() 2020-06-12 16:09:50 -04:00
Dave Bartolomeo
4331b9b54e C++: Simplify logic to an implication 2020-06-12 09:31:19 -04:00
Jonas Jensen
abd05bcff1 Merge pull request #3596 from robertbrignull/more-suites
Add more code-scanning suites
2020-06-12 09:08:20 +02:00
Mathias Vorreiter Pedersen
b78c06559e Merge pull request #3691 from geoffw0/reftest
C++: Add a test case for CWE-114 involving pointers and references.
2020-06-11 22:02:45 +02:00
Dave Bartolomeo
41df7000c5 Merge from master, including fixing up merge conflicts 2020-06-11 12:20:46 -04:00
Ian Lynagh
fd88289e46 C++: Fix reference to Block
We don't call it `BlockStmt`.
2020-06-11 16:50:23 +01:00
Robert Marsh
982fb38807 Merge pull request #3419 from MathiasVP/flat-structs
C++: Add reverse reads to IR field flow
2020-06-10 14:31:00 -07:00
Mathias Vorreiter Pedersen
a38839b446 C++: Include copy of IntWrapper class with two data members 2020-06-10 22:27:40 +02:00
Mathias Vorreiter Pedersen
ca20f17703 C++: Implement move constructor in terms of swap. I'm haven't found anything online on whether this is good or bad, and the only reason for not doing it might be performance. 2020-06-10 22:16:58 +02:00
Mathias Vorreiter Pedersen
1a95095505 C++: Add default move constructor. Also removed debug comment I forgot to remove earlier. Luckily, that meant that no line numbers changed in .expected files. 2020-06-10 17:13:04 +02:00
Mathias Vorreiter Pedersen
5abab25c28 Update cpp/ql/test/library-tests/dataflow/taint-tests/taint.cpp
Co-authored-by: Jonas Jensen <jbj@github.com>
2020-06-10 16:51:21 +02:00
Geoffrey White
91b9b78c48 C++: Add a test case for CWE-114 involving pointers and references. 2020-06-10 14:09:46 +01:00
Mathias Vorreiter Pedersen
88dabffd2b C++: Add tests that demonstrate flow through custom swap functions 2020-06-10 15:06:57 +02:00
semmle-qlci
1b8f3c4b84 Merge pull request #3657 from hvitved/dataflow/hidden-nodes
Approved by aschackmull, jbj
2020-06-10 13:22:09 +01:00
Robert Brignull
ded5eec76a rename slow-queries.yml to exclude-slow-queries.yml 2020-06-10 09:59:31 +01:00
Jonas Jensen
a341912da9 C++: Performance tweak for 1-field struct loads
On kamailio/kamailio the `DataFlowUtil::simpleInstructionLocalFlowStep`
predicate was slow because of the case for single-field structs, where
there was a large tuple-count bulge when joining with
`getFieldSizeOfClass`:

    3552902   ~2%       {2} r1 = SCAN Instruction::CopyInstruction::getSourceValueOperand_dispred#3#ff AS I OUTPUT I.<1>, I.<0>
    2065347   ~2%       {2} r35 = JOIN r1 WITH Operand::NonPhiMemoryOperand::getAnyDef_dispred#3#ff AS R ON FIRST 1 OUTPUT r1.<1>, R.<1>
    2065827   ~2%       {3} r36 = JOIN r35 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r35.<1>, r35.<0>
    2065825   ~3%       {3} r37 = JOIN r36 WITH Type::Type::getSize_dispred#ff AS R ON FIRST 1 OUTPUT r36.<1>, r36.<2>, R.<1>
    2068334   ~2%       {4} r38 = JOIN r37 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r37.<2>, r37.<0>, r37.<1>
    314603817 ~0%       {3} r39 = JOIN r38 WITH DataFlowUtil::getFieldSizeOfClass#fff_120#join_rhs AS R ON FIRST 2 OUTPUT r38.<3>, R.<2>, r38.<2>
    8         ~0%       {2} r40 = JOIN r39 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 2 OUTPUT r39.<2>, r39.<0>

That's 314M tuples.

Strangely, there is no such bulge on more well-behaved snapshots like
mysql/mysql-server.

With this commit the explosion is gone:

    ...
    2065825  ~0%       {4} r37 = JOIN r36 WITH Type::Type::getSize_dispred#ff AS R ON FIRST 1 OUTPUT r36.<0>, R.<1>, r36.<1>, r36.<2>
    1521     ~1%       {3} r38 = JOIN r37 WITH DataFlowUtil::getFieldSizeOfClass#fff_021#join_rhs AS R ON FIRST 2 OUTPUT r37.<2>, R.<2>, r37.<3>
    8        ~0%       {2} r39 = JOIN r38 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 2 OUTPUT r38.<0>, r38.<2>
2020-06-09 14:50:02 +02:00
Tom Hvitved
a371205db1 Data flow: Sync files 2020-06-09 13:55:12 +02:00
Tom Hvitved
8c9f85d04f Data flow: Allow nodes to be hidden from path explanations 2020-06-09 13:53:19 +02:00
Dave Bartolomeo
3fc02ce24e C++: Fix join order in virtual dispatch with unique
The optimizer picked a terrible join order in `VirtualDispatch::DataSensitiveCall::flowsFrom()`. Telling it that `getAnOutNode()` has a unique result convinces it to join first on the `Callable`, rather than on the `ReturnKind`.
2020-06-08 17:15:43 -04:00
Dave Bartolomeo
c511cc3444 C++: Better caching for getPrimaryInstructionForSideEffect() 2020-06-08 15:37:36 -04:00
Dave Bartolomeo
0ae98e78a2 Merge remote-tracking branch 'github/master' into github/codeql-c-analysis-team/69_union 2020-06-08 11:20:14 -04:00
Mathias Vorreiter Pedersen
b48168fc03 C++: Accept tests 2020-06-08 12:26:25 +02:00
Jonas Jensen
c62220e0dc C++: Fix data-flow dispatch perf with globals
There wasn't a good join order for the "store to global var" case in the
virtual dispatch library. When a global variable had millions of
accesses but few stores to it, the `flowsFrom` predicate would join to
see all those millions of accesses before filtering down to stores only.
The solution is to pull out a `storeIntoGlobal` helper predicate that
pre-computes which accesses are stores.

To make the code clearer, I've also pulled out a repeated chunk of code
into a new `addressOfGlobal` helper predicate.

For the kamailio/kamailio project, these are the tuple counts before:

    Starting to evaluate predicate DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta/3[3]@21a1df (iteration 3)
    Tuple counts for DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta:
    ...
    59002      ~0%     {3} r17 = SCAN DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#prev_delta AS I OUTPUT I.<1>, true, I.<0>
    58260      ~1%     {3} r31 = JOIN r17 WITH DataFlowUtil::Node::asVariable_dispred#fb AS R ON FIRST 1 OUTPUT R.<1>, true, r17.<2>
    2536187389 ~6%     {3} r32 = JOIN r31 WITH Instruction::VariableInstruction::getASTVariable_dispred#fb_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r31.<2>
    2536187389 ~6%     {3} r33 = JOIN r32 WITH project#Instruction::VariableAddressInstruction#class#3#ff AS R ON FIRST 1 OUTPUT r32.<0>, true, r32.<2>
    58208      ~0%     {3} r34 = JOIN r33 WITH Instruction::StoreInstruction::getDestinationAddress_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r33.<2>

Tuple counts after:

    Starting to evaluate predicate DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta/3[3]@6073c5 (iteration 3)
    Tuple counts for DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta:
    ...
    59002    ~0%     {3} r17 = SCAN DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#prev_delta AS I OUTPUT I.<1>, true, I.<0>
    58260    ~1%     {3} r23 = JOIN r17 WITH DataFlowUtil::Node::asVariable_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, true, r17.<2>
    58208    ~0%     {3} r24 = JOIN r23 WITH DataFlowDispatch::VirtualDispatch::storeIntoGlobal#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r23.<2>
    58208    ~0%     {3} r25 = JOIN r24 WITH DataFlowUtil::InstructionNode#ff_10#join_rhs AS R ON FIRST 1 OUTPUT true, r24.<2>, R.<1>

Notice that the final tuple count, 58208, is the same before and after.

The kamailio/kamailio project seems to have been affected by this issue
because it has global variables to do with logging policy, and these
variables are loaded from in every place where their logging macro is
used.
2020-06-08 11:48:40 +02:00
Mathias Vorreiter Pedersen
431cc5c926 C++: Fix inconsistent class name 2020-06-08 11:27:09 +02:00
Mathias Vorreiter Pedersen
01f3793159 C++: Add ReadSideEffect as a possible end instruction for load chains 2020-06-08 11:05:30 +02:00
Mathias Vorreiter Pedersen
a4388e9258 C++: Add example demonstrating missing flow 2020-06-08 11:03:36 +02:00
Dave Bartolomeo
94c2bba584 C++/C#: Fix formatting 2020-06-05 17:14:14 -04:00
Dave Bartolomeo
1c32e4cc68 C++/C#: Do filtering of instructions in cached predicates
The four cached predicates used to access common properties of instructions took a `TStageInstruction` as a parameter. This requires the calling code, in `Instruction.qll`, to then join the results with `hasInstruction()` to filter out results for `TRawInstruction`s that were discarded as unreachable. By simply switching the parameter types to `Instruction`, we can force that join to happen in the cached predicate itself. This makes the various accessor predicates on `Instruction` trivially inlinable to the cached predicate, instead of being joins of two huge relations that might have to be recomputed in later stages.
2020-06-05 15:41:21 -04:00
Dave Bartolomeo
e62b884b48 C++/C#: Cache Instruction.getResultIRType()
Most of the predicates on `Instruction` are thin wrappers around cached predicates in the `IRConstruction` or `SSAConstruction` modules. However, `getResultIRType()` has to join `Construction::getInstructionResultType()` with `LanguageType::getIRType()`. `getResultIRType()` is called frequently both within the IR code and by IR consumers, and that's a big join to have to repeat in multiple stages.

I looked at most of the other predicates in `Instruction.qll`, and didn't see any other predicates that met all of the criteria of "large, commonly called, and not already inline".
2020-06-05 15:17:28 -04:00
Dave Bartolomeo
c708ed1fe9 C++: Remove some usage of Instruction.getResultType()
There were a few places in the IR itself where we use `Instruction.getResultType()`, which returns the C++ `Type` of the result, instead of `Instruction.getResultIRType()`, which returns the language-neutral `IRType` of the result. By removing this usage, we can avoid evaluating `getResultType()` at all.

There are still other uses of `Instruction.getResultType()` in other libraries. We should switch those as well.
2020-06-05 14:08:01 -04:00
Dave Bartolomeo
11818489f5 C++/C#: Use cached to ensure that IR is evaluated in a single stage
Before this change, evaluation of the IR was spread out across about 5 stages. This resulted in a lot of redundant evaluation, especially tuple numbering of large IPA types like `TInstruction`. This change makes two small changes that, when combined, ensure that the IR is evaluated all in one stage:

First, we mark `TInstruction` as `cached`. This collapses all of the work to create instructions, across all three IR phases, into a single phase.

Second, we make the `SSA` module in `SSAConstruction.qll` just contain aliases to `cached` predicates defined in the `Cached` module. This ensures that all of the `Operand`-related SSA computation happens in the same stage as all of the `Instruction`-related SSA computation.
2020-06-05 14:05:25 -04:00
Mathias Vorreiter Pedersen
7642680ab9 C++: Also remove TInitializeThisValueNumber from the AST wrapper 2020-06-05 15:26:09 +02:00
Mathias Vorreiter Pedersen
1a33a3b7e1 Merge branch 'master' into remove-initialize-this-from-value-numbering 2020-06-05 15:03:54 +02:00
Mathias Vorreiter Pedersen
d49c0f7b67 C++: Sync identical files 2020-06-05 15:01:18 +02:00
Mathias Vorreiter Pedersen
15fa7be09a C++: Remove TInitializeThisValueNumber case from IR value numbering 2020-06-05 15:01:11 +02:00
Mathias Vorreiter Pedersen
7328429ef1 C++: Sync identical files 2020-06-04 11:31:32 +02:00
Mathias Vorreiter Pedersen
36cfe3624b C++: Add TConstantValueNumber case to ValueNumber::getKind 2020-06-04 11:31:02 +02:00