Commit Graph

5276 Commits

Author SHA1 Message Date
Tom Hvitved
8c9f85d04f Data flow: Allow nodes to be hidden from path explanations 2020-06-09 13:53:19 +02:00
Jonas Jensen
4642037dce C++: Speed up IRGuardCondition::controlsBlock
The `controlsBlock` predicate had some dramatic bulges in its tuple
counts. To make matters worse, those bulges were in materialized
intermediate predicates like `#shared` and `#antijoin_rhs`, not just in
the middle of a pipeline.

The problem was particularly evident on kamailio/kamailio, where
`controlsBlock` was the slowest predicate in the IR libraries:

    IRGuards::IRGuardCondition::controlsBlock_dispred#fff#shared#4 ........ 58.8s
    IRGuards::IRGuardCondition::controlsBlock_dispred#fff#antijoin_rhs .... 33.4s
    IRGuards::IRGuardCondition::controlsBlock_dispred#fff#antijoin_rhs#1 .. 26.7s

The first of the above relations had 201M rows, and the others
had intermediate bulges of similar size.

The bulges could be observed even on small projects although they did
not cause measurable performance issues there. The
`controlsBlock_dispred#fff#shared#4` relation had 3M rows on git/git,
which is a lot for a project with only 1.5M IR instructions.

This commit borrows an efficient implementation from Java's
`Guards.qll`, tweaking it slightly to fit into `IRGuards`. Performance
is now much better:

    IRGuards::IRGuardCondition::controlsBlock_dispred#fff ................... 6.1s
    IRGuards::IRGuardCondition::hasDominatingEdgeTo_dispred#ff .............. 616ms
    IRGuards::IRGuardCondition::hasDominatingEdgeTo_dispred#ff#antijoin_rhs . 540ms

After this commit, the biggest bulge in `controlsBlock` is the size of
`IRBlock::dominates`. On kamailio/kamailio this is an intermediate tuple
count of 18M rows in the calculation of `controlsBlock`, which in the
end produces 11M rows.
2020-06-09 12:15:45 +02:00
Jonas Jensen
cade3a3e23 C++: Use the hasBranchEdge helper predicate
This tidies up the code, removing unnecessary repetition.
2020-06-09 10:33:03 +02:00
Dave Bartolomeo
3fc02ce24e C++: Fix join order in virtual dispatch with unique
The optimizer picked a terrible join order in `VirtualDispatch::DataSensitiveCall::flowsFrom()`. Telling it that `getAnOutNode()` has a unique result convinces it to join first on the `Callable`, rather than on the `ReturnKind`.
2020-06-08 17:15:43 -04:00
Robert Marsh
2a96856ca5 C++/C#: Document IRPositionalParameter 2020-06-08 12:41:26 -07:00
Dave Bartolomeo
c511cc3444 C++: Better caching for getPrimaryInstructionForSideEffect() 2020-06-08 15:37:36 -04:00
Dave Bartolomeo
0ae98e78a2 Merge remote-tracking branch 'github/master' into github/codeql-c-analysis-team/69_union 2020-06-08 11:20:14 -04:00
Mathias Vorreiter Pedersen
b48168fc03 C++: Accept tests 2020-06-08 12:26:25 +02:00
Jonas Jensen
c62220e0dc C++: Fix data-flow dispatch perf with globals
There wasn't a good join order for the "store to global var" case in the
virtual dispatch library. When a global variable had millions of
accesses but few stores to it, the `flowsFrom` predicate would join to
see all those millions of accesses before filtering down to stores only.
The solution is to pull out a `storeIntoGlobal` helper predicate that
pre-computes which accesses are stores.

To make the code clearer, I've also pulled out a repeated chunk of code
into a new `addressOfGlobal` helper predicate.

For the kamailio/kamailio project, these are the tuple counts before:

    Starting to evaluate predicate DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta/3[3]@21a1df (iteration 3)
    Tuple counts for DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta:
    ...
    59002      ~0%     {3} r17 = SCAN DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#prev_delta AS I OUTPUT I.<1>, true, I.<0>
    58260      ~1%     {3} r31 = JOIN r17 WITH DataFlowUtil::Node::asVariable_dispred#fb AS R ON FIRST 1 OUTPUT R.<1>, true, r17.<2>
    2536187389 ~6%     {3} r32 = JOIN r31 WITH Instruction::VariableInstruction::getASTVariable_dispred#fb_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r31.<2>
    2536187389 ~6%     {3} r33 = JOIN r32 WITH project#Instruction::VariableAddressInstruction#class#3#ff AS R ON FIRST 1 OUTPUT r32.<0>, true, r32.<2>
    58208      ~0%     {3} r34 = JOIN r33 WITH Instruction::StoreInstruction::getDestinationAddress_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r33.<2>

Tuple counts after:

    Starting to evaluate predicate DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta/3[3]@6073c5 (iteration 3)
    Tuple counts for DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta:
    ...
    59002    ~0%     {3} r17 = SCAN DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#prev_delta AS I OUTPUT I.<1>, true, I.<0>
    58260    ~1%     {3} r23 = JOIN r17 WITH DataFlowUtil::Node::asVariable_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, true, r17.<2>
    58208    ~0%     {3} r24 = JOIN r23 WITH DataFlowDispatch::VirtualDispatch::storeIntoGlobal#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r23.<2>
    58208    ~0%     {3} r25 = JOIN r24 WITH DataFlowUtil::InstructionNode#ff_10#join_rhs AS R ON FIRST 1 OUTPUT true, r24.<2>, R.<1>

Notice that the final tuple count, 58208, is the same before and after.

The kamailio/kamailio project seems to have been affected by this issue
because it has global variables to do with logging policy, and these
variables are loaded from in every place where their logging macro is
used.
2020-06-08 11:48:40 +02:00
Mathias Vorreiter Pedersen
431cc5c926 C++: Fix inconsistent class name 2020-06-08 11:27:09 +02:00
Mathias Vorreiter Pedersen
01f3793159 C++: Add ReadSideEffect as a possible end instruction for load chains 2020-06-08 11:05:30 +02:00
Mathias Vorreiter Pedersen
a4388e9258 C++: Add example demonstrating missing flow 2020-06-08 11:03:36 +02:00
Robert Marsh
cce99f92a1 C++: exclude conversions in IR field flow tests 2020-06-05 16:19:02 -07:00
Robert Marsh
53a87fa378 C++: accept field flow test changes after merge 2020-06-05 15:41:10 -07:00
Dave Bartolomeo
94c2bba584 C++/C#: Fix formatting 2020-06-05 17:14:14 -04:00
Robert Marsh
0d2f8f3825 Merge branch 'master' into ir-this-parameter-2 2020-06-05 13:52:56 -07:00
Dave Bartolomeo
1c32e4cc68 C++/C#: Do filtering of instructions in cached predicates
The four cached predicates used to access common properties of instructions took a `TStageInstruction` as a parameter. This requires the calling code, in `Instruction.qll`, to then join the results with `hasInstruction()` to filter out results for `TRawInstruction`s that were discarded as unreachable. By simply switching the parameter types to `Instruction`, we can force that join to happen in the cached predicate itself. This makes the various accessor predicates on `Instruction` trivially inlinable to the cached predicate, instead of being joins of two huge relations that might have to be recomputed in later stages.
2020-06-05 15:41:21 -04:00
Dave Bartolomeo
e62b884b48 C++/C#: Cache Instruction.getResultIRType()
Most of the predicates on `Instruction` are thin wrappers around cached predicates in the `IRConstruction` or `SSAConstruction` modules. However, `getResultIRType()` has to join `Construction::getInstructionResultType()` with `LanguageType::getIRType()`. `getResultIRType()` is called frequently both within the IR code and by IR consumers, and that's a big join to have to repeat in multiple stages.

I looked at most of the other predicates in `Instruction.qll`, and didn't see any other predicates that met all of the criteria of "large, commonly called, and not already inline".
2020-06-05 15:17:28 -04:00
Dave Bartolomeo
c708ed1fe9 C++: Remove some usage of Instruction.getResultType()
There were a few places in the IR itself where we use `Instruction.getResultType()`, which returns the C++ `Type` of the result, instead of `Instruction.getResultIRType()`, which returns the language-neutral `IRType` of the result. By removing this usage, we can avoid evaluating `getResultType()` at all.

There are still other uses of `Instruction.getResultType()` in other libraries. We should switch those as well.
2020-06-05 14:08:01 -04:00
Dave Bartolomeo
11818489f5 C++/C#: Use cached to ensure that IR is evaluated in a single stage
Before this change, evaluation of the IR was spread out across about 5 stages. This resulted in a lot of redundant evaluation, especially tuple numbering of large IPA types like `TInstruction`. This change makes two small changes that, when combined, ensure that the IR is evaluated all in one stage:

First, we mark `TInstruction` as `cached`. This collapses all of the work to create instructions, across all three IR phases, into a single phase.

Second, we make the `SSA` module in `SSAConstruction.qll` just contain aliases to `cached` predicates defined in the `Cached` module. This ensures that all of the `Operand`-related SSA computation happens in the same stage as all of the `Instruction`-related SSA computation.
2020-06-05 14:05:25 -04:00
Robert Marsh
4c44c84ec0 C++: Add QLdoc in Initializer.qll-Macro.qll 2020-06-05 10:47:25 -07:00
Mathias Vorreiter Pedersen
7642680ab9 C++: Also remove TInitializeThisValueNumber from the AST wrapper 2020-06-05 15:26:09 +02:00
Mathias Vorreiter Pedersen
1a33a3b7e1 Merge branch 'master' into remove-initialize-this-from-value-numbering 2020-06-05 15:03:54 +02:00
Mathias Vorreiter Pedersen
d49c0f7b67 C++: Sync identical files 2020-06-05 15:01:18 +02:00
Mathias Vorreiter Pedersen
15fa7be09a C++: Remove TInitializeThisValueNumber case from IR value numbering 2020-06-05 15:01:11 +02:00
Mathias Vorreiter Pedersen
7328429ef1 C++: Sync identical files 2020-06-04 11:31:32 +02:00
Mathias Vorreiter Pedersen
36cfe3624b C++: Add TConstantValueNumber case to ValueNumber::getKind 2020-06-04 11:31:02 +02:00
Mathias Vorreiter Pedersen
4b16067af2 C++: Fix testcases after merge from master 2020-06-04 11:02:03 +02:00
Mathias Vorreiter Pedersen
2cf9bcef86 Merge branch 'master' into flat-structs 2020-06-04 10:52:25 +02:00
Dave Bartolomeo
cb2370cc7d C++/C#: Fix formatting 2020-06-04 02:36:51 -04:00
Dave Bartolomeo
a409b9d451 Merge remote-tracking branch 'github/master' into github/codeql-c-analysis-team/69_union 2020-06-03 16:10:22 -04:00
Dave Bartolomeo
15f41c0107 C++/C#: Remove dead QL code 2020-06-03 15:42:30 -04:00
Dave Bartolomeo
e65a5c921e C++: Add missing QLDoc 2020-06-03 13:49:14 -04:00
Dave Bartolomeo
f93c2e4e64 C++: Remove resultType from the IPA constructors for TInstruction
Making these part of the IPA object identity changes the failure mode for cases where we assign multiple result types to an instruction. Previously, we would just have one instruction with two result types, but now we'd have two instructions, which breaks things worse. This change goes back to how things were before, to avoid any new surprises on real-world code with invalid ASTs or IR.
2020-06-03 10:11:27 -04:00
Jonas Jensen
e292eee3d1 C++: Autoformat fixup 2020-06-03 15:48:50 +02:00
Mathias Vorreiter Pedersen
d295e2139a C++: Accept tests after merge from master 2020-06-03 15:13:44 +02:00
Mathias Vorreiter Pedersen
43a0d4c97d Merge branch 'master' into flat-structs 2020-06-03 15:11:14 +02:00
Jonas Jensen
ad292d8fb6 C++: Accept one more test change from last commit 2020-06-03 14:51:05 +02:00
Jonas Jensen
8f702d4b49 C++: Override toString on argument indirections
Without this override, end users would see the string
`BufferReadSideEffect` in path explanations.
2020-06-03 13:04:10 +02:00
Mathias Vorreiter Pedersen
b890b162f4 C++: Restrict the side effect of StoreChainEndInstructionSideEffect to be WriteSideEffectInstructions 2020-06-03 09:28:06 +02:00
Robert Marsh
f7752b0a01 C++/C#: add IRParameter subclass of IRVariable 2020-06-02 17:22:10 -07:00
Jonas Jensen
10dfa497a5 Merge remote-tracking branch 'upstream/master' into dataflow-indirect-args
Fixed a semantic merge conflict by accepting test changes in
`cpp/ql/test/library-tests/dataflow/fields/ir-path-flow.expected`.
2020-06-02 18:03:34 +02:00
Jonas Jensen
9c50acc0f9 Merge pull request #3602 from MathiasVP/path-problem-for-dataflow-tests
C++: Make path-problem versions of ir-flow.ql and flow.ql
2020-06-02 17:59:26 +02:00
Mathias Vorreiter Pedersen
2a1ba6d592 C++: Share configurations in testcases 2020-06-02 16:50:57 +02:00
Mathias Vorreiter Pedersen
b9af1123d9 C++: Make path-problem versions of ir-flow.ql and flow.ql 2020-06-02 16:28:01 +02:00
Jonas Jensen
771fd0b1cc C++: Fixup wording 2020-06-02 15:46:34 +02:00
Jonas Jensen
5f0d283212 Merge remote-tracking branch 'upstream/master' into dataflow-indirect-args
The conflicts came from how `this` is now a parameter but not a
`Parameter` on `master`.

Conflicts:
	cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll
	cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/defaulttainttracking.cpp
	cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/tainted.expected
	cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/test_diff.expected
	cpp/ql/test/library-tests/dataflow/dataflow-tests/dataflow-ir-consistency.expected
	cpp/ql/test/library-tests/dataflow/fields/ir-flow.expected
	cpp/ql/test/library-tests/syntax-zoo/dataflow-ir-consistency.expected
2020-06-02 15:35:02 +02:00
Mathias Vorreiter Pedersen
ce34d91a07 C++: Add more QLDoc to StoreNode and LoadNode classes, and related predicates. I also simplified the code a bit by moving common implementations of predicates into shared super classes. Finally, I added a getLocation predicate to StoreNode to match the structure of the LoadNode class. 2020-06-02 13:50:00 +02:00
Mathias Vorreiter Pedersen
e17b486195 Merge pull request #3593 from rdmarsh2/rdmarsh/cpp/add-qldoc-2
C++: Add QLDoc for AST classes up to Include.qll
2020-06-02 10:23:23 +02:00
Robert
a0ee41306a Update cpp/ql/src/codeql-suites/slow-queries.yml
Co-authored-by: Robert Marsh <rdmarsh2@gmail.com>
2020-06-02 09:22:23 +01:00