Commit Graph

3958 Commits

Author SHA1 Message Date
Robert Marsh
982fb38807 Merge pull request #3419 from MathiasVP/flat-structs
C++: Add reverse reads to IR field flow
2020-06-10 14:31:00 -07:00
semmle-qlci
1b8f3c4b84 Merge pull request #3657 from hvitved/dataflow/hidden-nodes
Approved by aschackmull, jbj
2020-06-10 13:22:09 +01:00
Jonas Jensen
a341912da9 C++: Performance tweak for 1-field struct loads
On kamailio/kamailio the `DataFlowUtil::simpleInstructionLocalFlowStep`
predicate was slow because of the case for single-field structs, where
there was a large tuple-count bulge when joining with
`getFieldSizeOfClass`:

    3552902   ~2%       {2} r1 = SCAN Instruction::CopyInstruction::getSourceValueOperand_dispred#3#ff AS I OUTPUT I.<1>, I.<0>
    2065347   ~2%       {2} r35 = JOIN r1 WITH Operand::NonPhiMemoryOperand::getAnyDef_dispred#3#ff AS R ON FIRST 1 OUTPUT r1.<1>, R.<1>
    2065827   ~2%       {3} r36 = JOIN r35 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r35.<1>, r35.<0>
    2065825   ~3%       {3} r37 = JOIN r36 WITH Type::Type::getSize_dispred#ff AS R ON FIRST 1 OUTPUT r36.<1>, r36.<2>, R.<1>
    2068334   ~2%       {4} r38 = JOIN r37 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r37.<2>, r37.<0>, r37.<1>
    314603817 ~0%       {3} r39 = JOIN r38 WITH DataFlowUtil::getFieldSizeOfClass#fff_120#join_rhs AS R ON FIRST 2 OUTPUT r38.<3>, R.<2>, r38.<2>
    8         ~0%       {2} r40 = JOIN r39 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 2 OUTPUT r39.<2>, r39.<0>

That's 314M tuples.

Strangely, there is no such bulge on more well-behaved snapshots like
mysql/mysql-server.

With this commit the explosion is gone:

    ...
    2065825  ~0%       {4} r37 = JOIN r36 WITH Type::Type::getSize_dispred#ff AS R ON FIRST 1 OUTPUT r36.<0>, R.<1>, r36.<1>, r36.<2>
    1521     ~1%       {3} r38 = JOIN r37 WITH DataFlowUtil::getFieldSizeOfClass#fff_021#join_rhs AS R ON FIRST 2 OUTPUT r37.<2>, R.<2>, r37.<3>
    8        ~0%       {2} r39 = JOIN r38 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 2 OUTPUT r38.<0>, r38.<2>
2020-06-09 14:50:02 +02:00
Tom Hvitved
a371205db1 Data flow: Sync files 2020-06-09 13:55:12 +02:00
Tom Hvitved
8c9f85d04f Data flow: Allow nodes to be hidden from path explanations 2020-06-09 13:53:19 +02:00
Mathias Vorreiter Pedersen
b48168fc03 C++: Accept tests 2020-06-08 12:26:25 +02:00
Jonas Jensen
c62220e0dc C++: Fix data-flow dispatch perf with globals
There wasn't a good join order for the "store to global var" case in the
virtual dispatch library. When a global variable had millions of
accesses but few stores to it, the `flowsFrom` predicate would join to
see all those millions of accesses before filtering down to stores only.
The solution is to pull out a `storeIntoGlobal` helper predicate that
pre-computes which accesses are stores.

To make the code clearer, I've also pulled out a repeated chunk of code
into a new `addressOfGlobal` helper predicate.

For the kamailio/kamailio project, these are the tuple counts before:

    Starting to evaluate predicate DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta/3[3]@21a1df (iteration 3)
    Tuple counts for DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta:
    ...
    59002      ~0%     {3} r17 = SCAN DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#prev_delta AS I OUTPUT I.<1>, true, I.<0>
    58260      ~1%     {3} r31 = JOIN r17 WITH DataFlowUtil::Node::asVariable_dispred#fb AS R ON FIRST 1 OUTPUT R.<1>, true, r17.<2>
    2536187389 ~6%     {3} r32 = JOIN r31 WITH Instruction::VariableInstruction::getASTVariable_dispred#fb_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r31.<2>
    2536187389 ~6%     {3} r33 = JOIN r32 WITH project#Instruction::VariableAddressInstruction#class#3#ff AS R ON FIRST 1 OUTPUT r32.<0>, true, r32.<2>
    58208      ~0%     {3} r34 = JOIN r33 WITH Instruction::StoreInstruction::getDestinationAddress_dispred#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r33.<2>

Tuple counts after:

    Starting to evaluate predicate DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta/3[3]@6073c5 (iteration 3)
    Tuple counts for DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#cur_delta:
    ...
    59002    ~0%     {3} r17 = SCAN DataFlowDispatch::VirtualDispatch::DataSensitiveCall::flowsFrom#fff#prev_delta AS I OUTPUT I.<1>, true, I.<0>
    58260    ~1%     {3} r23 = JOIN r17 WITH DataFlowUtil::Node::asVariable_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, true, r17.<2>
    58208    ~0%     {3} r24 = JOIN r23 WITH DataFlowDispatch::VirtualDispatch::storeIntoGlobal#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, true, r23.<2>
    58208    ~0%     {3} r25 = JOIN r24 WITH DataFlowUtil::InstructionNode#ff_10#join_rhs AS R ON FIRST 1 OUTPUT true, r24.<2>, R.<1>

Notice that the final tuple count, 58208, is the same before and after.

The kamailio/kamailio project seems to have been affected by this issue
because it has global variables to do with logging policy, and these
variables are loaded from in every place where their logging macro is
used.
2020-06-08 11:48:40 +02:00
Mathias Vorreiter Pedersen
431cc5c926 C++: Fix inconsistent class name 2020-06-08 11:27:09 +02:00
Mathias Vorreiter Pedersen
01f3793159 C++: Add ReadSideEffect as a possible end instruction for load chains 2020-06-08 11:05:30 +02:00
Mathias Vorreiter Pedersen
a4388e9258 C++: Add example demonstrating missing flow 2020-06-08 11:03:36 +02:00
Mathias Vorreiter Pedersen
7642680ab9 C++: Also remove TInitializeThisValueNumber from the AST wrapper 2020-06-05 15:26:09 +02:00
Mathias Vorreiter Pedersen
1a33a3b7e1 Merge branch 'master' into remove-initialize-this-from-value-numbering 2020-06-05 15:03:54 +02:00
Mathias Vorreiter Pedersen
d49c0f7b67 C++: Sync identical files 2020-06-05 15:01:18 +02:00
Mathias Vorreiter Pedersen
15fa7be09a C++: Remove TInitializeThisValueNumber case from IR value numbering 2020-06-05 15:01:11 +02:00
Mathias Vorreiter Pedersen
7328429ef1 C++: Sync identical files 2020-06-04 11:31:32 +02:00
Mathias Vorreiter Pedersen
36cfe3624b C++: Add TConstantValueNumber case to ValueNumber::getKind 2020-06-04 11:31:02 +02:00
Mathias Vorreiter Pedersen
4b16067af2 C++: Fix testcases after merge from master 2020-06-04 11:02:03 +02:00
Mathias Vorreiter Pedersen
2cf9bcef86 Merge branch 'master' into flat-structs 2020-06-04 10:52:25 +02:00
Jonas Jensen
e292eee3d1 C++: Autoformat fixup 2020-06-03 15:48:50 +02:00
Mathias Vorreiter Pedersen
d295e2139a C++: Accept tests after merge from master 2020-06-03 15:13:44 +02:00
Mathias Vorreiter Pedersen
43a0d4c97d Merge branch 'master' into flat-structs 2020-06-03 15:11:14 +02:00
Jonas Jensen
ad292d8fb6 C++: Accept one more test change from last commit 2020-06-03 14:51:05 +02:00
Jonas Jensen
8f702d4b49 C++: Override toString on argument indirections
Without this override, end users would see the string
`BufferReadSideEffect` in path explanations.
2020-06-03 13:04:10 +02:00
Mathias Vorreiter Pedersen
b890b162f4 C++: Restrict the side effect of StoreChainEndInstructionSideEffect to be WriteSideEffectInstructions 2020-06-03 09:28:06 +02:00
Jonas Jensen
10dfa497a5 Merge remote-tracking branch 'upstream/master' into dataflow-indirect-args
Fixed a semantic merge conflict by accepting test changes in
`cpp/ql/test/library-tests/dataflow/fields/ir-path-flow.expected`.
2020-06-02 18:03:34 +02:00
Jonas Jensen
9c50acc0f9 Merge pull request #3602 from MathiasVP/path-problem-for-dataflow-tests
C++: Make path-problem versions of ir-flow.ql and flow.ql
2020-06-02 17:59:26 +02:00
Mathias Vorreiter Pedersen
2a1ba6d592 C++: Share configurations in testcases 2020-06-02 16:50:57 +02:00
Mathias Vorreiter Pedersen
b9af1123d9 C++: Make path-problem versions of ir-flow.ql and flow.ql 2020-06-02 16:28:01 +02:00
Jonas Jensen
771fd0b1cc C++: Fixup wording 2020-06-02 15:46:34 +02:00
Jonas Jensen
5f0d283212 Merge remote-tracking branch 'upstream/master' into dataflow-indirect-args
The conflicts came from how `this` is now a parameter but not a
`Parameter` on `master`.

Conflicts:
	cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll
	cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/defaulttainttracking.cpp
	cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/tainted.expected
	cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/test_diff.expected
	cpp/ql/test/library-tests/dataflow/dataflow-tests/dataflow-ir-consistency.expected
	cpp/ql/test/library-tests/dataflow/fields/ir-flow.expected
	cpp/ql/test/library-tests/syntax-zoo/dataflow-ir-consistency.expected
2020-06-02 15:35:02 +02:00
Mathias Vorreiter Pedersen
ce34d91a07 C++: Add more QLDoc to StoreNode and LoadNode classes, and related predicates. I also simplified the code a bit by moving common implementations of predicates into shared super classes. Finally, I added a getLocation predicate to StoreNode to match the structure of the LoadNode class. 2020-06-02 13:50:00 +02:00
Mathias Vorreiter Pedersen
e17b486195 Merge pull request #3593 from rdmarsh2/rdmarsh/cpp/add-qldoc-2
C++: Add QLDoc for AST classes up to Include.qll
2020-06-02 10:23:23 +02:00
Robert Marsh
3460b9d550 C++: autoformat 2020-06-01 15:38:06 -07:00
Mathias Vorreiter Pedersen
cd574e8569 Merge pull request #3589 from rdmarsh2/ir-placement-new-consistency
C++: fix IR control flow for cast in placement new
2020-05-30 13:27:34 +02:00
Robert Marsh
e17adf14dc C++: autoformat 2020-05-29 16:13:40 -07:00
Robert Marsh
f8b6e07391 C++: Added QLDoc for Element.qll-Include.qll 2020-05-29 16:09:19 -07:00
Robert Marsh
1c20714c62 C++: file QLDoc for AutogeneratedFile-Diagnostics 2020-05-29 14:58:01 -07:00
Jonas Jensen
91da0d5567 Merge pull request #3592 from geoffw0/strlen
CPP: Don't taint the return value of strlen
2020-05-29 19:23:47 +02:00
Robert Marsh
6c9051ae6f C++: accept consistency fixes 2020-05-29 09:49:28 -07:00
Mathias Vorreiter Pedersen
3adc10fdb4 C++: Accept tests 2020-05-29 15:33:55 +02:00
Geoffrey White
f534f09784 C++: Autoformat. 2020-05-29 14:05:08 +01:00
Geoffrey White
19c33ab41c C++: Refine StrLenFunction, including removal of taint flow. 2020-05-29 14:04:27 +01:00
Geoffrey White
705529cdf7 C++: Split StrLenFunction from PureStrFunction (without changes). 2020-05-29 14:04:27 +01:00
Geoffrey White
59cb5f9b1e C++: Remove a special case for strlen in DefaultTaintTracking. 2020-05-29 14:04:26 +01:00
Geoffrey White
408e38a4d4 C++: Clarify which taint tracking libraries should be used somewhat. 2020-05-29 14:04:26 +01:00
Geoffrey White
d77092c931 C++: Add taint tests for strlen. 2020-05-29 13:39:40 +01:00
Mathias Vorreiter Pedersen
a0603692cb C++: Add LoadChain and StoreChain nodes to handle reverse reads in dataflow 2020-05-29 13:53:53 +02:00
Jonas Jensen
453de6bf4e Merge pull request #3583 from MathiasVP/qldoc-for-unix-constants
C++: QLDoc for Constants
2020-05-29 12:27:59 +02:00
Mathias Vorreiter Pedersen
335baaef73 C++: Add testcases for partial definitions with long access paths 2020-05-29 12:15:39 +02:00
Jonas Jensen
7d4d435f25 Merge remote-tracking branch 'upstream/master' into Expr-location-workaround
Conflicts:
	cpp/ql/test/library-tests/dataflow/fields/dataflow-ir-consistency.expected
2020-05-29 10:04:12 +02:00