Commit Graph

3581 Commits

Author SHA1 Message Date
Tom Hvitved
28307399f8 Data flow: Sync files 2020-02-17 10:45:35 +01:00
Jonas Jensen
0aba965a9e C++: Don't mention deprecated class
The language tests were failing because they don't tolerate mentioning a
deprecated class anywhere.
2020-02-16 09:43:25 +01:00
Jonas Jensen
a59c0facee C++: Accept test changes for IR libs
This is for the tests in the ql repo. There are also changed tests in
the internal repo.
2020-02-15 21:12:20 +01:00
Jonas Jensen
f4ba56f0c0 C++: Use IR for security.TaintTracking and GVN 2020-02-15 21:10:29 +01:00
Jonas Jensen
e95ebb25a5 C++: Ensure tainted_diff.ql keeps using old lib
Without this, the test will compare the IR to itself after we enable it.
2020-02-15 21:10:29 +01:00
Jonas Jensen
0628625a76 Merge pull request #2835 from MathiasVP/value-number-perf
C++: Value number performance fix
2020-02-15 20:40:53 +01:00
Mathias Vorreiter Pedersen
8cda847dbc C++: Add TLoadTotalOverlapValueNumber to getKind predicate in AST GVN wrapper 2020-02-15 09:37:45 -07:00
Jonas Jensen
49d2f5a60b C++: autoformat 2020-02-15 09:41:27 +01:00
Dave Bartolomeo
867581df91 Merge pull request #2844 from MathiasVP/value-numbering-performance-fix-2
C++: Ensure that there is just one overlap for an operand in value numbering
2020-02-14 16:40:03 -07:00
Robert Marsh
7abd289d7d C++: reinclude IRType in total load value numbers 2020-02-14 13:34:29 -08:00
Mathias Vorreiter Pedersen
8b8a8cae5b C++/C#: Sync identical files 2020-02-14 16:11:57 +01:00
Mathias Vorreiter Pedersen
4a7b865dc0 C++: Move overlap fix into SSAConstruction 2020-02-14 16:11:00 +01:00
Mathias Vorreiter Pedersen
121c5e436d C++: Check that there is only one overlap 2020-02-14 11:13:53 +01:00
Jonas Jensen
928bdbacb0 C++: Change import order for stable cache checksum
Without this fix, running the full LGTM suite would get the IR evaluated
twice. That's because we have multiple IPA types and constructors with
the same name (like `TInstruction` and `MkIRFunction`), and the QL
compiler chooses how to disambiguate those names differently depending
on import order.

I've tested that the IR is only evaluated once now by running the whole
suite on a tiny project (jbj/magicrescue) and looking at the output of

    perl -ne 'print if /^RESULTS IN:/ .. /^\[/ and not /^\[/' runSnapshotQueries-debug.log | sort |uniq -c |sort -n |less
2020-02-14 10:28:52 +01:00
Robert Marsh
b4ff1216cc C++: sync identical files 2020-02-13 17:02:00 -08:00
Robert Marsh
0f58887396 C++: unique value number for filtered instructions
Instructions that are removed from the normal value numbering recursion
because they have a duplicated type or AST element get unique value
numbers rather than going unnumbered. This ensures comparisons of value
numbers using `!=` hold for filtered instructions.
2020-02-13 15:36:42 -08:00
Mathias Vorreiter Pedersen
d4c6f487bc C++/C#: Fix sync config file for value numbering sharing 2020-02-13 22:32:52 +01:00
Mathias Vorreiter Pedersen
ed7888c612 C++: Sync identical files 2020-02-13 21:50:03 +01:00
Mathias Vorreiter Pedersen
57613d5507 C++: Reintroduce the type in TConstantValueNumber to avoid giving constant with different signed-ness the same value number. Instead filter those with more than one type out. 2020-02-13 21:49:40 +01:00
Dave Bartolomeo
9e1ea01be8 Fix typo 2020-02-13 13:01:09 -07:00
Mathias Vorreiter Pedersen
cb510edcf0 C++: Sync up identical files and restore imports 2020-02-13 18:02:56 +01:00
Mathias Vorreiter Pedersen
04c5f1cbb4 C++: Perf fix for value numbering 2020-02-13 18:02:56 +01:00
Jonas Jensen
24396905a5 WIP: Try to reduce ambiguous value numbers
This is not enough to get genome/breakdancer working.
2020-02-13 18:02:56 +01:00
Jonas Jensen
8054cde9fc WIP: Switch on IR 2020-02-13 18:02:56 +01:00
Tom Hvitved
332733a92e Java/C++: Follow-up changes 2020-02-13 16:34:06 +01:00
Tom Hvitved
b5b0c2b8cf Data flow: Sync files 2020-02-13 16:34:06 +01:00
Geoffrey White
4412cea04a Merge pull request #2821 from jbj/ValueNumbering-var-operand
C++: Fix perf of IR value numbering
2020-02-13 09:11:34 +00:00
Robert Marsh
52b164434d C++: remove accidental commit 2020-02-12 15:23:30 -08:00
Robert Marsh
1d5971f8ec C++: accept test changes from extractor update 2020-02-12 13:29:21 -08:00
Jonas Jensen
2abe416670 Merge pull request #2799 from MathiasVP/missing-flow-in-crement
C++: Fix false negatives for postfix crement expressions
2020-02-12 15:03:48 +01:00
Jonas Jensen
033a4c30ea C++: Fix perf of IR value numbering
On some snapshots, notably ffmpeg, the IR `ValueNumbering` recursion
would generate billions of tuples and eventually run out of space.

It turns out it was fairly common for an `Instruction` to get more than
one `ValueNumber` in the base cases for `VariableAddressInstruction` and
`InitializeParameterInstruction`, and it could also happen in an
instruction with more than one operand of the same `OperandTag`. When a
binary operation was applied to an instruction with `m` value numbers
and another instruction with `n` value numbers, the result would get
`m * n` value numbers. This led to doubly-exponential growth in the
number of value numbers in rare cases.

The underlying reason why a `VariableAddressInstruction` could get
multiple value numbers is that it was keyed on the associated
`IRVariable`, and the `IRVariable` is defined in part by the type of its
underlying `Variable` (or other AST element). If the extractor defines a
variable to have multiple types because of linker ambiguity, this leads
to the creation of multiple `IRVariable`s. That should ideally be solved
in `TIRVariable.qll`, but for now I've put a workaround in
`ValueNumberingInternal.qll` instead.

To remove the problem with instructions having multiple operands, the
construction in `Operand.qll` will now filter out any such operand. It
wasn't enough to apply that filter to the `raw` stage, so I've applied
it to all three stages.
2020-02-12 14:38:41 +01:00
Mathias Vorreiter Pedersen
c8be67ce0e C++: Generalize PostfixCrementOperation to CrementOperation to fix false negatives reported by Geoffrey 2020-02-12 13:26:10 +01:00
Robert Marsh
837fe84cec C++/C#: autoformat Opcode.qll 2020-02-11 12:18:45 -08:00
Robert Marsh
f467260815 C++: respond to PR comments. 2020-02-11 12:17:46 -08:00
Mathias Vorreiter Pedersen
1dd5926f41 C++: Generalize new case in adjustedSink to all AssignOperations 2020-02-11 17:15:42 +01:00
Geoffrey White
75a50a1714 C++: Understand formatting function varargs as needing null termination. 2020-02-11 15:25:59 +00:00
Geoffrey White
de8d84dfff C++: Clearer comments in NoSpaceForZeroTerminator.ql. 2020-02-11 15:25:59 +00:00
Geoffrey White
2f290bd528 C++: Additional test cases. 2020-02-11 15:25:59 +00:00
Robert Marsh
d672f8f863 C++: unflip cause strings in FunctionWithWrapper 2020-02-10 15:57:38 -08:00
Robert Marsh
d09f78db29 C++: fix cartesian product in FunctionWithWrapper 2020-02-10 13:02:58 -08:00
Dave Bartolomeo
405850e02b Merge pull request #2805 from jbj/dataflow-sideeffect-join
C++: IR DataFlowUtil::modelFlow join order fix
2020-02-10 13:04:51 -07:00
Robert Marsh
58bba86be4 C++: autoformat 2020-02-10 09:52:23 -08:00
Mathias Vorreiter Pedersen
bcd84efe8d C++: Add += and friends to adjustedSink 2020-02-10 15:50:52 +01:00
Jonas Jensen
cf1bc693b4 C++: Fix coversEntireVariable perf in AliasedSSA
This predicate got an unfortunate join order, leading to these tuple
counts on ElektraInitiative/libelektra:

    (290s) Tuple counts for AliasedSSA::VariableMemoryLocation::coversEntireVariable_dispred#f:
    57117     ~0%     {3} r1 = SCAN IRType::IRType::getByteSize_dispred#ff AS I OUTPUT 0, (I.<1> * 8), I.<0>
    421445272 ~0%     {3} r2 = JOIN r1 WITH AliasedSSA::VariableMemoryLocation#fffffff_5601#join_rhs AS R ON FIRST 2 OUTPUT R.<3>, r1.<2>, R.<2>
    103282    ~2%     {1} r3 = JOIN r2 WITH AliasConfiguration::Allocation::getIRType_dispred#ff AS R ON FIRST 2 OUTPUT r2.<2>
                      return r3

With this commit, we get these tuple counts instead:

    (0s) Tuple counts for AliasedSSA::VariableMemoryLocation::varIRTypeHasBitRange#bff:
    361874 ~0%     {3} r1 = SCAN AliasedSSA::VariableMemoryLocation#fffffff AS I OUTPUT I.<1>, 0, I.<0>
    361874 ~0%     {3} r2 = JOIN r1 WITH AliasConfiguration::Allocation::getIRType_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, 0, r1.<2>
    361874 ~1%     {3} r3 = JOIN r2 WITH IRType::IRType::getByteSize_dispred#ff AS R ON FIRST 1 OUTPUT r2.<2>, 0, (R.<1> * 8)
                   return r3

    (0s) Tuple counts for AliasedSSA::VariableMemoryLocation::coversEntireVariable_dispred#f:
    103282 ~2%     {1} r1 = JOIN AliasedSSA::VariableMemoryLocation#fffffff_056#join_rhs AS L WITH AliasedSSA::VariableMemoryLocation::varIRTypeHasBitRange#bff AS R ON FIRST 3 OUTPUT L.<0>
    103282 ~2%     {1} r2 = STREAM DEDUP r1
                   return r2
2020-02-10 15:18:34 +01:00
Jonas Jensen
47c12817ad C++: IR DataFlowUtil::modelFlow join order fix
We had these tuple counts on ElektraInitiative/libelektra (note that the
`modelFlow` predicate got inlined into
`simpleInstructionLocalFlowStep`):

    (652s) Tuple counts for DataFlowUtil::simpleInstructionLocalFlowStep#ff:
    ...
    19701      ~1%      {4} r27 = JOIN r26 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r26.<2>, r26.<1>, r26.<0>
    7908       ~0%      {3} r28 = JOIN r27 WITH SSAConstruction::Cached::getInstructionIndex#ff@staged_ext AS R ON FIRST 2 OUTPUT r27.<0>, r27.<2>, r27.<3>
    4023       ~0%      {3} r29 = JOIN r28 WITH Instruction::WriteSideEffectInstruction#class#ff AS R ON FIRST 1 OUTPUT r28.<1>, r28.<2>, r28.<0>
    ...
    1060807009 ~3%      {3} r34 = JOIN r33 WITH SSAConstruction::Cached::getInstructionIndex#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r33.<1>, r33.<2>
    15670      ~5%      {2} r35 = JOIN r34 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 2 OUTPUT r34.<0>, r34.<2>
    7973       ~0%      {2} r36 = JOIN r35 WITH Instruction::ReadSideEffectInstruction::getSideEffectOperand_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r35.<1>
    ...

In this predicate there are two cases (`WriteSideEffectInstruction` and
`ReadSideEffectInstruction`) where we need to join on both the call and
the argument index of a side effect. It works well enough for the first
case, `WriteSideEffectInstruction`, where the call is joined on before
the index, but it explodes in the second case,
`ReadSideEffectInstruction`, where the index is joined first. To fix the
second case, and to guard against future optimizer accidents in the
first case, this commit changes both of those cases to use a new helper
predicate that makes it possible to join on both columns at once. The
resulting tuple counts are:

    (3s) Tuple counts for DataFlowUtil::simpleInstructionLocalFlowStep#ff:
    ...
    7908    ~0%      {3} r27 = JOIN r26 WITH DataFlowUtil::getSideEffectFor#fff AS R ON FIRST 2 OUTPUT R.<2>, r26.<2>, r26.<0>
    4023    ~0%      {3} r28 = JOIN r27 WITH Instruction::WriteSideEffectInstruction#class#ff AS R ON FIRST 1 OUTPUT r27.<1>, r27.<2>, r27.<0>
    ...
    15670   ~5%      {2} r33 = JOIN r32 WITH DataFlowUtil::getSideEffectFor#fff AS R ON FIRST 2 OUTPUT R.<2>, r32.<2>
    7973    ~0%      {2} r34 = JOIN r33 WITH Instruction::ReadSideEffectInstruction::getSideEffectOperand_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r33.<1>
    ...

The bulge is now limited to a factor of two, and that's just because I
didn't write separate versions of `getSideEffectFor` for
`ReadSideEffectInstruction` and `WriteSideEffectInstruction`.
2020-02-10 15:11:30 +01:00
Mathias Vorreiter Pedersen
99a9d7f676 C++: Simplify 2020-02-10 13:01:40 +01:00
Mathias Vorreiter Pedersen
6804018a64 C++: Accept output 2020-02-10 11:37:40 +01:00
Mathias Vorreiter Pedersen
522c629441 C++: Move fix to adjustedSink to avoid generating too many instructions 2020-02-10 11:37:26 +01:00
Mathias Vorreiter Pedersen
52bc25b608 C++: Accept output 2020-02-10 08:50:29 +01:00
Mathias Vorreiter Pedersen
bb30275e2e C++: Fix false negatives for postfix crement expressions 2020-02-09 21:35:07 +01:00