Commit Graph

754 Commits

Author SHA1 Message Date
Ziemowit Laski
e5fc07660d [CPP-386] Print QL AST classes next to elements in PrintAST trees. 2019-07-13 12:23:09 -07:00
Ziemowit Laski
ddb0fd90e9 [CPP-386] Provide getCanonicalQLClass() predicate for many AST elements. 2019-07-13 12:19:40 -07:00
Ziemowit Laski
a4affbebbf [CPP-386] Add ElementBase::getCanonicalQLClass(). 2019-07-13 12:19:40 -07:00
Robert Marsh
41e46f6686 Merge pull request #1584 from geoffw0/swap
CPP: Model std::swap
2019-07-12 10:41:14 -07:00
Geoffrey White
c2fd2e273e CPP: Model taint flow through std::swap. 2019-07-12 18:00:39 +01:00
Dave Bartolomeo
1b38208bab Merge pull request #1567 from jbj/ir-operand-cycles
C++ IR: guard against cycles in operand graph
2019-07-11 13:14:10 -07:00
Dave Bartolomeo
c73b516862 Merge pull request #1541 from jbj/ir-operand-exact
C++ IR: Make instruction operand getters have only exact results
2019-07-11 13:13:20 -07:00
Dave Bartolomeo
00ff2bb6c4 Merge pull request #1554 from jbj/ir-ErrorExpr
C++ IR: support for translating ErrorExpr
2019-07-11 13:05:04 -07:00
Jonas Jensen
23001d5471 Merge pull request #1566 from rdmarsh2/rdmarsh/cpp/pure-functions-effect-model
C++: alias and side effect info for pure functions
2019-07-11 21:21:54 +02:00
Robert Marsh
c195420ba1 C++: respond to PR comments 2019-07-11 11:00:52 -07:00
Jonas Jensen
2324ce77ae C++ IR: Fix soundness of ConstantAnalysis
Now that `PhiInstruction.getAnInput` only has results for congruent
operands, a previous optimization I made to `getConstantValue` is no
longer sound. We have to check that all phi inputs give the same value,
not just the congruent ones. After this change, if there are any
non-congruent operands on a phi instruction, the whole aggregate will
have no result.
2019-07-11 15:51:09 +02:00
Jonas Jensen
7fb43a5a03 C++ IR: getAnyDef -> getDef in RangeUtils.qll
As recommended by Dave in PR review.
2019-07-11 15:35:14 +02:00
ian-semmle
463547f810 Merge pull request #1581 from jbj/revert-noTarget-workaround
Revert "C++: Work around extractor issue CPP-383"
2019-07-11 14:26:15 +01:00
Jonas Jensen
c831c4b58e C++ IR: Fix SignAnalysis after getAnyDef -> getDef
In the `SignAnalysis` abstract interpretation, "unknown sign"
corresponds to the set of _all_ `Sign`, but using `getDef` leads to the
operand having _no_ `Sign`. To fix that, we assign all signs to inexact
operands.
2019-07-11 15:17:55 +02:00
Jonas Jensen
ee5eaef5e4 Revert "C++: Work around extractor issue CPP-383"
The issue is now fixed in the extractor, and I've confirmed that the
workaround is no longer needed for g/an-tao/drogon.

This reverts commit 48a3385809.
2019-07-11 14:18:29 +02:00
Robert Marsh
72f9addd0b C++: move strstr back into main pure str model 2019-07-10 12:27:04 -07:00
Jonas Jensen
52cfbffb95 C++ IR: Fix calls to non-existent predicates
The last commit introduced calls to two predicates that did not exist. I
created `Instruction.getResultAddress` so it now exists and changed the
other call back to use the predicate that does exist.
2019-07-10 15:18:17 +02:00
Jonas Jensen
6d87c05155 Apply suggestions from code review
Co-Authored-By: Dave Bartolomeo <42150477+dave-bartolomeo@users.noreply.github.com>
2019-07-10 15:07:44 +02:00
Jonas Jensen
70f81badcb C++ IR: Move ErrorExpr filter to TranslatedElement
The convention in the IR translation is to handle all ignored
expressions in this central place.
2019-07-10 14:20:09 +02:00
Dave Bartolomeo
e087b6c82a Merge pull request #1571 from jbj/ir-operand-cached
C++ IR: Make TOperand cached
2019-07-09 16:14:58 -07:00
Robert Marsh
3804c1fbcf C++: model returns of strstr and strpbrk 2019-07-09 11:45:27 -07:00
Jonas Jensen
523fc9c1ce C++ IR: make isInCycle fast
Without this `pragma[noopt]`, `isInCycle` gets compiled into RA that
unpacks every tuple of the fast TC:

                      0          ~0%     {2} r1 = SELECT #Operand::getNonPhiOperandDef#3#ffPlus ON FIELDS #Operand::getNonPhiOperandDef#3#ffPlus.<0>=#Operand::getNonPhiOperandDef#3#ffPlus.<1>
                      0          ~0%     {1} r2 = SCAN r1 OUTPUT FIELDS {r1.<0>}
                                         return r2

With this change, it just becomes one lookup in the fast TC data
structure per instruction.
2019-07-09 16:28:55 +02:00
Jonas Jensen
9ee8a89492 C++ IR: Make TOperand cached
Just like `TInstruction` is cached to prevent re-numbering its tuples in
every IR query, I think `TOperand` should be cached too. I tested it on
the small comdb2 snapshot, where it only saves one second of work when
running a second IR query, but the savings should grow when snapshots
are larger and when there are more IR queries in a suite. Tuple
numbering is mildly quadratic, so it should be good to avoid repeating
it.

Adding these annotations adds three cached stages to the existing four
cached stages of the IR. The new cached stages are small and do not
appear to repeat any work from the other stages, so I see no advantage
to merging them with the existing stages.
2019-07-09 16:07:55 +02:00
Jonas Jensen
4324c97d39 C++: Use Opcode::Error for ErrorExpr translation 2019-07-09 13:26:00 +02:00
Jonas Jensen
a86ddd50de C++ IR: Translate ErrorExpr to NoOp 2019-07-09 13:18:11 +02:00
Jonas Jensen
39854a3f7b C++ IR: guard against cycles in operand graph
This doesn't fix the underlying problem that for some reason there are
cycles in the operand graph on our snapshots of the Linux kernel, but it
ensures that the cycles don't lead to non-termination of
`ConstantAnalysis` and `ValueNumbering`.
2019-07-09 11:00:27 +02:00
Jonas Jensen
da13dc6442 C++ IR: Don't propagate GVN through non-exact Copy
The `ValueNumbering` library is supposed to propagate value numberings
through a `CopyInstruction` only when it's _congruent_, meaning it must
have exact overlap with its source. A `CopyInstruction` can be a
`LoadInstruction`, a `StoreInstruction`, or a `CopyValueInstruction`.
The latter is also a `UnaryInstruction`, and the value numbering rule
for `UnaryInstruction` applied to it as well.

This meant that value numbering would propagate even through a
non-congruent `CopyValueInstruction`. That's semantically wrong but
probably only an issue in very rare circumstances, and it should get
corrected when we change the definition of `getUnary` to require
congruence.

What's worse is the performance implications. It meant that the value
numbering IPA witness could take two different paths through every
`CopyValueInstruction`. If multiple `CopyValueInstruction`s were
chained, this would lead to an exponential number of variable numbers
for the same `Instruction`, and we would run out of time and space
while performing value numbering.

This fixes the performance of `ValueNumbering.qll` on
https://github.com/asterisk/asterisk, although this project might also
require a separate change for fixing an infinite loop in the IR constant
analysis.
2019-07-09 10:58:03 +02:00
Dave Bartolomeo
7bbfffec4d Merge pull request #1552 from jbj/ir-builtin_addressof
C++ IR: Support __builtin_addressof
2019-07-08 17:08:38 -07:00
Dave Bartolomeo
52e0f3fb62 Merge pull request #1551 from jbj/ir-DeleteExpr-placeholder
C++: Placeholder translation of delete expressions
2019-07-08 17:07:16 -07:00
Robert Marsh
41e4d920e3 C++: alias and side effect info for pure functions 2019-07-08 12:26:58 -07:00
Robert Marsh
11581e4720 Merge pull request #1562 from geoffw0/models
CPP: Extend StrcpyFunction and update UsingStrcpyAsBoolean.ql
2019-07-08 09:56:16 -07:00
Geoffrey White
29e3e2a5bd CPP: Fix typo. 2019-07-08 09:45:40 +01:00
Jonas Jensen
4b4e7caf9f C++ IR: Support __builtin_addressof 2019-07-05 11:05:00 +02:00
Jonas Jensen
6fe9945c04 C++: Placeholder translation of delete expressions
Before this change, `delete` and `delete[]` expressions had no control
flow after them, which caused the reachability analysis to remove all
code after a delete expression. This commit adds placeholder support for
delete expression by translating them to `NoOp` instructions so their
presence doesn't cause large chunks of the program to be removed.
2019-07-05 10:54:35 +02:00
Jonas Jensen
2111bf5387 C++ IR: getAnyDef -> getDef in RangeAnalysis 2019-07-03 11:05:06 +02:00
Jonas Jensen
c62f73e2a2 C++ IR: getAnyDef -> getDef in SignAnalysis
For signs that follow from guards, we want the guard and the guarded
access to overlap exactly.
2019-07-03 11:05:06 +02:00
Jonas Jensen
a16ed7d613 C++ IR: getAnyDef -> getDef in ValueNumbering
This change seems more in line with what users would expect.
2019-07-03 11:05:06 +02:00
Jonas Jensen
2ce8612a05 C++ IR: allow inexact defs in taint tracking 2019-07-03 11:05:06 +02:00
Jonas Jensen
984405be2e C++ IR: Change many uses of getAnyDef to getDef
This changes all the getters on `Instruction` to use `getDef` instead of
`getAnyDef`, with the result that these getters now only have a result
if the definition is exact.

This is a backwards-INCOMPATIBLE change.
2019-07-03 11:04:57 +02:00
Jonas Jensen
e082451352 C++ IR: add getDef and deprecated predicates
These are the hand-written changes that complete the automatic changes
from the previous commit.
- Add deprecated compatibility wrappers for the renamed predicates.
- Add a new `Operand.getDef` predicate.
- Clarify the QLDoc for all these predicates.
2019-07-03 10:06:48 +02:00
Jonas Jensen
206a96df94 C++ IR: Rename getters for def/use on Operand
This renames `getDefinitionInstruction` to `getAnyDef`, reflecting that
it includes definitions without exact overlap. It renames
`getUseInstruction` to `getUse` for consistency.

    perl -p -i -e 's/\bgetUseInstruction\b/getUse/g; s/\bgetDefinitionInstruction\b/getAnyDef/g' \
      cpp/ql/src/semmle/code/cpp/ir/**/*.ql* \
      cpp/ql/test/**/*.ql* \
      cpp/ql/src/semmle/code/cpp/rangeanalysis/**/*.ql*
2019-07-03 10:06:48 +02:00
Geoffrey White
e079406a5f Merge pull request #1536 from jbj/leap-year-sameBaseType-perf
C++: Fix performance of leap year queries
2019-07-02 17:04:00 +01:00
Jonas Jensen
2a6000c270 C++: getter/setter performance in StructLikeClass
The predicates `getter` and `setter` in `StructLikeClass.qll` were very
slow on some snapshots. On https://github.com/dotnet/coreclr they had
this performance:

    StructLikeClass::getter#fff#antijoin_rhs ........... 3m55s
    Variable::Variable::getAnAssignedValue_dispred#bb .. 3m36s
    StructLikeClass::setter#fff#antijoin_rhs ........... 20.5s

The `getAnAssignedValue_dispred` predicate in the middle was slow due to
magic propagated from `setter`.

With this commit, performance is instead:

   StructLikeClass::getter#fff#antijoin_rhs ........... 497ms
   Variable::Variable::getAnAssignedValue_dispred#ff .. 617ms
   StructLikeClass::setter#fff#antijoin_rhs ........... 158ms

Instead of hand-optimizing the QL for performance, I simplified `setter`
and `getter` to require slightly stronger conditions. Previously, a
function was only considered a setter if it had no writes to other
fields on the same class. That requirement is now relaxed by dropping
the "on the same class" part. I made the corresponding change for what
defines a getter. I think that still captures the spirit of what getters
and setters are.

I also changed the double-negation with `exists` into a `forall`.
2019-07-02 13:49:52 +02:00
Jonas Jensen
5ad0b39f0c C++: Fix performance of leap year queries
The `sameBaseType` predicate was fundamentally quadratic, and this blew
up on large C++ code bases. Replacing it with calls to `Type.stripType`
fixes performance and does not affect the qltests. It looks like
`sameBaseType` was used purely an ad hoc heuristic, so I'm not worried
about the slight semantic difference between `sameBaseType` and
`stripType`.
2019-07-02 11:17:18 +02:00
Jonas Jensen
bf99a0ee15 C++: expand MacroInvocation.getExpr QLDoc 2019-07-01 20:22:24 +02:00
Jonas Jensen
757ec97e7a Merge pull request #1251 from zlaski-semmle/zlaski/cpp370
[CPP-370] Non-constant `format` arguments to `printf` and friends
2019-07-01 14:43:19 +02:00
Pavel Avgustinov
da7591d1f6 Merge pull request #1519 from geoffw0/depkind
CPP: Deprecate Expr.getKind() and Stmt.getKind().
2019-06-27 19:22:57 +01:00
Geoffrey White
65bf778b3a CPP: Deprecate Expr.getKind() and Stmt.getKind(). 2019-06-27 16:15:22 +01:00
Geoffrey White
47644b08b2 CPP: Normalize spacing. 2019-06-26 17:19:56 +01:00
Geoffrey White
4326699aa7 CPP: Extend the StrcpyFunction model. 2019-06-26 17:01:15 +01:00