Commit Graph

647 Commits

Author SHA1 Message Date
Denis Levin
7ff8fcd50e Some more typo fixes and a fix to test files 2019-06-13 17:16:30 -07:00
Denis Levin
1b8117ba3a C++: Mishandling Japanese Era and Leap Year in calculations 2019-05-21 14:49:40 -07:00
Geoffrey White
67527820a1 Merge pull request #1335 from EdoDodo/optimise-preprocessor
C++: Optimise quadratic code in PreprocessorBranchDirective
2019-05-20 15:58:33 +01:00
Anders Schack-Mulligen
9ebeac25ad Merge pull request #1329 from hvitved/dataflow/performance
Data flow: performance improvements
2019-05-20 14:27:03 +02:00
Edoardo Pirovano
30198c326d C++: Optimise quadratic code in PreprocessorBranchDirective 2019-05-20 12:57:47 +01:00
Tom Hvitved
bc00877ff2 Data flow: Add nomagic to storeCand() 2019-05-20 12:05:20 +02:00
Tom Hvitved
360c7a1ac5 Address review comments 2019-05-20 09:59:17 +02:00
Robert Marsh
762c977be7 Merge pull request #1326 from jbj/addressConstantVariable-isConstexpr
C++: Use isConstexpr instead of workaround in AddressConstantExpr
2019-05-16 15:18:56 -07:00
Robert Marsh
5f77ac4cf5 Merge pull request #1325 from jbj/reachableRecursive
C++: reachableRecursive refactor for performance
2019-05-16 14:05:57 -07:00
Jonas Jensen
947aaa9e4e C++: reachableRecursive refactor for performance
The `reachable` predicate is large and slow to compute. It's part of a
mutual recursion that's non-linear, meaning it has a recursive call on
both sides of an `and`.

This change removes a part of the base case that has no effect on
recursive cases. The removed part is added back after the recursion has
finished.

Before, on Wireshark:

    ControlFlowGraph::Cached::reachable#f .......... 20.8s (executed 9800 times)
    ConstantExprs::successors_adapted#ff ........... 4.2s (executed 615 times)
    ConstantExprs::potentiallyReturningFunction#f .. 3.9s (executed 9799 times)
    ConstantExprs::possiblePredecessor#f ........... 2.9s (executed 788 times)

After, on Wireshark:

    ConstantExprs::reachableRecursive#f ............ 13.2s (executed 9800 times)
    ConstantExprs::successors_adapted#ff ........... 4.2s (executed 615 times)
    ConstantExprs::potentiallyReturningFunction#f .. 4.3s (executed 9799 times)
    ConstantExprs::possiblePredecessor#f ........... 2.6s (executed 788 times)

I've verified that this change doesn't change what's computed by
checking that the output of the following query is unchanged:

    import cpp
    import semmle.code.cpp.controlflow.internal.ConstantExprs

    select
      strictcount(ControlFlowNode n | reachable(n)) as reachable,
      strictcount(ControlFlowNode n1, ControlFlowNode n2 | n2 = n1.getASuccessor()) as edges,
      strictcount(FunctionCall c | aborting(c)) as abortingCall,
      strictcount(Function f | abortingFunction(f)) as abortingFunction
2019-05-16 13:39:23 +02:00
Jonas Jensen
db6a807ff6 C++: Move same-stage predicates into cached module
This change only moves code around -- there are no changes to predicate
bodies or signatures.

The predicates that go in `ConstantExprs.Cached` after this change were
already cached in the same stage or, in the case of the `aborting*`
predicates, did not need to be cached. This is a fortunate consequence
of how the mutual recursion between the predicates happens to work, and
it's not going to be the case after the next commit.
2019-05-16 13:34:50 +02:00
Tom Hvitved
02ca09aa43 Data flow: performance improvements 2019-05-16 07:35:10 +02:00
Robert Marsh
14795863e2 Merge pull request #1303 from jbj/hasQualifiedName
C++: Fix `getQualifiedName` performance issues
2019-05-15 12:42:57 -07:00
Jonas Jensen
d820fc9cd2 C++: Address review comments about the comments 2019-05-15 14:55:26 +02:00
Jonas Jensen
f38253da89 C++: Use isConstexpr instead of workaround
The `addressConstantVariable` predicate was the slowest single predicate
when running the full LGTM suite on Chromium. Fortunately it's only
executed once, but it could be easily made faster by using the new
`Variable.isConstexpr` predicate instead of the slow workaround that was
in its place.
2019-05-15 14:41:05 +02:00
Jonas Jensen
8b012b2cab C++: Remove unneeded import 2019-05-15 14:35:05 +02:00
Geoffrey White
4cc23cce13 CPP: Document. 2019-05-10 16:26:39 +01:00
Geoffrey White
581266c347 CPP: Alternative fix. 2019-05-10 16:26:38 +01:00
Jonas Jensen
639d715d03 Merge pull request #1226 from hvitved/dataflow/prepare-for-csharp
Generalize data-flow library in preparation for C# adoption
2019-05-06 14:42:46 +02:00
Jonas Jensen
b52015a584 C++: QLDoc for QualifiedName.qll 2019-05-06 11:28:56 +02:00
Jonas Jensen
56e88cbac0 C++: Use underlyingElement for QualifiedName calls
Since the types in `QualifiedName.qll` are raw db types, callers need to
use `underlyingElement` and `unresolveElement` as appropriate. This has
no effect right now but will be needed when we switch the AST type
hierarchy to `newtype`s.
2019-05-06 11:24:28 +02:00
Tom Hvitved
d9bf0a670e Data flow: Address review comments 2019-05-03 15:00:48 +02:00
Jonas Jensen
b98daae077 C++: Remove deprecated from hasQualifiedName/1
The predicate is still deprecated, but we can't mark it as such until
the queries in our internal repo have migrated away from it.
2019-05-03 13:22:23 +02:00
Jonas Jensen
6d954fe53e C++: Deprecate hasQualifiedName/1
This predicate handles templates differently from the other overloads
with the same name, so it's likely to cause confusion.
2019-05-03 10:37:48 +02:00
Jonas Jensen
5e789901df C++: Remove all uses of hasQualifiedName/1 2019-05-03 10:37:48 +02:00
Jonas Jensen
64a87a863c C++: Remove uses of getQualifiedName
This removes all uses of `Declaration.getQualifiedName` that I think can
be removed without changing any behaviour. The following uses in the
LGTM default suite remain:

* `cpp/ql/src/Security/CWE/CWE-121/UnterminatedVarargsCall.ql` (in `select`).
* `cpp/ql/src/semmle/code/cpp/dataflow/internal/DataFlowDispatch.qll` (needs template args).
* `cpp/ql/src/semmle/code/cpp/security/FunctionWithWrappers.qll` (used for alert messages).
2019-05-03 10:37:48 +02:00
Jonas Jensen
0a2e28858a C++: Rework how qualified names are computed 2019-05-03 10:37:48 +02:00
Jonas Jensen
b51ce87ae8 C++: Autoformat QualifiedName.qll 2019-05-03 10:37:47 +02:00
Jonas Jensen
b97ff1a72f C++: Take QualifiedName.qll from Ian's branch
This imports `QualifiedName.qll` from
2f74a456290b9e0850b7308582e07f5d68de3a36 and makes minimal changes so it
compiles.

Original author: Ian Lynagh <ian@semmle.com>
2019-05-03 10:37:12 +02:00
Tom Hvitved
b6206d7370 Data flow: Introduce ReturnKind 2019-05-02 20:30:50 +02:00
Dave Bartolomeo
7071692373 C++: Clarify comment based on PR feedback 2019-05-02 11:18:10 -07:00
Dave Bartolomeo
fef58ec1ee C++: Add "~" prefix to inexact uses 2019-05-02 11:18:09 -07:00
Dave Bartolomeo
5dcd314908 C++: Update to conform to new API naming 2019-05-02 11:18:09 -07:00
Dave Bartolomeo
65535449d6 C++: Fix merge conflicts 2019-05-02 11:18:09 -07:00
Dave Bartolomeo
0cde86d3c1 C++: Fix PR feedback 2019-05-02 11:18:09 -07:00
Dave Bartolomeo
9869fd32d0 C++: Add implementation documentation for SSA 2019-05-02 11:18:08 -07:00
Dave Bartolomeo
e0f7344676 C++: Imprecise definitions in SSA 2019-05-02 11:18:08 -07:00
Dave Bartolomeo
eed0894029 C++: Add operand labels for more operand tags
I kept forgetting which operand on a Chi instruction was which, so I added dump labels. I added labels for the function target of a `Call`, for positional arguments, and for address operands as well.
2019-05-02 11:18:08 -07:00
Nick Rolfe
50c901d6d9 C++: remove pointless predicate 2019-05-02 11:16:21 +01:00
Nick Rolfe
8da2f0b8dc C++: clarify folds only appear in uninstantiated templates 2019-05-02 11:16:21 +01:00
Nick Rolfe
4352a20be0 C++: add support for C++17 fold expressions 2019-05-02 11:16:21 +01:00
Geoffrey White
ac277ad7ad CPP: Fix %I length specifier. 2019-05-01 11:12:06 +01:00
Robert Marsh
514d405630 C++: Use CallInstruction as DataFlowCall 2019-04-29 14:18:09 -07:00
Tom Hvitved
5f6e9121b3 C++: Generalize FunctionCall to Call in data-flow library 2019-04-29 20:42:07 +02:00
Tom Hvitved
29e59e6d1e Address review comments 2019-04-29 20:19:31 +02:00
semmle-qlci
2ede941097 Merge pull request #1291 from jbj/backEdgeSuccessor-perf
Approved by dave-bartolomeo
2019-04-29 18:18:27 +01:00
semmle-qlci
0ffba8b4eb Merge pull request #1289 from jbj/dominanceFrontier-iterated-ir
Approved by dave-bartolomeo
2019-04-29 18:14:20 +01:00
semmle-qlci
d53f5aac13 Merge pull request #1228 from jbj/ir-result-type-docs
Approved by dave-bartolomeo
2019-04-29 18:07:22 +01:00
Jonas Jensen
5fd425ae95 C++: fix IRBlock::backEdgeSuccessor performance
The `IRBlock::backEdgeSuccessor` predicate, in its three copies, had
become slow:

    6:IRBlock::Cached::backEdgeSuccessor#fff ...... 1m1s
    7:IRBlock::Cached::backEdgeSuccessor#2#fff .... 52.3s
    8:IRBlock::Cached::backEdgeSuccessor#3#fff .... 26.4s

The slow part was finding all the nodes involved in cycles in the
`forwardEdgeRaw` graph. This was done with `forwardEdgeRaw+(pred, pred)`,
but that got compiled into a materialization of `forwardEdgeRaw+`, which
is a huge relation with 1,816,752,107 rows on Wireshark:

    (1474s) Starting to evaluate predicate IRBlock::Cached::backEdgeSuccessor#3#fff
    (1501s) Tuple counts:
    0          ~0%     {2} r1 = SELECT #IRBlock::Cached::forwardEdgeRaw#3#ffPlus ON FIELDS #IRBlock::Cached::forwardEdgeRaw#3#ffPlus.<0>=#IRBlock::Cached::forwardEdgeRaw#3#ffPlus.<1>
    0          ~0%     {1} r2 = SCAN r1 OUTPUT FIELDS {r1.<0>}
    0          ~0%     {3} r3 = JOIN r2 WITH IRBlock::Cached::blockSuccessor#6#fff ON r2.<0>=IRBlock::Cached::blockSuccessor#6#fff.<0> OUTPUT FIELDS {r2.<0>,IRBlock::Cached::blockSuccessor#6#fff.<1>,IRBlock::Cached::blockSuccessor#6#fff.<2>}
    12411      ~7%     {3} r4 = IRBlock::Cached::backEdgeSuccessorRaw#3#fff \/ r3
                       return r4
    (1501s)  >>> Relation IRBlock::Cached::backEdgeSuccessor#3#fff: 12411 rows using 0 MB

The problem is the `SELECT`. It's fast to join on a fastTC result once
we know what we're looking for, so this fix materializes the identity
relation on `IRBlock` and joins with that so the fastTC ends up on the
RHS of a join, where it's fast. I had to introduce a helper predicate
because even with `noopt` I couldn't get `pred = pred2` to come _before_
`forwardEdgeRaw+(pred, pred2)`. The predicate now takes less than a
second to evaluate:

    (539s) Starting to evaluate predicate IRBlock::Cached::backEdgeSuccessor#fff
    (539s)  >>> Relation IRBlock::Cached::blockImmediatelyDominates#ff: 574677 rows using 0 MB
    (539s) 	 ... created with 574677 rows and 2 columns.
    (539s) Tuple counts:
    702445     ~1%     {2} r1 = SELECT IRBlock::Cached::blockIdentity#ff ON FIELDS IRBlock::Cached::blockIdentity#ff.<0>=IRBlock::Cached::blockIdentity#ff.<1>
    702445     ~1%     {2} r2 = SCAN r1 OUTPUT FIELDS {r1.<0>,r1.<0>}
    0          ~0%     {1} r3 = JOIN r2 WITH #IRBlock::Cached::forwardEdgeRaw#ffPlus ON r2.<0>=#IRBlock::Cached::forwardEdgeRaw#ffPlus.<0> AND r2.<1>=#IRBlock::Cached::forwardEdgeRaw#ffPlus.<1> OUTPUT FIELDS {r2.<0>}
    0          ~0%     {3} r4 = JOIN r3 WITH IRBlock::Cached::blockSuccessor#2#fff ON r3.<0>=IRBlock::Cached::blockSuccessor#2#fff.<0> OUTPUT FIELDS {r3.<0>,IRBlock::Cached::blockSuccessor#2#fff.<1>,IRBlock::Cached::blockSuccessor#2#fff.<2>}
    20487      ~0%     {3} r5 = IRBlock::Cached::backEdgeSuccessorRaw#fff \/ r4
                       return r5
    (539s)  >>> Relation IRBlock::Cached::backEdgeSuccessor#fff: 20487 rows using 0 MB
2019-04-29 15:44:50 +02:00
Jonas Jensen
cd7ba176ab C++: iterated dominance frontier algorithm for IR
Use the iterated dominance frontier algorithm to speed up dominance
frontier calculations. The implementation is copied from d310338c9b.

Before this change, the SSA calculations for unaliased and aliased SSA
used 169.9 seconds in total on these predicates:

    7:Dominance::getDominanceFrontier#2#ff .. 49s
    7:Dominance::blockDominates#2#ff ........ 47.5s
    8:Dominance::getDominanceFrontier#ff .... 44.4s
    8:Dominance::blockDominates#ff .......... 29s

After this change, the above predicates are replaced by two copies of
`getDominanceFrontier`, each of which takes less than a second.
2019-04-29 13:01:37 +02:00