Commit Graph

13932 Commits

Author SHA1 Message Date
Dave Bartolomeo
dbe12e7d02 C++: More PR feedback 2019-02-07 14:26:01 -08:00
Dave Bartolomeo
eb7016620b C++: Fix PR feedback 2019-02-07 14:26:00 -08:00
Dave Bartolomeo
7b54db8ca9 C++: Fix getIdentityString for TemplateParameter 2019-02-07 14:26:00 -08:00
Dave Bartolomeo
5d71d06dbc C++: Fix test expectation 2019-02-07 14:26:00 -08:00
Dave Bartolomeo
bd4ecc3e91 C++: Declaration.getIdentityString and Type.getTypeIdentityString
This PR adds new predicates to `Declaration` and `Type` to get a fully-qualified canonical name for the element, suitable for debugging and dumps. It includes template parameters, cv qualifiers, function parameter and return types, and fully-qualified names for all symbols. These strings are too large to compute in productions queries, so they should be used only for dumps and debugging. Feel free to suggest better names for these predicates.

I've updated PrintAST and PrintIR to use these instead of `Function.getFullSignature()`. The biggest advantage of the new predicates is that they handle lambdas and local classes, which `getQualifiedName` and `getFullSignature` do not. This makes IR and AST dumps much more usable for real-world snapshots.

Along the way, I cleaned up some of our handling of `IntegralType` to use a single table for tracking the signed, unsigned, and canonical versions of each type. The canonical part is new, and was necessary for `getTypeIdentityString` so that `signed int` and `int` both appear as `int`.
2019-02-07 14:26:00 -08:00
Dave Bartolomeo
f460d2c1c3 C++: Fix another test expectation 2019-02-07 09:56:56 -08:00
Dave Bartolomeo
f2a0a86c6d C++: Update captures test for closure fields extractor fix 2019-02-07 09:56:56 -08:00
Robert Marsh
3c638b5966 C++: add edge-based predicates to IRGuards
These predicates currently take a pair of `IRBlock`s - as it stands, at
most one edge can exist from one `IRBlock` to a given other `IRBlock`.
We may need to revisit that assumption and create an `IREdge` IPA type
at some future date
2019-02-07 09:38:54 -08:00
Robert Marsh
b85b7744ef C++: refactor branch instruction handling 2019-02-07 09:36:34 -08:00
Robert Marsh
92ba0919cc Merge pull request #899 from Semmle/rdmarsh/cpp/IRRename-rebased
C++: Rename a few problematic IR APIs
2019-02-07 09:28:59 -08:00
Jonas Jensen
ce31b14f21 C++: Add a queries.xml to the test dir
This makes compilation caching work with `*.ql` files in the test dir
when using `odasa qltest --optimize`.
2019-02-07 11:04:20 +01:00
Jonas Jensen
47ad280e34 Merge pull request #842 from geoffw0/gets
CPP: Clean up PotentialBufferOverflow.ql, PotentiallyDangerousFunction.ql
2019-02-07 09:27:00 +01:00
Dave Bartolomeo
f6d392089e C++: Replace getAnOperand().(XXXOperand) with getXXXOperand() 2019-02-06 22:44:53 -08:00
Dave Bartolomeo
4c23ad100e C++: Rename a few IR APIs
There are a few IR APIs that we've found to be confusingly named. This PR renames them to be more consistent within the IR and with the AST API:

`Instruction.getFunction` -> `Instruction.getEnclosingFunction`: This was especially confusing when you'd call `FunctionAddressInstruction.getFunction` to get the function whose address was taken, and wound up with the enclosing function instead.

`Instruction.getXXXOperand` -> `Instruction.getXXX`. Now that `Operand` is an exposed type, we want a way to get a specific `Operand` of an `Instruction`, but more often we want to get the definition instruction of that operand. Now, the pattern is that `getXXXOperand` returns the `Operand`, and `getXXX` is equivalent to `getXXXOperand().getDefinitionInstruction()`.

`Operand.getInstruction` -> `Operand.getUseInstruction`: More consistent with the existing `Operand.getDefinitionInstruction` predicate.
2019-02-06 22:43:49 -08:00
Robert Marsh
97c5b8ee44 Merge pull request #882 from jbj/ir-ConstantAnalysis-perf
C++: Speed up IR ConstantAnalysis
2019-02-06 22:29:09 -08:00
Dave Bartolomeo
1f873d0c9c Merge pull request #890 from aeyerstaylor/more-field-overriding
C++: Use more field overriding in IR construction
2019-02-06 17:04:43 -08:00
Geoffrey White
2321ae911e CPP: Fix the test by adding PotentiallyDangerousFunction. 2019-02-05 17:58:30 +00:00
Geoffrey White
018450500d CPP: Fix closing tag. 2019-02-05 17:58:30 +00:00
Geoffrey White
c05df6ea4c CPP: Add reference. 2019-02-05 17:58:30 +00:00
Geoffrey White
f73a3a6a24 CPP: Explain the danger of gets a bit more in qhelp. 2019-02-05 17:58:30 +00:00
Geoffrey White
0541950c44 CPP: Clean up PotentialBufferOverflow.ql a bit. 2019-02-05 17:58:30 +00:00
Geoffrey White
c32e1b8000 CPP: Change the @name of PotentialBufferOverflow.ql to be in line with everything else. 2019-02-05 17:58:30 +00:00
Geoffrey White
f7e7737789 CPP: Update qhelp. 2019-02-05 17:58:30 +00:00
Geoffrey White
87a25f0cbe CPP: Update CWE tags. 2019-02-05 17:58:30 +00:00
Geoffrey White
429f53ed74 CPP: Move the 'gets' case. 2019-02-05 17:58:30 +00:00
Geoffrey White
a82832e779 CPP: Add a test that uses 'gets'. 2019-02-05 17:58:30 +00:00
Geoffrey White
bbc8e7886b CPP: Rearrange PotentiallyDangerousFunction.ql. 2019-02-05 17:58:30 +00:00
alexet
59a5bec769 CPP: Use more field overriding 2019-02-05 13:07:41 +00:00
Jonas Jensen
cad4bac548 C++: Concretize ConstantAnalysis NegateInstruction
This is just to make the QL shorter. It generates the same DIL.
2019-02-05 11:05:47 +01:00
Jonas Jensen
be35c674a7 C++: Factor out getConstantValueToPhi
This speeds up `getConstantValue`, the main predicate in
`ConstantAnalysis`, from 2.4s to 1.6s on comdb2.
2019-02-05 11:05:47 +01:00
Jonas Jensen
283bb2f6d0 C++: Factor out ConstantAnalysis BinaryInstruction
This speeds up comdb2 constant analysis from 6.5s to 4.5s.
2019-02-05 11:05:47 +01:00
Jonas Jensen
d66578eaa8 C++: Add IntegerPartial, use in ConstantAnalysis
This adds `IntegerPartial.qll`, which is similar to
`IntegerConstant.qll` except that it contains partial functions on
integers instead of total functions on optional integers. This speeds up
the constant analysis so it takes 6.5s instead of 10.3s on comdb2.
2019-02-05 11:05:47 +01:00
semmle-qlci
06ae0c421a Merge pull request #864 from jbj/ir-TIRVariable-shared
Approved by dave-bartolomeo
2019-02-05 07:55:28 +00:00
Dave Bartolomeo
dc209246aa Merge pull request #866 from jbj/ir-TInstruction-normalize
C++: Normalize TInstruction
2019-02-04 12:14:45 -08:00
Dave Bartolomeo
aadd5cf202 Merge pull request #863 from jbj/ir-variableLiveOnEntryToBlock-rhs
C++: Speed up variableLiveOnEntryToBlock in IR
2019-02-04 10:47:29 -08:00
Jonas Jensen
3735cb69ce C++: No InstructionTag in SSAConstruction
This does to `SSAConstruction` what the previous commit did to
`IRConstruction`. An instruction in `SSAConstruction` is now defined in
terms of how it was created rather than what it can be queried for.
Effectively, this defines `TInstruction` as `TInstructionTag` was
defined before and then removes `TInstructionTag` from
`SSAConstruction`. This also has the benefit of removing the concept of
an instruction tag from the public predicates on `Instruction`.
2019-02-04 19:43:17 +01:00
Jonas Jensen
8ae3551ec1 C++: Normalize TInstruction in raw IR
This definition was denormalized to the extent that an instruction was
defined in terms of the six main attributes it could be queried for.
This made it possible to do multi-column joins on those six attributes,
but it doesn't appear that this feature was useful in practice. The main
multi-column join that was in use was on the pair of
(`TranslatedElement, InstructionTag`), but the `TranslatedElement` was
not part of the `TInstruction`.

This commit changes `TInstruction` to be defined in terms of what it's
_built from_ (`TranslatedElement, InstructionTag`) instead. This makes
it possible to do multi-column joins on those two components, and then
there are separate predicates (usually with two columns) to query
instruction attributes, replacing the many uncached projections from
`MkInstruction` that were generated before.

An immediate advantage is that an `Expr` with multiple types will no
longer give rise to multiple `Instruction`s, fixing most of the errors
from the sanity query `ambiguousSuccessors`. The code inside
`IRConstruction.qll` becomes simpler and hopefully faster as there is no
longer a translation from `TranslatedElement` to `Locatable` and back
again.
2019-02-04 19:43:17 +01:00
Jonas Jensen
3e03835630 C++: Only create variables in FunctionIRs
The previous commit had the side effect that `IRVariable`s were created
for all `Functions`, including those that did not have IR. This commit
restricts all `TIRVariable` constructors to functions that have IR.
2019-02-04 19:34:16 +01:00
Dave Bartolomeo
6d3d9025f7 Merge pull request #867 from jbj/ir-ignoreExprAndDescendants-perf
C++: Replace FastTC with iteration in ignoreExpr
2019-02-04 09:26:32 -08:00
Dave Bartolomeo
7345c921d9 Merge pull request #857 from jbj/ir-getInstruction
C++: Fix TranslatedElement.getInstruction perf
2019-02-04 09:24:00 -08:00
Robert Marsh
411c285aa3 Merge pull request #870 from jbj/ir-shortestDistances
C++: Use shortestDistances HOP for IR BB indexes
2019-02-04 09:19:15 -08:00
Jonas Jensen
45a995ba52 C++: Accept test changes from last commit 2019-02-04 13:00:28 +01:00
Jonas Jensen
8368c37781 C++: Use shortestDistances HOP for IR BB indexes
This doesn't make it much faster, but it reduces the debug output
volume. It also simplifies the code.

I've found this change necessary when I compute the full IR on a
Wireshark snapshot in QL4E. Without it, Eclipse runs out of memory
because the console log is too large.
2019-02-04 11:40:11 +01:00
Jonas Jensen
60141bf317 C++: ignoreExprAndDescendants QL-796 workaround
The new predicate `isOrphan` gets inlined into
`ignoreExprAndDescendants`, whose performance improves from

    TranslatedElement::ignoreExprAndDescendants#f .. 23.4s (executed 9 times)

to

    TranslatedElement::ignoreExprAndDescendants#f ... 4.3s (executed 9 times)

This dramatic improvement is not only due to eliminating a type check in
the recursive case. Removing the type check from the other base cases
also enabled them to get better join orders.
2019-02-03 16:55:12 +01:00
Jonas Jensen
66e7c26d4e C++: Replace FastTC with iteration in ignoreExpr
Before, `ignoreExprAndDescendants` and its related predicates had this
timing on Wireshark.

    #TranslatedElement::getRealParent#ffPlus#swapped ......... 25.7s
    TranslatedElement::ignoreExprAndDescendants#f ............ 16.9s
    TranslatedElement::getRealParent#ff ...................... 7.2s
    TranslatedElement::ignoreExpr#f .......................... 4.8s
    TranslatedElement::ignoreExpr#f#antijoin_rhs ............. 3.2s
    TranslatedElement::getRealParent#ff_10#higher_order_body . 2.2s

After, it looks like this

    TranslatedElement::ignoreExprAndDescendants#f ............ 23.4s (executed 9 times)
    TranslatedElement::getRealParent#ff ...................... 6.3s
    TranslatedElement::ignoreExpr#f#antijoin_rhs ............. 4.8s
    TranslatedElement::ignoreExpr#f .......................... 3.7s
    TranslatedElement::getRealParent#ff_10#join_rhs .......... 2.5s
    project#TranslatedElement::getRealParent#ff .............. 1.3s
2019-02-03 16:55:12 +01:00
Patrik Schönfeldt
ac249cdbbe Fix reccomendation for LargeParameter (C++)
The previous reccomentation changed the behaviour of the code.
A user following the advice might have broken her/his code:
With call-by-value, the original parameter is not changed.
With a call-by-reference, however, it may be changed. To be sure,
nothing breaks by blindly following the advice, suggest to pass a
const reference.
2019-02-03 15:44:13 +01:00
Jonas Jensen
f8318ef96f C++: Move TIRVariable to its own file
The `SSAConstruction.getNewIRVariable` was very slow on Wireshark. This
was probably because it couldn't join on multiple columns
simultaneously. Instead of improving the join, I observed that the
`TIRVariable` type was the same between all three IR stages except for a
few occurrences of `FunctionIR` that could easily be changed to
`Function`. By sharing `TIRVariable` between all the stages, we avoid
recomputing it and translating it between every stage, turning the slow
`getNewIRVariable` predicate into a no-op.

This change means that later stages of the IR can't introduce new
variables, but that was already the case because
`config/identical-files.json` forced all three `IRVariable.qll` files to
be identical.
2019-02-03 13:36:30 +01:00
Jonas Jensen
3afefce8ef C++: Improve order of parameters in SSA def/use
This changes the order so the parameter that's sometimes projected away
is the last one, making the projection cheap.
2019-02-03 13:34:02 +01:00
Jonas Jensen
4ac22253eb C++: Speed up variableLiveOnEntryToBlock in IR
This predicate computed a local CP between all defs and uses of the same
virtual variable in a basic block. This wasn't a problem in
`unaliased_ssa`, but it became a huge problem in `aliased_ssa`, probably
because many variables can be modelled with a single virtual variable
there.

Before this commit, evaluation of `aliased_ssa`'s
`variableLiveOnEntryToBlock#ff#antijoin_rhs` on Wireshark took 80
_minutes_. After this commit, that predicate and its immediate
dependencies take around 5 _seconds_.
2019-02-03 13:25:18 +01:00
Jonas Jensen
e81d197ebd C++: Revert doc-related changes to dbscheme
These changes to the dbscheme were made in 7cc1442ecb and a98aae0a24
without a corresponding upgrade script in the internal repo.
2019-02-01 10:01:29 +01:00