My recent changes to suppress FPs in `ReturnStackAllocatedMemory.ql`
caused us to lose all results where there was a `Conversion` at the
initial address escape. We cannot handle conversions in general, but
this commit restores the good results for the trivial types of
conversion that we can handle.
The main source of slowness in `BrokenCryptoAlgorithm.ql` was that the
regexp on function (macro) names was evaluated once per call
(invocation) instead of once per name. Factoring out separate predicates
for the problematic functions (macros) fixes this.
On https://github.com/ericniebler/range-v3, this change reduces the run
time of the two slowest predicates from
BrokenCryptoAlgorithm::InsecureMacroSpec#class#f .... 35.1s
BrokenCryptoAlgorithm::InsecureFunctionCall#class#f . 12.8s
to
BrokenCryptoAlgorithm::getAnInsecureFunction#f . 1.2s
BrokenCryptoAlgorithm::getAnInsecureMacro#f .... 12ms
Instead of `algorithmBlacklistRegex` having 2 * 5 results, it now has
only one result, which is a single regex that represents the union of
the previous 2 * 5 regexes. This means that `BrokenCryptoAlgorithm.ql`
has much less regex matching to do.
On https://github.com/ericniebler/range-v3, this change reduces the run
time of the two slowest predicates from
BrokenCryptoAlgorithm::InsecureMacroSpec#class#f .... 2m21s
BrokenCryptoAlgorithm::InsecureFunctionCall#class#f . 54.5s
to
BrokenCryptoAlgorithm::InsecureMacroSpec#class#f .... 35.1s
BrokenCryptoAlgorithm::InsecureFunctionCall#class#f . 12.8s
Since we cannot track data flow from a fully-converted expression but
only the unconverted expression, we should check whether the address
initially escapes into the unconverted expression, not the
fully-converted one.
This fixes most of the false positives observed on lgtm.com.
Also added `PhiInstruction::getAnInputOperand()`, and renamed `PhiInstruction::getAnOperandDefinitionInstruction()` to `getAnInput()` for consistency with other `Instruction` classes.
The `FlowVar::toString` predicate is purely a debugging aid, but
unfortunately it has to be `cached` because it's in a `cached` class.
Before this commit, it caused `Expr::toString` to be evaluated in full.
The `automaticVariableAddressEscapes` predicate got join-ordered badly
in its `unaliased_ssa` version. These are the tuple counts on Wireshark,
where one pipeline step is seen to have 716 million tuples:
```
[2019-03-02 11:29:41] (42s) Starting to evaluate predicate AliasAnalysis::automaticVariableAddressEscapes#2#f
[2019-03-02 11:30:06] (67s) Tuple counts:
353419 ~0% {1} r1 = JOIN project#Instruction::VariableAddressInstruction#class#2#ff WITH AliasAnalysis::resultEscapesNonReturn#2#f ON project#Instruction::VariableAddressInstruction#class#2#ff.<0>=AliasAnalysis::resultEscapesNonReturn#2#f.<0> OUTPUT FIELDS {AliasAnalysis::resultEscapesNonReturn#2#f.<0>}
353419 ~0% {2} r2 = JOIN r1 WITH IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext ON r1.<0>=IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext.<0> OUTPUT FIELDS {IRConstruction::Cached::getInstructionEnclosingFunctionIR#ff@staged_ext.<1>,r1.<0>}
353419 ~0% {2} r3 = JOIN r2 WITH FunctionIR::FunctionIR::getFunction_dispred#3#ff ON r2.<0>=FunctionIR::FunctionIR::getFunction_dispred#3#ff.<0> OUTPUT FIELDS {FunctionIR::FunctionIR::getFunction_dispred#3#ff.<1>,r2.<1>}
716040298 ~0% {2} r4 = JOIN r3 WITH IRVariable::IRVariable#class#3#ff_10#join_rhs ON r3.<0>=IRVariable::IRVariable#class#3#ff_10#join_rhs.<0> OUTPUT FIELDS {IRVariable::IRVariable#class#3#ff_10#join_rhs.<1>,r3.<1>}
4480139 ~0% {2} r5 = JOIN r4 WITH IRVariable::IRAutomaticVariable#class#3#ff ON r4.<0>=IRVariable::IRAutomaticVariable#class#3#ff.<0> OUTPUT FIELDS {r4.<1>,r4.<0>}
66760 ~91% {1} r6 = JOIN r5 WITH Instruction::VariableInstruction::getVariable_dispred#2#ff ON r5.<0>=Instruction::VariableInstruction::getVariable_dispred#2#ff.<0> AND r5.<1>=Instruction::VariableInstruction::getVariable_dispred#2#ff.<1> OUTPUT FIELDS {r5.<1>}
return r6
[2019-03-02 11:30:06] (67s) >>> Relation AliasAnalysis::automaticVariableAddressEscapes#2#f: 35531 rows using 0 MB
```
The predicate contained a cyclic join, which is always hard to optimize.
I couldn't see a reason to join the `FunctionIR`, so I removed that
part. The predicate is now fast, and there are no changes in the tests.