I think these queries have excellent results on lgtm.com. Many of the
results come from projects that use `sprintf` like it's a templating
engine, trusting that values from `argv` or `getenv` contain the correct
number of `%s`. I think we want to flag that.
The structure of the change note is modeled after 91af51cf46.
Some of our IR consistency failure query predicates already produced results in the schema as an `@kind problem` query, including `$@` replacements for the enclosing `IRFunction` to make it easier to figure out which function to dump when debugging. This change moves the rest of the query predicates in `IRConsistency.qll` to do the same. In addition, it wraps each call to `getEnclosingIRFunction()` to return an `OptionalIRFunction`, which can be either a real `IRFunction` or a placeholder in case `getEnclosingIRFunction()` returned no results. This exposes a couple new consistency failures in `syntax-zoo`, which will be fixed in a subsequent commit.
This change also deals with consistency failures when the enclosing `IRFunction` has more than one `Function` or `Location`. For multiple `Function`s, we concatenate the function names. For multiple `Location`s, we pick the first one in lexicographical order. This changes the number of results produced in the existing tests, but does't change the actual number of problems.
Instead of a vague reference to a code comment for another language, the
`controlsBlock` predicate now has the whole comment in it directly.
I've adjusted the wording so it should be reasonably correct for C/C++.
As with the other comments in this file, I don't distinguish between the
condition and its block. I think that makes the explanation clearer
without losing any detail we care about.
To make the code fit the wording of the comment, I changed the
`hasBranchEdge/2` predicate into `getBranchSuccessor/1`.
On kamailio/kamailio the `DataFlowUtil::simpleInstructionLocalFlowStep`
predicate was slow because of the case for single-field structs, where
there was a large tuple-count bulge when joining with
`getFieldSizeOfClass`:
3552902 ~2% {2} r1 = SCAN Instruction::CopyInstruction::getSourceValueOperand_dispred#3#ff AS I OUTPUT I.<1>, I.<0>
2065347 ~2% {2} r35 = JOIN r1 WITH Operand::NonPhiMemoryOperand::getAnyDef_dispred#3#ff AS R ON FIRST 1 OUTPUT r1.<1>, R.<1>
2065827 ~2% {3} r36 = JOIN r35 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r35.<1>, r35.<0>
2065825 ~3% {3} r37 = JOIN r36 WITH Type::Type::getSize_dispred#ff AS R ON FIRST 1 OUTPUT r36.<1>, r36.<2>, R.<1>
2068334 ~2% {4} r38 = JOIN r37 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r37.<2>, r37.<0>, r37.<1>
314603817 ~0% {3} r39 = JOIN r38 WITH DataFlowUtil::getFieldSizeOfClass#fff_120#join_rhs AS R ON FIRST 2 OUTPUT r38.<3>, R.<2>, r38.<2>
8 ~0% {2} r40 = JOIN r39 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 2 OUTPUT r39.<2>, r39.<0>
That's 314M tuples.
Strangely, there is no such bulge on more well-behaved snapshots like
mysql/mysql-server.
With this commit the explosion is gone:
...
2065825 ~0% {4} r37 = JOIN r36 WITH Type::Type::getSize_dispred#ff AS R ON FIRST 1 OUTPUT r36.<0>, R.<1>, r36.<1>, r36.<2>
1521 ~1% {3} r38 = JOIN r37 WITH DataFlowUtil::getFieldSizeOfClass#fff_021#join_rhs AS R ON FIRST 2 OUTPUT r37.<2>, R.<2>, r37.<3>
8 ~0% {2} r39 = JOIN r38 WITH Instruction::Instruction::getResultType_dispred#3#ff AS R ON FIRST 2 OUTPUT r38.<0>, r38.<2>