Switch from using additional flow steps with a DataFlow::Configuration
in DefaultTaintTracking to using a TaintTracking::Configuration. This
makes future improvements to TaintTracking::Configuration reflected in
DefaultTaintTracking without further effort. It also removes the
predictability constraint in DefaultTaintTracking, which increases the
number of results, with both new true positives and new false positives.
Those may need to be addressed on a per-query basis.
There are some additional regressions from losing pointer/object
conflation for arguments. Those can be worked around by adding that
conflation to TaintTracking::Configuration until precise indirect
parameter flow is ready.
A performance regression in `definitionByReferenceNodeFromArgument#ff`
was ultimately caused by a join on parameter indexes in
`DefinitionByReferenceNode.getArgument`. Joining on numbers in QL is
always fragile, and somehow the changes in #4432 had caused the join
order here to break.
Instead of tweaking the join order in the slow predicate itself, I added
`pragma[noinline]` to one of the predicates involved in the join on
parameter indexes. This should prevent us from getting similar
performance problems in the future when we write code that joins on
parameter numbers. Joining on indexes is always risky, but it's even
more risky when one of the predicates in the join is inlined by the
compiler and expands to further joins.
I tested performance by running `CgiXss.ql` on a ChakraCore snapshot.
Tuple counts before (I interrupted execution after five minutes or so):
(626s) Tuple counts for DataFlowUtil::definitionByReferenceNodeFromArgument#ff:
58162 ~0% {3} r1 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, -1, I.<0>
26934 ~0% {2} r2 = JOIN r1 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 2 OUTPUT r1.<0>, r1.<2>
26934 ~1% {2} r3 = JOIN r2 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>
26850 ~1% {2} r4 = JOIN r3 WITH Instruction::CallInstruction::getThisArgumentOperand_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r3.<1>
26850 ~0% {2} r5 = JOIN r4 WITH Operand::Operand::getDef_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r4.<1>
26850 ~1% {2} r6 = JOIN r5 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r5.<1>
58162 ~0% {2} r7 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, I.<0>
58162 ~4% {3} r8 = JOIN r7 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 1 OUTPUT R.<1>, r7.<1>, r7.<0>
4026581120 ~0% {4} r9 = JOIN r8 WITH Instruction::CallInstruction::getPositionalArgumentOperand_dispred#fff_102#join_rhs AS R ON FIRST 1 OUTPUT r8.<2>, R.<1>, r8.<1>, R.<2>
31154 ~4% {2} r10 = JOIN r9 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 2 OUTPUT r9.<3>, r9.<2>
31154 ~8% {2} r11 = JOIN r10 WITH Operand::Operand::getDef_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r10.<1>
31154 ~0% {2} r12 = JOIN r11 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r11.<1>
58004 ~0% {2} r13 = r6 \/ r12
return r13
Tuple counts after:
(0s) Tuple counts for DataFlowUtil::definitionByReferenceNodeFromArgument#ff:
385785 ~6% {2} r1 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, I.<0>
385785 ~0% {3} r2 = JOIN r1 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 1 OUTPUT r1.<0>, r1.<1>, R.<1>
385785 ~1% {3} r3 = JOIN r2 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r2.<2>, r2.<1>
198736 ~4% {2} r4 = JOIN r3 WITH Instruction::CallInstruction::getPositionalArgument#fff AS R ON FIRST 2 OUTPUT R.<2>, r3.<2>
198736 ~0% {2} r5 = JOIN r4 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r4.<1>
385785 ~1% {3} r6 = SCAN DataFlowUtil::DefinitionByReferenceNode#class#ff AS I OUTPUT I.<1>, -1, I.<0>
186891 ~1% {2} r7 = JOIN r6 WITH Instruction::IndexedInstruction#ff AS R ON FIRST 2 OUTPUT r6.<0>, r6.<2>
186891 ~2% {2} r8 = JOIN r7 WITH Instruction::SideEffectInstruction::getPrimaryInstruction_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r7.<1>
183201 ~3% {2} r9 = JOIN r8 WITH Instruction::CallInstruction::getThisArgumentOperand_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r8.<1>
183201 ~0% {2} r10 = JOIN r9 WITH Operand::Operand::getDef_dispred#3#ff AS R ON FIRST 1 OUTPUT R.<1>, r9.<1>
175449 ~8% {2} r11 = JOIN r10 WITH Instruction::Instruction::getUnconvertedResultExpression_dispred#ff AS R ON FIRST 1 OUTPUT R.<1>, r10.<1>
374185 ~3% {2} r12 = r5 \/ r11
return r12
In `getChild(int childIndex)`, the actual values of `childIndex` don't matter, as long as they are in the correct order. Rather than doing complicated math to compute the indices for the synthesized `.getFullyConverted()` children, just use the `rank` aggregate to order all children first by whether or not the child is a conversion, then by the original child index.