In https://github.com/github/codeql/pull/8641, `localFlowExit` was
changed to use `Stage2::readStepCand` instead of `read`, which means
that the big-step relation is broken up less. This causes test result
changes. Nothing is lost from the `select` clause, but some results may
have fewer paths, and fewer nodes and edges are output in the test
results.
In the `subpaths` section, the last node is now printed without its type
if it is the sink of the path.
This comes from the commit "Dataflow: Bugfix: include subpaths ending at
a sink. " in https://github.com/github/codeql/pull/7526
The TRAP writer already buffers writes before emitting to file, but running gzip compression is also fairly costly (especially if you only do it a couple of bytes at a time). Thus, this injects another buffer that collects the emitted tuples in string form, and only triggers gzip compression once the buffer is full. In my local testing, this buffering was actually more beneficial than the one between gzip and file (likely because the gzip writer already emits data in chunks), but that one is still beneficial.
This query already treats structs differently to usual: it includes field -> whole struct taint steps, but explicitly excludes struct -> field steps. This means that a logging framework sinking an entire struct with a tainted field yields an alert, but we don't get FPs caused by writing field `x` but then reading field `y`.
However, protobuf messages have a special treatment, with taint usually associated with the whole struct and getter methods propagating that taint out. Suppressing these getter method steps specifically for the cleartext-logging query mirrors its treatment of structs in general and avoids this sort of field-mismatch FP.
On the downside we will miss same-field propagation like `m.field = password; Log(m.GetField())` if we don't have source code for the implementation of `m`. However this is hopefully unusual since the typical use of protobufs is to serialize and deserialize, rather than using the struct as a general-purpose datastructure.
Switch the successfully extracted files query to the `location, message` results format so that we get rich location information when exporting the results of this query to SARIF. Previously the query used the `message` results format, which meant the interpreted results lacked a location.