The TRAP writer already buffers writes before emitting to file, but running gzip compression is also fairly costly (especially if you only do it a couple of bytes at a time). Thus, this injects another buffer that collects the emitted tuples in string form, and only triggers gzip compression once the buffer is full. In my local testing, this buffering was actually more beneficial than the one between gzip and file (likely because the gzip writer already emits data in chunks), but that one is still beneficial.
This query already treats structs differently to usual: it includes field -> whole struct taint steps, but explicitly excludes struct -> field steps. This means that a logging framework sinking an entire struct with a tainted field yields an alert, but we don't get FPs caused by writing field `x` but then reading field `y`.
However, protobuf messages have a special treatment, with taint usually associated with the whole struct and getter methods propagating that taint out. Suppressing these getter method steps specifically for the cleartext-logging query mirrors its treatment of structs in general and avoids this sort of field-mismatch FP.
On the downside we will miss same-field propagation like `m.field = password; Log(m.GetField())` if we don't have source code for the implementation of `m`. However this is hopefully unusual since the typical use of protobufs is to serialize and deserialize, rather than using the struct as a general-purpose datastructure.
Switch the successfully extracted files query to the `location, message` results format so that we get rich location information when exporting the results of this query to SARIF. Previously the query used the `message` results format, which meant the interpreted results lacked a location.
Tag the successfully extracted files queries with
`successfully-extracted-files` to make them easier to identify
programmatically in a language-independent way.
This follows the prior art for lines of code queries, which are tagged
`lines-of-code`.