This is because some git flags can specify arbitrary commands to execute, but its positional arguments cannot, and "--" like in many commands instructs git to consume no further flags.
Previously we considered certain fields of `tar` or `zip` file headers to be sources, but this meant subsequent references to the same field were not considered sanitized. For example, at least some real-world projects used a pattern like `if isIllegalPathTraversal(hdr.Name) { return nil; } ... /* other code using hdr.Name */`. By associating a source with the field-read `.Name` rather than the header itself, we were unable to see that the subsequent read was guarded by the sanitizer function.
Relatedly, it is common to use some intermediary taint-propagating function, as in `clean(s string) { if strings.HasPrefix("..", filepath.Clean(filepath.Join(target, s))) ...`, in the implementation of a sanitizer. We now follow the taint propagation (locally) backwards towards the function parameter, marking the predecessor functions and ultimately the parameter `s` as sanitized in addition to the direct argument to `strings.HasPrefix`. Existing sanitizing-function logic can then sometimes lift this out into the caller too.
We look across function-call boundaries to check there is some common enclosing loop, but false-positives are more likely if in practice there is no control-flow path from the archive iterator to the Symlink call and back.
This is usually dangerous because (if the archive is untrusted) the intent is usually to permit within-archive symlinks, e.g. dest/a/parent -> .. -> dest/a is an acceptable link to unpack. However if EvalSymlinks is not used to take already-unpacked symlinks into account, it becomes possible to sneak tricks like dest/escapes -> dest/a/parent/.. through, which create links leading out of the archive for later abuse.
For example, "x, _ := a, b" would produce an incorrect CSV that branched to the next statement after evaluating "b", skipping the assignment to 'x'. We already had test coverage for function returns, so I'm reasonably confident this only affects parallel assigns, not destructuring ones like "x, y := f()".
This is achieved by splitting the query into two pieces: (1) trace flow from indefinitely large object creation to len(...) calls, then (2) considering those particular len(...) calls as taint propagators, trace taint from the same sources all the way to an allocation call. This is more accurate than the previous solution, which considered any len(...) call to propagate taint, potentially confusing an array that stored a large value in one of its cells for an array which is itself of large size.
Inspired by js/useless-regexp-character-escape, but much much simpler because the Go source code parser forbids unrecognised escapes and its regex engine refuses to compile \\x where x is not a character class or other special token (e.g. start-of-word).
These can represent a username, method name or other non-sensitive component of an Authorization header. For greater precision we could split the query into one investigating Authorization headers and one investigating other sources of sensitive data that can't be sanitized by splitting this way.
Previously we only detected these if the marshalling directly fed the request body within the same function; now it's a general sanitiser for the purposes of XSS.
This creates a PostUpdateNode for x in the contexts `x.field[element]`, `x.field.otherfield`, `x[element].field` and so on.
Most uses of PostUpdateNode implicitly assume its old definition, but our protobuf model benefits.
Unlike the old ODASA consistency queries, new consistency queries can have expected results, so there is no need to have special handling of files with expected errors.
Previously, the "good" example still had the "BAD: " comment in it which was confusing.
This change updates the good example to have a "GOOD: " comment instead.