Technically not always true, but my assumption is that +90% of the time
that's what it will be used for, so while we could be more precise by
adding a taint-step from the `input` part of the construction, I'm not
sure it's worth it in this case.
Furthermore, doing so would break with the current way we model
threat-model sources, and how sources are generally modeled in JS... so
for a very pretty setup it would require changing all the other `file`
threat-model sources to start at the constructors such as
`fs.createReadStream()` and have taint-propagation steps towards the
actual use (like we do in Python)...
I couldn't see an easy path forwards for doing this while keeping the
Concepts integration, so I opted for the simpler solution here.
You could argue that proper modeling be done in the same way as
`NodeJSFileSystemAccessRead` is done for the callback based `fs` API (in
NodeJSLib.qll). However, that work is straying from the core goals I'm
working towards right now, so I'll argue that "perfect is the enemy of
good", and leave this as is for now.
This is a bit of a bandaid to cover issues with the push() method on next/router being
treated as an array push, which causes it to flow into other taint sources.
With the capture library we sometimes bails out of handling certain functions for scalability reasons.
This means we have a notion of "captured but imprecisely-tracked" variables and 'this'. In these cases we go back to propagating flow from a post-update node to the local source.
Such that we can reuse the existing modeling, but have it globally
applied as a threat-model as well.
I Basically just moved the modeling. One important aspect is that this
changes is that the previously query-specific `argsParseStep` is now a
globally applied taint-step. This seems reasonable, if someone applied
the argument parsing to any user-controlled string, it seems correct to
propagate that taint for _any_ query.
Integration with RemoteFlowSource is not straightforward, so postponing
that for later
Naming in other languages:
- `SourceNode` (for QL only modeling)
- `ThreatModelFlowSource` (for active sources from QL or data-extensions)
However, since we use `LocalSourceNode` in Python, and `SourceNode` in
JS (for local source nodes), it seems a bit confusing to follow the same
naming convention as other languages, and instead I came up with new names.
We need these to return the dominator instead of declaring it in the parameter list, so that we can use it directly to fulfill part of the signature for the SSA library.
We can't rewrite it with an inline predicate since the SSA module calls with a transitive closure '*', which does not permit inline predicates.