ObjectId is a sanitizer used to sanitize strings into valid MongoDB ids. During research we've found that this method is used.
ObjectId returns a string representing an id. If at any time ObjectId can't parse it's input (like when a tainted dict in passed in), then ObjectId will throw an error preventing the query from running.
This ensures that changes to `API::Node` does not invalidate the cached
`module Impl`. At present, I don't expect this to have any effect (as
the `Node` class is also fairly static, though not explicitly cached),
but I can imagine us making some of the `Node` methods have
user-extensible behaviour, in which case we definitely do not want this
to result in reevaluation of `API::Impl`.
This is basically just a port of the C++/JS queries added in:
- https://github.com/github/codeql/pull/5414 (C++)
- https://github.com/github/codeql/pull/5656 (JS)
SyntaxError should capture all errors we have information about. At least in
`python/ql/src/semmlecode.python.dbscheme` the only match for `error` is
`py_syntax_error_versioned` (which `SyntaxError` is based on).
I introduced a InternalTypeTracking module, since the type-tracking code got so
verbose, that it was impossible to get an overview of the relevant predicates.
(this means the "first" type-tracking predicate that is usually private, cannot
be marked private anymore, since it needs to be exposed in the private module.
I considered using `getInput` like in JS, but things like signature verification
has multiple inputs (message and signature).
Using getAnInput also aligns better with Decoding/Encoding.
Adds a few unbinds to prevent bad joins from occurring.
Firstly, we never want to join `StepSummary::step` with
`TypeTracker::append` on `summary` as the first join, as the resulting
relation is absolutely massive. So we decouple the two occurrences of
`summary` by unbinding each of them.
Secondly, in some cases the node we're stepping to (`nodeTo` for type
trackers, `nodeFrom` for type backtrackers) will get joined eagerly
with the typetracker one is defining, and again this produces an
uncomfortably large intermediate join. A bit of unbinding prevents this
as well.
This requires a bit of explanation, so strap in.
Firstly, because we use `LocalSourceNode`s as the start and end points
of our `StepSummary::step` relation, there's no need to include
`simpleLocalFlowStep` (via `typePreservingStep`) in `smallstep`. Indeed,
since the successor node for a `step` is a `LocalSourceNode`, and local
sources never have incoming flow, this is entirely futile -- we can find
values for `mid` and `nodeTo` that satisfy the body of `step`, but
`nodeTo` will never be a `LocalSourceNode`.
With this in mind, we can simplify `smallstep` to only refer to
`jumpStep`.
This then brings the other uses of `typePreservingStep` into question.
The only other place we use this predicate is in the `TypeTracker` and
`TypeBackTracker` `smallstep` predicates. Note, however, that here we
no longer need `jumpStep` to be part of `typeTrackingStep` (as it is
already accounted for in `StepSummary::smallstep`) so we can simplify
to `simpleLocalFlowStep`. At this point, `typePreservingStep` is unused.
Finally, because of the way `smallstep` is used in `step` (inside
`StepSummary`), `nodeTo` must always be a `LocalSourceNode`, so I have
propagated this restriction to `smallstep` as well. We can always lift
this restriction later, but for now it seems like it's likely to cause
fewer surprises to have made this explicit.
Initially I had called `nameIndicatesSensitiveData` for `maybeSensitiveName`,
which made the relationship with `maybeSensitive` and `notSensitive` quite
strange -- and therefore I added the more informative `maybeSensitiveRegexp` and
`notSensitiveRegexp`.
Although I'm no longer using `maybeSensitiveName`, and I no longer have a strong
argument for making this name change, I still like it. If someone thinks this is
a terrible idea, I'm happy to change it though 👍
Perhaps unsurprisingly, the join orderer was eager and willing to find
the wrong join order in this predicate as well. Applying a similar
fix to the one used in `TypeTracker::step` fixes the problem.
This is merely making explicit what was implicitly enforced. The move
to change the return type of `step` already meant that `this` and
`result` had to be `LocalSourceNode`. By moving these methods to their
rightful place, we should hopefully avoid a bit of suprising behaviour.
This caused some suprising test changes, where suddenly we had flow from
a `ModuleVariableNode` (as a `RemoteFlowSource`) to a sink. This of
course makes little sense, so instead we simply exclude these nodes as
uses in the first place.
The somewhat convoluted `comes_from_cfgnode` was originally introduced
in order to have local sources for instances of global variables. This
was needed because global variables have an implicit "scope entry" SSA
definition that flows to the first actual use of the variable (and so
would not fit the strict "has no incoming flow" definition of a local
source node).
However, a subsequent change means that we include all global variable
reads anyway, and so the old definition is no longer needed.
(See commit 3fafb47b16 for further
context.)
This commit does a lot of stuff all at once, so here are the main
highlights:
In `TypeTracker.qll`, we change `StepSummary::step` to step only between
source nodes. Because reads and writes of global variables happen in two
different (jump) steps, this requires the intermediate
`ModuleVariableNode` to _also_ be a `LocalSourceNode`, and we therefore
modify the charpred for that class accordingly. (This also means
changing a few of the tests to account for these new source nodes.)
In addition, we change `TypeTracker::step` to likewise step between
local source nodes.
Next, to enable the use of the `track` convenience method on nodes, we
add some pragmas to `TypeTracker::step` that prevent bad joins from
occurring. With this, we can eliminate all of the manual type tracker
join predicates.
Next, we observe that because `StepSummary::step` now uses `flowsTo`, it
automatically encapsulates all local-flow steps. In particular this
means we do not have to use `typePreservingStep` in `smallstep`, but can
use `jumpStep` directly. A similar observation applies to
`TypeTracker::smallstep`.
Having done this, we no longer need `typePreservingStep`, so we get rid
of it.