mirror of
https://github.com/github/codeql.git
synced 2026-04-26 09:15:12 +02:00
JS: Type tracking tutorial
This commit is contained in:
@@ -393,6 +393,6 @@ string may be an absolute path and whether it may contain ``..`` components.
|
||||
What next?
|
||||
----------
|
||||
|
||||
- Learn about the QL standard libraries used to write queries for JavaScript in :doc:`Introducing the Javacript libraries <introduce-libraries-js>`.
|
||||
- Learn about the QL standard libraries used to write queries for JavaScript in :doc:`Introducing the JavaScript libraries <introduce-libraries-js>`.
|
||||
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
|
||||
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.
|
||||
|
||||
@@ -9,6 +9,7 @@ QL for JavaScript
|
||||
introduce-libraries-ts
|
||||
dataflow
|
||||
flow-labels
|
||||
type-tracking
|
||||
ast-class-reference
|
||||
dataflow-cheat-sheet
|
||||
|
||||
|
||||
509
docs/language/learn-ql/javascript/type-tracking.rst
Normal file
509
docs/language/learn-ql/javascript/type-tracking.rst
Normal file
@@ -0,0 +1,509 @@
|
||||
Tutorial: API modelling with type tracking
|
||||
==========================================
|
||||
|
||||
This tutorial demonstrates how to build a simple model of the Firebase API in QL
|
||||
using the JavaScript type tracking library.
|
||||
|
||||
The type tracking library makes it possible to track values through properties and function calls,
|
||||
usually to recognize method calls and properties accessed on a specific type of object.
|
||||
This can act as a substitute for static type information, so for TypeScript analysis, it may be easier to use the
|
||||
`static type system <https://help.semmle.com/QL/learn-ql/javascript/introduce-libraries-ts.html#static-type-information>`__.
|
||||
In this article we'll be working with plain untyped JavaScript.
|
||||
|
||||
This is an advanced topic and is intended for readers already familiar with the
|
||||
`SourceNode <https://help.semmle.com/QL/learn-ql/javascript/dataflow.html#source-nodes>`__ class as well as
|
||||
`taint tracking <https://help.semmle.com/QL/learn-ql/javascript/dataflow.html#using-global-taint-tracking>`__.
|
||||
|
||||
|
||||
The problem of recognizing method calls
|
||||
---------------------------------------
|
||||
|
||||
We'll start with a simple model of the Firebase API and gradually build on it to use type tracking.
|
||||
Knowledge of Firebase is not required.
|
||||
|
||||
Suppose we wish to find places where data is written to a Firebase database, as
|
||||
in the following example:
|
||||
|
||||
.. code-block:: javascript
|
||||
|
||||
var ref = firebase.database().ref("forecast");
|
||||
ref.set("Rain"); // <-- find this call
|
||||
|
||||
A simple way to do this is just to find
|
||||
all method calls named "``set``":
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
import javascript
|
||||
import DataFlow
|
||||
|
||||
MethodCallNode firebaseSetterCall() {
|
||||
result.getMethodName() = "set"
|
||||
}
|
||||
|
||||
The obvious problem with this is that it finds calls to *all* methods named ``set``,
|
||||
many of which are unrelated to Firebase.
|
||||
|
||||
Another approach is to use local data flow to match the chain of calls that led to this call:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
MethodCallNode firebaseSetterCall() {
|
||||
result = globalVarRef("firebase")
|
||||
.getAMethodCall("database")
|
||||
.getAMethodCall("ref")
|
||||
.getAMethodCall("set")
|
||||
}
|
||||
|
||||
This will find the ``set`` call from the example, but no spurious, unrelated ``set`` method calls.
|
||||
We can split it up so each step is its own predicate:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
SourceNode firebase() {
|
||||
result = globalVarRef("firebase")
|
||||
}
|
||||
|
||||
SourceNode firebaseDatabase() {
|
||||
result = firebase().getAMethodCall("database")
|
||||
}
|
||||
|
||||
SourceNode firebaseRef() {
|
||||
result = firebaseDatabase().getAMethodCall("ref");
|
||||
}
|
||||
|
||||
MethodCallNode firebaseSetterCall() {
|
||||
result = firebaseRef().getAMethodCall("set")
|
||||
}
|
||||
|
||||
The code above is equivalent to the previous version,
|
||||
but it's easier to tinker with the individual steps.
|
||||
|
||||
The downside is that the model relies entirely on local data flow,
|
||||
which means it won't look through properties and function calls.
|
||||
For instance, ``firebaseSetterCall()`` fails to find anything in this example:
|
||||
|
||||
.. code-block:: javascript
|
||||
|
||||
function getDatabase() {
|
||||
return firebase.database();
|
||||
}
|
||||
var ref = getDatabase().ref("forecast");
|
||||
ref.set("Rain");
|
||||
|
||||
Notice that the QL predicate ``firebaseDatabase()`` still finds the call to ``firebase.database()``,
|
||||
but not the ``getDatabase()`` call.
|
||||
This means ``firebaseRef()`` has no result, which in turn means ``firebaseSetterCall()`` has no result.
|
||||
|
||||
As a simple remedy, let's try to make ``firebaseDatabase()`` recognize the ``getDatabase()`` call:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
SourceNode firebaseDatabase() {
|
||||
result = firebase().getAMethodCall("database")
|
||||
or
|
||||
result.(CallNode).getACallee().getAReturn().getALocalSource() = firebaseDatabase()
|
||||
}
|
||||
|
||||
The second clause ensures ``firebaseDatabase()`` finds not only ``firebase.database()`` calls,
|
||||
but also calls to functions that *return* ``firebase.database()``, such as ``getDatabase()`` seen above.
|
||||
It's recursive, so it handles flow out of any number of nested function calls.
|
||||
|
||||
However, it still only tracks *out* of functions, not *into* functions through parameters, nor through properties.
|
||||
Instead of adding these steps by hand, we'll use type tracking.
|
||||
|
||||
Type tracking in general
|
||||
------------------------
|
||||
|
||||
Type tracking is a generalization of the above pattern, where a predicate matches the value to track,
|
||||
and has a recursive clause that tracks the flow of that value.
|
||||
But instead of us having to deal with function calls/returns and property reads/writes,
|
||||
all of these steps are included in a single predicate,
|
||||
`SourceNode.track <https://help.semmle.com/qldoc/javascript/semmle/javascript/dataflow/Sources.qll/predicate.Sources$SourceNode$track.2.html>`__,
|
||||
to be used with the companion class
|
||||
`TypeTracker <https://help.semmle.com/qldoc/javascript/semmle/javascript/dataflow/TypeTracking.qll/type.TypeTracking$TypeTracker.html>`__.
|
||||
|
||||
Predicates that use type tracking usually conform to the following general pattern (explanation follows below):
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
SourceNode myType(TypeTracker t) {
|
||||
t.start() and
|
||||
result = /* value to track */
|
||||
or
|
||||
exists(TypeTracker t2 |
|
||||
result = myType(t2).track(t2, t)
|
||||
)
|
||||
}
|
||||
|
||||
SourceNode myType() {
|
||||
result = myType(TypeTracker::end())
|
||||
}
|
||||
|
||||
We'll apply the pattern to our example model and use that to explain what's going on.
|
||||
|
||||
|
||||
Tracking the database instance
|
||||
------------------------------
|
||||
|
||||
Applying the above pattern to the ``firebaseDatabase()`` predicate we get the following:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
SourceNode firebaseDatabase(TypeTracker t) {
|
||||
t.start() and
|
||||
result = firebase().getAMethodCall("database")
|
||||
or
|
||||
exists(TypeTracker t2 |
|
||||
result = firebaseDatabase(t2).track(t2, t)
|
||||
)
|
||||
}
|
||||
|
||||
SourceNode firebaseDatabase() {
|
||||
result = firebaseDatabase(TypeTracker::end())
|
||||
}
|
||||
|
||||
There are now two predicates named ``firebaseDatabase``.
|
||||
The one with the ``TypeTracker`` parameter is the one actually doing the global data flow tracking
|
||||
-- the other predicate exposes the result in a convenient way.
|
||||
|
||||
The new ``TypeTracker t`` parameter is a summary of the steps needed to track the value of interest to the resulting data flow node.
|
||||
|
||||
In the base case, when matching ``firebase.database()``, we use ``t.start()`` to indicate that no steps were needed, that is,
|
||||
this is the starting point of type tracking:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
t.start() and
|
||||
result = firebase().getAMethodCall("database")
|
||||
|
||||
In the recursive case, we apply the ``track`` predicate on a previously-found firebase database node, such as ``firebase.database()``.
|
||||
The ``track`` predicate maps this to a successor of that node, such as ``getDatabase()``, and
|
||||
binds ``t`` to the continuation of ``t2`` with this extra step included:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
exists(TypeTracker t2 |
|
||||
result = firebaseDatabase(t2).track(t2, t)
|
||||
)
|
||||
|
||||
To understand the role of ``t`` here, note that type tracking can step *into* a property, which means
|
||||
the data flow node returned from ``track`` is not necessarily a firebase database instance, it could be
|
||||
an object *containing* a firebase database in one of its properties.
|
||||
|
||||
For example, in the program below, the ``firebaseDatabase(t)`` predicate includes the ``obj`` node in its result,
|
||||
but with ``t`` recording the fact that the actual value being tracked is inside the ``DB`` property:
|
||||
|
||||
.. code-block:: javascript
|
||||
|
||||
let obj = { DB: firebase.database() };
|
||||
let db = obj.DB;
|
||||
|
||||
This brings us to the last predicate. This uses ``TypeTracker::end()`` to filter out
|
||||
the paths where the firebase database instance ended up inside a property of another object,
|
||||
so it includes ``db`` but not ``obj``:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
SourceNode firebaseDatabase() {
|
||||
result = firebaseDatabase(TypeTracker::end())
|
||||
}
|
||||
|
||||
Here's see an example of what this can handle now:
|
||||
|
||||
.. code-block:: javascript
|
||||
|
||||
class Firebase {
|
||||
constructor() {
|
||||
this.db = firebase.database();
|
||||
}
|
||||
|
||||
getDatabase() { return this.db; }
|
||||
|
||||
setForecast(value) {
|
||||
this.getDatabase().ref("forecast").set(value); // found by firebaseSetterCall()
|
||||
}
|
||||
}
|
||||
|
||||
Tracking in the whole model
|
||||
---------------------------
|
||||
We applied this pattern to ``firebaseDatabase()`` in the previous section, and it
|
||||
can just as easily apply to the other predicates.
|
||||
For reference, here's our simple Firebase model with type tracking on every predicate:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
SourceNode firebase(TypeTracker t) {
|
||||
t.start() and
|
||||
result = globalVarRef("firebase")
|
||||
or
|
||||
exists(TypeTracker t2 |
|
||||
result = firebase(t2).track(t2, t)
|
||||
)
|
||||
}
|
||||
|
||||
SourceNode firebase() {
|
||||
result = firebase(TypeTracker::end())
|
||||
}
|
||||
|
||||
SourceNode firebaseDatabase(TypeTracker t) {
|
||||
t.start() and
|
||||
result = firebase().getAMethodCall("database")
|
||||
or
|
||||
exists(TypeTracker t2 |
|
||||
result = firebaseDatabase(t2).track(t2, t)
|
||||
)
|
||||
}
|
||||
|
||||
SourceNode firebaseDatabase() {
|
||||
result = firebaseDatabase(TypeTracker::end())
|
||||
}
|
||||
|
||||
SourceNode firebaseRef(TypeTracker t) {
|
||||
t.start() and
|
||||
result = firebaseDatabase().getAMethodCall("ref")
|
||||
or
|
||||
exists(TypeTracker t2 |
|
||||
result = firebaseRef(t2).track(t2, t)
|
||||
)
|
||||
}
|
||||
|
||||
SourceNode firebaseRef() {
|
||||
result = firebaseRef(TypeTracker::end())
|
||||
}
|
||||
|
||||
MethodCallNode firebaseSetterCall() {
|
||||
result = firebaseRef().getAMethodCall("set")
|
||||
}
|
||||
|
||||
`Here <https://lgtm.com/query/1053770500827789481>`__ is a run of an example query using the model on one of the Firebase sample projects.
|
||||
It's been modified slightly to handle a bit more of the API, which is out of scope of this tutorial.
|
||||
|
||||
Tracking associated data
|
||||
------------------------
|
||||
|
||||
By adding extra parameters to the type tracking predicate we can carry along
|
||||
extra bits of information about the result.
|
||||
|
||||
For example, here's a type tracking version of ``firebaseRef()``, which
|
||||
tracks the string that was passed to the ``ref`` call:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
SourceNode firebaseRef(string name, TypeTracker t) {
|
||||
t.start() and
|
||||
exists(CallNode call |
|
||||
call = firebaseDatabase().getAMethodCall("ref") and
|
||||
name = call.getArgument(0).getStringValue() and
|
||||
result = call
|
||||
)
|
||||
or
|
||||
exists(TypeTracker t2 |
|
||||
result = firebaseRef(name, t2).track(t2, t)
|
||||
)
|
||||
}
|
||||
|
||||
SourceNode firebaseRef(string name) {
|
||||
result = firebaseRef(name, TypeTracker::end())
|
||||
}
|
||||
|
||||
MethodCallNode firebaseSetterCall(string refName) {
|
||||
result = firebaseRef(refName).getAMethodCall("set")
|
||||
}
|
||||
|
||||
So now we can use ``firebaseSetterCall("forecast")`` to find assignments to the forecast.
|
||||
|
||||
Back-tracking callbacks
|
||||
-----------------------
|
||||
|
||||
The type tracking predicates we've seen above all use *forward* tracking.
|
||||
That is, they all start with some value of interest and ask "where does this flow?".
|
||||
|
||||
Sometimes it's more useful to work backwards, starting at the desired end-point and asking "what flows to here?".
|
||||
|
||||
As a motivating example, we'll extend our model to look for places where we *read* a value
|
||||
from the database, as opposed to writing it.
|
||||
Reading is an asynchronous operation and the result is obtained through a callback, for example:
|
||||
|
||||
.. code-block:: javascript
|
||||
|
||||
function fetchForecast(callback) {
|
||||
firebase.database().ref("forecast").once("value", callback);
|
||||
}
|
||||
|
||||
function updateReminders() {
|
||||
fetchForecast((snapshot) => {
|
||||
let forecast = snapshot.val(); // <-- find this call
|
||||
addReminder(forecast === "Rain" ? "Umbrella" : "Sunscreen");
|
||||
})
|
||||
}
|
||||
|
||||
The actual forecast is obtained by the call to ``snapshot.val()``.
|
||||
|
||||
Looking for all method calls named ``val`` will in practice find many unrelated methods,
|
||||
so we'll use type tracking again in order to take the receiver type into account.
|
||||
|
||||
The receiver ``snapshot`` is a parameter to a callback function, which ultimately escapes
|
||||
into the ``once()`` call. We'll extend our model from above to use back-tracking to find
|
||||
all functions that flow into the ``once()`` call. Type tracking backwards is not much
|
||||
different from forwards; the differences are:
|
||||
|
||||
- The ``TypeTracker`` parameter instead has type ``TypeBackTracker``.
|
||||
- The call to ``.track()`` is instead a call to ``.backtrack()``
|
||||
- To ensure the initial value is a source node, a call to ``getALocalSource()`` is usually required.
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
SourceNode firebaseSnapshotCallback(string refName, TypeBackTracker t) {
|
||||
t.start() and
|
||||
result = firebaseRef(refName).getAMethodCall("once").getArgument(1).getALocalSource()
|
||||
or
|
||||
exists(TypeBackTracker t2 |
|
||||
result = firebaseSnapshotCallback(refName, t2).backtrack(t2, t)
|
||||
)
|
||||
}
|
||||
|
||||
FunctionNode firebaseSnapshotCallback(string refName) {
|
||||
result = firebaseSnapshotCallback(refName, TypeBackTracker::end())
|
||||
}
|
||||
|
||||
Now, ``firebaseSnapshotCallback("forecast")`` finds the function being passed to ``fetchForecast``.
|
||||
Based on that we can track the ``snapshot`` value and find the ``val()`` call itself:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
SourceNode firebaseSnapshot(string refName, TypeTracker t) {
|
||||
t.start() and
|
||||
result = firebaseSnapshotCallback(refName).getParameter(0)
|
||||
or
|
||||
exists(TypeTracker t2 |
|
||||
result = firebaseSnapshot(refName, t2).track(t2, t)
|
||||
)
|
||||
}
|
||||
|
||||
SourceNode firebaseSnapshot(string refName) {
|
||||
result = firebaseSnapshot(refName, TypeTracker::end())
|
||||
}
|
||||
|
||||
MethodCallNode firebaseDatabaseRead(string refName) {
|
||||
result = firebaseSnapshot(refName).getAMethodCall("val")
|
||||
}
|
||||
|
||||
With this addition, ``firebaseDatabaseRead("forecast")`` finds the call to ``snapshot.val()`` which contains the value of the forecast.
|
||||
|
||||
`Here <https://lgtm.com/query/8761360814276109092>`__ is a run of an example query using the model.
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
This covers the use of the type tracking library. To recap, use this template to define forward type tracking predicates:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
SourceNode myType(TypeTracker t) {
|
||||
t.start() and
|
||||
result = /* value to track */
|
||||
or
|
||||
exists(TypeTracker t2 |
|
||||
result = myType(t2).track(t2, t)
|
||||
)
|
||||
}
|
||||
|
||||
SourceNode myType() {
|
||||
result = myType(TypeTracker::end())
|
||||
}
|
||||
|
||||
Use this template to define backward type tracking predicates:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
SourceNode myType(TypeBackTracker t) {
|
||||
t.start() and
|
||||
result = (/* argument to track */).getALocalSource()
|
||||
or
|
||||
exists(TypeBackTracker t2 |
|
||||
result = myType(t2).backtrack(t2, t)
|
||||
)
|
||||
}
|
||||
|
||||
SourceNode myType() {
|
||||
result = myType(TypeBackTracker::end())
|
||||
}
|
||||
|
||||
Limitations
|
||||
-----------
|
||||
|
||||
As mentioned, type tracking will track values in and out of function calls and properties,
|
||||
but only within some limits.
|
||||
|
||||
Type tracking does not always track *through* functions, that is, if a value flows into a parameter
|
||||
and back out of the return value, it might not be tracked back out to the call site again.
|
||||
Here's an example that the model from this tutorial won't find:
|
||||
|
||||
.. code-block:: javascript
|
||||
|
||||
function wrapDB(database) {
|
||||
return { db: database }
|
||||
}
|
||||
let wrapper = wrapDB(firebase.database())
|
||||
wrapper.db.ref("forecast"); // <-- not found
|
||||
|
||||
This is an example of where `data flow configurations <https://help.semmle.com/QL/learn-ql/javascript/dataflow.html#global-data-flow>`__ are more powerful.
|
||||
|
||||
When to use type tracking
|
||||
-------------------------
|
||||
|
||||
Type tracking and data flow configurations are essentally competing solutions to the same
|
||||
problem, each with their own tradeoffs.
|
||||
|
||||
Type tracking can be used in any number of predicates, which may depend on each other
|
||||
in fairly unrestricted ways. The result of one predicate may be the starting
|
||||
point for another. Type tracking predicates may be mutually recursive.
|
||||
Type tracking predicates can have any number of extra parameters, making it possible, but optional,
|
||||
to construct source/sink pairs. Omitting source/sink pairs can be useful when there is a huge number
|
||||
of sources and the sinks are not known to the library model.
|
||||
|
||||
Data flow configurations have more restricted dependencies but are more powerful in other ways.
|
||||
For performance reasons,
|
||||
the sources, sinks, and steps of a configuration should not depend on whether a flow path has been found using
|
||||
that configuration or any other configuration.
|
||||
In that sense, the sources, sinks, and steps must be configured "up front" and can't be discovered on-the-fly.
|
||||
The upside is that they track flow through functions and callbacks in some ways that type tracking doesn't,
|
||||
which is particularly important for security queries.
|
||||
Also, path queries can only be defined using data flow configurations.
|
||||
|
||||
Prefer type tracking when:
|
||||
|
||||
- Disambiguating generically named methods or properties.
|
||||
- Making reusable library components to be shared between queries.
|
||||
- The set of source/sink pairs is too large to compute or has insufficient information.
|
||||
- The information is needed as input to a data flow configuration.
|
||||
|
||||
Prefer data flow configurations when:
|
||||
|
||||
- Tracking user-controlled data -- use `taint tracking <https://help.semmle.com/QL/learn-ql/javascript/dataflow.html#using-global-taint-tracking>`__.
|
||||
- Differentiating between different kinds of user-controlled data -- use :doc:`flow labels <flow-labels>`.
|
||||
- Tracking transformations of a value through generic utility functions.
|
||||
- Tracking values through string manipulation.
|
||||
- Generating a path from source to sink -- see :doc:`constructing path queries <../writing-queries/path-queries>`.
|
||||
|
||||
Type tracking in the standard libraries
|
||||
---------------------------------------
|
||||
|
||||
Type tracking is used in a few places in the standard libraries:
|
||||
|
||||
- The `DOM <https://help.semmle.com/qldoc/javascript/semmle/javascript/DOM.qll/module.DOM$DOM.html>`__ predicates,
|
||||
`documentRef <https://help.semmle.com/qldoc/javascript/semmle/javascript/DOM.qll/predicate.DOM$DOM$documentRef.0.html>`__,
|
||||
`locationRef <https://help.semmle.com/qldoc/javascript/semmle/javascript/DOM.qll/predicate.DOM$DOM$locationRef.0.html>`__, and
|
||||
`domValueRef <https://help.semmle.com/qldoc/javascript/semmle/javascript/DOM.qll/predicate.DOM$DOM$domValueRef.0.html>`__,
|
||||
are implemented with type tracking.
|
||||
- The `HTTP <https://help.semmle.com/qldoc/javascript/semmle/javascript/frameworks/HTTP.qll/module.HTTP$HTTP.html>`__ server models, such as `Express <https://help.semmle.com/qldoc/javascript/semmle/javascript/frameworks/Express.qll/module.Express$Express.html>`__, use type tracking to track the installation of router handler functions.
|
||||
- The `Firebase <https://help.semmle.com/qldoc/javascript/semmle/javascript/frameworks/Firebase.qll/module.Firebase$Firebase.html>`__ and
|
||||
`Socket.io <https://help.semmle.com/qldoc/javascript/semmle/javascript/frameworks/SocketIO.qll/module.SocketIO$SocketIO.html>`__ models use type tracking to track objects coming from their respective APIs.
|
||||
|
||||
What next?
|
||||
----------
|
||||
|
||||
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
|
||||
- Learn more about the query console in `Using the query console <https://lgtm.com/help/lgtm/using-query-console>`__.
|
||||
- Learn about writing precise data-flow analyses in :doc:`Advanced data-flow analysis using flow labels <flow-labels>`.
|
||||
Reference in New Issue
Block a user