mirror of
https://github.com/github/codeql.git
synced 2025-12-16 16:53:25 +01:00
JS: Remove docs for the old portal-based flow summaries
This commit is contained in:
@@ -1,227 +0,0 @@
|
||||
Summary-based information flow analysis
|
||||
=======================================
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
This document presents an approach for running information flow analyses (such as the standard
|
||||
security queries) on an application that depends on one or more npm packages. Instead of
|
||||
installing the npm packages during the snapshot build and analyzing them together with application
|
||||
code, we analyze each package in isolation and compute *flow summaries* that record information
|
||||
about any sources, sinks and flow steps contributed by the package's API. These flow summaries
|
||||
are then imported when building a snapshot of the application (usually in the form of CSV files
|
||||
added as external data), and are picked up by the standard security queries, allowing them to reason
|
||||
about flow into, out of and through the npm packages as though they had been included as part of the
|
||||
build.
|
||||
|
||||
Note that flow summaries are an experimental technology, and not ready to be used in production
|
||||
queries or libraries. Also note that flow summaries do not currently work with CodeQL, but require
|
||||
the legacy Semmle Core toolchain.
|
||||
|
||||
Motivating example
|
||||
------------------
|
||||
|
||||
Let us take the `mkdirp <https://www.npmjs.com/package/mkdirp>`_ package as an example. It exports
|
||||
a function that takes as its first argument a file system path, and creates a folder with that
|
||||
path, as well as any parent folders that do not exist yet. As further arguments, the function
|
||||
accepts an optional configuration object and a callback to invoke once the folder has been
|
||||
created.
|
||||
|
||||
An application might use this package as follows:
|
||||
|
||||
.. code-block:: js
|
||||
|
||||
const mkdirp = require('mkdirp');
|
||||
// ...
|
||||
mkdirp(p, opts, function cb(err) {
|
||||
// ...
|
||||
});
|
||||
|
||||
If the value of ``p`` can be controlled by an untrusted user, this would allow them to create arbitrary
|
||||
folders, which may not be desirable.
|
||||
|
||||
By analyzing the application code base together with the source code for the ``mkdirp`` package,
|
||||
the default path injection analysis would be able to track taint through the call to ``mkdirp`` into its
|
||||
implementation, which ultimately uses built-in Node.js file system APIs to create the folder. Since
|
||||
the path injection analysis has built-in models of these APIs it would then be able to spot and flag this
|
||||
vulnerability.
|
||||
|
||||
However, analyzing ``mkdirp`` from scratch for every client application is wasteful. Moreover, it would
|
||||
in this case be undesirable to flag the location inside ``mkdirp`` where the folder is actually created
|
||||
as part of the alert: the developer of the client application did not write that code and hence will
|
||||
have a hard time understanding why it is being flagged.
|
||||
|
||||
Both of these concerns can be addressed by treating the first argument to ``mkdirp`` as a path injection
|
||||
sink in its own right: the analysis no longer needs to track flow into the implementation of ``mkdirp``,
|
||||
so we would no longer need to include its source code in the analysis, and the alert would flag the call
|
||||
to ``mkdirp`` in application code, not its implementation in library code.
|
||||
|
||||
The information that the first parameter of ``mkdirp`` is interpreted as a file system path and hence should
|
||||
be considered a path injection sink is an example of a *flow summary*, or more precisely a *sink summary*.
|
||||
Besides sink summaries, we also consider *source summaries* and *flow-step summaries*.
|
||||
|
||||
In general, a sink summary states that some API interface point (such as a function parameter) should
|
||||
be considered a sink for a certain analysis, so if data from a known source reaches this point without
|
||||
undergoing appropriate sanitization, it should be flagged with an alert. A sink summary may also
|
||||
specify which taint kind the data needs to have in order for the sink to be problematic.
|
||||
|
||||
Conversely, a source summary identifies some API (such as the return value of a function) as a source
|
||||
of tainted data for a certain analysis, again optionally specifying a taint kind.
|
||||
|
||||
Finally, a flow-step summary records the fact that data that flows into the package at some point
|
||||
may propagate to another point (for example, from a function parameter to its return value).
|
||||
In this case, there are two relevant taint kinds, one describing the kind of taint data has that
|
||||
enters, and one describing the taint of the data that emerges. In general, flow steps (like sources
|
||||
and sinks) are analysis-specific, since we need to know about sanitizers.
|
||||
|
||||
In what follows we will first discuss how summaries are generated from a snapshot of an npm package,
|
||||
and then how they are imported when analyzing client code. Finally, we will discuss the format in which
|
||||
flow summaries are stored.
|
||||
|
||||
Note that flow summaries are considered an experimental feature at this point. Using them involves
|
||||
some manual configuration, and we make no guarantee that the API will remain stable.
|
||||
|
||||
Generating summaries
|
||||
--------------------
|
||||
|
||||
Flow summaries of an npm package can be generated by running special summary extraction queries
|
||||
either on a snapshot of the package itself, or on a snapshot of a hand-written model of the
|
||||
package. (Note that this requires a working installation of Semmle Core.)
|
||||
|
||||
There are three default summary extraction queries:
|
||||
|
||||
- Extract flow step summaries (``js/step-summary-extraction``,
|
||||
``experimental/Summaries/ExtractSourceSummaries.ql``)
|
||||
- Extract sink summaries (``js/sink-summary-extraction``,
|
||||
``experimental/Summaries/ExtractSinkSummaries.ql``)
|
||||
- Extract source summaries (``js/source-summary-extraction``,
|
||||
``experimental/Summaries/ExtractSourceSummaries.ql``)
|
||||
|
||||
You can run these queries individually against a snapshot of the npm package you want to create
|
||||
flow summaries for using ``odasa runQuery``, and store the output as CSV files named
|
||||
``additional-steps.csv``, ``additional-sinks.csv`` and ``additional-sources.csv``, respectively.
|
||||
|
||||
For example, assuming that folder ``mkdirp-snapshot`` contains a snapshot of the ``mkdirp``
|
||||
project, we can extract sink summaries using the command
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
odasa runQuery \
|
||||
--query $SEMMLE_DIST/queries/semmlecode-javascript-queries/experimental/Summaries/ExtractSinkSummaries.ql \
|
||||
--output-file additional-sinks.csv --snapshot mkdirp-snapshot
|
||||
|
||||
|
||||
Instead of generating summaries directly from the package source code, you can also generate
|
||||
them from a hand-written model of the package. The model should contain a ``package.json`` file
|
||||
giving the correct package name, and models for the relevant API entry points. The models are
|
||||
plain JavaScript with special comments annotating certain expressions as sources or sinks.
|
||||
|
||||
For example, a model of ``mkdirp`` might look like this:
|
||||
|
||||
.. code-block:: js
|
||||
|
||||
module.exports = function mkdirp(path) {
|
||||
path /* Semmle: sink: taint, TaintedPath */
|
||||
};
|
||||
|
||||
Annotation comments start with ``Semmle:``, and contain ``source`` and ``sink`` specifications.
|
||||
Each such specification lists a flow label (in this case, ``taint``) and a configuration to which
|
||||
the specification applies (in this case, ``TaintedPath``).
|
||||
|
||||
A source specification annotates an expression as being a source of flow with the given label
|
||||
for the purposes of the given configuration, and similar for sinks. Annotation comments apply to
|
||||
any expression (and more generally any data flow node) whose source location ends on the line
|
||||
where the comment starts.
|
||||
|
||||
Using summaries
|
||||
---------------
|
||||
|
||||
Once you have created summaries using the approach outlined above, you have two options for
|
||||
including them in the analysis of a client application.
|
||||
|
||||
External data
|
||||
:::::::::::::
|
||||
|
||||
Firstly, you can include the CSV files generated by running the extraction queries as external
|
||||
data when building a snapshot of the client application by copying them into the
|
||||
``$snapshot/external/data`` folder. This is typically done by including a command like this
|
||||
in your ``project`` file:
|
||||
|
||||
.. code-block:: xml
|
||||
|
||||
<build>cp /path/to/additional-sinks.csv ${snapshot}/external/data</build>
|
||||
|
||||
If you want to include summaries for multiple libraries, you have to concatenate the
|
||||
corresponding CSV files before copying them into the external data folder.
|
||||
|
||||
Additionally, you need to import the library ``Security.Summaries.ImportFromCsv`` in your
|
||||
``javascript.qll``, which will pick up the summaries from external data and interpret them
|
||||
as additional sources, sinks and flow steps:
|
||||
|
||||
.. code-block:: ql
|
||||
|
||||
import Security.Summaries.ImportFromCsv
|
||||
|
||||
After these preparatory steps, you can run your analysis without any further changes.
|
||||
|
||||
External predicates
|
||||
:::::::::::::::::::
|
||||
|
||||
The second method for including flow summaries is by including the
|
||||
``Security.Summaries.ImportFromExternalPredicates`` library in your analysis, which declares
|
||||
three external predicates ``additionalSteps``, ``additionalSinks`` and ``additionalSources`` that
|
||||
need to be instantiated with the flow summary CSV data.
|
||||
|
||||
This is most easily done in QL for Eclipse, which will prompt you for CSV files to populate
|
||||
the three predicates.
|
||||
|
||||
This approach has the advantage that you do not need to include the CSV files during the
|
||||
snapshot build, so you can use an existing snapshot, for example as downloaded from LGTM.com.
|
||||
|
||||
Summary format
|
||||
--------------
|
||||
|
||||
Source and sink summaries are specified as tuples of the form ``(portal, kind, configuration)``,
|
||||
where ``portal`` is a description of the API element being marked as a source or sink, ``kind``
|
||||
is a flow label (also known as "taint kind") describing the kind of information being generated
|
||||
or consumed, and ``configuration`` specifies which flow configuration the summary applies to.
|
||||
|
||||
If ``kind`` is empty, it defaults to ``data`` for sources and either ``data`` or ``taint`` for sinks.
|
||||
If ``configuration`` is empty, the specification applies to all configurations.
|
||||
The default extraction queries never produce empty ``kind`` or ``configuration`` columns.
|
||||
|
||||
Similarly, step summaries are tuples of the form
|
||||
``(inPortal, inKind, outPortal, outKind, configuration)``, stating that information with label
|
||||
``inKind`` that flows into ``inPortal`` resurfaces from ``outPortal``, now having kind ``outKind``.
|
||||
As before, ``configuration`` specifies which configuration this information applies to.
|
||||
|
||||
In all of the above, ``portal`` is an S-expression that abstractly describes a *portal*, that is,
|
||||
an API interface point by which data may enter or leave the npm package being analyzed.
|
||||
|
||||
Currently, we model five kinds of portals:
|
||||
|
||||
- ``(root <uri>)``, representing the ``module`` object of the main module of the npm package
|
||||
described by ``<uri>``, which is a URL of the form ``https://www.npmjs.com/package/<pkg>``;
|
||||
- ``(member <name> <base>)``, representing property ``<name>`` of an object described by
|
||||
portal ``<base>``;
|
||||
- ``(instance <base>)``, representing an instance of a (constructor) function or class
|
||||
described by portal ``base``;
|
||||
- ``(parameter <i> <base>)``, representing the ``i`` th parameter of a function described by
|
||||
portal ``base``;
|
||||
- ``(return <base>)``, representing the return value of a function described by portal ``base``.
|
||||
|
||||
In our example above, the first parameter of the default export of package ``mkdirp`` is
|
||||
described by the portal
|
||||
|
||||
.. code-block:: lisp
|
||||
|
||||
(parameter 0 (member default (root https://www.npmjs.com/package/mkdirp))
|
||||
|
||||
As a more complicated example,
|
||||
|
||||
.. code-block:: lisp
|
||||
|
||||
(parameter 0 (parameter 1 (member then (instance (member Promise (root https://www.npmjs.com/package/bluebird))))))
|
||||
|
||||
describes the first parameter of a function passed as second argument to the ``then`` method of
|
||||
the ``Promise`` constructor exported by package ``bluebird``.
|
||||
Reference in New Issue
Block a user