JS: Remove docs for the old portal-based flow summaries

2025-12-16 16:53:25 +01:00 · 2022-09-20 14:20:46 +02:00
parent 91f9e89e95
commit 0294444054
1 changed files with 0 additions and 227 deletions
--- a/javascript/documentation/flow-summaries.rst
+++ b/javascript/documentation/flow-summaries.rst
@@ -1,227 +0,0 @@
-Summary-based information flow analysis
-=======================================
-
-Overview
--------
-
-This document presents an approach for running information flow analyses (such as the standard
-security queries) on an application that depends on one or more npm packages. Instead of
-installing the npm packages during the snapshot build and analyzing them together with application
-code, we analyze each package in isolation and compute *flow summaries* that record information
-about any sources, sinks and flow steps contributed by the package's API. These flow summaries
-are then imported when building a snapshot of the application (usually in the form of CSV files
-added as external data), and are picked up by the standard security queries, allowing them to reason
-about flow into, out of and through the npm packages as though they had been included as part of the
-build.
-
-Note that flow summaries are an experimental technology, and not ready to be used in production
-queries or libraries. Also note that flow summaries do not currently work with CodeQL, but require
-the legacy Semmle Core toolchain.
-
-Motivating example
------------------
-
-Let us take the `mkdirp <https://www.npmjs.com/package/mkdirp>`_ package as an example. It exports
-a function that takes as its first argument a file system path, and creates a folder with that
-path, as well as any parent folders that do not exist yet. As further arguments, the function
-accepts an optional configuration object and a callback to invoke once the folder has been
-created.
-
-An application might use this package as follows:
-
-.. code-block:: js
-
-  const mkdirp = require('mkdirp');
-  // ...
-  mkdirp(p, opts, function cb(err) {
-    // ...
-  });
-
-If the value of ``p`` can be controlled by an untrusted user, this would allow them to create arbitrary
-folders, which may not be desirable.
-
-By analyzing the application code base together with the source code for the ``mkdirp`` package,
-the default path injection analysis would be able to track taint through the call to ``mkdirp`` into its
-implementation, which ultimately uses built-in Node.js file system APIs to create the folder. Since
-the path injection analysis has built-in models of these APIs it would then be able to spot and flag this
-vulnerability.
-
-However, analyzing ``mkdirp`` from scratch for every client application is wasteful. Moreover, it would
-in this case be undesirable to flag the location inside ``mkdirp`` where the folder is actually created
-as part of the alert: the developer of the client application did not write that code and hence will
-have a hard time understanding why it is being flagged.
-
-Both of these concerns can be addressed by treating the first argument to ``mkdirp`` as a path injection
-sink in its own right: the analysis no longer needs to track flow into the implementation of ``mkdirp``,
-so we would no longer need to include its source code in the analysis, and the alert would flag the call
-to ``mkdirp`` in application code, not its implementation in library code.
-
-The information that the first parameter of ``mkdirp`` is interpreted as a file system path and hence should
-be considered a path injection sink is an example of a *flow summary*, or more precisely a *sink summary*.
-Besides sink summaries, we also consider *source summaries* and *flow-step summaries*.
-
-In general, a sink summary states that some API interface point (such as a function parameter) should
-be considered a sink for a certain analysis, so if data from a known source reaches this point without
-undergoing appropriate sanitization, it should be flagged with an alert. A sink summary may also
-specify which taint kind the data needs to have in order for the sink to be problematic.
-
-Conversely, a source summary identifies some API (such as the return value of a function) as a source
-of tainted data for a certain analysis, again optionally specifying a taint kind.
-
-Finally, a flow-step summary records the fact that data that flows into the package at some point
-may propagate to another point (for example, from a function parameter to its return value).
-In this case, there are two relevant taint kinds, one describing the kind of taint data has that
-enters, and one describing the taint of the data that emerges. In general, flow steps (like sources
-and sinks) are analysis-specific, since we need to know about sanitizers.
-
-In what follows we will first discuss how summaries are generated from a snapshot of an npm package,
-and then how they are imported when analyzing client code. Finally, we will discuss the format in which
-flow summaries are stored.
-
-Note that flow summaries are considered an experimental feature at this point. Using them involves
-some manual configuration, and we make no guarantee that the API will remain stable.
-
-Generating summaries
--------------------
-
-Flow summaries of an npm package can be generated by running special summary extraction queries
-either on a snapshot of the package itself, or on a snapshot of a hand-written model of the
-package. (Note that this requires a working installation of Semmle Core.)
-
-There are three default summary extraction queries:
-
- Extract flow step summaries (``js/step-summary-extraction``,
-  ``experimental/Summaries/ExtractSourceSummaries.ql``)
- Extract sink summaries (``js/sink-summary-extraction``,
-  ``experimental/Summaries/ExtractSinkSummaries.ql``)
- Extract source summaries (``js/source-summary-extraction``,
-  ``experimental/Summaries/ExtractSourceSummaries.ql``)
-
-You can run these queries individually against a snapshot of the npm package you want to create
-flow summaries for using ``odasa runQuery``, and store the output as CSV files named
-``additional-steps.csv``, ``additional-sinks.csv`` and ``additional-sources.csv``, respectively.
-
-For example, assuming that folder ``mkdirp-snapshot`` contains a snapshot of the ``mkdirp``
-project, we can extract sink summaries using the command
-
-.. code-block:: bash
-
-  odasa runQuery \
-        --query $SEMMLE_DIST/queries/semmlecode-javascript-queries/experimental/Summaries/ExtractSinkSummaries.ql \
-        --output-file additional-sinks.csv --snapshot mkdirp-snapshot
-
-
-Instead of generating summaries directly from the package source code, you can also generate
-them from a hand-written model of the package. The model should contain a ``package.json`` file
-giving the correct package name, and models for the relevant API entry points. The models are
-plain JavaScript with special comments annotating certain expressions as sources or sinks.
-
-For example, a model of ``mkdirp`` might look like this:
-
-.. code-block:: js
-
-  module.exports = function mkdirp(path) {
-    path /* Semmle: sink: taint, TaintedPath */
-  };
-
-Annotation comments start with ``Semmle:``, and contain ``source`` and ``sink`` specifications.
-Each such specification lists a flow label (in this case, ``taint``) and a configuration to which
-the specification applies (in this case, ``TaintedPath``).
-
-A source specification annotates an expression as being a source of flow with the given label
-for the purposes of the given configuration, and similar for sinks. Annotation comments apply to
-any expression (and more generally any data flow node) whose source location ends on the line
-where the comment starts.
-
-Using summaries
---------------
-
-Once you have created summaries using the approach outlined above, you have two options for
-including them in the analysis of a client application.
-
-External data
-:::::::::::::
-
-Firstly, you can include the CSV files generated by running the extraction queries as external
-data when building a snapshot of the client application by copying them into the
-``$snapshot/external/data`` folder. This is typically done by including a command like this
-in your ``project`` file:
-
-.. code-block:: xml
-
-  <build>cp /path/to/additional-sinks.csv ${snapshot}/external/data</build>
-
-If you want to include summaries for multiple libraries, you have to concatenate the
-corresponding CSV files before copying them into the external data folder.
-
-Additionally, you need to import the library ``Security.Summaries.ImportFromCsv`` in your
-``javascript.qll``, which will pick up the summaries from external data and interpret them
-as additional sources, sinks and flow steps:
-
-.. code-block:: ql
-
-  import Security.Summaries.ImportFromCsv
-
-After these preparatory steps, you can run your analysis without any further changes.
-
-External predicates
-:::::::::::::::::::
-
-The second method for including flow summaries is by including the
-``Security.Summaries.ImportFromExternalPredicates`` library in your analysis, which declares
-three external predicates ``additionalSteps``, ``additionalSinks`` and ``additionalSources`` that
-need to be instantiated with the flow summary CSV data.
-
-This is most easily done in QL for Eclipse, which will prompt you for CSV files to populate
-the three predicates.
-
-This approach has the advantage that you do not need to include the CSV files during the
-snapshot build, so you can use an existing snapshot, for example as downloaded from LGTM.com.
-
-Summary format
--------------
-
-Source and sink summaries are specified as tuples of the form ``(portal, kind, configuration)``,
-where ``portal`` is a description of the API element being marked as a source or sink, ``kind``
-is a flow label (also known as "taint kind") describing the kind of information being generated
-or consumed, and ``configuration`` specifies which flow configuration the summary applies to.
-
-If ``kind`` is empty, it defaults to ``data`` for sources and either ``data`` or ``taint`` for sinks.
-If ``configuration`` is empty, the specification applies to all configurations.
-The default extraction queries never produce empty ``kind`` or ``configuration`` columns.
-
-Similarly, step summaries are tuples of the form
-``(inPortal, inKind, outPortal, outKind, configuration)``, stating that information with label
-``inKind`` that flows into ``inPortal`` resurfaces from ``outPortal``, now having kind ``outKind``.
-As before, ``configuration`` specifies which configuration this information applies to.
-
-In all of the above, ``portal`` is an S-expression that abstractly describes a *portal*, that is,
-an API interface point by which data may enter or leave the npm package being analyzed.
-
-Currently, we model five kinds of portals:
-
- ``(root <uri>)``, representing the ``module`` object of the main module of the npm package
-  described by ``<uri>``, which is a URL of the form ``https://www.npmjs.com/package/<pkg>``;
- ``(member <name> <base>)``, representing property ``<name>`` of an object described by
-  portal ``<base>``;
- ``(instance <base>)``, representing an instance of a (constructor) function or class
-  described by portal ``base``;
- ``(parameter <i> <base>)``, representing the ``i`` th parameter of a function described by
-  portal ``base``;
- ``(return <base>)``, representing the return value of a function described by portal ``base``.
-
-In our example above, the first parameter of the default export of package ``mkdirp`` is
-described by the portal
-
-.. code-block:: lisp
-
-  (parameter 0 (member default (root https://www.npmjs.com/package/mkdirp))
-
-As a more complicated example,
-
-.. code-block:: lisp
-
-  (parameter 0 (parameter 1 (member then (instance (member Promise (root https://www.npmjs.com/package/bluebird))))))
-
-describes the first parameter of a function passed as second argument to the ``then`` method of
-the ``Promise`` constructor exported by package ``bluebird``.