JavaScript: Apply suggestions from code review

Co-Authored-By: jf205 <42464962+jf205@users.noreply.github.com>
2025-12-16 16:53:25 +01:00 · 2019-08-15 14:35:30 +01:00
parent 967a5788b2
commit e92a1c3169
1 changed files with 70 additions and 65 deletions
--- a/javascript/documentation/library-customization.rst
+++ b/javascript/documentation/library-customization.rst
@@ -9,40 +9,39 @@ Customization mechanisms

 The two mechanisms used for customization are subclassing and overriding.

-By subclassing an abstract class used by the JavaScript analysis and implementing its abstract
-member predicates we can teach the analysis to handle further instances of abstract concepts it
-already understands. For example, the standard library defines an abstract class
-``SystemCommandExecution`` that covers various APIs for executing operating-system commands. This
-class is used by the command-injection analysis to identify problematic flows where input from a
-potentially malicious user is interpreted as the name of a system command to execute. By defining
-additional subclasses of ``SystemCommandExecution``, we can make this analysis more powerful without
-touching its implementation.
+We can teach the JavaScript analysis to handle further instances of abstract concepts it already
+understands by subclassing abstract classes and implementing their member predicates. For example,
+the standard library defines an abstract class ``SystemCommandExecution`` that covers various APIs
+for executing operating-system commands. This class is used by the command-injection analysis to
+identify problematic flows where input from a potentially malicious user is interpreted as the name
+of a system command to execute. By defining additional subclasses of ``SystemCommandExecution``, we
+can make this analysis more powerful without touching its implementation.

 By overriding a member predicate defined in the library, we can change its behavior either for all
 its receivers or only a subset. For example, the standard library predicate
 ``ControlFlowNode::getASuccessor`` implements the basic control-flow graph on which many further
-analyses are based. By overriding it, we can add, suppress or modify control-flow graph edges.
+analyses are based. By overriding it, we can add, suppress, or modify control-flow graph edges.

 Once a customization has been defined, it needs to be brought into scope so that the default
 analysis queries pick it up. This can be done by adding the customizing definitions to
 ``Customizations.qll``, an initially empty library file that is imported by the default library
 ``javascript.qll``.

-Sometimes you may want to perform both kinds of customizations at the same time: subclass a base
+Sometimes you may want to perform both kinds of customizations at the same time. That is, subclass a base
 class to provide new implementations of an API, and override some member predicates of the same base
 class to selectively change the implementation of the API. This is not always easy to do, since the
 former requires the base class to be abstract, while the latter requires it to be concrete.

-To work around this, the JavaScript library uses the so-called `range pattern`: the base class
+To work around this, the JavaScript library uses the so-called *range pattern*. In this pattern, the base class
 ``Base`` itself is concrete, but it has an abstract companion class called ``Base::Range`` covering
 the same set of values. To change the implementation of the API, subclass ``Base`` and override its
 member predicates. To provide new implementations of the API, subclass ``Base::Range`` and implement
 its abstract member predicates.

 For example, the class ``Base64::Encode`` in the standard library models base64-encoding libraries
-using the range pattern. To add support for a new library, subclass ``Base64::Encode::Range`` and
-implement the member predicates ``getInput`` and ``getOutput``. (Subclasses for many popular base64
-encoders are included in the standard library.) To customize the definition of ``getInput`` or
+using the range pattern.  It comes with subclasses corresponding to many popular base64 encoders. To
+add support for a new library, subclass ``Base64::Encode::Range`` and implement the member
+predicates ``getInput`` and ``getOutput``. To customize the definition of ``getInput`` or
 ``getOutput`` for a library that is already supported, extend ``Base64::Encode`` itself and override
 the predicate you want to customize.

@@ -57,57 +56,59 @@ The JavaScript analysis libraries have a layered structure with higher-level ana
 lower-level ones. Usually, classes and predicates in a lower layer should not depend on a higher
 layer to avoid performance problems and non-monotonic recursion.

-We briefly survey the most important analysis layers here, starting from the lowest layer. Below we
-will discuss the extension points offered by the individual layers.
+In this section, we briefly introduce the most important analysis layers, starting from the lowest
+layer. Below, we discuss the extension points offered by the individual layers.

-AST
-~~~
+Abstract syntax tree
+~~~~~~~~~~~~~~~~~~~~

-The abstract syntax tree, implemented by class ``ASTNode`` and its subclasses, is the lowest layer
-and more or less directly represents the information stored in the snapshot data base. It
+The abstract syntax tree (AST), implemented by class ``ASTNode`` and its subclasses, is the lowest layer
+and is a good representation of the information stored in the snapshot database. It
 corresponds closely to the syntactic structure of the program, only abstracting away from
 typographical details such as whitespace and indentation.

-CFG
-~~~
+Control-flow graph
+~~~~~~~~~~~~~~~~~~

-The (intra-procedural) control-flow graph, implemented by class ``ControlFlowNode`` and its
-subclasses, is the next higher level. It models flow of control inside functions and top-level
-scripts, and is overlaid on top of the AST in that each AST node has a corresponding CFG node. There
-are also synthetic CFG nodes that do not correspond to an AST node: entry and exit nodes
-(``ControlFlowEntryNode`` and ``ControlFlowExitNode``) mark the beginning and end, respectively, of
-the execution of a function or top-level, while guard nodes (``GuardControlFlowNode``) record the
-fact that some condition is known to hold at some point in the program.
+The (intra-procedural) control-flow graph (CFG), implemented by class ``ControlFlowNode`` and its
+subclasses, is the next layer. It models flow of control inside functions and top-level scripts. The
+CFG is overlaid on top of the AST, meaning that each AST node has a corresponding CFG node. There
+are also synthetic CFG nodes that do not correspond to an AST node. For example, entry and exit
+nodes (``ControlFlowEntryNode`` and ``ControlFlowExitNode``) mark the beginning and end,
+respectively, of the execution of a function or top-level script, while guard nodes
+(``GuardControlFlowNode``) record that some condition is known to hold at some point in the program.

 Basic blocks (class ``BasicBlock``) organize control-flow nodes into maximal sequences of
 straight-line code, which is vital for efficiently reasoning about control flow.

-SSA
-~~~
+Static single-assignment form
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-The static single-assignment representation (class ``SsaVariable`` and ``SsaDefinition``) uses
+The static single-assignment (SSA) representation (class ``SsaVariable`` and ``SsaDefinition``) uses
 control-flow information to split up local variables into SSA variables that each only have a single
-definition. In addition to regular definitions from assignments and increment/decrement expressions,
-the SSA form also introduces pseudo-definitions such as `phi nodes` where multiple possible values
-for a variable are merged and `refinement nodes` (also known as `pi nodes`) marking program points
-where additional information about a variable becomes available that may restrict its possible set
-of values.
+definition.
+
+In addition to regular definitions corresponding to assignments and increment/decrement expressions,
+the SSA form also introduces pseudo-definitions such as
+
+  - `phi nodes` where multiple possible values for a variable are merged
+  - `refinement nodes` (also known as `pi nodes`) marking program points where additional information about a variable becomes available that may restrict its possible set of values.

 Local data flow
 ~~~~~~~~~~~~~~~

 The (intra-procedural) data-flow graph, implemented by class ``DataFlow::Node`` and its subclasses,
-represents the flow of data within a function or top-level. Each expression has a corresponding
-data-flow node. Additionally, there are data-flow nodes that do not correspond to syntactic
-elements; for example, each SSA variable has a corresponding data-flow node. Note that flow between
-functions (through arguments and return values) is not modelled in this layer, except for the
-special case of immediately-invoked function expressions. Flow through object properties is also not
-modelled.
+represents the flow of data within a function or top-level scripts. Each expression has a
+corresponding data-flow node. Additionally, there are data-flow nodes that do not correspond to
+syntactic elements. For example, each SSA variable has a corresponding data-flow node. Note that
+flow between functions (through arguments and return values) is not modeled in this layer, except
+for the special case of immediately-invoked function expressions. Flow through object properties is
+also not modeled.

-This layer also implements the widely-used source-node API: class ``DataFlow::SourceNode`` and its
+This layer also implements the widely-used source-node API. The class ``DataFlow::SourceNode`` and its
 subclasses represent data-flow nodes where new objects are created (such as object expressions), or
 where non-local data flow enters the intra-procedural data-flow graph (such as function parameters
-or property reads). The source-node API provides convenience predicates for reasoning about these
+or property reads). The source-node API provides convenient predicates for reasoning about these
 nodes without having to explicitly encode data-flow graph traversal.

 Type inference
@@ -121,7 +122,7 @@ Call graph
 ~~~~~~~~~~

 The call graph is implemented as a predicate ``getACallee`` on ``DataFlow::InvokeNode``, the class
-of data-flow nodes representing function calls (with or wihout ``new``). It uses local data flow and
+of data-flow nodes representing function calls (with or without ``new``). It uses local data flow and
 type information, as well as type annotations where available.

 Type tracking
@@ -135,8 +136,8 @@ Framework models
 ~~~~~~~~~~~~~~~~

 The libraries under ``semmle/javascript/frameworks`` model a broad range of popular JavaScript
-libraries and frameworks, such as Express or Vue.js. Some framework modeling libraries are located
-under ``semmle/javascript`` directly, for instance ``Base64``, ``EmailClients`` and ``JsonParsers``.
+libraries and frameworks, such as Express and Vue.js. Some framework modeling libraries are located
+under ``semmle/javascript`` directly, for instance ``Base64``, ``EmailClients``, and ``JsonParsers``.

 Global data flow and taint tracking
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -147,13 +148,13 @@ information-flow analyses. Most of our security queries are based on this approa
 Extension points
 ----------------

-Below we discuss the most important extension points for the individual analysis layers introduced
+In this section, we discuss the most important extension points for the individual analysis layers introduced
 above.

 AST
 ~~~

-This layer should not normally be customized. It is technically possible to override, say,
+This layer should not normally be customized. It is technically possible to override, for instance,
 ``ASTNode.getChild`` to change the way the AST structure is represented, but this should normally be
 avoided in the interest of keeping a close correspondence between AST and concrete syntax.

@@ -174,9 +175,9 @@ Local data flow

 The ``DataFlow::SourceNode`` class uses the range pattern, so new kinds of source nodes can be
 added by extending ``Dataflow::SourceNode::Range``. Some of its subclasses can similarly be
-extended: ``DataFlow::ModuleImportNode`` models module imports, and ``DataFlow::ClassNode`` models
-class definitions. The former provides default implementations covering CommonJS, AMD and ECMAScript
-2015 modules, while the latter handles ECMAScript 2015 classes as well as traditional function-based
+extended. For example, ``DataFlow::ModuleImportNode`` models module imports, and ``DataFlow::ClassNode`` models
+class definitions. The former provides default implementations covering CommonJS, AMD, and ECMAScript
+2015 modules, while the latter handles ECMAScript 2015 classes, as well as traditional function-based
 classes. You can extend their corresponding ``::Range`` classes to add support for other module or
 class systems.

@@ -184,10 +185,10 @@ Type inference
 ~~~~~~~~~~~~~~

 You can override ``AnalyzedNode::getAValue`` to customize the type inference. Note that the type
-inference is expected to be sound, that is (as far as practical) the abstract values inferred for a
+inference is expected to be sound, that is (as far as practical), the abstract values inferred for a
 data-flow nodes should cover all possible concrete values this node may take on at runtime.

-You can also extend the set of abstract values: to add individual abstract values that are
+You can also extend the set of abstract values. To add individual abstract values that are
 independent of the program being analyzed, define a subclass of ``CustomAbstractValueTag``
 describing the new abstract value. There will then be a corresponding value of class
 ``CustomAbstractValue`` that you can use in overriding definitions of the ``getAValue`` predicate.
@@ -196,7 +197,7 @@ Call graph
 ~~~~~~~~~~

 You can override ``DataFlow::InvokeNode::getACallee(int)`` to customize the call graph. Note that
-overriding the zero-argument version ``getACallee()`` is not enough since higher layers use the
+overriding the zero-argument version ``getACallee()`` is not enough, since higher layers use the
 one-argument version.

 Type tracking
@@ -220,21 +221,25 @@ The ``semmle.javascript.frameworks.SQL`` module defines abstract classes for mod
 connector libraries, and the ``semmle.javascript.JsonParsers`` and
 ``semmle.javascript.frameworks.XML`` modules for modeling JSON and XML parsers, respectively.

-The ``semmle.javascript.Concepts`` modules defines a few very broad concepts such as system-command
+The ``semmle.javascript.Concepts`` module defines a small number of broad concepts such as system-command
 executions or file-system accesses, which are concretely instantiated in some of the existing
 framework libraries, but can of course be further extended to model additional frameworks.

 Global data flow and taint tracking
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-Most security queries consist of one QL file defining the query, one configuration module defining
-the taint-tracking configuration, and one customization module defining sources, sinks and
-sanitizers. For example, ``Security/CWE-078/CommandInjection.ql`` defines the command-injection
-query. It imports module ``semmle.javascript.security.dataflow.CommandInjection``, which defines the
-configuration class ``CommandInjection::Configuration``, and itself imports module
-``semmle.javascript.security.dataflow.CommandInjectionCustomizations``, which defines sources, sinks
-and sanitizers by means of three abstract classes ``CommandInjection::Source``,
-``CommandInjetion::Sink`` and ``CommandInjection::Sanitizer``, respectively.
+Most security queries consist of:
+
+  - one QL file defining the query
+  - one configuration module defining the taint-tracking configuration
+  - one customization module defining sources, sinks and sanitizers
+
+For example, ``Security/CWE-078/CommandInjection.ql`` defines the command-injection query. It
+imports the module ``semmle.javascript.security.dataflow.CommandInjection``, which defines the
+configuration class ``CommandInjection::Configuration``. This module in turn imports
+``semmle.javascript.security.dataflow.CommandInjectionCustomizations``, which defines three abstract
+classes (``CommandInjection::Source``, ``CommandInjection::Sink``, and
+``CommandInjection::Sanitizer``) that model sources, sinks, and sanitizers, respectively.

 To define additional sources, sinks or sanitizers for this or any other security query, import the
 customization module and extend these abstract classes with additional subclasses.