mirror of
https://github.com/github/codeql.git
synced 2025-12-16 16:53:25 +01:00
JavaScript: Apply suggestions from code review
Co-Authored-By: jf205 <42464962+jf205@users.noreply.github.com>
This commit is contained in:
@@ -9,40 +9,39 @@ Customization mechanisms
|
||||
|
||||
The two mechanisms used for customization are subclassing and overriding.
|
||||
|
||||
By subclassing an abstract class used by the JavaScript analysis and implementing its abstract
|
||||
member predicates we can teach the analysis to handle further instances of abstract concepts it
|
||||
already understands. For example, the standard library defines an abstract class
|
||||
``SystemCommandExecution`` that covers various APIs for executing operating-system commands. This
|
||||
class is used by the command-injection analysis to identify problematic flows where input from a
|
||||
potentially malicious user is interpreted as the name of a system command to execute. By defining
|
||||
additional subclasses of ``SystemCommandExecution``, we can make this analysis more powerful without
|
||||
touching its implementation.
|
||||
We can teach the JavaScript analysis to handle further instances of abstract concepts it already
|
||||
understands by subclassing abstract classes and implementing their member predicates. For example,
|
||||
the standard library defines an abstract class ``SystemCommandExecution`` that covers various APIs
|
||||
for executing operating-system commands. This class is used by the command-injection analysis to
|
||||
identify problematic flows where input from a potentially malicious user is interpreted as the name
|
||||
of a system command to execute. By defining additional subclasses of ``SystemCommandExecution``, we
|
||||
can make this analysis more powerful without touching its implementation.
|
||||
|
||||
By overriding a member predicate defined in the library, we can change its behavior either for all
|
||||
its receivers or only a subset. For example, the standard library predicate
|
||||
``ControlFlowNode::getASuccessor`` implements the basic control-flow graph on which many further
|
||||
analyses are based. By overriding it, we can add, suppress or modify control-flow graph edges.
|
||||
analyses are based. By overriding it, we can add, suppress, or modify control-flow graph edges.
|
||||
|
||||
Once a customization has been defined, it needs to be brought into scope so that the default
|
||||
analysis queries pick it up. This can be done by adding the customizing definitions to
|
||||
``Customizations.qll``, an initially empty library file that is imported by the default library
|
||||
``javascript.qll``.
|
||||
|
||||
Sometimes you may want to perform both kinds of customizations at the same time: subclass a base
|
||||
Sometimes you may want to perform both kinds of customizations at the same time. That is, subclass a base
|
||||
class to provide new implementations of an API, and override some member predicates of the same base
|
||||
class to selectively change the implementation of the API. This is not always easy to do, since the
|
||||
former requires the base class to be abstract, while the latter requires it to be concrete.
|
||||
|
||||
To work around this, the JavaScript library uses the so-called `range pattern`: the base class
|
||||
To work around this, the JavaScript library uses the so-called *range pattern*. In this pattern, the base class
|
||||
``Base`` itself is concrete, but it has an abstract companion class called ``Base::Range`` covering
|
||||
the same set of values. To change the implementation of the API, subclass ``Base`` and override its
|
||||
member predicates. To provide new implementations of the API, subclass ``Base::Range`` and implement
|
||||
its abstract member predicates.
|
||||
|
||||
For example, the class ``Base64::Encode`` in the standard library models base64-encoding libraries
|
||||
using the range pattern. To add support for a new library, subclass ``Base64::Encode::Range`` and
|
||||
implement the member predicates ``getInput`` and ``getOutput``. (Subclasses for many popular base64
|
||||
encoders are included in the standard library.) To customize the definition of ``getInput`` or
|
||||
using the range pattern. It comes with subclasses corresponding to many popular base64 encoders. To
|
||||
add support for a new library, subclass ``Base64::Encode::Range`` and implement the member
|
||||
predicates ``getInput`` and ``getOutput``. To customize the definition of ``getInput`` or
|
||||
``getOutput`` for a library that is already supported, extend ``Base64::Encode`` itself and override
|
||||
the predicate you want to customize.
|
||||
|
||||
@@ -57,57 +56,59 @@ The JavaScript analysis libraries have a layered structure with higher-level ana
|
||||
lower-level ones. Usually, classes and predicates in a lower layer should not depend on a higher
|
||||
layer to avoid performance problems and non-monotonic recursion.
|
||||
|
||||
We briefly survey the most important analysis layers here, starting from the lowest layer. Below we
|
||||
will discuss the extension points offered by the individual layers.
|
||||
In this section, we briefly introduce the most important analysis layers, starting from the lowest
|
||||
layer. Below, we discuss the extension points offered by the individual layers.
|
||||
|
||||
AST
|
||||
~~~
|
||||
Abstract syntax tree
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The abstract syntax tree, implemented by class ``ASTNode`` and its subclasses, is the lowest layer
|
||||
and more or less directly represents the information stored in the snapshot data base. It
|
||||
The abstract syntax tree (AST), implemented by class ``ASTNode`` and its subclasses, is the lowest layer
|
||||
and is a good representation of the information stored in the snapshot database. It
|
||||
corresponds closely to the syntactic structure of the program, only abstracting away from
|
||||
typographical details such as whitespace and indentation.
|
||||
|
||||
CFG
|
||||
~~~
|
||||
Control-flow graph
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The (intra-procedural) control-flow graph, implemented by class ``ControlFlowNode`` and its
|
||||
subclasses, is the next higher level. It models flow of control inside functions and top-level
|
||||
scripts, and is overlaid on top of the AST in that each AST node has a corresponding CFG node. There
|
||||
are also synthetic CFG nodes that do not correspond to an AST node: entry and exit nodes
|
||||
(``ControlFlowEntryNode`` and ``ControlFlowExitNode``) mark the beginning and end, respectively, of
|
||||
the execution of a function or top-level, while guard nodes (``GuardControlFlowNode``) record the
|
||||
fact that some condition is known to hold at some point in the program.
|
||||
The (intra-procedural) control-flow graph (CFG), implemented by class ``ControlFlowNode`` and its
|
||||
subclasses, is the next layer. It models flow of control inside functions and top-level scripts. The
|
||||
CFG is overlaid on top of the AST, meaning that each AST node has a corresponding CFG node. There
|
||||
are also synthetic CFG nodes that do not correspond to an AST node. For example, entry and exit
|
||||
nodes (``ControlFlowEntryNode`` and ``ControlFlowExitNode``) mark the beginning and end,
|
||||
respectively, of the execution of a function or top-level script, while guard nodes
|
||||
(``GuardControlFlowNode``) record that some condition is known to hold at some point in the program.
|
||||
|
||||
Basic blocks (class ``BasicBlock``) organize control-flow nodes into maximal sequences of
|
||||
straight-line code, which is vital for efficiently reasoning about control flow.
|
||||
|
||||
SSA
|
||||
~~~
|
||||
Static single-assignment form
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The static single-assignment representation (class ``SsaVariable`` and ``SsaDefinition``) uses
|
||||
The static single-assignment (SSA) representation (class ``SsaVariable`` and ``SsaDefinition``) uses
|
||||
control-flow information to split up local variables into SSA variables that each only have a single
|
||||
definition. In addition to regular definitions from assignments and increment/decrement expressions,
|
||||
the SSA form also introduces pseudo-definitions such as `phi nodes` where multiple possible values
|
||||
for a variable are merged and `refinement nodes` (also known as `pi nodes`) marking program points
|
||||
where additional information about a variable becomes available that may restrict its possible set
|
||||
of values.
|
||||
definition.
|
||||
|
||||
In addition to regular definitions corresponding to assignments and increment/decrement expressions,
|
||||
the SSA form also introduces pseudo-definitions such as
|
||||
|
||||
- `phi nodes` where multiple possible values for a variable are merged
|
||||
- `refinement nodes` (also known as `pi nodes`) marking program points where additional information about a variable becomes available that may restrict its possible set of values.
|
||||
|
||||
Local data flow
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
The (intra-procedural) data-flow graph, implemented by class ``DataFlow::Node`` and its subclasses,
|
||||
represents the flow of data within a function or top-level. Each expression has a corresponding
|
||||
data-flow node. Additionally, there are data-flow nodes that do not correspond to syntactic
|
||||
elements; for example, each SSA variable has a corresponding data-flow node. Note that flow between
|
||||
functions (through arguments and return values) is not modelled in this layer, except for the
|
||||
special case of immediately-invoked function expressions. Flow through object properties is also not
|
||||
modelled.
|
||||
represents the flow of data within a function or top-level scripts. Each expression has a
|
||||
corresponding data-flow node. Additionally, there are data-flow nodes that do not correspond to
|
||||
syntactic elements. For example, each SSA variable has a corresponding data-flow node. Note that
|
||||
flow between functions (through arguments and return values) is not modeled in this layer, except
|
||||
for the special case of immediately-invoked function expressions. Flow through object properties is
|
||||
also not modeled.
|
||||
|
||||
This layer also implements the widely-used source-node API: class ``DataFlow::SourceNode`` and its
|
||||
This layer also implements the widely-used source-node API. The class ``DataFlow::SourceNode`` and its
|
||||
subclasses represent data-flow nodes where new objects are created (such as object expressions), or
|
||||
where non-local data flow enters the intra-procedural data-flow graph (such as function parameters
|
||||
or property reads). The source-node API provides convenience predicates for reasoning about these
|
||||
or property reads). The source-node API provides convenient predicates for reasoning about these
|
||||
nodes without having to explicitly encode data-flow graph traversal.
|
||||
|
||||
Type inference
|
||||
@@ -121,7 +122,7 @@ Call graph
|
||||
~~~~~~~~~~
|
||||
|
||||
The call graph is implemented as a predicate ``getACallee`` on ``DataFlow::InvokeNode``, the class
|
||||
of data-flow nodes representing function calls (with or wihout ``new``). It uses local data flow and
|
||||
of data-flow nodes representing function calls (with or without ``new``). It uses local data flow and
|
||||
type information, as well as type annotations where available.
|
||||
|
||||
Type tracking
|
||||
@@ -135,8 +136,8 @@ Framework models
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The libraries under ``semmle/javascript/frameworks`` model a broad range of popular JavaScript
|
||||
libraries and frameworks, such as Express or Vue.js. Some framework modeling libraries are located
|
||||
under ``semmle/javascript`` directly, for instance ``Base64``, ``EmailClients`` and ``JsonParsers``.
|
||||
libraries and frameworks, such as Express and Vue.js. Some framework modeling libraries are located
|
||||
under ``semmle/javascript`` directly, for instance ``Base64``, ``EmailClients``, and ``JsonParsers``.
|
||||
|
||||
Global data flow and taint tracking
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@@ -147,13 +148,13 @@ information-flow analyses. Most of our security queries are based on this approa
|
||||
Extension points
|
||||
----------------
|
||||
|
||||
Below we discuss the most important extension points for the individual analysis layers introduced
|
||||
In this section, we discuss the most important extension points for the individual analysis layers introduced
|
||||
above.
|
||||
|
||||
AST
|
||||
~~~
|
||||
|
||||
This layer should not normally be customized. It is technically possible to override, say,
|
||||
This layer should not normally be customized. It is technically possible to override, for instance,
|
||||
``ASTNode.getChild`` to change the way the AST structure is represented, but this should normally be
|
||||
avoided in the interest of keeping a close correspondence between AST and concrete syntax.
|
||||
|
||||
@@ -174,9 +175,9 @@ Local data flow
|
||||
|
||||
The ``DataFlow::SourceNode`` class uses the range pattern, so new kinds of source nodes can be
|
||||
added by extending ``Dataflow::SourceNode::Range``. Some of its subclasses can similarly be
|
||||
extended: ``DataFlow::ModuleImportNode`` models module imports, and ``DataFlow::ClassNode`` models
|
||||
class definitions. The former provides default implementations covering CommonJS, AMD and ECMAScript
|
||||
2015 modules, while the latter handles ECMAScript 2015 classes as well as traditional function-based
|
||||
extended. For example, ``DataFlow::ModuleImportNode`` models module imports, and ``DataFlow::ClassNode`` models
|
||||
class definitions. The former provides default implementations covering CommonJS, AMD, and ECMAScript
|
||||
2015 modules, while the latter handles ECMAScript 2015 classes, as well as traditional function-based
|
||||
classes. You can extend their corresponding ``::Range`` classes to add support for other module or
|
||||
class systems.
|
||||
|
||||
@@ -184,10 +185,10 @@ Type inference
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
You can override ``AnalyzedNode::getAValue`` to customize the type inference. Note that the type
|
||||
inference is expected to be sound, that is (as far as practical) the abstract values inferred for a
|
||||
inference is expected to be sound, that is (as far as practical), the abstract values inferred for a
|
||||
data-flow nodes should cover all possible concrete values this node may take on at runtime.
|
||||
|
||||
You can also extend the set of abstract values: to add individual abstract values that are
|
||||
You can also extend the set of abstract values. To add individual abstract values that are
|
||||
independent of the program being analyzed, define a subclass of ``CustomAbstractValueTag``
|
||||
describing the new abstract value. There will then be a corresponding value of class
|
||||
``CustomAbstractValue`` that you can use in overriding definitions of the ``getAValue`` predicate.
|
||||
@@ -196,7 +197,7 @@ Call graph
|
||||
~~~~~~~~~~
|
||||
|
||||
You can override ``DataFlow::InvokeNode::getACallee(int)`` to customize the call graph. Note that
|
||||
overriding the zero-argument version ``getACallee()`` is not enough since higher layers use the
|
||||
overriding the zero-argument version ``getACallee()`` is not enough, since higher layers use the
|
||||
one-argument version.
|
||||
|
||||
Type tracking
|
||||
@@ -220,21 +221,25 @@ The ``semmle.javascript.frameworks.SQL`` module defines abstract classes for mod
|
||||
connector libraries, and the ``semmle.javascript.JsonParsers`` and
|
||||
``semmle.javascript.frameworks.XML`` modules for modeling JSON and XML parsers, respectively.
|
||||
|
||||
The ``semmle.javascript.Concepts`` modules defines a few very broad concepts such as system-command
|
||||
The ``semmle.javascript.Concepts`` module defines a small number of broad concepts such as system-command
|
||||
executions or file-system accesses, which are concretely instantiated in some of the existing
|
||||
framework libraries, but can of course be further extended to model additional frameworks.
|
||||
|
||||
Global data flow and taint tracking
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Most security queries consist of one QL file defining the query, one configuration module defining
|
||||
the taint-tracking configuration, and one customization module defining sources, sinks and
|
||||
sanitizers. For example, ``Security/CWE-078/CommandInjection.ql`` defines the command-injection
|
||||
query. It imports module ``semmle.javascript.security.dataflow.CommandInjection``, which defines the
|
||||
configuration class ``CommandInjection::Configuration``, and itself imports module
|
||||
``semmle.javascript.security.dataflow.CommandInjectionCustomizations``, which defines sources, sinks
|
||||
and sanitizers by means of three abstract classes ``CommandInjection::Source``,
|
||||
``CommandInjetion::Sink`` and ``CommandInjection::Sanitizer``, respectively.
|
||||
Most security queries consist of:
|
||||
|
||||
- one QL file defining the query
|
||||
- one configuration module defining the taint-tracking configuration
|
||||
- one customization module defining sources, sinks and sanitizers
|
||||
|
||||
For example, ``Security/CWE-078/CommandInjection.ql`` defines the command-injection query. It
|
||||
imports the module ``semmle.javascript.security.dataflow.CommandInjection``, which defines the
|
||||
configuration class ``CommandInjection::Configuration``. This module in turn imports
|
||||
``semmle.javascript.security.dataflow.CommandInjectionCustomizations``, which defines three abstract
|
||||
classes (``CommandInjection::Source``, ``CommandInjection::Sink``, and
|
||||
``CommandInjection::Sanitizer``) that model sources, sinks, and sanitizers, respectively.
|
||||
|
||||
To define additional sources, sinks or sanitizers for this or any other security query, import the
|
||||
customization module and extend these abstract classes with additional subclasses.
|
||||
|
||||
Reference in New Issue
Block a user