Introducing the QL libraries for JavaScript =========================================== Overview -------- There is an extensive QL library for analyzing JavaScript code. The classes in this library present the data from a snapshot database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks. The library is implemented as a set of QL modules, that is, files with the extension ``.qll``. The module ``javascript.qll`` imports most other standard library modules, so you can include the complete library by beginning your query with: .. code-block:: ql import javascript The rest of this tutorial briefly summarizes the most important QL classes and predicates provided by this library, including references to the `detailed API documentation `__ where applicable. Introducing the library ----------------------- The QL JavaScript library presents information about JavaScript source code at different levels: - **Textual** — classes that represent source code as unstructured text files - **Lexical** — classes that represent source code as a series of tokens and comments - **Syntactic** — classes that represent source code as an abstract syntax tree - **Name binding** — classes that represent scopes and variables - **Control flow** — classes that represent the flow of control during execution - **Data flow** — classes that you can use to reason about data flow in JavaScript source code - **Type inference** — classes that you can use to approximate types for JavaScript expressions and variables - **Call graph** — classes that represent the caller-callee relationship between functions - **Inter-procedural data flow** — classes that you can use to define inter-procedural data flow and taint tracking analyses - **Frameworks** — classes that represent source code entities that have a special meaning to JavaScript tools and frameworks Note that representations above the textual level (for example the lexical representation or the flow graphs) are only available for JavaScript code that does not contain fatal syntax errors. For code with such errors, the only information available is at the textual level, as well as information about the errors themselves. Additionally, there is library support for working with HTML documents, JSON, and YAML data, JSDoc comments, and regular expressions. Textual level ~~~~~~~~~~~~~ At its most basic level, a JavaScript code base can simply be viewed as a collection of files organized into folders, where each file is composed of zero or more lines of text. Note that the textual content of a program is not included in the snapshot database unless you specifically request it during extraction. In particular, snapshots on LGTM do not normally include textual information. Files and folders ^^^^^^^^^^^^^^^^^ In QL, files are represented as entities of class `File `__, and folders as entities of class `Folder `__, both of which are subclasses of class `Container `__. Class `Container `__ provides the following member predicates: - ``Container.getParentContainer()`` returns the parent folder of the file or folder. - ``Container.getAFile()`` returns a file within the folder. - ``Container.getAFolder()`` returns a folder nested within the folder. Note that while ``getAFile`` and ``getAFolder`` are declared on class `Container `__, they currently only have results for `Folder `__\ s. Both files and folders have paths, which can be accessed by the predicate ``Container.getAbsolutePath()``. For example, if ``f`` represents a file with the path ``/home/user/project/src/index.js``, then ``f.getAbsolutePath()`` evaluates to the string ``"/home/user/project/src/index.js"``, while ``f.getParentContainer().getAbsolutePath()`` returns ``"/home/user/project/src"``. These paths are absolute file system paths. If you want to obtain the path of a file relative to the snapshot source location, use ``Container.getRelativePath()`` instead. Note, however, that a snapshot may contain files that are not located underneath the snapshot source location; for such files, ``getRelativePath()`` will not return anything. The following member predicates of class `Container `__ provide more information about the name of a file or folder: - ``Container.getBaseName()`` returns the base name of a file or folder, not including its parent folder, but including its extension. In the above example, ``f.getBaseName()`` would return the string ``"index.js"``. - ``Container.getStem()`` is similar to ``Container.getBaseName()``, but it does *not* include the file extension; so ``f.getStem()`` returns ``"index"``. - ``Container.getExtension()`` returns the file extension, not including the dot; so ``f.getExtension()`` returns ``"js"``. For example, the following query computes, for each folder, the number of JavaScript files (that is, files with extension ``js``) contained in the folder: .. code-block:: ql import javascript from Folder d select d.getRelativePath(), count(File f | f = d.getAFile() and f.getExtension() = "js") ➤ `See this in the query console `__. When you run the query on most projects, the results include folders that contain files with a ``js`` extension and folders that don't. Locations ^^^^^^^^^ Most entities in a snapshot database have an associated source location. Locations are identified by four pieces of information: a file, a start line, a start column, an end line, and an end column. Line and column counts are 1-based (so the first character of a file is at line 1, column 1), and the end position is inclusive. All entities associated with a source location belong to the QL class `Locatable `__. The location itself is modeled by the QL class `Location `__ and can be accessed through the member predicate ``Locatable.getLocation()``. The `Location `__ class provides the following member predicates: - ``Location.getFile()``, ``Location.getStartLine()``, ``Location.getStartColumn()``, ``Location.getEndLine()``, ``Location.getEndColumn()`` return detailed information about the location. - ``Location.getNumLines()`` returns the number of (whole or partial) lines covered by the location. - ``Location.startsBefore(Location)`` and ``Location.endsAfter(Location)`` determine whether one location starts before or ends after another location. - ``Location.contains(Location)`` indicates whether one location completely contains another location; ``l1.contains(l2)`` holds if, and only if, ``l1.startsBefore(l2)`` and ``l1.endsAfter(l2)``. Lines ^^^^^ Lines of text in files are represented by the QL class `Line `__. This class offers the following member predicates: - ``Line.getText()`` returns the text of the line, excluding any terminating newline characters. - ``Line.getTerminator()`` returns the terminator character(s) of the line. The last line in a file may not have any terminator characters, in which case this predicate does not return anything; otherwise it returns either the two-character string ``"\r\n"`` (carriage-return followed by newline), or one of the one-character strings ``"\n"`` (newline), ``"\r"`` (carriage-return), ``"\u2028"`` (Unicode character LINE SEPARATOR), ``"\u2029"`` (Unicode character PARAGRAPH SEPARATOR). Note that, as mentioned above, the textual representation of the program is not included in the snapshot database by default. Lexical level ~~~~~~~~~~~~~ A slightly more structured view of a JavaScript program is provided by the classes `Token `__ and `Comment `__, which represent tokens and comments, respectively. Tokens ^^^^^^ The most important member predicates of class `Token `__ are as follows: - ``Token.getValue()`` returns the source text of the token. - ``Token.getIndex()`` returns the index of the token within its enclosing script. - ``Token.getNextToken()`` and ``Token.getPreviousToken()`` navigate between tokens. The `Token `__ class has nine subclasses, each representing a particular kind of token: - `EOFToken `__: a marker token representing the end of a script - `NullLiteralToken `__, `BooleanLiteralToken `__, `NumericLiteralToken `__, `StringLiteralToken `__ and `RegularExpressionToken `__: different kinds of literals - `IdentifierToken `__ and `KeywordToken `__: identifiers and keywords (including reserved words) respectively - `PunctuatorToken `__: operators and other punctuation symbols As an example of a query operating entirely on the lexical level, consider the following query, which finds consecutive comma tokens arising from an omitted element in an array expression: .. code-block:: ql import javascript class CommaToken extends PunctuatorToken { CommaToken() { getValue() = "," } } from CommaToken comma where comma.getNextToken() instanceof CommaToken select comma, "Omitted array elements are bad style." ➤ `See this in the query console `__. If the query returns no results, this pattern isn't used in the projects that you analyzed. You can use predicate ``Locatable.getFirstToken()`` and ``Locatable.getLastToken()`` to access the first and last token (if any) belonging to an element with a source location. Comments ^^^^^^^^ The class `Comment `__ and its subclasses represent the different kinds of comments that can occur in JavaScript programs: - `Comment `__: any comment - `LineComment `__: a single-line comment terminated by an end-of-line character - `SlashSlashComment `__: a plain JavaScript single-line comment starting with ``//`` - `HtmlLineComment `__: a (non-standard) HTML comment - `HtmlCommentStart `__: an HTML comment starting with ```` - `BlockComment `__: a block comment potentially spanning multiple lines - `SlashStarComment `__: a plain JavaScript block comment surrounded with ``/*...*/`` - `DocComment `__: a documentation block comment surrounded with ``/**...*/`` The most important member predicates are as follows: - ``Comment.getText()`` returns the source text of the comment, not including delimiters. - ``Comment.getLine(i)`` returns the ``i``\ th line of text within the comment (0-based). - ``Comment.getNumLines()`` returns the number of lines in the comment. - ``Comment.getNextToken()`` returns the token immediately following a comment. Note that such a token always exists: if a comment appears at the end of a file, its following token is an `EOFToken `__. As an example of a query using only lexical information, consider the following query for finding HTML comments, which are not a standard ECMAScript feature and should be avoided: .. code-block:: ql import javascript from HtmlLineComment c select c, "Do not use HTML comments." ➤ `See this in the query console `__. When we ran this query on the *mozilla/pdf.js* project in LGTM.com, we found three HTML comments. Syntactic level ~~~~~~~~~~~~~~~ The majority of classes in the QL JavaScript library is concerned with representing a JavaScript program as a collection of `abstract syntax trees `__ (ASTs). The QL class `ASTNode `__ contains all entities representing nodes in the abstract syntax trees and defines generic tree traversal predicates: - ``ASTNode.getChild(i)``: returns the ``i``\ th child of this AST node. - ``ASTNode.getAChild()``: returns any child of this AST node. - ``ASTNode.getParent()``: returns the parent node of this AST node, if any. .. pull-quote:: Note These predicates should only be used to perform generic AST traversal. To access children of specific AST node types, the specialized predicates introduced below should be used instead. In particular, queries should not rely on the numeric indices of child nodes relative to their parent nodes: these are considered an implementation detail that may change between versions of the library. Top-levels ^^^^^^^^^^ From a syntactic point of view, each JavaScript program is composed of one or more top-level code blocks (or *top-levels* for short), which are blocks of JavaScript code that do not belong to a larger code block. Top-levels are represented by the class `TopLevel `__ and its subclasses: - `TopLevel `__ - `Script `__: a stand-alone file or HTML ``