mirror of
https://github.com/github/codeql.git
synced 2025-12-21 11:16:30 +01:00
69 lines
3.8 KiB
Plaintext
69 lines
3.8 KiB
Plaintext
/**
|
|
* This library implements the analysis described in the following two papers:
|
|
*
|
|
* James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for
|
|
* Regular Expression Denial-of-Service Attacks. NSS 2013.
|
|
* (http://www.cs.bham.ac.uk/~hxt/research/reg-exp-sec.pdf)
|
|
* Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression
|
|
* Exponential Runtime via Substructural Logics. 2014.
|
|
* (https://www.cs.bham.ac.uk/~hxt/research/redos_full.pdf)
|
|
*
|
|
* The basic idea is to search for overlapping cycles in the NFA, that is,
|
|
* states `q` such that there are two distinct paths from `q` to itself
|
|
* that consume the same word `w`.
|
|
*
|
|
* For any such state `q`, an attack string can be constructed as follows:
|
|
* concatenate a prefix `v` that takes the NFA to `q` with `n` copies of
|
|
* the word `w` that leads back to `q` along two different paths, followed
|
|
* by a suffix `x` that is _not_ accepted in state `q`. A backtracking
|
|
* implementation will need to explore at least 2^n different ways of going
|
|
* from `q` back to itself while trying to match the `n` copies of `w`
|
|
* before finally giving up.
|
|
*
|
|
* Now in order to identify overlapping cycles, all we have to do is find
|
|
* pumpable forks, that is, states `q` that can transition to two different
|
|
* states `r1` and `r2` on the same input symbol `c`, such that there are
|
|
* paths from both `r1` and `r2` to `q` that consume the same word. The latter
|
|
* condition is equivalent to saying that `(q, q)` is reachable from `(r1, r2)`
|
|
* in the product NFA.
|
|
*
|
|
* This is what the library does. It makes a simple attempt to construct a
|
|
* prefix `v` leading into `q`, but only to improve the alert message.
|
|
* And the library tries to prove the existence of a suffix that ensures
|
|
* rejection. This check might fail, which can cause false positives.
|
|
*
|
|
* Finally, sometimes it depends on the translation whether the NFA generated
|
|
* for a regular expression has a pumpable fork or not. We implement one
|
|
* particular translation, which may result in false positives or negatives
|
|
* relative to some particular JavaScript engine.
|
|
*
|
|
* More precisely, the library constructs an NFA from a regular expression `r`
|
|
* as follows:
|
|
*
|
|
* * Every sub-term `t` gives rise to an NFA state `Match(t,i)`, representing
|
|
* the state of the automaton before attempting to match the `i`th character in `t`.
|
|
* * There is one accepting state `Accept(r)`.
|
|
* * There is a special `AcceptAnySuffix(r)` state, which accepts any suffix string
|
|
* by using an epsilon transition to `Accept(r)` and an any transition to itself.
|
|
* * Transitions between states may be labelled with epsilon, or an abstract
|
|
* input symbol.
|
|
* * Each abstract input symbol represents a set of concrete input characters:
|
|
* either a single character, a set of characters represented by a
|
|
* character class, or the set of all characters.
|
|
* * The product automaton is constructed lazily, starting with pair states
|
|
* `(q, q)` where `q` is a fork, and proceeding along an over-approximate
|
|
* step relation.
|
|
* * The over-approximate step relation allows transitions along pairs of
|
|
* abstract input symbols where the symbols have overlap in the characters they accept.
|
|
* * Once a trace of pairs of abstract input symbols that leads from a fork
|
|
* back to itself has been identified, we attempt to construct a concrete
|
|
* string corresponding to it, which may fail.
|
|
* * Lastly we ensure that any state reached by repeating `n` copies of `w` has
|
|
* a suffix `x` (possible empty) that is most likely __not__ accepted.
|
|
*/
|
|
|
|
private import semmle.python.RegexTreeView::RegexTreeView as TreeView
|
|
// ExponentialBackTracking should be used directly from the shared pack, and not from this file.
|
|
deprecated private import codeql.regex.nfa.ExponentialBackTracking::Make<TreeView> as Dep
|
|
import Dep
|