Merge master into next.

Conflict in `cpp/ql/test/library-tests/sideEffects/functions/sideEffects.expected`,
resolved by accepting test output (combining changes).
This commit is contained in:
Aditya Sharad
2018-12-12 17:22:16 +00:00
345 changed files with 9800 additions and 2436 deletions

View File

@@ -0,0 +1,88 @@
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>
<overview>
<p>
Sanitizing untrusted URLs is an important technique for
preventing attacks such as request forgeries and malicious
redirections. Usually, this is done by checking that the host of a URL
is in a set of allowed hosts.
</p>
<p>
However, it is notoriously error-prone to treat the URL as
a string and check if one of the allowed hosts is a substring of the
URL. Malicious URLs can bypass such security checks by embedding one
of the allowed hosts in an unexpected location.
</p>
<p>
Even if the substring check is not used in a
security-critical context, the incomplete check may still cause
undesirable behaviors when the check succeeds accidentally.
</p>
</overview>
<recommendation>
<p>
Parse a URL before performing a check on its host value,
and ensure that the check handles arbitrary subdomain sequences
correctly.
</p>
</recommendation>
<example>
<p>
The following example code checks that a URL redirection
will reach the <code>example.com</code> domain, or one of its
subdomains, and not some malicious site.
</p>
<sample src="examples/IncompleteUrlSubstringSanitization_BAD1.js"/>
<p>
The substring check is, however, easy to bypass. For example
by embedding <code>example.com</code> in the path component:
<code>http://evil-example.net/example.com</code>, or in the query
string component: <code>http://evil-example.net/?x=example.com</code>.
Address these shortcomings by checking the host of the parsed URL instead:
</p>
<sample src="examples/IncompleteUrlSubstringSanitization_BAD2.js"/>
<p>
This is still not a sufficient check as the
following URLs bypass it: <code>http://evil-example.com</code>
<code>http://example.com.evil-example.net</code>.
Instead, use an explicit whitelist of allowed hosts to
make the redirect secure:
</p>
<sample src="examples/IncompleteUrlSubstringSanitization_GOOD.js"/>
</example>
<references>
<li>OWASP: <a href="https://www.owasp.org/index.php/Server_Side_Request_Forgery">SSRF</a></li>
<li>OWASP: <a href="https://www.owasp.org/index.php/Unvalidated_Redirects_and_Forwards_Cheat_Sheet">XSS Unvalidated Redirects and Forwards Cheat Sheet</a>.</li>
</references>
</qhelp>

View File

@@ -0,0 +1,58 @@
/**
* @name Incomplete URL substring sanitization
* @description Security checks on the substrings of an unparsed URL are often vulnerable to bypassing.
* @kind problem
* @problem.severity warning
* @precision high
* @id js/incomplete-url-substring-sanitization
* @tags correctness
* security
* external/cwe/cwe-20
*/
import javascript
private import semmle.javascript.dataflow.InferredTypes
from DataFlow::MethodCallNode call, string name, DataFlow::Node substring, string target
where
(name = "indexOf" or name = "includes" or name = "startsWith" or name = "endsWith") and
call.getMethodName() = name and
substring = call.getArgument(0) and
substring.mayHaveStringValue(target) and
(
// target contains a domain on a common TLD, and perhaps some other URL components
target.regexpMatch("(?i)([a-z]*:?//)?\\.?([a-z0-9-]+\\.)+(com|org|edu|gov|uk|net)(:[0-9]+)?/?") or
// target is a HTTP URL to a domain on any TLD
target.regexpMatch("(?i)https?://([a-z0-9-]+\\.)+([a-z]+)(:[0-9]+)?/?")
) and
// whitelist
not (
name = "indexOf" and
(
// arithmetic on the indexOf-result
any(ArithmeticExpr e).getAnOperand().getUnderlyingValue() = call.asExpr()
or
// non-trivial position check on the indexOf-result
exists(EqualityTest test, Expr n |
test.hasOperands(call.asExpr(), n) |
not n.getIntValue() = [-1..0]
)
)
or
// the leading dot in a subdomain sequence makes the suffix-check safe (if it is performed on the host of the url)
name = "endsWith" and
target.regexpMatch("(?i)\\.([a-z0-9-]+)(\\.[a-z0-9-]+)+")
or
// the trailing slash makes the prefix-check safe
(
name = "startsWith"
or
name = "indexOf" and
exists(EqualityTest test, Expr n |
test.hasOperands(call.asExpr(), n) and
n.getIntValue() = 0
)
) and
target.regexpMatch(".*/")
)
select call, "'$@' may be at an arbitrary position in the sanitized URL.", substring, target

View File

@@ -0,0 +1,72 @@
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>
<overview>
<p>
The <code>indexOf</code> and <code>lastIndexOf</code> methods are sometimes used to check
if a substring occurs at a certain position in a string. However, if the returned index
is compared to an expression that might evaluate to -1, the check may pass in some
cases where the substring was not found at all.
</p>
<p>
Specifically, this can easily happen when implementing <code>endsWith</code> using
<code>indexOf</code>.
</p>
</overview>
<recommendation>
<p>
Use <code>String.prototype.endsWith</code> if it is available.
Otherwise, explicitly handle the -1 case, either by checking the relative
lengths of the strings, or by checking if the returned index is -1.
</p>
</recommendation>
<example>
<p>
The following example uses <code>lastIndexOf</code> to determine if the string <code>x</code>
ends with the string <code>y</code>:
</p>
<sample src="examples/IncorrectSuffixCheck.js"/>
<p>
However, if <code>y</code> is one character longer than <code>x</code>, the right-hand side
<code>x.length - y.length</code> becomes -1, which then equals the return value
of <code>lastIndexOf</code>. This will make the test pass, even though <code>x</code> does not
end with <code>y</code>.
</p>
<p>
To avoid this, explicitly check for the -1 case:
</p>
<sample src="examples/IncorrectSuffixCheckGood.js"/>
</example>
<references>
<li>MDN: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/endsWith">String.prototype.endsWith</a></li>
<li>MDN: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf">String.prototype.indexOf</a></li>
</references>
</qhelp>

View File

@@ -0,0 +1,150 @@
/**
* @name Incorrect suffix check
* @description Using indexOf to implement endsWith functionality is error-prone if the -1 case is not explicitly handled.
* @kind problem
* @problem.severity error
* @precision high
* @id js/incorrect-suffix-check
* @tags security
* correctness
* external/cwe/cwe-020
*/
import javascript
/**
* A call to `indexOf` or `lastIndexOf`.
*/
class IndexOfCall extends DataFlow::MethodCallNode {
IndexOfCall() {
exists (string name | name = getMethodName() |
name = "indexOf" or
name = "lastIndexOf") and
getNumArgument() = 1
}
/** Gets the receiver or argument of this call. */
DataFlow::Node getAnOperand() {
result = getReceiver() or
result = getArgument(0)
}
/**
* Gets an `indexOf` call with the same receiver, argument, and method name, including this call itself.
*/
IndexOfCall getAnEquivalentIndexOfCall() {
result.getReceiver().getALocalSource() = this.getReceiver().getALocalSource() and
result.getArgument(0).getALocalSource() = this.getArgument(0).getALocalSource() and
result.getMethodName() = this.getMethodName()
}
/**
* Gets an expression that refers to the return value of this call.
*/
Expr getAUse() {
this.flowsToExpr(result)
}
}
/**
* Gets a source of the given string value, or one of its operands if it is a concatenation.
*/
DataFlow::SourceNode getStringSource(DataFlow::Node node) {
result = node.getALocalSource()
or
result = StringConcatenation::getAnOperand(node).getALocalSource()
}
/**
* An expression that takes the length of a string literal.
*/
class LiteralLengthExpr extends DotExpr {
LiteralLengthExpr() {
getPropertyName() = "length" and
getBase() instanceof StringLiteral
}
/**
* Gets the value of the string literal whose length is taken.
*/
string getBaseValue() {
result = getBase().getStringValue()
}
}
/**
* Holds if `length` is derived from the length of the given `indexOf`-operand.
*/
predicate isDerivedFromLength(DataFlow::Node length, DataFlow::Node operand) {
exists (IndexOfCall call | operand = call.getAnOperand() |
length = getStringSource(operand).getAPropertyRead("length")
or
// Find a literal length with the same string constant
exists (LiteralLengthExpr lengthExpr |
lengthExpr.getContainer() = call.getContainer() and
lengthExpr.getBaseValue() = operand.asExpr().getStringValue() and
length = lengthExpr.flow())
or
// Find an integer constants that equals the length of string constant
exists (Expr lengthExpr |
lengthExpr.getContainer() = call.getContainer() and
lengthExpr.getIntValue() = operand.asExpr().getStringValue().length() and
length = lengthExpr.flow())
)
or
isDerivedFromLength(length.getAPredecessor(), operand)
or
exists (SubExpr sub |
isDerivedFromLength(sub.getAnOperand().flow(), operand) and
length = sub.flow())
}
/**
* An equality comparison of form `A.indexOf(B) === A.length - B.length` or similar.
*
* We assume A and B are strings, even A and/or B could be also be arrays.
* The comparison with the length rarely occurs for arrays, however.
*/
class UnsafeIndexOfComparison extends EqualityTest {
IndexOfCall indexOf;
DataFlow::Node testedValue;
UnsafeIndexOfComparison() {
hasOperands(indexOf.getAUse(), testedValue.asExpr()) and
isDerivedFromLength(testedValue, indexOf.getReceiver()) and
isDerivedFromLength(testedValue, indexOf.getArgument(0)) and
// Ignore cases like `x.indexOf("/") === x.length - 1` that can only be bypassed if `x` is the empty string.
// Sometimes strings are just known to be non-empty from the context, and it is unlikely to be a security issue,
// since it's obviously not a domain name check.
not indexOf.getArgument(0).mayHaveStringValue(any(string s | s.length() = 1)) and
// Relative string length comparison, such as A.length > B.length, or (A.length - B.length) > 0
not exists (RelationalComparison compare |
isDerivedFromLength(compare.getAnOperand().flow(), indexOf.getReceiver()) and
isDerivedFromLength(compare.getAnOperand().flow(), indexOf.getArgument(0))
) and
// Check for indexOf being -1
not exists (EqualityTest test, Expr minusOne |
test.hasOperands(indexOf.getAnEquivalentIndexOfCall().getAUse(), minusOne) and
minusOne.getIntValue() = -1
) and
// Check for indexOf being >1, or >=0, etc
not exists (RelationalComparison test |
test.getGreaterOperand() = indexOf.getAnEquivalentIndexOfCall().getAUse() and
exists (int value | value = test.getLesserOperand().getIntValue() |
value >= 0
or
not test.isInclusive() and
value = -1)
)
}
IndexOfCall getIndexOf() {
result = indexOf
}
}
from UnsafeIndexOfComparison comparison
select comparison, "This suffix check is missing a length comparison to correctly handle " + comparison.getIndexOf().getMethodName() + " returning -1."

View File

@@ -0,0 +1,7 @@
app.get('/some/path', function(req, res) {
let url = req.param("url");
// BAD: the host of `url` may be controlled by an attacker
if (url.includes("example.com")) {
res.redirect(url);
}
});

View File

@@ -0,0 +1,8 @@
app.get('/some/path', function(req, res) {
let url = req.param("url"),
host = urlLib.parse(url).host;
// BAD: the host of `url` may be controlled by an attacker
if (host.includes("example.com")) {
res.redirect(url);
}
});

View File

@@ -0,0 +1,13 @@
app.get('/some/path', function(req, res) {
let url = req.param('url'),
host = urlLib.parse(url).host;
// GOOD: the host of `url` can not be controlled by an attacker
let allowedHosts = [
'example.com',
'beta.example.com',
'www.example.com'
];
if (allowedHosts.includes(host)) {
res.redirect(url);
}
});

View File

@@ -0,0 +1,3 @@
function endsWith(x, y) {
return x.lastIndexOf(y) === x.length - y.length;
}

View File

@@ -0,0 +1,4 @@
function endsWith(x, y) {
let index = x.lastIndexOf(y);
return index !== -1 && index === x.length - y.length;
}

View File

@@ -1,5 +1,5 @@
/**
* @name Client side cross-site scripting
* @name Client-side cross-site scripting
* @description Writing user input directly to the DOM allows for
* a cross-site scripting vulnerability.
* @kind path-problem

View File

@@ -0,0 +1,77 @@
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>
<overview>
<p>
Escaping meta-characters in untrusted input is an important technique for preventing injection
attacks such as cross-site scripting. One particular example of this is HTML entity encoding,
where HTML special characters are replaced by HTML character entities to prevent them from being
interpreted as HTML markup. For example, the less-than character is encoded as <code>&amp;lt;</code>
and the double-quote character as <code>&amp;quot;</code>.
Other examples include backslash-escaping for including untrusted data in string literals and
percent-encoding for URI components.
</p>
<p>
The reverse process of replacing escape sequences with the characters they represent is known as
unescaping.
</p>
<p>
Note that the escape characters themselves (such as ampersand in the case of HTML encoding) play
a special role during escaping and unescaping: they are themselves escaped, but also form part
of the escaped representations of other characters. Hence care must be taken to avoid double escaping
and unescaping: when escaping, the escape character must be escaped first, when unescaping it has
to be unescaped last.
</p>
<p>
If used in the context of sanitization, double unescaping may render the sanitization ineffective.
Even if it is not used in a security-critical context, it may still result in confusing
or garbled output.
</p>
</overview>
<recommendation>
<p>
Use a (well-tested) sanitization library if at all possible. These libraries are much more
likely to handle corner cases correctly than a custom implementation. For URI encoding,
you can use the standard `encodeURIComponent` and `decodeURIComponent` functions.
</p>
<p>
Otherwise, make sure to always escape the escape character first, and unescape it last.
</p>
</recommendation>
<example>
<p>
The following example shows a pair of hand-written HTML encoding and decoding functions:
</p>
<sample src="examples/DoubleEscaping.js" />
<p>
The encoding function correctly handles ampersand before the other characters. For example,
the string <code>me &amp; "you"</code> is encoded as <code>me &amp;amp; &amp;quot;you&amp;quot;</code>,
and the string <code>&quot;</code> is encoded as <code>&amp;quot;</code>.
</p>
<p>
The decoding function, however, incorrectly decodes <code>&amp;amp;</code> into <code>&amp;</code>
before handling the other characters. So while it correctly decodes the first example above,
it decodes the second example (<code>&amp;quot;</code>) to <code>&quot;</code> (a single double quote),
which is not correct.
</p>
<p>
Instead, the decoding function should decode the ampersand last:
</p>
<sample src="examples/DoubleEscapingGood.js" />
</example>
<references>
<li>OWASP Top 10: <a href="https://www.owasp.org/index.php/Top_10-2017_A1-Injection">A1 Injection</a>.</li>
<li>npm: <a href="https://www.npmjs.com/package/html-entities">html-entities</a> package.</li>
<li>npm: <a href="https://www.npmjs.com/package/js-string-escape">js-string-escape</a> package.</li>
</references>
</qhelp>

View File

@@ -0,0 +1,159 @@
/**
* @name Double escaping or unescaping
* @description When escaping special characters using a meta-character like backslash or
* ampersand, the meta-character has to be escaped first to avoid double-escaping,
* and conversely it has to be unescaped last to avoid double-unescaping.
* @kind problem
* @problem.severity warning
* @precision high
* @id js/double-escaping
* @tags correctness
* security
* external/cwe/cwe-116
* external/cwe/cwe-20
*/
import javascript
/**
* Holds if `rl` is a simple constant, which is bound to the result of the predicate.
*
* For example, `/a/g` has string value `"a"` and `/abc/` has string value `"abc"`,
* while `/ab?/` and `/a(?=b)/` do not have a string value.
*
* Flags are ignored, so `/a/i` is still considered to have string value `"a"`,
* even though it also matches `"A"`.
*
* Note the somewhat subtle use of monotonic aggregate semantics, which makes the
* `strictconcat` fail if one of the children of the root is not a constant (legacy
* semantics would simply skip such children).
*/
language[monotonicAggregates]
string getStringValue(RegExpLiteral rl) {
exists (RegExpTerm root | root = rl.getRoot() |
result = root.(RegExpConstant).getValue()
or
result = strictconcat(RegExpTerm ch, int i |
ch = root.(RegExpSequence).getChild(i) |
ch.(RegExpConstant).getValue() order by i
)
)
}
/**
* Gets a predecessor of `nd` that is not an SSA phi node.
*/
DataFlow::Node getASimplePredecessor(DataFlow::Node nd) {
result = nd.getAPredecessor() and
not nd.(DataFlow::SsaDefinitionNode).getSsaVariable().getDefinition() instanceof SsaPhiNode
}
/**
* Holds if `metachar` is a meta-character that is used to escape special characters
* into a form described by regular expression `regex`.
*/
predicate escapingScheme(string metachar, string regex) {
metachar = "&" and regex = "&.*;"
or
metachar = "%" and regex = "%.*"
or
metachar = "\\" and regex = "\\\\.*"
}
/**
* A call to `String.prototype.replace` that replaces all instances of a pattern.
*/
class Replacement extends DataFlow::Node {
RegExpLiteral pattern;
Replacement() {
exists (DataFlow::MethodCallNode mcn | this = mcn |
mcn.getMethodName() = "replace" and
mcn.getArgument(0).asExpr() = pattern and
mcn.getNumArgument() = 2 and
pattern.isGlobal()
)
}
/**
* Holds if this replacement replaces the string `input` with `output`.
*/
predicate replaces(string input, string output) {
exists (DataFlow::MethodCallNode mcn |
mcn = this and
input = getStringValue(pattern) and
output = mcn.getArgument(1).asExpr().getStringValue()
)
}
/**
* Holds if this replacement escapes `char` using `metachar`.
*
* For example, during HTML entity escaping `<` is escaped (to `&lt;`)
* using `&`.
*/
predicate escapes(string char, string metachar) {
exists (string regexp, string repl |
escapingScheme(metachar, regexp) and
replaces(char, repl) and
repl.regexpMatch(regexp)
)
}
/**
* Holds if this replacement unescapes `char` using `metachar`.
*
* For example, during HTML entity unescaping `<` is unescaped (from
* `&lt;`) using `<`.
*/
predicate unescapes(string metachar, string char) {
exists (string regexp, string orig |
escapingScheme(metachar, regexp) and
replaces(orig, char) and
orig.regexpMatch(regexp)
)
}
/**
* Gets the previous replacement in this chain of replacements.
*/
Replacement getPreviousReplacement() {
result = getASimplePredecessor*(this.(DataFlow::MethodCallNode).getReceiver())
}
/**
* Gets an earlier replacement in this chain of replacements that
* performs an escaping.
*/
Replacement getAnEarlierEscaping(string metachar) {
exists (Replacement pred | pred = this.getPreviousReplacement() |
if pred.escapes(_, metachar) then
result = pred
else
result = pred.getAnEarlierEscaping(metachar)
)
}
/**
* Gets an earlier replacement in this chain of replacements that
* performs a unescaping.
*/
Replacement getALaterUnescaping(string metachar) {
exists (Replacement succ | this = succ.getPreviousReplacement() |
if succ.unescapes(metachar, _) then
result = succ
else
result = succ.getALaterUnescaping(metachar)
)
}
}
from Replacement primary, Replacement supplementary, string message, string metachar
where primary.escapes(metachar, _) and
supplementary = primary.getAnEarlierEscaping(metachar) and
message = "may double-escape '" + metachar + "' characters from $@"
or
primary.unescapes(_, metachar) and
supplementary = primary.getALaterUnescaping(metachar) and
message = "may produce '" + metachar + "' characters that are double-unescaped $@"
select primary, "This replacement " + message + ".", supplementary, "here"

View File

@@ -0,0 +1,11 @@
module.exports.encode = function(s) {
return s.replace(/&/g, "&amp;")
.replace(/"/g, "&quot;")
.replace(/'/g, "&apos;");
};
module.exports.decode = function(s) {
return s.replace(/&amp;/g, "&")
.replace(/&quot;/g, "\"")
.replace(/&apos;/g, "'");
};

View File

@@ -0,0 +1,11 @@
module.exports.encode = function(s) {
return s.replace(/&/g, "&amp;")
.replace(/"/g, "&quot;")
.replace(/'/g, "&apos;");
};
module.exports.decode = function(s) {
return s.replace(/&quot;/g, "\"")
.replace(/&apos;/g, "'")
.replace(/&amp;/g, "&");
};