mirror of
https://github.com/github/codeql.git
synced 2025-12-17 17:23:36 +01:00
Merge branch 'main' into python-add-typetrackingnode
This commit is contained in:
108
python/ql/src/Security/CWE-730/PolynomialReDoS.qhelp
Normal file
108
python/ql/src/Security/CWE-730/PolynomialReDoS.qhelp
Normal file
@@ -0,0 +1,108 @@
|
||||
<!DOCTYPE qhelp PUBLIC
|
||||
"-//Semmle//qhelp//EN"
|
||||
"qhelp.dtd">
|
||||
|
||||
<qhelp>
|
||||
|
||||
<include src="ReDoSIntroduction.inc.qhelp" />
|
||||
|
||||
<example>
|
||||
<p>
|
||||
|
||||
Consider this use of a regular expression, which removes
|
||||
all leading and trailing whitespace in a string:
|
||||
|
||||
</p>
|
||||
|
||||
<sample language="python">
|
||||
re.sub(r"^\s+|\s+$", "", text) # BAD
|
||||
</sample>
|
||||
|
||||
<p>
|
||||
|
||||
The sub-expression <code>"\s+$"</code> will match the
|
||||
whitespace characters in <code>text</code> from left to right, but it
|
||||
can start matching anywhere within a whitespace sequence. This is
|
||||
problematic for strings that do <strong>not</strong> end with a whitespace
|
||||
character. Such a string will force the regular expression engine to
|
||||
process each whitespace sequence once per whitespace character in the
|
||||
sequence.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
This ultimately means that the time cost of trimming a
|
||||
string is quadratic in the length of the string. So a string like
|
||||
<code>"a b"</code> will take milliseconds to process, but a similar
|
||||
string with a million spaces instead of just one will take several
|
||||
minutes.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
Avoid this problem by rewriting the regular expression to
|
||||
not contain the ambiguity about when to start matching whitespace
|
||||
sequences. For instance, by using a negative look-behind
|
||||
(<code>^\s+|(?<!\s)\s+$</code>), or just by using the built-in strip
|
||||
method (<code>text.strip()</code>).
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
Note that the sub-expression <code>"^\s+"</code> is
|
||||
<strong>not</strong> problematic as the <code>^</code> anchor restricts
|
||||
when that sub-expression can start matching, and as the regular
|
||||
expression engine matches from left to right.
|
||||
|
||||
</p>
|
||||
|
||||
</example>
|
||||
|
||||
<example>
|
||||
|
||||
<p>
|
||||
|
||||
As a similar, but slightly subtler problem, consider the
|
||||
regular expression that matches lines with numbers, possibly written
|
||||
using scientific notation:
|
||||
</p>
|
||||
|
||||
<sample language="python">
|
||||
^0\.\d+E?\d+$ # BAD
|
||||
</sample>
|
||||
|
||||
<p>
|
||||
|
||||
The problem with this regular expression is in the
|
||||
sub-expression <code>\d+E?\d+</code> because the second
|
||||
<code>\d+</code> can start matching digits anywhere after the first
|
||||
match of the first <code>\d+</code> if there is no <code>E</code> in
|
||||
the input string.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
This is problematic for strings that do <strong>not</strong>
|
||||
end with a digit. Such a string will force the regular expression
|
||||
engine to process each digit sequence once per digit in the sequence,
|
||||
again leading to a quadratic time complexity.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
To make the processing faster, the regular expression
|
||||
should be rewritten such that the two <code>\d+</code> sub-expressions
|
||||
do not have overlapping matches: <code>^0\.\d+(E\d+)?$</code>.
|
||||
|
||||
</p>
|
||||
|
||||
</example>
|
||||
|
||||
<include src="ReDoSReferences.inc.qhelp"/>
|
||||
|
||||
</qhelp>
|
||||
34
python/ql/src/Security/CWE-730/ReDoS.qhelp
Normal file
34
python/ql/src/Security/CWE-730/ReDoS.qhelp
Normal file
@@ -0,0 +1,34 @@
|
||||
<!DOCTYPE qhelp PUBLIC
|
||||
"-//Semmle//qhelp//EN"
|
||||
"qhelp.dtd">
|
||||
|
||||
<qhelp>
|
||||
|
||||
<include src="ReDoSIntroduction.inc.qhelp" />
|
||||
|
||||
<example>
|
||||
<p>
|
||||
Consider this regular expression:
|
||||
</p>
|
||||
<sample language="python">
|
||||
^_(__|.)+_$
|
||||
</sample>
|
||||
<p>
|
||||
Its sub-expression <code>"(__|.)+?"</code> can match the string <code>"__"</code> either by the
|
||||
first alternative <code>"__"</code> to the left of the <code>"|"</code> operator, or by two
|
||||
repetitions of the second alternative <code>"."</code> to the right. Thus, a string consisting
|
||||
of an odd number of underscores followed by some other character will cause the regular
|
||||
expression engine to run for an exponential amount of time before rejecting the input.
|
||||
</p>
|
||||
<p>
|
||||
This problem can be avoided by rewriting the regular expression to remove the ambiguity between
|
||||
the two branches of the alternative inside the repetition:
|
||||
</p>
|
||||
<sample language="python">
|
||||
^_(__|[^_])+_$
|
||||
</sample>
|
||||
</example>
|
||||
|
||||
<include src="ReDoSReferences.inc.qhelp"/>
|
||||
|
||||
</qhelp>
|
||||
54
python/ql/src/Security/CWE-730/ReDoSIntroduction.inc.qhelp
Normal file
54
python/ql/src/Security/CWE-730/ReDoSIntroduction.inc.qhelp
Normal file
@@ -0,0 +1,54 @@
|
||||
<!DOCTYPE qhelp PUBLIC
|
||||
"-//Semmle//qhelp//EN"
|
||||
"qhelp.dtd">
|
||||
<qhelp>
|
||||
<overview>
|
||||
<p>
|
||||
|
||||
Some regular expressions take a long time to match certain
|
||||
input strings to the point where the time it takes to match a string
|
||||
of length <i>n</i> is proportional to <i>n<sup>k</sup></i> or even
|
||||
<i>2<sup>n</sup></i>. Such regular expressions can negatively affect
|
||||
performance, or even allow a malicious user to perform a Denial of
|
||||
Service ("DoS") attack by crafting an expensive input string for the
|
||||
regular expression to match.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
The regular expression engine provided by Python uses a backtracking non-deterministic finite
|
||||
automata to implement regular expression matching. While this approach
|
||||
is space-efficient and allows supporting advanced features like
|
||||
capture groups, it is not time-efficient in general. The worst-case
|
||||
time complexity of such an automaton can be polynomial or even
|
||||
exponential, meaning that for strings of a certain shape, increasing
|
||||
the input length by ten characters may make the automaton about 1000
|
||||
times slower.
|
||||
|
||||
</p>
|
||||
|
||||
<p>
|
||||
|
||||
Typically, a regular expression is affected by this
|
||||
problem if it contains a repetition of the form <code>r*</code> or
|
||||
<code>r+</code> where the sub-expression <code>r</code> is ambiguous
|
||||
in the sense that it can match some string in multiple ways. More
|
||||
information about the precise circumstances can be found in the
|
||||
references.
|
||||
|
||||
</p>
|
||||
</overview>
|
||||
|
||||
<recommendation>
|
||||
|
||||
<p>
|
||||
|
||||
Modify the regular expression to remove the ambiguity, or
|
||||
ensure that the strings matched with the regular expression are short
|
||||
enough that the time-complexity does not matter.
|
||||
|
||||
</p>
|
||||
|
||||
</recommendation>
|
||||
</qhelp>
|
||||
16
python/ql/src/Security/CWE-730/ReDoSReferences.inc.qhelp
Normal file
16
python/ql/src/Security/CWE-730/ReDoSReferences.inc.qhelp
Normal file
@@ -0,0 +1,16 @@
|
||||
<!DOCTYPE qhelp PUBLIC
|
||||
"-//Semmle//qhelp//EN"
|
||||
"qhelp.dtd">
|
||||
<qhelp>
|
||||
<references>
|
||||
<li>
|
||||
OWASP:
|
||||
<a href="https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>.
|
||||
</li>
|
||||
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/ReDoS">ReDoS</a>.</li>
|
||||
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Time_complexity">Time complexity</a>.</li>
|
||||
<li>James Kirrage, Asiri Rathnayake, Hayo Thielecke:
|
||||
<a href="http://www.cs.bham.ac.uk/~hxt/research/reg-exp-sec.pdf">Static Analysis for Regular Expression Denial-of-Service Attack</a>.
|
||||
</li>
|
||||
</references>
|
||||
</qhelp>
|
||||
@@ -76,28 +76,16 @@ private module SensitiveDataModeling {
|
||||
}
|
||||
|
||||
/**
|
||||
* Gets a reference to a string constant that, if used as the key in a lookup,
|
||||
* indicates the presence of sensitive data with `classification`.
|
||||
*/
|
||||
private DataFlow::TypeTrackingNode sensitiveLookupStringConst(
|
||||
DataFlow::TypeTracker t, SensitiveDataClassification classification
|
||||
) {
|
||||
t.start() and
|
||||
nameIndicatesSensitiveData(result.asExpr().(StrConst).getText(), classification)
|
||||
or
|
||||
exists(DataFlow::TypeTracker t2 |
|
||||
result = sensitiveLookupStringConst(t2, classification).track(t2, t)
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* Gets a reference to a string constant that, if used as the key in a lookup,
|
||||
* indicates the presence of sensitive data with `classification`.
|
||||
*
|
||||
* Also see `extraStepForCalls`.
|
||||
* Gets a reference (in local scope) to a string constant that, if used as the key in
|
||||
* a lookup, indicates the presence of sensitive data with `classification`.
|
||||
*/
|
||||
DataFlow::Node sensitiveLookupStringConst(SensitiveDataClassification classification) {
|
||||
sensitiveLookupStringConst(DataFlow::TypeTracker::end(), classification).flowsTo(result)
|
||||
// Note: If this is implemented with type-tracking, we will get cross-talk as
|
||||
// illustrated in python/ql/test/experimental/dataflow/sensitive-data/test.py
|
||||
exists(DataFlow::LocalSourceNode source |
|
||||
nameIndicatesSensitiveData(source.asExpr().(StrConst).getText(), classification) and
|
||||
source.flowsTo(result)
|
||||
)
|
||||
}
|
||||
|
||||
/** A function call that is considered a source of sensitive data. */
|
||||
@@ -118,6 +106,8 @@ private module SensitiveDataModeling {
|
||||
/**
|
||||
* Tracks any modeled source of sensitive data (with any classification),
|
||||
* to limit the scope of `extraStepForCalls`. See it's QLDoc for more context.
|
||||
*
|
||||
* Also see `extraStepForCalls`.
|
||||
*/
|
||||
private DataFlow::TypeTrackingNode possibleSensitiveCallable(DataFlow::TypeTracker t) {
|
||||
t.start() and
|
||||
@@ -129,6 +119,8 @@ private module SensitiveDataModeling {
|
||||
/**
|
||||
* Tracks any modeled source of sensitive data (with any classification),
|
||||
* to limit the scope of `extraStepForCalls`. See it's QLDoc for more context.
|
||||
*
|
||||
* Also see `extraStepForCalls`.
|
||||
*/
|
||||
private DataFlow::Node possibleSensitiveCallable() {
|
||||
possibleSensitiveCallable(DataFlow::TypeTracker::end()).flowsTo(result)
|
||||
|
||||
@@ -17,6 +17,13 @@ abstract class AttrRef extends Node {
|
||||
*/
|
||||
abstract Node getObject();
|
||||
|
||||
/**
|
||||
* Holds if this data flow node accesses attribute named `attrName` on object `object`.
|
||||
*/
|
||||
predicate accesses(Node object, string attrName) {
|
||||
this.getObject() = object and this.getAttributeName() = attrName
|
||||
}
|
||||
|
||||
/**
|
||||
* Gets the expression node that defines the attribute being accessed, if any. This is
|
||||
* usually an identifier or literal.
|
||||
|
||||
@@ -1393,9 +1393,9 @@ private module Stage2 {
|
||||
exists(RetNodeEx ret, Ap ap0, ReturnKindExt kind, int pos |
|
||||
parameterFlow(p, ap, ap0, c, config) and
|
||||
c = ret.getEnclosingCallable() and
|
||||
revFlow(ret, true, apSome(_), ap0, config) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), pragma[only_bind_into](ap0),
|
||||
revFlow(pragma[only_bind_into](ret), true, apSome(_), pragma[only_bind_into](ap0),
|
||||
pragma[only_bind_into](config)) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), ap0, config) and
|
||||
kind = ret.getKind() and
|
||||
p.getPosition() = pos and
|
||||
// we don't expect a parameter to return stored in itself
|
||||
@@ -2087,9 +2087,9 @@ private module Stage3 {
|
||||
exists(RetNodeEx ret, Ap ap0, ReturnKindExt kind, int pos |
|
||||
parameterFlow(p, ap, ap0, c, config) and
|
||||
c = ret.getEnclosingCallable() and
|
||||
revFlow(ret, true, apSome(_), ap0, config) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), pragma[only_bind_into](ap0),
|
||||
revFlow(pragma[only_bind_into](ret), true, apSome(_), pragma[only_bind_into](ap0),
|
||||
pragma[only_bind_into](config)) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), ap0, config) and
|
||||
kind = ret.getKind() and
|
||||
p.getPosition() = pos and
|
||||
// we don't expect a parameter to return stored in itself
|
||||
@@ -2850,9 +2850,9 @@ private module Stage4 {
|
||||
exists(RetNodeEx ret, Ap ap0, ReturnKindExt kind, int pos |
|
||||
parameterFlow(p, ap, ap0, c, config) and
|
||||
c = ret.getEnclosingCallable() and
|
||||
revFlow(ret, true, apSome(_), ap0, config) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), pragma[only_bind_into](ap0),
|
||||
revFlow(pragma[only_bind_into](ret), true, apSome(_), pragma[only_bind_into](ap0),
|
||||
pragma[only_bind_into](config)) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), ap0, config) and
|
||||
kind = ret.getKind() and
|
||||
p.getPosition() = pos and
|
||||
// we don't expect a parameter to return stored in itself
|
||||
|
||||
@@ -1393,9 +1393,9 @@ private module Stage2 {
|
||||
exists(RetNodeEx ret, Ap ap0, ReturnKindExt kind, int pos |
|
||||
parameterFlow(p, ap, ap0, c, config) and
|
||||
c = ret.getEnclosingCallable() and
|
||||
revFlow(ret, true, apSome(_), ap0, config) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), pragma[only_bind_into](ap0),
|
||||
revFlow(pragma[only_bind_into](ret), true, apSome(_), pragma[only_bind_into](ap0),
|
||||
pragma[only_bind_into](config)) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), ap0, config) and
|
||||
kind = ret.getKind() and
|
||||
p.getPosition() = pos and
|
||||
// we don't expect a parameter to return stored in itself
|
||||
@@ -2087,9 +2087,9 @@ private module Stage3 {
|
||||
exists(RetNodeEx ret, Ap ap0, ReturnKindExt kind, int pos |
|
||||
parameterFlow(p, ap, ap0, c, config) and
|
||||
c = ret.getEnclosingCallable() and
|
||||
revFlow(ret, true, apSome(_), ap0, config) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), pragma[only_bind_into](ap0),
|
||||
revFlow(pragma[only_bind_into](ret), true, apSome(_), pragma[only_bind_into](ap0),
|
||||
pragma[only_bind_into](config)) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), ap0, config) and
|
||||
kind = ret.getKind() and
|
||||
p.getPosition() = pos and
|
||||
// we don't expect a parameter to return stored in itself
|
||||
@@ -2850,9 +2850,9 @@ private module Stage4 {
|
||||
exists(RetNodeEx ret, Ap ap0, ReturnKindExt kind, int pos |
|
||||
parameterFlow(p, ap, ap0, c, config) and
|
||||
c = ret.getEnclosingCallable() and
|
||||
revFlow(ret, true, apSome(_), ap0, config) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), pragma[only_bind_into](ap0),
|
||||
revFlow(pragma[only_bind_into](ret), true, apSome(_), pragma[only_bind_into](ap0),
|
||||
pragma[only_bind_into](config)) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), ap0, config) and
|
||||
kind = ret.getKind() and
|
||||
p.getPosition() = pos and
|
||||
// we don't expect a parameter to return stored in itself
|
||||
|
||||
@@ -1393,9 +1393,9 @@ private module Stage2 {
|
||||
exists(RetNodeEx ret, Ap ap0, ReturnKindExt kind, int pos |
|
||||
parameterFlow(p, ap, ap0, c, config) and
|
||||
c = ret.getEnclosingCallable() and
|
||||
revFlow(ret, true, apSome(_), ap0, config) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), pragma[only_bind_into](ap0),
|
||||
revFlow(pragma[only_bind_into](ret), true, apSome(_), pragma[only_bind_into](ap0),
|
||||
pragma[only_bind_into](config)) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), ap0, config) and
|
||||
kind = ret.getKind() and
|
||||
p.getPosition() = pos and
|
||||
// we don't expect a parameter to return stored in itself
|
||||
@@ -2087,9 +2087,9 @@ private module Stage3 {
|
||||
exists(RetNodeEx ret, Ap ap0, ReturnKindExt kind, int pos |
|
||||
parameterFlow(p, ap, ap0, c, config) and
|
||||
c = ret.getEnclosingCallable() and
|
||||
revFlow(ret, true, apSome(_), ap0, config) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), pragma[only_bind_into](ap0),
|
||||
revFlow(pragma[only_bind_into](ret), true, apSome(_), pragma[only_bind_into](ap0),
|
||||
pragma[only_bind_into](config)) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), ap0, config) and
|
||||
kind = ret.getKind() and
|
||||
p.getPosition() = pos and
|
||||
// we don't expect a parameter to return stored in itself
|
||||
@@ -2850,9 +2850,9 @@ private module Stage4 {
|
||||
exists(RetNodeEx ret, Ap ap0, ReturnKindExt kind, int pos |
|
||||
parameterFlow(p, ap, ap0, c, config) and
|
||||
c = ret.getEnclosingCallable() and
|
||||
revFlow(ret, true, apSome(_), ap0, config) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), pragma[only_bind_into](ap0),
|
||||
revFlow(pragma[only_bind_into](ret), true, apSome(_), pragma[only_bind_into](ap0),
|
||||
pragma[only_bind_into](config)) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), ap0, config) and
|
||||
kind = ret.getKind() and
|
||||
p.getPosition() = pos and
|
||||
// we don't expect a parameter to return stored in itself
|
||||
|
||||
@@ -1393,9 +1393,9 @@ private module Stage2 {
|
||||
exists(RetNodeEx ret, Ap ap0, ReturnKindExt kind, int pos |
|
||||
parameterFlow(p, ap, ap0, c, config) and
|
||||
c = ret.getEnclosingCallable() and
|
||||
revFlow(ret, true, apSome(_), ap0, config) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), pragma[only_bind_into](ap0),
|
||||
revFlow(pragma[only_bind_into](ret), true, apSome(_), pragma[only_bind_into](ap0),
|
||||
pragma[only_bind_into](config)) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), ap0, config) and
|
||||
kind = ret.getKind() and
|
||||
p.getPosition() = pos and
|
||||
// we don't expect a parameter to return stored in itself
|
||||
@@ -2087,9 +2087,9 @@ private module Stage3 {
|
||||
exists(RetNodeEx ret, Ap ap0, ReturnKindExt kind, int pos |
|
||||
parameterFlow(p, ap, ap0, c, config) and
|
||||
c = ret.getEnclosingCallable() and
|
||||
revFlow(ret, true, apSome(_), ap0, config) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), pragma[only_bind_into](ap0),
|
||||
revFlow(pragma[only_bind_into](ret), true, apSome(_), pragma[only_bind_into](ap0),
|
||||
pragma[only_bind_into](config)) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), ap0, config) and
|
||||
kind = ret.getKind() and
|
||||
p.getPosition() = pos and
|
||||
// we don't expect a parameter to return stored in itself
|
||||
@@ -2850,9 +2850,9 @@ private module Stage4 {
|
||||
exists(RetNodeEx ret, Ap ap0, ReturnKindExt kind, int pos |
|
||||
parameterFlow(p, ap, ap0, c, config) and
|
||||
c = ret.getEnclosingCallable() and
|
||||
revFlow(ret, true, apSome(_), ap0, config) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), pragma[only_bind_into](ap0),
|
||||
revFlow(pragma[only_bind_into](ret), true, apSome(_), pragma[only_bind_into](ap0),
|
||||
pragma[only_bind_into](config)) and
|
||||
fwdFlow(ret, any(CcCall ccc), apSome(ap), ap0, config) and
|
||||
kind = ret.getKind() and
|
||||
p.getPosition() = pos and
|
||||
// we don't expect a parameter to return stored in itself
|
||||
|
||||
@@ -1,7 +1,8 @@
|
||||
private import python
|
||||
private import semmle.python.dataflow.new.DataFlow
|
||||
private import semmle.python.dataflow.new.internal.DataFlowPrivate
|
||||
private import semmle.python.dataflow.new.internal.DataFlowPrivate as DataFlowPrivate
|
||||
private import semmle.python.dataflow.new.internal.TaintTrackingPublic
|
||||
private import semmle.python.ApiGraphs
|
||||
|
||||
/**
|
||||
* Holds if `node` should be a sanitizer in all global taint flow configurations
|
||||
@@ -82,13 +83,13 @@ predicate subscriptStep(DataFlow::CfgNode nodeFrom, DataFlow::CfgNode nodeTo) {
|
||||
*/
|
||||
predicate stringManipulation(DataFlow::CfgNode nodeFrom, DataFlow::CfgNode nodeTo) {
|
||||
// transforming something tainted into a string will make the string tainted
|
||||
exists(CallNode call | call = nodeTo.getNode() |
|
||||
call.getFunction().(NameNode).getId() in ["str", "bytes", "unicode"] and
|
||||
exists(DataFlow::CallCfgNode call | call = nodeTo |
|
||||
(
|
||||
nodeFrom.getNode() = call.getArg(0)
|
||||
call = API::builtin(["str", "bytes", "unicode"]).getACall()
|
||||
or
|
||||
nodeFrom.getNode() = call.getArgByName("object")
|
||||
)
|
||||
call.getFunction().asCfgNode().(NameNode).getId() in ["str", "bytes", "unicode"]
|
||||
) and
|
||||
nodeFrom in [call.getArg(0), call.getArgByName("object")]
|
||||
)
|
||||
or
|
||||
// String methods. Note that this doesn't recognize `meth = "foo".upper; meth()`
|
||||
@@ -155,39 +156,37 @@ predicate stringManipulation(DataFlow::CfgNode nodeFrom, DataFlow::CfgNode nodeT
|
||||
predicate containerStep(DataFlow::CfgNode nodeFrom, DataFlow::Node nodeTo) {
|
||||
// construction by literal
|
||||
// TODO: Not limiting the content argument here feels like a BIG hack, but we currently get nothing for free :|
|
||||
storeStep(nodeFrom, _, nodeTo)
|
||||
DataFlowPrivate::storeStep(nodeFrom, _, nodeTo)
|
||||
or
|
||||
// constructor call
|
||||
exists(CallNode call | call = nodeTo.asCfgNode() |
|
||||
call.getFunction().(NameNode).getId() in [
|
||||
"list", "set", "frozenset", "dict", "defaultdict", "tuple"
|
||||
] and
|
||||
call.getArg(0) = nodeFrom.getNode()
|
||||
exists(DataFlow::CallCfgNode call | call = nodeTo |
|
||||
call = API::builtin(["list", "set", "frozenset", "dict", "tuple"]).getACall() and
|
||||
call.getArg(0) = nodeFrom
|
||||
// TODO: Properly handle defaultdict/namedtuple
|
||||
)
|
||||
or
|
||||
// functions operating on collections
|
||||
exists(CallNode call | call = nodeTo.asCfgNode() |
|
||||
call.getFunction().(NameNode).getId() in ["sorted", "reversed", "iter", "next"] and
|
||||
call.getArg(0) = nodeFrom.getNode()
|
||||
exists(DataFlow::CallCfgNode call | call = nodeTo |
|
||||
call = API::builtin(["sorted", "reversed", "iter", "next"]).getACall() and
|
||||
call.getArg(0) = nodeFrom
|
||||
)
|
||||
or
|
||||
// methods
|
||||
exists(CallNode call, string name | call = nodeTo.asCfgNode() |
|
||||
name in [
|
||||
exists(DataFlow::MethodCallNode call, string methodName | call = nodeTo |
|
||||
methodName in [
|
||||
// general
|
||||
"copy", "pop",
|
||||
// dict
|
||||
"values", "items", "get", "popitem"
|
||||
] and
|
||||
call.getFunction().(AttrNode).getObject(name) = nodeFrom.asCfgNode()
|
||||
call.calls(nodeFrom, methodName)
|
||||
)
|
||||
or
|
||||
// list.append, set.add
|
||||
exists(CallNode call, string name |
|
||||
name in ["append", "add"] and
|
||||
call.getFunction().(AttrNode).getObject(name) =
|
||||
nodeTo.(DataFlow::PostUpdateNode).getPreUpdateNode().asCfgNode() and
|
||||
call.getArg(0) = nodeFrom.getNode()
|
||||
exists(DataFlow::MethodCallNode call, DataFlow::Node obj |
|
||||
call.calls(obj, ["append", "add"]) and
|
||||
obj = nodeTo.(DataFlow::PostUpdateNode).getPreUpdateNode() and
|
||||
call.getArg(0) = nodeFrom
|
||||
)
|
||||
}
|
||||
|
||||
@@ -195,14 +194,9 @@ predicate containerStep(DataFlow::CfgNode nodeFrom, DataFlow::Node nodeTo) {
|
||||
* Holds if taint can flow from `nodeFrom` to `nodeTo` with a step related to copying.
|
||||
*/
|
||||
predicate copyStep(DataFlow::CfgNode nodeFrom, DataFlow::CfgNode nodeTo) {
|
||||
exists(CallNode call | call = nodeTo.getNode() |
|
||||
// Fully qualified: copy.copy, copy.deepcopy
|
||||
(
|
||||
call.getFunction().(NameNode).getId() in ["copy", "deepcopy"]
|
||||
or
|
||||
call.getFunction().(AttrNode).getObject(["copy", "deepcopy"]).(NameNode).getId() = "copy"
|
||||
) and
|
||||
call.getArg(0) = nodeFrom.getNode()
|
||||
exists(DataFlow::CallCfgNode call | call = nodeTo |
|
||||
call = API::moduleImport("copy").getMember(["copy", "deepcopy"]).getACall() and
|
||||
call.getArg(0) = nodeFrom
|
||||
)
|
||||
}
|
||||
|
||||
|
||||
@@ -51,7 +51,7 @@ module CleartextLogging {
|
||||
}
|
||||
|
||||
/** A piece of data printed, considered as a flow sink. */
|
||||
class PrintedDataAsSink extends Sink, DataFlow::CallCfgNode {
|
||||
class PrintedDataAsSink extends Sink {
|
||||
PrintedDataAsSink() {
|
||||
this = API::builtin("print").getACall().getArg(_)
|
||||
or
|
||||
|
||||
@@ -68,14 +68,14 @@ import ReDoSUtil
|
||||
* Holds if state `s` might be inside a backtracking repetition.
|
||||
*/
|
||||
pragma[noinline]
|
||||
predicate stateInsideBacktracking(State s) {
|
||||
private predicate stateInsideBacktracking(State s) {
|
||||
s.getRepr().getParent*() instanceof MaybeBacktrackingRepetition
|
||||
}
|
||||
|
||||
/**
|
||||
* A infinitely repeating quantifier that might backtrack.
|
||||
*/
|
||||
class MaybeBacktrackingRepetition extends InfiniteRepetitionQuantifier {
|
||||
private class MaybeBacktrackingRepetition extends InfiniteRepetitionQuantifier {
|
||||
MaybeBacktrackingRepetition() {
|
||||
exists(RegExpTerm child |
|
||||
child instanceof RegExpAlt or
|
||||
@@ -89,7 +89,7 @@ class MaybeBacktrackingRepetition extends InfiniteRepetitionQuantifier {
|
||||
/**
|
||||
* A state in the product automaton.
|
||||
*/
|
||||
newtype TStatePair =
|
||||
private newtype TStatePair =
|
||||
/**
|
||||
* We lazily only construct those states that we are actually
|
||||
* going to need: `(q, q)` for every fork state `q`, and any
|
||||
@@ -112,7 +112,7 @@ newtype TStatePair =
|
||||
* Gets a unique number for a `state`.
|
||||
* Is used to create an ordering of states, where states with the same `toString()` will be ordered differently.
|
||||
*/
|
||||
int rankState(State state) {
|
||||
private int rankState(State state) {
|
||||
state =
|
||||
rank[result](State s, Location l |
|
||||
l = s.getRepr().getLocation()
|
||||
@@ -124,7 +124,7 @@ int rankState(State state) {
|
||||
/**
|
||||
* A state in the product automaton.
|
||||
*/
|
||||
class StatePair extends TStatePair {
|
||||
private class StatePair extends TStatePair {
|
||||
State q1;
|
||||
State q2;
|
||||
|
||||
@@ -145,7 +145,7 @@ class StatePair extends TStatePair {
|
||||
*
|
||||
* Used in `statePairDist`
|
||||
*/
|
||||
predicate isStatePair(StatePair p) { any() }
|
||||
private predicate isStatePair(StatePair p) { any() }
|
||||
|
||||
/**
|
||||
* Holds if there are transitions from the components of `q` to the corresponding
|
||||
@@ -153,13 +153,13 @@ predicate isStatePair(StatePair p) { any() }
|
||||
*
|
||||
* Used in `statePairDist`
|
||||
*/
|
||||
predicate delta2(StatePair q, StatePair r) { step(q, _, _, r) }
|
||||
private predicate delta2(StatePair q, StatePair r) { step(q, _, _, r) }
|
||||
|
||||
/**
|
||||
* Gets the minimum length of a path from `q` to `r` in the
|
||||
* product automaton.
|
||||
*/
|
||||
int statePairDist(StatePair q, StatePair r) =
|
||||
private int statePairDist(StatePair q, StatePair r) =
|
||||
shortestDistances(isStatePair/1, delta2/2)(q, r, result)
|
||||
|
||||
/**
|
||||
@@ -172,7 +172,7 @@ int statePairDist(StatePair q, StatePair r) =
|
||||
* expression cannot be vulnerable to ReDoS attacks anyway).
|
||||
*/
|
||||
pragma[noopt]
|
||||
predicate isFork(State q, InputSymbol s1, InputSymbol s2, State r1, State r2) {
|
||||
private predicate isFork(State q, InputSymbol s1, InputSymbol s2, State r1, State r2) {
|
||||
stateInsideBacktracking(q) and
|
||||
exists(State q1, State q2 |
|
||||
q1 = epsilonSucc*(q) and
|
||||
@@ -222,7 +222,7 @@ predicate isFork(State q, InputSymbol s1, InputSymbol s2, State r1, State r2) {
|
||||
* Gets the state pair `(q1, q2)` or `(q2, q1)`; note that only
|
||||
* one or the other is defined.
|
||||
*/
|
||||
StatePair mkStatePair(State q1, State q2) {
|
||||
private StatePair mkStatePair(State q1, State q2) {
|
||||
result = MkStatePair(q1, q2) or result = MkStatePair(q2, q1)
|
||||
}
|
||||
|
||||
@@ -230,7 +230,7 @@ StatePair mkStatePair(State q1, State q2) {
|
||||
* Holds if there are transitions from the components of `q` to the corresponding
|
||||
* components of `r` labelled with `s1` and `s2`, respectively.
|
||||
*/
|
||||
predicate step(StatePair q, InputSymbol s1, InputSymbol s2, StatePair r) {
|
||||
private predicate step(StatePair q, InputSymbol s1, InputSymbol s2, StatePair r) {
|
||||
exists(State r1, State r2 | step(q, s1, s2, r1, r2) and r = mkStatePair(r1, r2))
|
||||
}
|
||||
|
||||
@@ -242,7 +242,7 @@ predicate step(StatePair q, InputSymbol s1, InputSymbol s2, StatePair r) {
|
||||
* inside a repetition that might backtrack.
|
||||
*/
|
||||
pragma[noopt]
|
||||
predicate step(StatePair q, InputSymbol s1, InputSymbol s2, State r1, State r2) {
|
||||
private predicate step(StatePair q, InputSymbol s1, InputSymbol s2, State r1, State r2) {
|
||||
exists(State q1, State q2 | q.getLeft() = q1 and q.getRight() = q2 |
|
||||
deltaClosed(q1, s1, r1) and
|
||||
deltaClosed(q2, s2, r2) and
|
||||
@@ -268,7 +268,7 @@ private newtype TTrace =
|
||||
* A list of pairs of input symbols that describe a path in the product automaton
|
||||
* starting from some fork state.
|
||||
*/
|
||||
class Trace extends TTrace {
|
||||
private class Trace extends TTrace {
|
||||
/** Gets a textual representation of this element. */
|
||||
string toString() {
|
||||
this = Nil() and result = "Nil()"
|
||||
@@ -282,7 +282,7 @@ class Trace extends TTrace {
|
||||
/**
|
||||
* Gets a string corresponding to the trace `t`.
|
||||
*/
|
||||
string concretise(Trace t) {
|
||||
private string concretise(Trace t) {
|
||||
t = Nil() and result = ""
|
||||
or
|
||||
exists(InputSymbol s1, InputSymbol s2, Trace rest | t = Step(s1, s2, rest) |
|
||||
@@ -294,7 +294,7 @@ string concretise(Trace t) {
|
||||
* Holds if `r` is reachable from `(fork, fork)` under input `w`, and there is
|
||||
* a path from `r` back to `(fork, fork)` with `rem` steps.
|
||||
*/
|
||||
predicate isReachableFromFork(State fork, StatePair r, Trace w, int rem) {
|
||||
private predicate isReachableFromFork(State fork, StatePair r, Trace w, int rem) {
|
||||
// base case
|
||||
exists(InputSymbol s1, InputSymbol s2, State q1, State q2 |
|
||||
isFork(fork, s1, s2, q1, q2) and
|
||||
@@ -316,7 +316,7 @@ predicate isReachableFromFork(State fork, StatePair r, Trace w, int rem) {
|
||||
* Gets a state in the product automaton from which `(fork, fork)` is
|
||||
* reachable in zero or more epsilon transitions.
|
||||
*/
|
||||
StatePair getAForkPair(State fork) {
|
||||
private StatePair getAForkPair(State fork) {
|
||||
isFork(fork, _, _, _, _) and
|
||||
result = MkStatePair(epsilonPred*(fork), epsilonPred*(fork))
|
||||
}
|
||||
@@ -324,7 +324,7 @@ StatePair getAForkPair(State fork) {
|
||||
/**
|
||||
* Holds if `fork` is a pumpable fork with word `w`.
|
||||
*/
|
||||
predicate isPumpable(State fork, string w) {
|
||||
private predicate isPumpable(State fork, string w) {
|
||||
exists(StatePair q, Trace t |
|
||||
isReachableFromFork(fork, q, t, _) and
|
||||
q = getAForkPair(fork) and
|
||||
|
||||
Reference in New Issue
Block a user