Merge branch 'main' into promote-xxe

This commit is contained in:
Rasmus Wriedt Larsen
2022-04-20 13:42:02 +02:00
307 changed files with 6388 additions and 16068 deletions

View File

@@ -1,3 +1,26 @@
## 0.0.13
## 0.0.12
### Breaking Changes
* The flow state variants of `isBarrier` and `isAdditionalFlowStep` are no longer exposed in the taint tracking library. The `isSanitizer` and `isAdditionalTaintStep` predicates should be used instead.
### Deprecated APIs
* Many classes/predicates/modules that had upper-case acronyms have been renamed to follow our style-guide.
The old name still exists as a deprecated alias.
* Some modules that started with a lowercase letter have been renamed to follow our style-guide.
The old name still exists as a deprecated alias.
### New Features
* The data flow and taint tracking libraries have been extended with versions of `isBarrierIn`, `isBarrierOut`, and `isBarrierGuard`, respectively `isSanitizerIn`, `isSanitizerOut`, and `isSanitizerGuard`, that support flow states.
### Minor Analysis Improvements
* All deprecated predicates/classes/modules that have been deprecated for over a year have been deleted.
## 0.0.11
### Minor Analysis Improvements

View File

@@ -1,4 +0,0 @@
---
category: minorAnalysis
---
* All deprecated predicates/classes/modules that have been deprecated for over a year have been deleted.

View File

@@ -1,5 +0,0 @@
---
category: deprecated
---
* Many classes/predicates/modules that had upper-case acronyms have been renamed to follow our style-guide.
The old name still exists as a deprecated alias.

View File

@@ -1,5 +0,0 @@
---
category: deprecated
---
* Some modules that started with a lowercase letter have been renamed to follow our style-guide.
The old name still exists as a deprecated alias.

View File

@@ -1,4 +0,0 @@
---
category: feature
---
* The data flow and taint tracking libraries have been extended with versions of `isBarrierIn`, `isBarrierOut`, and `isBarrierGuard`, respectively `isSanitizerIn`, `isSanitizerOut`, and `isSanitizerGuard`, that support flow states.

View File

@@ -1,4 +0,0 @@
---
category: breaking
---
* The flow state variants of `isBarrier` and `isAdditionalFlowStep` are no longer exposed in the taint tracking library. The `isSanitizer` and `isAdditionalTaintStep` predicates should be used instead.

View File

@@ -0,0 +1,20 @@
## 0.0.12
### Breaking Changes
* The flow state variants of `isBarrier` and `isAdditionalFlowStep` are no longer exposed in the taint tracking library. The `isSanitizer` and `isAdditionalTaintStep` predicates should be used instead.
### Deprecated APIs
* Many classes/predicates/modules that had upper-case acronyms have been renamed to follow our style-guide.
The old name still exists as a deprecated alias.
* Some modules that started with a lowercase letter have been renamed to follow our style-guide.
The old name still exists as a deprecated alias.
### New Features
* The data flow and taint tracking libraries have been extended with versions of `isBarrierIn`, `isBarrierOut`, and `isBarrierGuard`, respectively `isSanitizerIn`, `isSanitizerOut`, and `isSanitizerGuard`, that support flow states.
### Minor Analysis Improvements
* All deprecated predicates/classes/modules that have been deprecated for over a year have been deleted.

View File

@@ -0,0 +1 @@
## 0.0.13

View File

@@ -1,2 +1,2 @@
---
lastReleaseVersion: 0.0.11
lastReleaseVersion: 0.0.13

View File

@@ -1,5 +1,5 @@
name: codeql/python-all
version: 0.0.12-dev
version: 0.1.0-dev
groups: python
dbscheme: semmlecode.python.dbscheme
extractor: python

View File

@@ -0,0 +1,558 @@
/**
* INTERNAL: Do not use.
*
* Points-to based call-graph.
*/
private import python
private import DataFlowPublic
private import semmle.python.SpecialMethods
/** A parameter position represented by an integer. */
class ParameterPosition extends int {
ParameterPosition() { exists(any(DataFlowCallable c).getParameter(this)) }
}
/** An argument position represented by an integer. */
class ArgumentPosition extends int {
ArgumentPosition() { exists(any(DataFlowCall c).getArg(this)) }
}
/** Holds if arguments at position `apos` match parameters at position `ppos`. */
pragma[inline]
predicate parameterMatch(ParameterPosition ppos, ArgumentPosition apos) { ppos = apos }
/**
* Computes routing of arguments to parameters
*
* When a call contains more positional arguments than there are positional parameters,
* the extra positional arguments are passed as a tuple to a starred parameter. This is
* achieved by synthesizing a node `TPosOverflowNode(call, callable)`
* that represents the tuple of extra positional arguments. There is a store step from each
* extra positional argument to this node.
*
* CURRENTLY NOT SUPPORTED:
* When a call contains an iterable unpacking argument, such as `func(*args)`, it is expanded into positional arguments.
*
* CURRENTLY NOT SUPPORTED:
* If a call contains an iterable unpacking argument, such as `func(*args)`, and the callee contains a starred argument, any extra
* positional arguments are passed to the starred argument.
*
* When a call contains keyword arguments that do not correspond to keyword parameters, these
* extra keyword arguments are passed as a dictionary to a doubly starred parameter. This is
* achieved by synthesizing a node `TKwOverflowNode(call, callable)`
* that represents the dictionary of extra keyword arguments. There is a store step from each
* extra keyword argument to this node.
*
* When a call contains a dictionary unpacking argument, such as `func(**kwargs)`, with entries corresponding to a keyword parameter,
* the value at such a key is unpacked and passed to the parameter. This is achieved
* by synthesizing an argument node `TKwUnpacked(call, callable, name)` representing the unpacked
* value. This node is used as the argument passed to the matching keyword parameter. There is a read
* step from the dictionary argument to the synthesized argument node.
*
* When a call contains a dictionary unpacking argument, such as `func(**kwargs)`, and the callee contains a doubly starred parameter,
* entries which are not unpacked are passed to the doubly starred parameter. This is achieved by
* adding a dataflow step from the dictionary argument to `TKwOverflowNode(call, callable)` and a
* step to clear content of that node at any unpacked keys.
*
* ## Examples:
* Assume that we have the callable
* ```python
* def f(x, y, *t, **d):
* pass
* ```
* Then the call
* ```python
* f(0, 1, 2, a=3)
* ```
* will be modeled as
* ```python
* f(0, 1, [*t], [**d])
* ```
* where `[` and `]` denotes synthesized nodes, so `[*t]` is the synthesized tuple argument
* `TPosOverflowNode` and `[**d]` is the synthesized dictionary argument `TKwOverflowNode`.
* There will be a store step from `2` to `[*t]` at pos `0` and one from `3` to `[**d]` at key
* `a`.
*
* For the call
* ```python
* f(0, **{"y": 1, "a": 3})
* ```
* no tuple argument is synthesized. It is modeled as
* ```python
* f(0, [y=1], [**d])
* ```
* where `[y=1]` is the synthesized unpacked argument `TKwUnpacked` (with `name` = `y`). There is
* a read step from `**{"y": 1, "a": 3}` to `[y=1]` at key `y` to get the value passed to the parameter
* `y`. There is a dataflow step from `**{"y": 1, "a": 3}` to `[**d]` to transfer the content and
* a clearing of content at key `y` for node `[**d]`, since that value has been unpacked.
*/
module ArgumentPassing {
/**
* Holds if `call` represents a `DataFlowCall` to a `DataFlowCallable` represented by `callable`.
*
* It _may not_ be the case that `call = callable.getACall()`, i.e. if `call` represents a `ClassCall`.
*
* Used to limit the size of predicates.
*/
predicate connects(CallNode call, CallableValue callable) {
exists(DataFlowCall c |
call = c.getNode() and
callable = c.getCallable().getCallableValue()
)
}
/**
* Gets the `n`th parameter of `callable`.
* If the callable has a starred parameter, say `*tuple`, that is matched with `n=-1`.
* If the callable has a doubly starred parameter, say `**dict`, that is matched with `n=-2`.
* Note that, unlike other languages, we do _not_ use -1 for the position of `self` in Python,
* as it is an explicit parameter at position 0.
*/
NameNode getParameter(CallableValue callable, int n) {
// positional parameter
result = callable.getParameter(n)
or
// starred parameter, `*tuple`
exists(Function f |
f = callable.getScope() and
n = -1 and
result = f.getVararg().getAFlowNode()
)
or
// doubly starred parameter, `**dict`
exists(Function f |
f = callable.getScope() and
n = -2 and
result = f.getKwarg().getAFlowNode()
)
}
/**
* A type representing a mapping from argument indices to parameter indices.
* We currently use two mappings: NoShift, the identity, used for ordinary
* function calls, and ShiftOneUp which is used for calls where an extra argument
* is inserted. These include method calls, constructor calls and class calls.
* In these calls, the argument at index `n` is mapped to the parameter at position `n+1`.
*/
newtype TArgParamMapping =
TNoShift() or
TShiftOneUp()
/** A mapping used for parameter passing. */
abstract class ArgParamMapping extends TArgParamMapping {
/** Gets the index of the parameter that corresponds to the argument at index `argN`. */
bindingset[argN]
abstract int getParamN(int argN);
/** Gets a textual representation of this element. */
abstract string toString();
}
/** A mapping that passes argument `n` to parameter `n`. */
class NoShift extends ArgParamMapping, TNoShift {
NoShift() { this = TNoShift() }
override string toString() { result = "NoShift [n -> n]" }
bindingset[argN]
override int getParamN(int argN) { result = argN }
}
/** A mapping that passes argument `n` to parameter `n+1`. */
class ShiftOneUp extends ArgParamMapping, TShiftOneUp {
ShiftOneUp() { this = TShiftOneUp() }
override string toString() { result = "ShiftOneUp [n -> n+1]" }
bindingset[argN]
override int getParamN(int argN) { result = argN + 1 }
}
/**
* Gets the node representing the argument to `call` that is passed to the parameter at
* (zero-based) index `paramN` in `callable`. If this is a positional argument, it must appear
* at an index, `argN`, in `call` wich satisfies `paramN = mapping.getParamN(argN)`.
*
* `mapping` will be the identity for function calls, but not for method- or constructor calls,
* where the first parameter is `self` and the first positional argument is passed to the second positional parameter.
* Similarly for classmethod calls, where the first parameter is `cls`.
*
* NOT SUPPORTED: Keyword-only parameters.
*/
Node getArg(CallNode call, ArgParamMapping mapping, CallableValue callable, int paramN) {
connects(call, callable) and
(
// positional argument
exists(int argN |
paramN = mapping.getParamN(argN) and
result = TCfgNode(call.getArg(argN))
)
or
// keyword argument
// TODO: Since `getArgName` have no results for keyword-only parameters,
// these are currently not supported.
exists(Function f, string argName |
f = callable.getScope() and
f.getArgName(paramN) = argName and
result = TCfgNode(call.getArgByName(unbind_string(argName)))
)
or
// a synthezised argument passed to the starred parameter (at position -1)
callable.getScope().hasVarArg() and
paramN = -1 and
result = TPosOverflowNode(call, callable)
or
// a synthezised argument passed to the doubly starred parameter (at position -2)
callable.getScope().hasKwArg() and
paramN = -2 and
result = TKwOverflowNode(call, callable)
or
// argument unpacked from dict
exists(string name |
call_unpacks(call, mapping, callable, name, paramN) and
result = TKwUnpackedNode(call, callable, name)
)
)
}
/** Currently required in `getArg` in order to prevent a bad join. */
bindingset[result, s]
private string unbind_string(string s) { result <= s and s <= result }
/** Gets the control flow node that is passed as the `n`th overflow positional argument. */
ControlFlowNode getPositionalOverflowArg(CallNode call, CallableValue callable, int n) {
connects(call, callable) and
exists(Function f, int posCount, int argNr |
f = callable.getScope() and
f.hasVarArg() and
posCount = f.getPositionalParameterCount() and
result = call.getArg(argNr) and
argNr >= posCount and
argNr = posCount + n
)
}
/** Gets the control flow node that is passed as the overflow keyword argument with key `key`. */
ControlFlowNode getKeywordOverflowArg(CallNode call, CallableValue callable, string key) {
connects(call, callable) and
exists(Function f |
f = callable.getScope() and
f.hasKwArg() and
not exists(f.getArgByName(key)) and
result = call.getArgByName(key)
)
}
/**
* Holds if `call` unpacks a dictionary argument in order to pass it via `name`.
* It will then be passed to the parameter of `callable` at index `paramN`.
*/
predicate call_unpacks(
CallNode call, ArgParamMapping mapping, CallableValue callable, string name, int paramN
) {
connects(call, callable) and
exists(Function f |
f = callable.getScope() and
not exists(int argN | paramN = mapping.getParamN(argN) | exists(call.getArg(argN))) and // no positional argument available
name = f.getArgName(paramN) and
// not exists(call.getArgByName(name)) and // only matches keyword arguments not preceded by **
// TODO: make the below logic respect control flow splitting (by not going to the AST).
not call.getNode().getANamedArg().(Keyword).getArg() = name and // no keyword argument available
paramN >= 0 and
paramN < f.getPositionalParameterCount() + f.getKeywordOnlyParameterCount() and
exists(call.getNode().getKwargs()) // dict argument available
)
}
}
import ArgumentPassing
/**
* IPA type for DataFlowCallable.
*
* A callable is either a function value, a class value, or a module (for enclosing `ModuleVariableNode`s).
* A module has no calls.
*/
newtype TDataFlowCallable =
TCallableValue(CallableValue callable) {
callable instanceof FunctionValue and
not callable.(FunctionValue).isLambda()
or
callable instanceof ClassValue
} or
TLambda(Function lambda) { lambda.isLambda() } or
TModule(Module m)
/** A callable. */
abstract class DataFlowCallable extends TDataFlowCallable {
/** Gets a textual representation of this element. */
abstract string toString();
/** Gets a call to this callable. */
abstract CallNode getACall();
/** Gets the scope of this callable */
abstract Scope getScope();
/** Gets the specified parameter of this callable */
abstract NameNode getParameter(int n);
/** Gets the name of this callable. */
abstract string getName();
/** Gets a callable value for this callable, if one exists. */
abstract CallableValue getCallableValue();
}
/** A class representing a callable value. */
class DataFlowCallableValue extends DataFlowCallable, TCallableValue {
CallableValue callable;
DataFlowCallableValue() { this = TCallableValue(callable) }
override string toString() { result = callable.toString() }
override CallNode getACall() { result = callable.getACall() }
override Scope getScope() { result = callable.getScope() }
override NameNode getParameter(int n) { result = getParameter(callable, n) }
override string getName() { result = callable.getName() }
override CallableValue getCallableValue() { result = callable }
}
/** A class representing a callable lambda. */
class DataFlowLambda extends DataFlowCallable, TLambda {
Function lambda;
DataFlowLambda() { this = TLambda(lambda) }
override string toString() { result = lambda.toString() }
override CallNode getACall() { result = this.getCallableValue().getACall() }
override Scope getScope() { result = lambda.getEvaluatingScope() }
override NameNode getParameter(int n) { result = getParameter(this.getCallableValue(), n) }
override string getName() { result = "Lambda callable" }
override FunctionValue getCallableValue() {
result.getOrigin().getNode() = lambda.getDefinition()
}
}
/** A class representing the scope in which a `ModuleVariableNode` appears. */
class DataFlowModuleScope extends DataFlowCallable, TModule {
Module mod;
DataFlowModuleScope() { this = TModule(mod) }
override string toString() { result = mod.toString() }
override CallNode getACall() { none() }
override Scope getScope() { result = mod }
override NameNode getParameter(int n) { none() }
override string getName() { result = mod.getName() }
override CallableValue getCallableValue() { none() }
}
/**
* IPA type for DataFlowCall.
*
* Calls corresponding to `CallNode`s are either to callable values or to classes.
* The latter is directed to the callable corresponding to the `__init__` method of the class.
*
* An `__init__` method can also be called directly, so that the callable can be targeted by
* different types of calls. In that case, the parameter mappings will be different,
* as the class call will synthesize an argument node to be mapped to the `self` parameter.
*
* A call corresponding to a special method call is handled by the corresponding `SpecialMethodCallNode`.
*
* TODO: Add `TClassMethodCall` mapping `cls` appropriately.
*/
newtype TDataFlowCall =
TFunctionCall(CallNode call) { call = any(FunctionValue f).getAFunctionCall() } or
/** Bound methods need to make room for the explicit self parameter */
TMethodCall(CallNode call) { call = any(FunctionValue f).getAMethodCall() } or
TClassCall(CallNode call) { call = any(ClassValue c | not c.isAbsent()).getACall() } or
TSpecialCall(SpecialMethodCallNode special)
/** A call. */
abstract class DataFlowCall extends TDataFlowCall {
/** Gets a textual representation of this element. */
abstract string toString();
/** Get the callable to which this call goes. */
abstract DataFlowCallable getCallable();
/**
* Gets the argument to this call that will be sent
* to the `n`th parameter of the callable.
*/
abstract Node getArg(int n);
/** Get the control flow node representing this call. */
abstract ControlFlowNode getNode();
/** Gets the enclosing callable of this call. */
abstract DataFlowCallable getEnclosingCallable();
/** Gets the location of this dataflow call. */
Location getLocation() { result = this.getNode().getLocation() }
}
/**
* A call to a function/lambda.
* This excludes calls to bound methods, classes, and special methods.
* Bound method calls and class calls insert an argument for the explicit
* `self` parameter, and special method calls have special argument passing.
*/
class FunctionCall extends DataFlowCall, TFunctionCall {
CallNode call;
DataFlowCallable callable;
FunctionCall() {
this = TFunctionCall(call) and
call = callable.getACall()
}
override string toString() { result = call.toString() }
override Node getArg(int n) { result = getArg(call, TNoShift(), callable.getCallableValue(), n) }
override ControlFlowNode getNode() { result = call }
override DataFlowCallable getCallable() { result = callable }
override DataFlowCallable getEnclosingCallable() { result.getScope() = call.getNode().getScope() }
}
/**
* Represents a call to a bound method call.
* The node representing the instance is inserted as argument to the `self` parameter.
*/
class MethodCall extends DataFlowCall, TMethodCall {
CallNode call;
FunctionValue bm;
MethodCall() {
this = TMethodCall(call) and
call = bm.getACall()
}
private CallableValue getCallableValue() { result = bm }
override string toString() { result = call.toString() }
override Node getArg(int n) {
n > 0 and result = getArg(call, TShiftOneUp(), this.getCallableValue(), n)
or
n = 0 and result = TCfgNode(call.getFunction().(AttrNode).getObject())
}
override ControlFlowNode getNode() { result = call }
override DataFlowCallable getCallable() { result = TCallableValue(this.getCallableValue()) }
override DataFlowCallable getEnclosingCallable() { result.getScope() = call.getScope() }
}
/**
* Represents a call to a class.
* The pre-update node for the call is inserted as argument to the `self` parameter.
* That makes the call node be the post-update node holding the value of the object
* after the constructor has run.
*/
class ClassCall extends DataFlowCall, TClassCall {
CallNode call;
ClassValue c;
ClassCall() {
this = TClassCall(call) and
call = c.getACall()
}
private CallableValue getCallableValue() { c.getScope().getInitMethod() = result.getScope() }
override string toString() { result = call.toString() }
override Node getArg(int n) {
n > 0 and result = getArg(call, TShiftOneUp(), this.getCallableValue(), n)
or
n = 0 and result = TSyntheticPreUpdateNode(TCfgNode(call))
}
override ControlFlowNode getNode() { result = call }
override DataFlowCallable getCallable() { result = TCallableValue(this.getCallableValue()) }
override DataFlowCallable getEnclosingCallable() { result.getScope() = call.getScope() }
}
/** A call to a special method. */
class SpecialCall extends DataFlowCall, TSpecialCall {
SpecialMethodCallNode special;
SpecialCall() { this = TSpecialCall(special) }
override string toString() { result = special.toString() }
override Node getArg(int n) { result = TCfgNode(special.(SpecialMethod::Potential).getArg(n)) }
override ControlFlowNode getNode() { result = special }
override DataFlowCallable getCallable() {
result = TCallableValue(special.getResolvedSpecialMethod())
}
override DataFlowCallable getEnclosingCallable() {
result.getScope() = special.getNode().getScope()
}
}
/** Gets a viable run-time target for the call `call`. */
DataFlowCallable viableCallable(DataFlowCall call) { result = call.getCallable() }
private newtype TReturnKind = TNormalReturnKind()
/**
* A return kind. A return kind describes how a value can be returned
* from a callable. For Python, this is simply a method return.
*/
class ReturnKind extends TReturnKind {
/** Gets a textual representation of this element. */
string toString() { result = "return" }
}
/** A data flow node that represents a value returned by a callable. */
class ReturnNode extends CfgNode {
Return ret;
// See `TaintTrackingImplementation::returnFlowStep`
ReturnNode() { node = ret.getValue().getAFlowNode() }
/** Gets the kind of this return node. */
ReturnKind getKind() { any() }
}
/** A data flow node that represents the output of a call. */
class OutNode extends CfgNode {
OutNode() { node instanceof CallNode }
}
/**
* Gets a node that can read the value returned from `call` with return kind
* `kind`.
*/
OutNode getAnOutNode(DataFlowCall call, ReturnKind kind) {
call.getNode() = result.getNode() and
kind = TNormalReturnKind()
}

View File

@@ -1158,8 +1158,8 @@ private module Stage2 {
if reducedViableImplInReturn(c, call) then result = TReturn(c, call) else result = ccNone()
}
bindingset[node, cc, config]
private LocalCc getLocalCc(NodeEx node, Cc cc, Configuration config) { any() }
bindingset[node, cc]
private LocalCc getLocalCc(NodeEx node, Cc cc) { any() }
bindingset[node1, state1, config]
bindingset[node2, state2, config]
@@ -1246,7 +1246,7 @@ private module Stage2 {
or
exists(NodeEx mid, FlowState state0, Ap ap0, LocalCc localCc |
fwdFlow(mid, state0, cc, argAp, ap0, config) and
localCc = getLocalCc(mid, cc, config)
localCc = getLocalCc(mid, cc)
|
localStep(mid, state0, node, state, true, _, config, localCc) and
ap = ap0
@@ -1951,8 +1951,8 @@ private module Stage3 {
bindingset[call, c, innercc]
private CcNoCall getCallContextReturn(DataFlowCallable c, DataFlowCall call, Cc innercc) { any() }
bindingset[node, cc, config]
private LocalCc getLocalCc(NodeEx node, Cc cc, Configuration config) { any() }
bindingset[node, cc]
private LocalCc getLocalCc(NodeEx node, Cc cc) { any() }
private predicate localStep(
NodeEx node1, FlowState state1, NodeEx node2, FlowState state2, boolean preservesValue,
@@ -2035,7 +2035,7 @@ private module Stage3 {
or
exists(NodeEx mid, FlowState state0, Ap ap0, LocalCc localCc |
fwdFlow(mid, state0, cc, argAp, ap0, config) and
localCc = getLocalCc(mid, cc, config)
localCc = getLocalCc(mid, cc)
|
localStep(mid, state0, node, state, true, _, config, localCc) and
ap = ap0
@@ -2765,12 +2765,11 @@ private module Stage4 {
if reducedViableImplInReturn(c, call) then result = TReturn(c, call) else result = ccNone()
}
bindingset[node, cc, config]
private LocalCc getLocalCc(NodeEx node, Cc cc, Configuration config) {
bindingset[node, cc]
private LocalCc getLocalCc(NodeEx node, Cc cc) {
result =
getLocalCallContext(pragma[only_bind_into](pragma[only_bind_out](cc)),
node.getEnclosingCallable()) and
exists(config)
node.getEnclosingCallable())
}
private predicate localStep(
@@ -2863,7 +2862,7 @@ private module Stage4 {
or
exists(NodeEx mid, FlowState state0, Ap ap0, LocalCc localCc |
fwdFlow(mid, state0, cc, argAp, ap0, config) and
localCc = getLocalCc(mid, cc, config)
localCc = getLocalCc(mid, cc)
|
localStep(mid, state0, node, state, true, _, config, localCc) and
ap = ap0
@@ -5048,6 +5047,7 @@ private module FlowExploration {
)
}
pragma[nomagic]
private predicate revPartialPathStep(
PartialPathNodeRev mid, NodeEx node, FlowState state, TRevSummaryCtx1 sc1, TRevSummaryCtx2 sc2,
TRevSummaryCtx3 sc3, RevPartialAccessPath ap, Configuration config

View File

@@ -1158,8 +1158,8 @@ private module Stage2 {
if reducedViableImplInReturn(c, call) then result = TReturn(c, call) else result = ccNone()
}
bindingset[node, cc, config]
private LocalCc getLocalCc(NodeEx node, Cc cc, Configuration config) { any() }
bindingset[node, cc]
private LocalCc getLocalCc(NodeEx node, Cc cc) { any() }
bindingset[node1, state1, config]
bindingset[node2, state2, config]
@@ -1246,7 +1246,7 @@ private module Stage2 {
or
exists(NodeEx mid, FlowState state0, Ap ap0, LocalCc localCc |
fwdFlow(mid, state0, cc, argAp, ap0, config) and
localCc = getLocalCc(mid, cc, config)
localCc = getLocalCc(mid, cc)
|
localStep(mid, state0, node, state, true, _, config, localCc) and
ap = ap0
@@ -1951,8 +1951,8 @@ private module Stage3 {
bindingset[call, c, innercc]
private CcNoCall getCallContextReturn(DataFlowCallable c, DataFlowCall call, Cc innercc) { any() }
bindingset[node, cc, config]
private LocalCc getLocalCc(NodeEx node, Cc cc, Configuration config) { any() }
bindingset[node, cc]
private LocalCc getLocalCc(NodeEx node, Cc cc) { any() }
private predicate localStep(
NodeEx node1, FlowState state1, NodeEx node2, FlowState state2, boolean preservesValue,
@@ -2035,7 +2035,7 @@ private module Stage3 {
or
exists(NodeEx mid, FlowState state0, Ap ap0, LocalCc localCc |
fwdFlow(mid, state0, cc, argAp, ap0, config) and
localCc = getLocalCc(mid, cc, config)
localCc = getLocalCc(mid, cc)
|
localStep(mid, state0, node, state, true, _, config, localCc) and
ap = ap0
@@ -2765,12 +2765,11 @@ private module Stage4 {
if reducedViableImplInReturn(c, call) then result = TReturn(c, call) else result = ccNone()
}
bindingset[node, cc, config]
private LocalCc getLocalCc(NodeEx node, Cc cc, Configuration config) {
bindingset[node, cc]
private LocalCc getLocalCc(NodeEx node, Cc cc) {
result =
getLocalCallContext(pragma[only_bind_into](pragma[only_bind_out](cc)),
node.getEnclosingCallable()) and
exists(config)
node.getEnclosingCallable())
}
private predicate localStep(
@@ -2863,7 +2862,7 @@ private module Stage4 {
or
exists(NodeEx mid, FlowState state0, Ap ap0, LocalCc localCc |
fwdFlow(mid, state0, cc, argAp, ap0, config) and
localCc = getLocalCc(mid, cc, config)
localCc = getLocalCc(mid, cc)
|
localStep(mid, state0, node, state, true, _, config, localCc) and
ap = ap0
@@ -5048,6 +5047,7 @@ private module FlowExploration {
)
}
pragma[nomagic]
private predicate revPartialPathStep(
PartialPathNodeRev mid, NodeEx node, FlowState state, TRevSummaryCtx1 sc1, TRevSummaryCtx2 sc2,
TRevSummaryCtx3 sc3, RevPartialAccessPath ap, Configuration config

View File

@@ -1158,8 +1158,8 @@ private module Stage2 {
if reducedViableImplInReturn(c, call) then result = TReturn(c, call) else result = ccNone()
}
bindingset[node, cc, config]
private LocalCc getLocalCc(NodeEx node, Cc cc, Configuration config) { any() }
bindingset[node, cc]
private LocalCc getLocalCc(NodeEx node, Cc cc) { any() }
bindingset[node1, state1, config]
bindingset[node2, state2, config]
@@ -1246,7 +1246,7 @@ private module Stage2 {
or
exists(NodeEx mid, FlowState state0, Ap ap0, LocalCc localCc |
fwdFlow(mid, state0, cc, argAp, ap0, config) and
localCc = getLocalCc(mid, cc, config)
localCc = getLocalCc(mid, cc)
|
localStep(mid, state0, node, state, true, _, config, localCc) and
ap = ap0
@@ -1951,8 +1951,8 @@ private module Stage3 {
bindingset[call, c, innercc]
private CcNoCall getCallContextReturn(DataFlowCallable c, DataFlowCall call, Cc innercc) { any() }
bindingset[node, cc, config]
private LocalCc getLocalCc(NodeEx node, Cc cc, Configuration config) { any() }
bindingset[node, cc]
private LocalCc getLocalCc(NodeEx node, Cc cc) { any() }
private predicate localStep(
NodeEx node1, FlowState state1, NodeEx node2, FlowState state2, boolean preservesValue,
@@ -2035,7 +2035,7 @@ private module Stage3 {
or
exists(NodeEx mid, FlowState state0, Ap ap0, LocalCc localCc |
fwdFlow(mid, state0, cc, argAp, ap0, config) and
localCc = getLocalCc(mid, cc, config)
localCc = getLocalCc(mid, cc)
|
localStep(mid, state0, node, state, true, _, config, localCc) and
ap = ap0
@@ -2765,12 +2765,11 @@ private module Stage4 {
if reducedViableImplInReturn(c, call) then result = TReturn(c, call) else result = ccNone()
}
bindingset[node, cc, config]
private LocalCc getLocalCc(NodeEx node, Cc cc, Configuration config) {
bindingset[node, cc]
private LocalCc getLocalCc(NodeEx node, Cc cc) {
result =
getLocalCallContext(pragma[only_bind_into](pragma[only_bind_out](cc)),
node.getEnclosingCallable()) and
exists(config)
node.getEnclosingCallable())
}
private predicate localStep(
@@ -2863,7 +2862,7 @@ private module Stage4 {
or
exists(NodeEx mid, FlowState state0, Ap ap0, LocalCc localCc |
fwdFlow(mid, state0, cc, argAp, ap0, config) and
localCc = getLocalCc(mid, cc, config)
localCc = getLocalCc(mid, cc)
|
localStep(mid, state0, node, state, true, _, config, localCc) and
ap = ap0
@@ -5048,6 +5047,7 @@ private module FlowExploration {
)
}
pragma[nomagic]
private predicate revPartialPathStep(
PartialPathNodeRev mid, NodeEx node, FlowState state, TRevSummaryCtx1 sc1, TRevSummaryCtx2 sc2,
TRevSummaryCtx3 sc3, RevPartialAccessPath ap, Configuration config

View File

@@ -1158,8 +1158,8 @@ private module Stage2 {
if reducedViableImplInReturn(c, call) then result = TReturn(c, call) else result = ccNone()
}
bindingset[node, cc, config]
private LocalCc getLocalCc(NodeEx node, Cc cc, Configuration config) { any() }
bindingset[node, cc]
private LocalCc getLocalCc(NodeEx node, Cc cc) { any() }
bindingset[node1, state1, config]
bindingset[node2, state2, config]
@@ -1246,7 +1246,7 @@ private module Stage2 {
or
exists(NodeEx mid, FlowState state0, Ap ap0, LocalCc localCc |
fwdFlow(mid, state0, cc, argAp, ap0, config) and
localCc = getLocalCc(mid, cc, config)
localCc = getLocalCc(mid, cc)
|
localStep(mid, state0, node, state, true, _, config, localCc) and
ap = ap0
@@ -1951,8 +1951,8 @@ private module Stage3 {
bindingset[call, c, innercc]
private CcNoCall getCallContextReturn(DataFlowCallable c, DataFlowCall call, Cc innercc) { any() }
bindingset[node, cc, config]
private LocalCc getLocalCc(NodeEx node, Cc cc, Configuration config) { any() }
bindingset[node, cc]
private LocalCc getLocalCc(NodeEx node, Cc cc) { any() }
private predicate localStep(
NodeEx node1, FlowState state1, NodeEx node2, FlowState state2, boolean preservesValue,
@@ -2035,7 +2035,7 @@ private module Stage3 {
or
exists(NodeEx mid, FlowState state0, Ap ap0, LocalCc localCc |
fwdFlow(mid, state0, cc, argAp, ap0, config) and
localCc = getLocalCc(mid, cc, config)
localCc = getLocalCc(mid, cc)
|
localStep(mid, state0, node, state, true, _, config, localCc) and
ap = ap0
@@ -2765,12 +2765,11 @@ private module Stage4 {
if reducedViableImplInReturn(c, call) then result = TReturn(c, call) else result = ccNone()
}
bindingset[node, cc, config]
private LocalCc getLocalCc(NodeEx node, Cc cc, Configuration config) {
bindingset[node, cc]
private LocalCc getLocalCc(NodeEx node, Cc cc) {
result =
getLocalCallContext(pragma[only_bind_into](pragma[only_bind_out](cc)),
node.getEnclosingCallable()) and
exists(config)
node.getEnclosingCallable())
}
private predicate localStep(
@@ -2863,7 +2862,7 @@ private module Stage4 {
or
exists(NodeEx mid, FlowState state0, Ap ap0, LocalCc localCc |
fwdFlow(mid, state0, cc, argAp, ap0, config) and
localCc = getLocalCc(mid, cc, config)
localCc = getLocalCc(mid, cc)
|
localStep(mid, state0, node, state, true, _, config, localCc) and
ap = ap0
@@ -5048,6 +5047,7 @@ private module FlowExploration {
)
}
pragma[nomagic]
private predicate revPartialPathStep(
PartialPathNodeRev mid, NodeEx node, FlowState state, TRevSummaryCtx1 sc1, TRevSummaryCtx2 sc2,
TRevSummaryCtx3 sc3, RevPartialAccessPath ap, Configuration config

View File

@@ -0,0 +1,396 @@
/**
* The unpacking assignment takes the general form
* ```python
* sequence = iterable
* ```
* where `sequence` is either a tuple or a list and it can contain wildcards.
* The iterable can be any iterable, which means that (CodeQL modeling of) content
* will need to change type if it should be transferred from the LHS to the RHS.
*
* Note that (CodeQL modeling of) content does not have to change type on data-flow
* paths _inside_ the LHS, as the different allowed syntaxes here are merely a convenience.
* Consequently, we model all LHS sequences as tuples, which have the more precise content
* model, making flow to the elements more precise. If an element is a starred variable,
* we will have to mutate the content type to be list content.
*
* We may for instance have
* ```python
* (a, b) = ["a", SOURCE] # RHS has content `ListElementContent`
* ```
* Due to the abstraction for list content, we do not know whether `SOURCE`
* ends up in `a` or in `b`, so we want to overapproximate and see it in both.
*
* Using wildcards we may have
* ```python
* (a, *b) = ("a", "b", SOURCE) # RHS has content `TupleElementContent(2)`
* ```
* Since the starred variables are always assigned (Python-)type list, `*b` will be
* `["b", SOURCE]`, and we will again overapproximate and assign it
* content corresponding to anything found in the RHS.
*
* For a precise transfer
* ```python
* (a, b) = ("a", SOURCE) # RHS has content `TupleElementContent(1)`
* ```
* we wish to keep the precision, so only `b` receives the tuple content at index 1.
*
* Finally, `sequence` is actually a pattern and can have a more complicated structure,
* such as
* ```python
* (a, [b, *c]) = ("a", ["b", SOURCE]) # RHS has content `TupleElementContent(1); ListElementContent`
* ```
* where `a` should not receive content, but `b` and `c` should. `c` will be `[SOURCE]` so
* should have the content transferred, while `b` should read it.
*
* To transfer content from RHS to the elements of the LHS in the expression `sequence = iterable`,
* we use two synthetic nodes:
*
* - `TIterableSequence(sequence)` which captures the content-modeling the entire `sequence` will have
* (essentially just a copy of the content-modeling the RHS has)
*
* - `TIterableElement(sequence)` which captures the content-modeling that will be assigned to an element.
* Note that an empty access path means that the value we are tracking flows directly to the element.
*
*
* The `TIterableSequence(sequence)` is at this point superflous but becomes useful when handling recursive
* structures in the LHS, where `sequence` is some internal sequence node. We can have a uniform treatment
* by always having these two synthetic nodes. So we transfer to (or, in the recursive case, read into)
* `TIterableSequence(sequence)`, from which we take a read step to `TIterableElement(sequence)` and then a
* store step to `sequence`.
*
* This allows the unknown content from the RHS to be read into `TIterableElement(sequence)` and tuple content
* to then be stored into `sequence`. If the content is already tuple content, this inderection creates crosstalk
* between indices. Therefore, tuple content is never read into `TIterableElement(sequence)`; it is instead
* transferred directly from `TIterableSequence(sequence)` to `sequence` via a flow step. Such a flow step will
* also transfer other content, but only tuple content is further read from `sequence` into its elements.
*
* The strategy is then via several read-, store-, and flow steps:
* 1. a) [Flow] Content is transferred from `iterable` to `TIterableSequence(sequence)` via a
* flow step. From here, everything happens on the LHS.
*
* b) [Read] If the unpacking happens inside a for as in
* ```python
* for sequence in iterable
* ```
* then content is read from `iterable` to `TIterableSequence(sequence)`.
*
* 2. [Flow] Content is transferred from `TIterableSequence(sequence)` to `sequence` via a
* flow step. (Here only tuple content is relevant.)
*
* 3. [Read] Content is read from `TIterableSequence(sequence)` into `TIterableElement(sequence)`.
* As `sequence` is modeled as a tuple, we will not read tuple content as that would allow
* crosstalk.
*
* 4. [Store] Content is stored from `TIterableElement(sequence)` to `sequence`.
* Content type is `TupleElementContent` with indices taken from the syntax.
* For instance, if `sequence` is `(a, *b, c)`, content is written to index 0, 1, and 2.
* This is adequate as the route through `TIterableElement(sequence)` does not transfer precise content.
*
* 5. [Read] Content is read from `sequence` to its elements.
* a) If the element is a plain variable, the target is the corresponding essa node.
*
* b) If the element is itself a sequence, with control-flow node `seq`, the target is `TIterableSequence(seq)`.
*
* c) If the element is a starred variable, with control-flow node `v`, the target is `TIterableElement(v)`.
*
* 6. [Store] Content is stored from `TIterableElement(v)` to the essa variable for `v`, with
* content type `ListElementContent`.
*
* 7. [Flow, Read, Store] Steps 2 through 7 are repeated for all recursive elements which are sequences.
*
*
* We illustrate the above steps on the assignment
*
* ```python
* (a, b) = ["a", SOURCE]
* ```
*
* Looking at the content propagation to `a`:
* `["a", SOURCE]`: [ListElementContent]
*
* --Step 1a-->
*
* `TIterableSequence((a, b))`: [ListElementContent]
*
* --Step 3-->
*
* `TIterableElement((a, b))`: []
*
* --Step 4-->
*
* `(a, b)`: [TupleElementContent(0)]
*
* --Step 5a-->
*
* `a`: []
*
* Meaning there is data-flow from the RHS to `a` (an over approximation). The same logic would be applied to show there is data-flow to `b`. Note that _Step 3_ and _Step 4_ would not have been needed if the RHS had been a tuple (since that would have been able to use _Step 2_ instead).
*
* Another, more complicated example:
* ```python
* (a, [b, *c]) = ["a", [SOURCE]]
* ```
* where the path to `c` is
*
* `["a", [SOURCE]]`: [ListElementContent; ListElementContent]
*
* --Step 1a-->
*
* `TIterableSequence((a, [b, *c]))`: [ListElementContent; ListElementContent]
*
* --Step 3-->
*
* `TIterableElement((a, [b, *c]))`: [ListElementContent]
*
* --Step 4-->
*
* `(a, [b, *c])`: [TupleElementContent(1); ListElementContent]
*
* --Step 5b-->
*
* `TIterableSequence([b, *c])`: [ListElementContent]
*
* --Step 3-->
*
* `TIterableElement([b, *c])`: []
*
* --Step 4-->
*
* `[b, *c]`: [TupleElementContent(1)]
*
* --Step 5c-->
*
* `TIterableElement(c)`: []
*
* --Step 6-->
*
* `c`: [ListElementContent]
*/
private import python
private import DataFlowPublic
/**
* The target of a `for`, e.g. `x` in `for x in list` or in `[42 for x in list]`.
* This class also records the source, which in both above cases is `list`.
* This class abstracts away the differing representations of comprehensions and
* for statements.
*/
class ForTarget extends ControlFlowNode {
Expr source;
ForTarget() {
exists(For for |
source = for.getIter() and
this.getNode() = for.getTarget() and
not for = any(Comp comp).getNthInnerLoop(0)
)
or
exists(Comp comp |
source = comp.getIterable() and
this.getNode() = comp.getNthInnerLoop(0).getTarget()
)
}
Expr getSource() { result = source }
}
/** The LHS of an assignment, it also records the assigned value. */
class AssignmentTarget extends ControlFlowNode {
Expr value;
AssignmentTarget() {
exists(Assign assign | this.getNode() = assign.getATarget() | value = assign.getValue())
}
Expr getValue() { result = value }
}
/** A direct (or top-level) target of an unpacking assignment. */
class UnpackingAssignmentDirectTarget extends ControlFlowNode {
Expr value;
UnpackingAssignmentDirectTarget() {
this instanceof SequenceNode and
(
value = this.(AssignmentTarget).getValue()
or
value = this.(ForTarget).getSource()
)
}
Expr getValue() { result = value }
}
/** A (possibly recursive) target of an unpacking assignment. */
class UnpackingAssignmentTarget extends ControlFlowNode {
UnpackingAssignmentTarget() {
this instanceof UnpackingAssignmentDirectTarget
or
this = any(UnpackingAssignmentSequenceTarget parent).getAnElement()
}
}
/** A (possibly recursive) target of an unpacking assignment which is also a sequence. */
class UnpackingAssignmentSequenceTarget extends UnpackingAssignmentTarget instanceof SequenceNode {
ControlFlowNode getElement(int i) { result = super.getElement(i) }
ControlFlowNode getAnElement() { result = this.getElement(_) }
}
/**
* Step 1a
* Data flows from `iterable` to `TIterableSequence(sequence)`
*/
predicate iterableUnpackingAssignmentFlowStep(Node nodeFrom, Node nodeTo) {
exists(AssignmentTarget target |
nodeFrom.asExpr() = target.getValue() and
nodeTo = TIterableSequenceNode(target)
)
}
/**
* Step 1b
* Data is read from `iterable` to `TIterableSequence(sequence)`
*/
predicate iterableUnpackingForReadStep(CfgNode nodeFrom, Content c, Node nodeTo) {
exists(ForTarget target |
nodeFrom.asExpr() = target.getSource() and
target instanceof SequenceNode and
nodeTo = TIterableSequenceNode(target)
) and
(
c instanceof ListElementContent
or
c instanceof SetElementContent
)
}
/**
* Step 2
* Data flows from `TIterableSequence(sequence)` to `sequence`
*/
predicate iterableUnpackingTupleFlowStep(Node nodeFrom, Node nodeTo) {
exists(UnpackingAssignmentSequenceTarget target |
nodeFrom = TIterableSequenceNode(target) and
nodeTo.asCfgNode() = target
)
}
/**
* Step 3
* Data flows from `TIterableSequence(sequence)` into `TIterableElement(sequence)`.
* As `sequence` is modeled as a tuple, we will not read tuple content as that would allow
* crosstalk.
*/
predicate iterableUnpackingConvertingReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(UnpackingAssignmentSequenceTarget target |
nodeFrom = TIterableSequenceNode(target) and
nodeTo = TIterableElementNode(target) and
(
c instanceof ListElementContent
or
c instanceof SetElementContent
// TODO: dict content in iterable unpacking not handled
)
)
}
/**
* Step 4
* Data flows from `TIterableElement(sequence)` to `sequence`.
* Content type is `TupleElementContent` with indices taken from the syntax.
* For instance, if `sequence` is `(a, *b, c)`, content is written to index 0, 1, and 2.
*/
predicate iterableUnpackingConvertingStoreStep(Node nodeFrom, Content c, Node nodeTo) {
exists(UnpackingAssignmentSequenceTarget target |
nodeFrom = TIterableElementNode(target) and
nodeTo.asCfgNode() = target and
exists(int index | exists(target.getElement(index)) |
c.(TupleElementContent).getIndex() = index
)
)
}
/**
* Step 5
* For a sequence node inside an iterable unpacking, data flows from the sequence to its elements. There are
* three cases for what `toNode` should be:
* a) If the element is a plain variable, `toNode` is the corresponding essa node.
*
* b) If the element is itself a sequence, with control-flow node `seq`, `toNode` is `TIterableSequence(seq)`.
*
* c) If the element is a starred variable, with control-flow node `v`, `toNode` is `TIterableElement(v)`.
*/
predicate iterableUnpackingElementReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(
UnpackingAssignmentSequenceTarget target, int index, ControlFlowNode element, int starIndex
|
target.getElement(starIndex) instanceof StarredNode
or
not exists(target.getAnElement().(StarredNode)) and
starIndex = -1
|
nodeFrom.asCfgNode() = target and
element = target.getElement(index) and
(
if starIndex = -1 or index < starIndex
then c.(TupleElementContent).getIndex() = index
else
// This could get big if big tuples exist
if index = starIndex
then c.(TupleElementContent).getIndex() >= index
else c.(TupleElementContent).getIndex() >= index - 1
) and
(
if element instanceof SequenceNode
then
// Step 5b
nodeTo = TIterableSequenceNode(element)
else
if element instanceof StarredNode
then
// Step 5c
nodeTo = TIterableElementNode(element)
else
// Step 5a
nodeTo.asVar().getDefinition().(MultiAssignmentDefinition).getDefiningNode() = element
)
)
}
/**
* Step 6
* Data flows from `TIterableElement(v)` to the essa variable for `v`, with
* content type `ListElementContent`.
*/
predicate iterableUnpackingStarredElementStoreStep(Node nodeFrom, Content c, Node nodeTo) {
exists(ControlFlowNode starred | starred.getNode() instanceof Starred |
nodeFrom = TIterableElementNode(starred) and
nodeTo.asVar().getDefinition().(MultiAssignmentDefinition).getDefiningNode() = starred and
c instanceof ListElementContent
)
}
/** All read steps associated with unpacking assignment. */
predicate iterableUnpackingReadStep(Node nodeFrom, Content c, Node nodeTo) {
iterableUnpackingForReadStep(nodeFrom, c, nodeTo)
or
iterableUnpackingElementReadStep(nodeFrom, c, nodeTo)
or
iterableUnpackingConvertingReadStep(nodeFrom, c, nodeTo)
}
/** All store steps associated with unpacking assignment. */
predicate iterableUnpackingStoreStep(Node nodeFrom, Content c, Node nodeTo) {
iterableUnpackingStarredElementStoreStep(nodeFrom, c, nodeTo)
or
iterableUnpackingConvertingStoreStep(nodeFrom, c, nodeTo)
}
/** All flow steps associated with unpacking assignment. */
predicate iterableUnpackingFlowStep(Node nodeFrom, Node nodeTo) {
iterableUnpackingAssignmentFlowStep(nodeFrom, nodeTo)
or
iterableUnpackingTupleFlowStep(nodeFrom, nodeTo)
}

View File

@@ -0,0 +1,311 @@
/**
* There are a number of patterns available for the match statement.
* Each one transfers data and content differently to its parts.
*
* Furthermore, given a successful match, we can infer some data about
* the subject. Consider the example:
* ```python
* match choice:
* case 'Y':
* ...body
* ```
* Inside `body`, we know that `choice` has the value `'Y'`.
*
* A similar thing happens with the "as pattern". Consider the example:
* ```python
* match choice:
* case ('y'|'Y') as c:
* ...body
* ```
* By the binding rules, there is data flow from `choice` to `c`. But we
* can infer the value of `c` to be either `'y'` or `'Y'` if the match succeeds.
*
* We will treat such inferences separately as guards. First we will model the data flow
* stemming from the bindings and the matching of shape. Below, 'subject' is not necessarily the
* top-level subject of the match, but rather the part recursively matched by the current pattern.
* For instance, in the example:
* ```python
* match command:
* case ('quit' as c) | ('go', ('up'|'down') as c):
* ...body
* ```
* `command` is the subject of first the as-pattern, while the second component of `command`
* is the subject of the second as-pattern. As such, 'subject' refers to the pattern under evaluation.
*
* - as pattern: subject flows to alias as well as to the interior pattern
* - or pattern: subject flows to each alternative
* - literal pattern: flow from the literal to the pattern, to add information
* - capture pattern: subject flows to the variable
* - wildcard pattern: no flow
* - value pattern: flow from the value to the pattern, to add information
* - sequence pattern: each element reads from subject at the associated index
* - star pattern: subject flows to the variable, possibly via a conversion
* - mapping pattern: each value reads from subject at the associated key
* - double star pattern: subject flows to the variable, possibly via a conversion
* - key-value pattern: the value reads from the subject at the key (see mapping pattern)
* - class pattern: all keywords read the appropriate attribute from the subject
* - keyword pattern: the appropriate attribute is read from the subject (see class pattern)
*
* Inside the class pattern, we also find positional arguments. They are converted to
* keyword arguments using the `__match_args__` attribute on the class. We do not
* currently model this.
*/
private import python
private import DataFlowPublic
/**
* Holds when there is flow from the subject `nodeFrom` to the (top-level) pattern `nodeTo` of a `match` statement.
*
* The subject of a match flows to each top-level pattern
* (a pattern directly under a `case` statement).
*
* We could consider a model closer to use-use-flow, where the subject
* only flows to the first top-level pattern and from there to the
* following ones.
*/
predicate matchSubjectFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchStmt match, Expr subject, Pattern target |
subject = match.getSubject() and
target = match.getCase(_).(Case).getPattern()
|
nodeFrom.asExpr() = subject and
nodeTo.asCfgNode().getNode() = target
)
}
/**
* as pattern: subject flows to alias as well as to the interior pattern
* syntax (toplevel): `case pattern as alias:`
*/
predicate matchAsFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchAsPattern subject, Name alias | alias = subject.getAlias() |
// We make the subject flow to the interior pattern via the alias.
// That way, information can propagate from the interior pattern to the alias.
//
// the subject flows to the interior pattern
nodeFrom.asCfgNode().getNode() = subject and
nodeTo.asCfgNode().getNode() = subject.getPattern()
or
// the interior pattern flows to the alias
nodeFrom.asCfgNode().getNode() = subject.getPattern() and
nodeTo.asVar().getDefinition().(PatternAliasDefinition).getDefiningNode().getNode() = alias
)
}
/**
* or pattern: subject flows to each alternative
* syntax (toplevel): `case alt1 | alt2:`
*/
predicate matchOrFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchOrPattern subject, Pattern pattern | pattern = subject.getAPattern() |
nodeFrom.asCfgNode().getNode() = subject and
nodeTo.asCfgNode().getNode() = pattern
)
}
/**
* literal pattern: flow from the literal to the pattern, to add information
* syntax (toplevel): `case literal:`
*/
predicate matchLiteralFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchLiteralPattern pattern, Expr literal | literal = pattern.getLiteral() |
nodeFrom.asExpr() = literal and
nodeTo.asCfgNode().getNode() = pattern
)
}
/**
* capture pattern: subject flows to the variable
* syntax (toplevel): `case var:`
*/
predicate matchCaptureFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchCapturePattern capture, Name var | capture.getVariable() = var |
nodeFrom.asCfgNode().getNode() = capture and
nodeTo.asVar().getDefinition().(PatternCaptureDefinition).getDefiningNode().getNode() = var
)
}
/**
* value pattern: flow from the value to the pattern, to add information
* syntax (toplevel): `case Dotted.value:`
*/
predicate matchValueFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchValuePattern pattern, Expr value | value = pattern.getValue() |
nodeFrom.asExpr() = value and
nodeTo.asCfgNode().getNode() = pattern
)
}
/**
* sequence pattern: each element reads from subject at the associated index
* syntax (toplevel): `case [a, b]:`
*/
predicate matchSequenceReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(MatchSequencePattern subject, int index, Pattern element |
element = subject.getPattern(index)
|
nodeFrom.asCfgNode().getNode() = subject and
nodeTo.asCfgNode().getNode() = element and
(
// tuple content
c.(TupleElementContent).getIndex() = index
or
// list content
c instanceof ListElementContent
// set content is excluded from sequence patterns,
// see https://www.python.org/dev/peps/pep-0635/#sequence-patterns
)
)
}
/**
* star pattern: subject flows to the variable, possibly via a conversion
* syntax (toplevel): `case *var:`
*
* We decompose this flow into a read step and a store step. The read step
* reads both tuple and list content, the store step only stores list content.
* This way, we convert all content to list content.
*
* This is the read step.
*/
predicate matchStarReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(MatchSequencePattern subject, int index, MatchStarPattern star |
star = subject.getPattern(index)
|
nodeFrom.asCfgNode().getNode() = subject and
nodeTo = TStarPatternElementNode(star) and
(
// tuple content
c.(TupleElementContent).getIndex() >= index
or
// list content
c instanceof ListElementContent
// set content is excluded from sequence patterns,
// see https://www.python.org/dev/peps/pep-0635/#sequence-patterns
)
)
}
/**
* star pattern: subject flows to the variable, possibly via a conversion
* syntax (toplevel): `case *var:`
*
* We decompose this flow into a read step and a store step. The read step
* reads both tuple and list content, the store step only stores list content.
* This way, we convert all content to list content.
*
* This is the store step.
*/
predicate matchStarStoreStep(Node nodeFrom, Content c, Node nodeTo) {
exists(MatchStarPattern star |
nodeFrom = TStarPatternElementNode(star) and
nodeTo.asCfgNode().getNode() = star.getTarget() and
c instanceof ListElementContent
)
}
/**
* mapping pattern: each value reads from subject at the associated key
* syntax (toplevel): `case {"color": c, "height": x}:`
*/
predicate matchMappingReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(
MatchMappingPattern subject, MatchKeyValuePattern keyValue, MatchLiteralPattern key,
Pattern value
|
keyValue = subject.getAMapping() and
key = keyValue.getKey() and
value = keyValue.getValue()
|
nodeFrom.asCfgNode().getNode() = subject and
nodeTo.asCfgNode().getNode() = value and
c.(DictionaryElementContent).getKey() = key.getLiteral().(StrConst).getText()
)
}
/**
* double star pattern: subject flows to the variable, possibly via a conversion
* syntax (toplevel): `case {**var}:`
*
* Dictionary content flows to the double star, but all mentioned keys in the
* mapping pattern should be cleared.
*/
predicate matchMappingFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchMappingPattern subject, MatchDoubleStarPattern dstar | dstar = subject.getAMapping() |
nodeFrom.asCfgNode().getNode() = subject and
nodeTo.asCfgNode().getNode() = dstar.getTarget()
)
}
/**
* Bindings that are mentioned in a mapping pattern will not be available
* to a double star pattern in the same mapping pattern.
*/
predicate matchMappingClearStep(Node n, Content c) {
exists(
MatchMappingPattern subject, MatchKeyValuePattern keyValue, MatchLiteralPattern key,
MatchDoubleStarPattern dstar
|
keyValue = subject.getAMapping() and
key = keyValue.getKey() and
dstar = subject.getAMapping()
|
n.asCfgNode().getNode() = dstar.getTarget() and
c.(DictionaryElementContent).getKey() = key.getLiteral().(StrConst).getText()
)
}
/**
* class pattern: all keywords read the appropriate attribute from the subject
* syntax (toplevel): `case ClassName(attr = val):`
*/
predicate matchClassReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(MatchClassPattern subject, MatchKeywordPattern keyword, Name attr, Pattern value |
keyword = subject.getKeyword(_) and
attr = keyword.getAttribute() and
value = keyword.getValue()
|
nodeFrom.asCfgNode().getNode() = subject and
nodeTo.asCfgNode().getNode() = value and
c.(AttributeContent).getAttribute() = attr.getId()
)
}
/** All flow steps associated with match. */
predicate matchFlowStep(Node nodeFrom, Node nodeTo) {
matchSubjectFlowStep(nodeFrom, nodeTo)
or
matchAsFlowStep(nodeFrom, nodeTo)
or
matchOrFlowStep(nodeFrom, nodeTo)
or
matchLiteralFlowStep(nodeFrom, nodeTo)
or
matchCaptureFlowStep(nodeFrom, nodeTo)
or
matchValueFlowStep(nodeFrom, nodeTo)
or
matchMappingFlowStep(nodeFrom, nodeTo)
}
/** All read steps associated with match. */
predicate matchReadStep(Node nodeFrom, Content c, Node nodeTo) {
matchClassReadStep(nodeFrom, c, nodeTo)
or
matchSequenceReadStep(nodeFrom, c, nodeTo)
or
matchMappingReadStep(nodeFrom, c, nodeTo)
or
matchStarReadStep(nodeFrom, c, nodeTo)
}
/** All store steps associated with match. */
predicate matchStoreStep(Node nodeFrom, Content c, Node nodeTo) {
matchStarStoreStep(nodeFrom, c, nodeTo)
}
/**
* All clear steps associated with match
*/
predicate matchClearStep(Node n, Content c) { matchMappingClearStep(n, c) }

View File

@@ -1,138 +0,0 @@
# Using the shared dataflow library
## File organisation
The files currently live in `experimental` (whereas the existing implementation lives in `semmle\python\dataflow`).
In there is found `DataFlow.qll`, `DataFlow2.qll` etc. which refer to `internal\DataFlowImpl`, `internal\DataFlowImpl2` etc. respectively. The `DataFlowImplN`-files are all identical copies to avoid mutual recursion. They start off by including two files `internal\DataFlowImplCommon` and `internal\DataFlowImplSpecific`. The former contains all the language-agnostic definitions, while the latter is where we describe our favorite language. `Sepcific` simply forwards to two other files `internal\DataFlowPrivate.qll` and `internal\DataFlowPublic.qll`. Definitions in the former will be hidden behind a `private` modifier, while those in the latter can be referred to in data flow queries. For instance, the definition of `DataFlow::Node` should likely be in `DataFlowPublic.qll`.
## Define the dataflow graph
In order to use the dataflow library, we need to define the dataflow graph,
that is define the nodes and the edges.
### Define the nodes
The nodes are defined in the type `DataFlow::Node` (found in `DataFlowPublic.qll`).
This should likely be an IPA type, so we can extend it as needed.
Typical cases needed to construct the call graph include
- argument node
- parameter node
- return node
Typical extensions include
- postupdate nodes
- implicit `this`-nodes
### Define the edges
The edges split into local flow (within a function) and global flow (the call graph, between functions/procedures).
Extra flow, such as reading from and writing to global variables, can be captured in `jumpStep`.
The local flow should be obtainalble from an SSA computation.
Local flow nodes are generally either control flow nodes or SSA variables.
Flow from control flow nodes to SSA variables comes from SSA variable definitions, while flow from SSA variables to control flow nodes comes from def-use pairs.
The global flow should be obtainable from a `PointsTo` analysis. It is specified via `viableCallable` and
`getAnOutNode`. Consider making `ReturnKind` a singleton IPA type as in java.
Global flow includes local flow within a consistent call context. Thus, for local flow to count as global flow, all relevant nodes should implement `getEnclosingCallable`.
If complicated dispatch needs to be modelled, try using the `[reduced|pruned]viable*` predicates.
## Field flow
To track flow through fields we need to provide a model of fields, that is the `Content` class.
Field access is specified via `read_step` and `store_step`.
Work is being done to make field flow handle lists and dictionaries and the like.
`PostUpdateNode`s become important when field flow is used, as they track modifications to fields resulting from function calls.
## Type pruning
If type information is available, flows can be discarded on the grounds of type mismatch.
Tracked types are given by the class `DataFlowType` and the predicate `getTypeBound`, and compatibility is recorded in the predicate `compatibleTypes`.
If type pruning is not used, `compatibleTypes` should be implemented as `any`; if it is implemented, say, as `none`, all flows will be pruned.
Further, possible casts are given by the class `CastNode`.
---
# Plan
## Stage I, data flow
### Phase 0, setup
Define minimal IPA type for `DataFlow::Node`
Define all required predicates empty (via `none()`),
except `compatibleTypes` which should be `any()`.
Define `ReturnKind`, `DataFlowType`, and `Content` as singleton IPA types.
### Phase 1, local flow
Implement `simpleLocalFlowStep` based on the existing SSA computation
### Phase 2, local flow
Implement `viableCallable` and `getAnOutNode` based on the existing predicate `PointsTo`.
### Phase 3, field flow
Redefine `Content` and implement `read_step` and `store_step`.
Review use of post-update nodes.
### Phase 4, type pruning
Use type trackers to obtain relevant type information and redefine `DataFlowType` to contain appropriate cases. Record the type information in `getTypeBound`.
Implement `compatibleTypes` (perhaps simply as the identity).
If necessary, re-implement `getErasedRepr` and `ppReprType`.
If necessary, redefine `CastNode`.
### Phase 5, bonus
Review possible use of `[reduced|pruned]viable*` predicates.
Review need for more elaborate `ReturnKind`.
Review need for non-empty `jumpStep`.
Review need for non-empty `isUnreachableInCall`.
## Stage II, taint tracking
# Phase 0, setup
Implement all predicates empty.
# Phase 1, experiments
Try recovering an existing taint tracking query by implementing sources, sinks, sanitizers, and barriers.
---
# Status
## Achieved
- Copy of shared library; implemented enough predicates to make it compile.
- Simple flow into, out of, and through functions.
- Some tests, in particular a sceleton for something comprehensive.
## TODO
- Implementation has largely been done by finding a plausibly-sounding predicate in the python library to refer to. We should review that we actually have the intended semantics in all places.
- Comprehensive testing.
- The regression tests track the value of guards in order to eliminate impossible data flow. We currently have regressions because of this. We cannot readily replicate the existing method, as it uses the interdefinedness of data flow and taint tracking (there is a boolean taint kind). C++ [does something similar](https://github.com/github/codeql/blob/master/cpp/ql/src/semmle/code/cpp/controlflow/internal/ConstantExprs.qll#L27-L36) for eliminating impossible control flow, which we might be able to replicate (they infer values of "interesting" control flow nodes, which are those needed to determine values of guards).
- Flow for some syntactic constructs are done via extra taint steps in the existing implementation, we should find a way to get data flow for it. Some of this should be covered by field flow.
- A document is being written about proper use of the shared data flow library, this should be adhered to. In particular, we should consider replacing def-use with def-to-first-use and use-to-next-use in local flow.
- We seem to get duplicated results for global flow, as well as flow with and without type (so four times the "unique" results).
- We currently consider control flow nodes like exit nodes for functions, we should probably filter down which ones are of interest.
- We should probably override ToString for a number of data flow nodes.
- Test flow through classes, constructors and methods.
- What happens with named arguments? What does C# do?
- What should the enclosable callable for global variables be? C++ [makes it the variable itself](https://github.com/github/codeql/blob/master/cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll#L417), C# seems to not have nodes for these but only for their reads and writes.
- Is `yield` another return type? If not, how is it handled?
- Should `OutNode` include magic function calls?
- Consider creating an internal abstract class for nodes as C# does. Among other things, this can help the optimizer by stating that `getEnclosingCallable` [is functional](https://github.com/github/codeql/blob/master/csharp/ql/src/semmle/code/csharp/dataflow/internal/DataFlowPublic.qll#L62).

View File

@@ -243,7 +243,9 @@ module AiohttpWebModel {
/** A class that has a super-type which is an aiohttp.web View class. */
class AiohttpViewClassFromSuperClass extends AiohttpViewClass {
AiohttpViewClassFromSuperClass() { this.getABase() = View::subclassRef().getAUse().asExpr() }
AiohttpViewClassFromSuperClass() {
this.getParent() = View::subclassRef().getAnImmediateUse().asExpr()
}
}
/** A class that is used in a route-setup, therefore being considered an aiohttp.web View class. */

View File

@@ -829,7 +829,7 @@ module PrivateDjango {
/** Gets the (AST) class of the Django model class `modelClass`. */
Class getModelClassClass(API::Node modelClass) {
result.getParent() = modelClass.getAUse().asExpr().(ClassExpr) and
result.getParent() = modelClass.getAnImmediateUse().asExpr() and
modelClass = Model::subclassRef()
}
@@ -2162,7 +2162,9 @@ module PrivateDjango {
* thereby handling user input.
*/
class DjangoFormClass extends Class, SelfRefMixin {
DjangoFormClass() { this.getABase() = Django::Forms::Form::subclassRef().getAUse().asExpr() }
DjangoFormClass() {
this.getParent() = Django::Forms::Form::subclassRef().getAnImmediateUse().asExpr()
}
}
/**
@@ -2195,7 +2197,7 @@ module PrivateDjango {
*/
class DjangoFormFieldClass extends Class {
DjangoFormFieldClass() {
this.getABase() = Django::Forms::Field::subclassRef().getAUse().asExpr()
this.getParent() = Django::Forms::Field::subclassRef().getAnImmediateUse().asExpr()
}
}
@@ -2298,7 +2300,7 @@ module PrivateDjango {
*/
class DjangoViewClassFromSuperClass extends DjangoViewClass {
DjangoViewClassFromSuperClass() {
this.getABase() = Django::Views::View::subclassRef().getAUse().asExpr()
this.getParent() = Django::Views::View::subclassRef().getAnImmediateUse().asExpr()
}
}

View File

@@ -194,8 +194,8 @@ module Flask {
API::Node api_node;
FlaskViewClass() {
this.getABase() = Views::View::subclassRef().getAUse().asExpr() and
api_node.getAnImmediateUse().asExpr() = this.getParent()
api_node = Views::View::subclassRef() and
this.getParent() = api_node.getAnImmediateUse().asExpr()
}
/** Gets a function that could handle incoming requests, if any. */
@@ -219,8 +219,8 @@ module Flask {
*/
class FlaskMethodViewClass extends FlaskViewClass {
FlaskMethodViewClass() {
this.getABase() = Views::MethodView::subclassRef().getAUse().asExpr() and
api_node.getAnImmediateUse().asExpr() = this.getParent()
api_node = Views::MethodView::subclassRef() and
this.getParent() = api_node.getAnImmediateUse().asExpr()
}
override Function getARequestHandler() {

View File

@@ -115,7 +115,7 @@ private module RestFramework {
*/
class RestFrameworkApiViewClass extends PrivateDjango::DjangoViewClassFromSuperClass {
RestFrameworkApiViewClass() {
this.getABase() = any(ModeledApiViewClasses c).getASubclass*().getAUse().asExpr()
this.getParent() = any(ModeledApiViewClasses c).getASubclass*().getAnImmediateUse().asExpr()
}
override Function getARequestHandler() {

View File

@@ -1934,7 +1934,7 @@ private module StdlibPrivate {
/** A HttpRequestHandler class definition (most likely in project code). */
class HttpRequestHandlerClassDef extends Class {
HttpRequestHandlerClassDef() { this.getParent() = subclassRef().getAUse().asExpr() }
HttpRequestHandlerClassDef() { this.getParent() = subclassRef().getAnImmediateUse().asExpr() }
}
/** DEPRECATED: Alias for HttpRequestHandlerClassDef */
@@ -2027,12 +2027,12 @@ private module StdlibPrivate {
private module WsgirefSimpleServer {
class WsgiServerSubclass extends Class, SelfRefMixin {
WsgiServerSubclass() {
this.getABase() =
this.getParent() =
API::moduleImport("wsgiref")
.getMember("simple_server")
.getMember("WSGIServer")
.getASubclass*()
.getAUse()
.getAnImmediateUse()
.asExpr()
}
}

View File

@@ -92,7 +92,7 @@ private module Tornado {
/** A RequestHandler class (most likely in project code). */
class RequestHandlerClass extends Class {
RequestHandlerClass() { this.getParent() = subclassRef().getAUse().asExpr() }
RequestHandlerClass() { this.getParent() = subclassRef().getAnImmediateUse().asExpr() }
/** Gets a function that could handle incoming requests, if any. */
Function getARequestHandler() {

View File

@@ -27,13 +27,13 @@ private module Twisted {
*/
class TwistedResourceSubclass extends Class {
TwistedResourceSubclass() {
this.getABase() =
this.getParent() =
API::moduleImport("twisted")
.getMember("web")
.getMember("resource")
.getMember("Resource")
.getASubclass*()
.getAUse()
.getAnImmediateUse()
.asExpr()
}

View File

@@ -1,3 +1,7 @@
## 0.0.13
## 0.0.12
## 0.0.11
### New Queries

View File

@@ -0,0 +1 @@
## 0.0.12

View File

@@ -0,0 +1 @@
## 0.0.13

View File

@@ -1,2 +1,2 @@
---
lastReleaseVersion: 0.0.11
lastReleaseVersion: 0.0.13

View File

@@ -0,0 +1,56 @@
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>
<overview>
<p>Extracting files from a malicious zip archive without validating that the destination file path
is within the destination directory can cause files outside the destination directory to be
overwritten, due to the possible presence of directory traversal elements (<code>..</code>) in
archive paths.</p>
<p>Zip archives contain archive entries representing each file in the archive. These entries
include a file path for the entry, but these file paths are not restricted and may contain
unexpected special elements such as the directory traversal element (<code>..</code>). If these
file paths are used to determine an output file to write the contents of the archive item to, then
the file may be written to an unexpected location. This can result in sensitive information being
revealed or deleted, or an attacker being able to influence behavior by modifying unexpected
files.</p>
<p>For example, if a Zip archive contains a file entry <code>..\sneaky-file</code>, and the Zip archive
is extracted to the directory <code>c:\output</code>, then naively combining the paths would result
in an output file path of <code>c:\output\..\sneaky-file</code>, which would cause the file to be
written to <code>c:\sneaky-file</code>.</p>
</overview>
<recommendation>
<p>Ensure that output paths constructed from Zip archive entries are validated
to prevent writing files to unexpected locations.</p>
<p>The recommended way of writing an output file from a Zip archive entry is to call <code>extract()</code> or <code>extractall()</code>.
</p>
</recommendation>
<example>
<p>
In this example an archive is extracted without validating file paths.
</p>
<sample src="zipslip_bad.py" />
<p>To fix this vulnerability, we need to call the function <code>extractall()</code>.
</p>
<sample src="zipslip_good.py" />
</example>
<references>
<li>
Snyk:
<a href="https://snyk.io/research/zip-slip-vulnerability">Zip Slip Vulnerability</a>.
</li>
</references>
</qhelp>

View File

@@ -0,0 +1,22 @@
/**
* @name Arbitrary file write during archive extraction ("Zip Slip")
* @description Extracting files from a malicious archive without validating that the
* destination file path is within the destination directory can cause files outside
* the destination directory to be overwritten.
* @kind path-problem
* @id py/zipslip
* @problem.severity error
* @security-severity 7.5
* @precision high
* @tags security
* external/cwe/cwe-022
*/
import python
import experimental.semmle.python.security.ZipSlip
import DataFlow::PathGraph
from ZipSlipConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Extraction of zipfile from $@", source.getNode(),
"a potentially untrusted source"

View File

@@ -0,0 +1,16 @@
import zipfile
import shutil
def unzip(filename):
with tarfile.open(filename) as zipf:
#BAD : This could write any file on the filesystem.
for entry in zipf:
shutil.copyfile(entry, "/tmp/unpack/")
def unzip4(filename):
zf = zipfile.ZipFile(filename)
filelist = zf.namelist()
for x in filelist:
with zf.open(x) as srcf:
shutil.copyfileobj(srcf, dstfile)

View File

@@ -0,0 +1,10 @@
import zipfile
def unzip(filename, dir):
zf = zipfile.ZipFile(filename)
zf.extractall(dir)
def unzip1(filename, dir):
zf = zipfile.ZipFile(filename)
zf.extract(dir)

View File

@@ -14,6 +14,73 @@ private import semmle.python.dataflow.new.RemoteFlowSources
private import semmle.python.dataflow.new.TaintTracking
private import experimental.semmle.python.Frameworks
/** Provides classes for modeling copying file related APIs. */
module CopyFile {
/**
* A data flow node for copying file.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `CopyFile` instead.
*/
abstract class Range extends DataFlow::Node {
/**
* Gets the argument containing the path.
*/
abstract DataFlow::Node getAPathArgument();
/**
* Gets fsrc argument.
*/
abstract DataFlow::Node getfsrcArgument();
}
}
/**
* A data flow node for copying file.
*
* Extend this class to refine existing API models. If you want to model new APIs,
* extend `CopyFile::Range` instead.
*/
class CopyFile extends DataFlow::Node {
CopyFile::Range range;
CopyFile() { this = range }
DataFlow::Node getAPathArgument() { result = range.getAPathArgument() }
DataFlow::Node getfsrcArgument() { result = range.getfsrcArgument() }
}
/** Provides classes for modeling log related APIs. */
module LogOutput {
/**
* A data flow node for log output.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `LogOutput` instead.
*/
abstract class Range extends DataFlow::Node {
/**
* Get the parameter value of the log output function.
*/
abstract DataFlow::Node getAnInput();
}
}
/**
* A data flow node for log output.
*
* Extend this class to refine existing API models. If you want to model new APIs,
* extend `LogOutput::Range` instead.
*/
class LogOutput extends DataFlow::Node {
LogOutput::Range range;
LogOutput() { this = range }
DataFlow::Node getAnInput() { result = range.getAnInput() }
}
/** Provides classes for modeling LDAP query execution-related APIs. */
module LdapQuery {
/**

View File

@@ -13,3 +13,4 @@ private import experimental.semmle.python.libraries.PyJWT
private import experimental.semmle.python.libraries.Python_JWT
private import experimental.semmle.python.libraries.Authlib
private import experimental.semmle.python.libraries.PythonJose
private import experimental.semmle.python.frameworks.CopyFile

View File

@@ -0,0 +1,42 @@
private import python
private import experimental.semmle.python.Concepts
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.ApiGraphs
private module CopyFileImpl {
/**
* The `shutil` module provides methods to copy or move files.
* See:
* - https://docs.python.org/3/library/shutil.html#shutil.copyfile
* - https://docs.python.org/3/library/shutil.html#shutil.copy
* - https://docs.python.org/3/library/shutil.html#shutil.copy2
* - https://docs.python.org/3/library/shutil.html#shutil.copytree
* - https://docs.python.org/3/library/shutil.html#shutil.move
*/
private class CopyFiles extends DataFlow::CallCfgNode, CopyFile::Range {
CopyFiles() {
this =
API::moduleImport("shutil")
.getMember(["copyfile", "copy", "copy2", "copytree", "move"])
.getACall()
}
override DataFlow::Node getAPathArgument() {
result in [this.getArg(0), this.getArgByName("src")]
}
override DataFlow::Node getfsrcArgument() { none() }
}
// TODO: once we have flow summaries, model `shutil.copyfileobj` which copies the content between its' file-like arguments.
// See https://docs.python.org/3/library/shutil.html#shutil.copyfileobj
private class CopyFileobj extends DataFlow::CallCfgNode, CopyFile::Range {
CopyFileobj() { this = API::moduleImport("shutil").getMember("copyfileobj").getACall() }
override DataFlow::Node getfsrcArgument() {
result in [this.getArg(0), this.getArgByName("fsrc")]
}
override DataFlow::Node getAPathArgument() { none() }
}
}

View File

@@ -0,0 +1,39 @@
import python
import experimental.semmle.python.Concepts
import semmle.python.dataflow.new.DataFlow
import semmle.python.ApiGraphs
import semmle.python.dataflow.new.TaintTracking
class ZipSlipConfig extends TaintTracking::Configuration {
ZipSlipConfig() { this = "ZipSlipConfig" }
override predicate isSource(DataFlow::Node source) {
(
source =
API::moduleImport("zipfile").getMember("ZipFile").getReturn().getMember("open").getACall() or
source =
API::moduleImport("zipfile")
.getMember("ZipFile")
.getReturn()
.getMember("namelist")
.getACall() or
source = API::moduleImport("tarfile").getMember("open").getACall() or
source = API::moduleImport("tarfile").getMember("TarFile").getACall() or
source = API::moduleImport("bz2").getMember("open").getACall() or
source = API::moduleImport("bz2").getMember("BZ2File").getACall() or
source = API::moduleImport("gzip").getMember("GzipFile").getACall() or
source = API::moduleImport("gzip").getMember("open").getACall() or
source = API::moduleImport("lzma").getMember("open").getACall() or
source = API::moduleImport("lzma").getMember("LZMAFile").getACall()
) and
not source.getScope().getLocation().getFile().inStdlib()
}
override predicate isSink(DataFlow::Node sink) {
(
sink = any(CopyFile copyfile).getAPathArgument() or
sink = any(CopyFile copyfile).getfsrcArgument()
) and
not sink.getScope().getLocation().getFile().inStdlib()
}
}

View File

@@ -1,5 +1,5 @@
name: codeql/python-queries
version: 0.0.12-dev
version: 0.1.0-dev
groups:
- python
- queries

View File

@@ -0,0 +1,34 @@
edges
| zipslip_bad.py:8:10:8:31 | ControlFlowNode for Attribute() | zipslip_bad.py:10:13:10:17 | SSA variable entry |
| zipslip_bad.py:10:13:10:17 | SSA variable entry | zipslip_bad.py:11:25:11:29 | ControlFlowNode for entry |
| zipslip_bad.py:14:10:14:28 | ControlFlowNode for Attribute() | zipslip_bad.py:16:13:16:17 | SSA variable entry |
| zipslip_bad.py:16:13:16:17 | SSA variable entry | zipslip_bad.py:17:26:17:30 | ControlFlowNode for entry |
| zipslip_bad.py:20:10:20:27 | ControlFlowNode for Attribute() | zipslip_bad.py:22:13:22:17 | SSA variable entry |
| zipslip_bad.py:22:13:22:17 | SSA variable entry | zipslip_bad.py:23:29:23:33 | ControlFlowNode for entry |
| zipslip_bad.py:27:10:27:22 | ControlFlowNode for Attribute() | zipslip_bad.py:29:13:29:13 | SSA variable x |
| zipslip_bad.py:29:13:29:13 | SSA variable x | zipslip_bad.py:30:25:30:25 | ControlFlowNode for x |
| zipslip_bad.py:34:16:34:28 | ControlFlowNode for Attribute() | zipslip_bad.py:35:9:35:9 | SSA variable x |
| zipslip_bad.py:35:9:35:9 | SSA variable x | zipslip_bad.py:37:32:37:32 | ControlFlowNode for x |
nodes
| zipslip_bad.py:8:10:8:31 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| zipslip_bad.py:10:13:10:17 | SSA variable entry | semmle.label | SSA variable entry |
| zipslip_bad.py:11:25:11:29 | ControlFlowNode for entry | semmle.label | ControlFlowNode for entry |
| zipslip_bad.py:14:10:14:28 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| zipslip_bad.py:16:13:16:17 | SSA variable entry | semmle.label | SSA variable entry |
| zipslip_bad.py:17:26:17:30 | ControlFlowNode for entry | semmle.label | ControlFlowNode for entry |
| zipslip_bad.py:20:10:20:27 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| zipslip_bad.py:22:13:22:17 | SSA variable entry | semmle.label | SSA variable entry |
| zipslip_bad.py:23:29:23:33 | ControlFlowNode for entry | semmle.label | ControlFlowNode for entry |
| zipslip_bad.py:27:10:27:22 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| zipslip_bad.py:29:13:29:13 | SSA variable x | semmle.label | SSA variable x |
| zipslip_bad.py:30:25:30:25 | ControlFlowNode for x | semmle.label | ControlFlowNode for x |
| zipslip_bad.py:34:16:34:28 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| zipslip_bad.py:35:9:35:9 | SSA variable x | semmle.label | SSA variable x |
| zipslip_bad.py:37:32:37:32 | ControlFlowNode for x | semmle.label | ControlFlowNode for x |
subpaths
#select
| zipslip_bad.py:11:25:11:29 | ControlFlowNode for entry | zipslip_bad.py:8:10:8:31 | ControlFlowNode for Attribute() | zipslip_bad.py:11:25:11:29 | ControlFlowNode for entry | Extraction of zipfile from $@ | zipslip_bad.py:8:10:8:31 | ControlFlowNode for Attribute() | a potentially untrusted source |
| zipslip_bad.py:17:26:17:30 | ControlFlowNode for entry | zipslip_bad.py:14:10:14:28 | ControlFlowNode for Attribute() | zipslip_bad.py:17:26:17:30 | ControlFlowNode for entry | Extraction of zipfile from $@ | zipslip_bad.py:14:10:14:28 | ControlFlowNode for Attribute() | a potentially untrusted source |
| zipslip_bad.py:23:29:23:33 | ControlFlowNode for entry | zipslip_bad.py:20:10:20:27 | ControlFlowNode for Attribute() | zipslip_bad.py:23:29:23:33 | ControlFlowNode for entry | Extraction of zipfile from $@ | zipslip_bad.py:20:10:20:27 | ControlFlowNode for Attribute() | a potentially untrusted source |
| zipslip_bad.py:30:25:30:25 | ControlFlowNode for x | zipslip_bad.py:27:10:27:22 | ControlFlowNode for Attribute() | zipslip_bad.py:30:25:30:25 | ControlFlowNode for x | Extraction of zipfile from $@ | zipslip_bad.py:27:10:27:22 | ControlFlowNode for Attribute() | a potentially untrusted source |
| zipslip_bad.py:37:32:37:32 | ControlFlowNode for x | zipslip_bad.py:34:16:34:28 | ControlFlowNode for Attribute() | zipslip_bad.py:37:32:37:32 | ControlFlowNode for x | Extraction of zipfile from $@ | zipslip_bad.py:34:16:34:28 | ControlFlowNode for Attribute() | a potentially untrusted source |

View File

@@ -0,0 +1 @@
experimental/Security/CWE-022/ZipSlip.ql

View File

@@ -0,0 +1,39 @@
import tarfile
import shutil
import bz2
import gzip
import zipfile
def unzip(filename):
with tarfile.open(filename) as zipf:
#BAD : This could write any file on the filesystem.
for entry in zipf:
shutil.move(entry, "/tmp/unpack/")
def unzip1(filename):
with gzip.open(filename) as zipf:
#BAD : This could write any file on the filesystem.
for entry in zipf:
shutil.copy2(entry, "/tmp/unpack/")
def unzip2(filename):
with bz2.open(filename) as zipf:
#BAD : This could write any file on the filesystem.
for entry in zipf:
shutil.copyfile(entry, "/tmp/unpack/")
def unzip3(filename):
zf = zipfile.ZipFile(filename)
with zf.namelist() as filelist:
#BAD : This could write any file on the filesystem.
for x in filelist:
shutil.copy(x, "/tmp/unpack/")
def unzip4(filename):
zf = zipfile.ZipFile(filename)
filelist = zf.namelist()
for x in filelist:
with zf.open(x) as srcf:
shutil.copyfileobj(x, "/tmp/unpack/")
import tty # to set the import root so we can identify the standard library

View File

@@ -0,0 +1,14 @@
import zipfile
import tarfile
import shutil
def unzip(filename, dir):
zf = zipfile.ZipFile(filename)
zf.extractall(dir)
def unzip1(filename, dir):
zf = zipfile.ZipFile(filename)
zf.extract(dir)