Merge branch 'main' into new-nosql-examples

This commit is contained in:
Rasmus Wriedt Larsen
2022-05-02 11:21:36 +02:00
1511 changed files with 32566 additions and 14775 deletions

View File

@@ -1,3 +1,48 @@
## 0.1.0
### Breaking Changes
* The recently added flow-state versions of `isBarrierIn`, `isBarrierOut`, `isSanitizerIn`, and `isSanitizerOut` in the data flow and taint tracking libraries have been removed.
### Deprecated APIs
* Queries importing a data-flow configuration from `semmle.python.security.dataflow`
should ensure that the imported file ends with `Query`, and only import its top-level
module. For example, a query that used `CommandInjection::Configuration` from
`semmle.python.security.dataflow.CommandInjection` should from now use `Configuration`
from `semmle.python.security.dataflow.CommandInjectionQuery` instead.
### Major Analysis Improvements
* Added data-flow for Django ORM models that are saved in a database (no `models.ForeignKey` support).
### Minor Analysis Improvements
* Improved modeling of Flask `Response` objects, so passing a response body with the keyword argument `response` is now recognized.
## 0.0.13
## 0.0.12
### Breaking Changes
* The flow state variants of `isBarrier` and `isAdditionalFlowStep` are no longer exposed in the taint tracking library. The `isSanitizer` and `isAdditionalTaintStep` predicates should be used instead.
### Deprecated APIs
* Many classes/predicates/modules that had upper-case acronyms have been renamed to follow our style-guide.
The old name still exists as a deprecated alias.
* Some modules that started with a lowercase letter have been renamed to follow our style-guide.
The old name still exists as a deprecated alias.
### New Features
* The data flow and taint tracking libraries have been extended with versions of `isBarrierIn`, `isBarrierOut`, and `isBarrierGuard`, respectively `isSanitizerIn`, `isSanitizerOut`, and `isSanitizerGuard`, that support flow states.
### Minor Analysis Improvements
* All deprecated predicates/classes/modules that have been deprecated for over a year have been deleted.
## 0.0.11
### Minor Analysis Improvements

View File

@@ -1,4 +0,0 @@
---
category: minorAnalysis
---
* All deprecated predicates/classes/modules that have been deprecated for over a year have been deleted.

View File

@@ -1,5 +0,0 @@
---
category: deprecated
---
* Many classes/predicates/modules that had upper-case acronyms have been renamed to follow our style-guide.
The old name still exists as a deprecated alias.

View File

@@ -1,5 +0,0 @@
---
category: deprecated
---
* Some modules that started with a lowercase letter have been renamed to follow our style-guide.
The old name still exists as a deprecated alias.

View File

@@ -1,4 +0,0 @@
---
category: majorAnalysis
---
* Added data-flow for Django ORM models that are saved in a database (no `models.ForeignKey` support).

View File

@@ -1,4 +0,0 @@
---
category: feature
---
* The data flow and taint tracking libraries have been extended with versions of `isBarrierIn`, `isBarrierOut`, and `isBarrierGuard`, respectively `isSanitizerIn`, `isSanitizerOut`, and `isSanitizerGuard`, that support flow states.

View File

@@ -1,4 +0,0 @@
---
category: breaking
---
* The flow state variants of `isBarrier` and `isAdditionalFlowStep` are no longer exposed in the taint tracking library. The `isSanitizer` and `isAdditionalTaintStep` predicates should be used instead.

View File

@@ -1,8 +0,0 @@
---
category: deprecated
---
* Queries importing a data-flow configuration from `semmle.python.security.dataflow`
should ensure that the imported file ends with `Query`, and only import its top-level
module. For example, a query that used `CommandInjection::Configuration` from
`semmle.python.security.dataflow.CommandInjection` should from now use `Configuration`
from `semmle.python.security.dataflow.CommandInjectionQuery` instead.

View File

@@ -1,4 +0,0 @@
---
category: minorAnalysis
---
* Improved modeling of Flask `Response` objects, so passing a response body with the keyword argument `response` is now recognized.

View File

@@ -0,0 +1,4 @@
---
category: breaking
---
The signature of `allowImplicitRead` on `DataFlow::Configuration` and `TaintTracking::Configuration` has changed from `allowImplicitRead(DataFlow::Node node, DataFlow::Content c)` to `allowImplicitRead(DataFlow::Node node, DataFlow::ContentSet c)`.

View File

@@ -0,0 +1,20 @@
## 0.0.12
### Breaking Changes
* The flow state variants of `isBarrier` and `isAdditionalFlowStep` are no longer exposed in the taint tracking library. The `isSanitizer` and `isAdditionalTaintStep` predicates should be used instead.
### Deprecated APIs
* Many classes/predicates/modules that had upper-case acronyms have been renamed to follow our style-guide.
The old name still exists as a deprecated alias.
* Some modules that started with a lowercase letter have been renamed to follow our style-guide.
The old name still exists as a deprecated alias.
### New Features
* The data flow and taint tracking libraries have been extended with versions of `isBarrierIn`, `isBarrierOut`, and `isBarrierGuard`, respectively `isSanitizerIn`, `isSanitizerOut`, and `isSanitizerGuard`, that support flow states.
### Minor Analysis Improvements
* All deprecated predicates/classes/modules that have been deprecated for over a year have been deleted.

View File

@@ -0,0 +1 @@
## 0.0.13

View File

@@ -0,0 +1,21 @@
## 0.1.0
### Breaking Changes
* The recently added flow-state versions of `isBarrierIn`, `isBarrierOut`, `isSanitizerIn`, and `isSanitizerOut` in the data flow and taint tracking libraries have been removed.
### Deprecated APIs
* Queries importing a data-flow configuration from `semmle.python.security.dataflow`
should ensure that the imported file ends with `Query`, and only import its top-level
module. For example, a query that used `CommandInjection::Configuration` from
`semmle.python.security.dataflow.CommandInjection` should from now use `Configuration`
from `semmle.python.security.dataflow.CommandInjectionQuery` instead.
### Major Analysis Improvements
* Added data-flow for Django ORM models that are saved in a database (no `models.ForeignKey` support).
### Minor Analysis Improvements
* Improved modeling of Flask `Response` objects, so passing a response body with the keyword argument `response` is now recognized.

View File

@@ -1,2 +1,2 @@
---
lastReleaseVersion: 0.0.11
lastReleaseVersion: 0.1.0

View File

@@ -1,5 +1,5 @@
name: codeql/python-all
version: 0.0.12-dev
version: 0.1.1-dev
groups: python
dbscheme: semmlecode.python.dbscheme
extractor: python

View File

@@ -779,7 +779,7 @@ module API {
MkLabelAwait()
/** A label for a module. */
class LabelModule extends ApiLabel {
class LabelModule extends ApiLabel, MkLabelModule {
string mod;
LabelModule() { this = MkLabelModule(mod) }
@@ -791,7 +791,7 @@ module API {
}
/** A label for the member named `prop`. */
class LabelMember extends ApiLabel {
class LabelMember extends ApiLabel, MkLabelMember {
string member;
LabelMember() { this = MkLabelMember(member) }
@@ -803,14 +803,12 @@ module API {
}
/** A label for a member with an unknown name. */
class LabelUnknownMember extends ApiLabel {
LabelUnknownMember() { this = MkLabelUnknownMember() }
class LabelUnknownMember extends ApiLabel, MkLabelUnknownMember {
override string toString() { result = "getUnknownMember()" }
}
/** A label for parameter `i`. */
class LabelParameter extends ApiLabel {
class LabelParameter extends ApiLabel, MkLabelParameter {
int i;
LabelParameter() { this = MkLabelParameter(i) }
@@ -822,7 +820,7 @@ module API {
}
/** A label for a keyword parameter `name`. */
class LabelKeywordParameter extends ApiLabel {
class LabelKeywordParameter extends ApiLabel, MkLabelKeywordParameter {
string name;
LabelKeywordParameter() { this = MkLabelKeywordParameter(name) }
@@ -834,23 +832,17 @@ module API {
}
/** A label that gets the return value of a function. */
class LabelReturn extends ApiLabel {
LabelReturn() { this = MkLabelReturn() }
class LabelReturn extends ApiLabel, MkLabelReturn {
override string toString() { result = "getReturn()" }
}
/** A label that gets the subclass of a class. */
class LabelSubclass extends ApiLabel {
LabelSubclass() { this = MkLabelSubclass() }
class LabelSubclass extends ApiLabel, MkLabelSubclass {
override string toString() { result = "getASubclass()" }
}
/** A label for awaited values. */
class LabelAwait extends ApiLabel {
LabelAwait() { this = MkLabelAwait() }
class LabelAwait extends ApiLabel, MkLabelAwait {
override string toString() { result = "getAwaited()" }
}
}

View File

@@ -189,7 +189,16 @@ class Call extends Call_ {
*/
Keyword getKeyword(int index) {
result = this.getNamedArg(index) and
not exists(DictUnpacking d, int lower | d = this.getNamedArg(lower) and lower < index)
(
not exists(this.getMinimumUnpackingIndex())
or
index <= this.getMinimumUnpackingIndex()
)
}
/** Gets the minimum index (if any) at which a dictionary unpacking (`**foo`) occurs in this call. */
private int getMinimumUnpackingIndex() {
result = min(int i | this.getNamedArg(i) instanceof DictUnpacking)
}
/**

View File

@@ -1,5 +1,4 @@
import python
private import semmle.python.objects.ObjectAPI
private import semmle.python.objects.Modules
private import semmle.python.internal.CachedStages

View File

@@ -0,0 +1,558 @@
/**
* INTERNAL: Do not use.
*
* Points-to based call-graph.
*/
private import python
private import DataFlowPublic
private import semmle.python.SpecialMethods
/** A parameter position represented by an integer. */
class ParameterPosition extends int {
ParameterPosition() { exists(any(DataFlowCallable c).getParameter(this)) }
}
/** An argument position represented by an integer. */
class ArgumentPosition extends int {
ArgumentPosition() { exists(any(DataFlowCall c).getArg(this)) }
}
/** Holds if arguments at position `apos` match parameters at position `ppos`. */
pragma[inline]
predicate parameterMatch(ParameterPosition ppos, ArgumentPosition apos) { ppos = apos }
/**
* Computes routing of arguments to parameters
*
* When a call contains more positional arguments than there are positional parameters,
* the extra positional arguments are passed as a tuple to a starred parameter. This is
* achieved by synthesizing a node `TPosOverflowNode(call, callable)`
* that represents the tuple of extra positional arguments. There is a store step from each
* extra positional argument to this node.
*
* CURRENTLY NOT SUPPORTED:
* When a call contains an iterable unpacking argument, such as `func(*args)`, it is expanded into positional arguments.
*
* CURRENTLY NOT SUPPORTED:
* If a call contains an iterable unpacking argument, such as `func(*args)`, and the callee contains a starred argument, any extra
* positional arguments are passed to the starred argument.
*
* When a call contains keyword arguments that do not correspond to keyword parameters, these
* extra keyword arguments are passed as a dictionary to a doubly starred parameter. This is
* achieved by synthesizing a node `TKwOverflowNode(call, callable)`
* that represents the dictionary of extra keyword arguments. There is a store step from each
* extra keyword argument to this node.
*
* When a call contains a dictionary unpacking argument, such as `func(**kwargs)`, with entries corresponding to a keyword parameter,
* the value at such a key is unpacked and passed to the parameter. This is achieved
* by synthesizing an argument node `TKwUnpacked(call, callable, name)` representing the unpacked
* value. This node is used as the argument passed to the matching keyword parameter. There is a read
* step from the dictionary argument to the synthesized argument node.
*
* When a call contains a dictionary unpacking argument, such as `func(**kwargs)`, and the callee contains a doubly starred parameter,
* entries which are not unpacked are passed to the doubly starred parameter. This is achieved by
* adding a dataflow step from the dictionary argument to `TKwOverflowNode(call, callable)` and a
* step to clear content of that node at any unpacked keys.
*
* ## Examples:
* Assume that we have the callable
* ```python
* def f(x, y, *t, **d):
* pass
* ```
* Then the call
* ```python
* f(0, 1, 2, a=3)
* ```
* will be modeled as
* ```python
* f(0, 1, [*t], [**d])
* ```
* where `[` and `]` denotes synthesized nodes, so `[*t]` is the synthesized tuple argument
* `TPosOverflowNode` and `[**d]` is the synthesized dictionary argument `TKwOverflowNode`.
* There will be a store step from `2` to `[*t]` at pos `0` and one from `3` to `[**d]` at key
* `a`.
*
* For the call
* ```python
* f(0, **{"y": 1, "a": 3})
* ```
* no tuple argument is synthesized. It is modeled as
* ```python
* f(0, [y=1], [**d])
* ```
* where `[y=1]` is the synthesized unpacked argument `TKwUnpacked` (with `name` = `y`). There is
* a read step from `**{"y": 1, "a": 3}` to `[y=1]` at key `y` to get the value passed to the parameter
* `y`. There is a dataflow step from `**{"y": 1, "a": 3}` to `[**d]` to transfer the content and
* a clearing of content at key `y` for node `[**d]`, since that value has been unpacked.
*/
module ArgumentPassing {
/**
* Holds if `call` represents a `DataFlowCall` to a `DataFlowCallable` represented by `callable`.
*
* It _may not_ be the case that `call = callable.getACall()`, i.e. if `call` represents a `ClassCall`.
*
* Used to limit the size of predicates.
*/
predicate connects(CallNode call, CallableValue callable) {
exists(DataFlowCall c |
call = c.getNode() and
callable = c.getCallable().getCallableValue()
)
}
/**
* Gets the `n`th parameter of `callable`.
* If the callable has a starred parameter, say `*tuple`, that is matched with `n=-1`.
* If the callable has a doubly starred parameter, say `**dict`, that is matched with `n=-2`.
* Note that, unlike other languages, we do _not_ use -1 for the position of `self` in Python,
* as it is an explicit parameter at position 0.
*/
NameNode getParameter(CallableValue callable, int n) {
// positional parameter
result = callable.getParameter(n)
or
// starred parameter, `*tuple`
exists(Function f |
f = callable.getScope() and
n = -1 and
result = f.getVararg().getAFlowNode()
)
or
// doubly starred parameter, `**dict`
exists(Function f |
f = callable.getScope() and
n = -2 and
result = f.getKwarg().getAFlowNode()
)
}
/**
* A type representing a mapping from argument indices to parameter indices.
* We currently use two mappings: NoShift, the identity, used for ordinary
* function calls, and ShiftOneUp which is used for calls where an extra argument
* is inserted. These include method calls, constructor calls and class calls.
* In these calls, the argument at index `n` is mapped to the parameter at position `n+1`.
*/
newtype TArgParamMapping =
TNoShift() or
TShiftOneUp()
/** A mapping used for parameter passing. */
abstract class ArgParamMapping extends TArgParamMapping {
/** Gets the index of the parameter that corresponds to the argument at index `argN`. */
bindingset[argN]
abstract int getParamN(int argN);
/** Gets a textual representation of this element. */
abstract string toString();
}
/** A mapping that passes argument `n` to parameter `n`. */
class NoShift extends ArgParamMapping, TNoShift {
NoShift() { this = TNoShift() }
override string toString() { result = "NoShift [n -> n]" }
bindingset[argN]
override int getParamN(int argN) { result = argN }
}
/** A mapping that passes argument `n` to parameter `n+1`. */
class ShiftOneUp extends ArgParamMapping, TShiftOneUp {
ShiftOneUp() { this = TShiftOneUp() }
override string toString() { result = "ShiftOneUp [n -> n+1]" }
bindingset[argN]
override int getParamN(int argN) { result = argN + 1 }
}
/**
* Gets the node representing the argument to `call` that is passed to the parameter at
* (zero-based) index `paramN` in `callable`. If this is a positional argument, it must appear
* at an index, `argN`, in `call` wich satisfies `paramN = mapping.getParamN(argN)`.
*
* `mapping` will be the identity for function calls, but not for method- or constructor calls,
* where the first parameter is `self` and the first positional argument is passed to the second positional parameter.
* Similarly for classmethod calls, where the first parameter is `cls`.
*
* NOT SUPPORTED: Keyword-only parameters.
*/
Node getArg(CallNode call, ArgParamMapping mapping, CallableValue callable, int paramN) {
connects(call, callable) and
(
// positional argument
exists(int argN |
paramN = mapping.getParamN(argN) and
result = TCfgNode(call.getArg(argN))
)
or
// keyword argument
// TODO: Since `getArgName` have no results for keyword-only parameters,
// these are currently not supported.
exists(Function f, string argName |
f = callable.getScope() and
f.getArgName(paramN) = argName and
result = TCfgNode(call.getArgByName(unbind_string(argName)))
)
or
// a synthezised argument passed to the starred parameter (at position -1)
callable.getScope().hasVarArg() and
paramN = -1 and
result = TPosOverflowNode(call, callable)
or
// a synthezised argument passed to the doubly starred parameter (at position -2)
callable.getScope().hasKwArg() and
paramN = -2 and
result = TKwOverflowNode(call, callable)
or
// argument unpacked from dict
exists(string name |
call_unpacks(call, mapping, callable, name, paramN) and
result = TKwUnpackedNode(call, callable, name)
)
)
}
/** Currently required in `getArg` in order to prevent a bad join. */
bindingset[result, s]
private string unbind_string(string s) { result <= s and s <= result }
/** Gets the control flow node that is passed as the `n`th overflow positional argument. */
ControlFlowNode getPositionalOverflowArg(CallNode call, CallableValue callable, int n) {
connects(call, callable) and
exists(Function f, int posCount, int argNr |
f = callable.getScope() and
f.hasVarArg() and
posCount = f.getPositionalParameterCount() and
result = call.getArg(argNr) and
argNr >= posCount and
argNr = posCount + n
)
}
/** Gets the control flow node that is passed as the overflow keyword argument with key `key`. */
ControlFlowNode getKeywordOverflowArg(CallNode call, CallableValue callable, string key) {
connects(call, callable) and
exists(Function f |
f = callable.getScope() and
f.hasKwArg() and
not exists(f.getArgByName(key)) and
result = call.getArgByName(key)
)
}
/**
* Holds if `call` unpacks a dictionary argument in order to pass it via `name`.
* It will then be passed to the parameter of `callable` at index `paramN`.
*/
predicate call_unpacks(
CallNode call, ArgParamMapping mapping, CallableValue callable, string name, int paramN
) {
connects(call, callable) and
exists(Function f |
f = callable.getScope() and
not exists(int argN | paramN = mapping.getParamN(argN) | exists(call.getArg(argN))) and // no positional argument available
name = f.getArgName(paramN) and
// not exists(call.getArgByName(name)) and // only matches keyword arguments not preceded by **
// TODO: make the below logic respect control flow splitting (by not going to the AST).
not call.getNode().getANamedArg().(Keyword).getArg() = name and // no keyword argument available
paramN >= 0 and
paramN < f.getPositionalParameterCount() + f.getKeywordOnlyParameterCount() and
exists(call.getNode().getKwargs()) // dict argument available
)
}
}
import ArgumentPassing
/**
* IPA type for DataFlowCallable.
*
* A callable is either a function value, a class value, or a module (for enclosing `ModuleVariableNode`s).
* A module has no calls.
*/
newtype TDataFlowCallable =
TCallableValue(CallableValue callable) {
callable instanceof FunctionValue and
not callable.(FunctionValue).isLambda()
or
callable instanceof ClassValue
} or
TLambda(Function lambda) { lambda.isLambda() } or
TModule(Module m)
/** A callable. */
abstract class DataFlowCallable extends TDataFlowCallable {
/** Gets a textual representation of this element. */
abstract string toString();
/** Gets a call to this callable. */
abstract CallNode getACall();
/** Gets the scope of this callable */
abstract Scope getScope();
/** Gets the specified parameter of this callable */
abstract NameNode getParameter(int n);
/** Gets the name of this callable. */
abstract string getName();
/** Gets a callable value for this callable, if one exists. */
abstract CallableValue getCallableValue();
}
/** A class representing a callable value. */
class DataFlowCallableValue extends DataFlowCallable, TCallableValue {
CallableValue callable;
DataFlowCallableValue() { this = TCallableValue(callable) }
override string toString() { result = callable.toString() }
override CallNode getACall() { result = callable.getACall() }
override Scope getScope() { result = callable.getScope() }
override NameNode getParameter(int n) { result = getParameter(callable, n) }
override string getName() { result = callable.getName() }
override CallableValue getCallableValue() { result = callable }
}
/** A class representing a callable lambda. */
class DataFlowLambda extends DataFlowCallable, TLambda {
Function lambda;
DataFlowLambda() { this = TLambda(lambda) }
override string toString() { result = lambda.toString() }
override CallNode getACall() { result = this.getCallableValue().getACall() }
override Scope getScope() { result = lambda.getEvaluatingScope() }
override NameNode getParameter(int n) { result = getParameter(this.getCallableValue(), n) }
override string getName() { result = "Lambda callable" }
override FunctionValue getCallableValue() {
result.getOrigin().getNode() = lambda.getDefinition()
}
}
/** A class representing the scope in which a `ModuleVariableNode` appears. */
class DataFlowModuleScope extends DataFlowCallable, TModule {
Module mod;
DataFlowModuleScope() { this = TModule(mod) }
override string toString() { result = mod.toString() }
override CallNode getACall() { none() }
override Scope getScope() { result = mod }
override NameNode getParameter(int n) { none() }
override string getName() { result = mod.getName() }
override CallableValue getCallableValue() { none() }
}
/**
* IPA type for DataFlowCall.
*
* Calls corresponding to `CallNode`s are either to callable values or to classes.
* The latter is directed to the callable corresponding to the `__init__` method of the class.
*
* An `__init__` method can also be called directly, so that the callable can be targeted by
* different types of calls. In that case, the parameter mappings will be different,
* as the class call will synthesize an argument node to be mapped to the `self` parameter.
*
* A call corresponding to a special method call is handled by the corresponding `SpecialMethodCallNode`.
*
* TODO: Add `TClassMethodCall` mapping `cls` appropriately.
*/
newtype TDataFlowCall =
TFunctionCall(CallNode call) { call = any(FunctionValue f).getAFunctionCall() } or
/** Bound methods need to make room for the explicit self parameter */
TMethodCall(CallNode call) { call = any(FunctionValue f).getAMethodCall() } or
TClassCall(CallNode call) { call = any(ClassValue c | not c.isAbsent()).getACall() } or
TSpecialCall(SpecialMethodCallNode special)
/** A call. */
abstract class DataFlowCall extends TDataFlowCall {
/** Gets a textual representation of this element. */
abstract string toString();
/** Get the callable to which this call goes. */
abstract DataFlowCallable getCallable();
/**
* Gets the argument to this call that will be sent
* to the `n`th parameter of the callable.
*/
abstract Node getArg(int n);
/** Get the control flow node representing this call. */
abstract ControlFlowNode getNode();
/** Gets the enclosing callable of this call. */
abstract DataFlowCallable getEnclosingCallable();
/** Gets the location of this dataflow call. */
Location getLocation() { result = this.getNode().getLocation() }
}
/**
* A call to a function/lambda.
* This excludes calls to bound methods, classes, and special methods.
* Bound method calls and class calls insert an argument for the explicit
* `self` parameter, and special method calls have special argument passing.
*/
class FunctionCall extends DataFlowCall, TFunctionCall {
CallNode call;
DataFlowCallable callable;
FunctionCall() {
this = TFunctionCall(call) and
call = callable.getACall()
}
override string toString() { result = call.toString() }
override Node getArg(int n) { result = getArg(call, TNoShift(), callable.getCallableValue(), n) }
override ControlFlowNode getNode() { result = call }
override DataFlowCallable getCallable() { result = callable }
override DataFlowCallable getEnclosingCallable() { result.getScope() = call.getNode().getScope() }
}
/**
* Represents a call to a bound method call.
* The node representing the instance is inserted as argument to the `self` parameter.
*/
class MethodCall extends DataFlowCall, TMethodCall {
CallNode call;
FunctionValue bm;
MethodCall() {
this = TMethodCall(call) and
call = bm.getACall()
}
private CallableValue getCallableValue() { result = bm }
override string toString() { result = call.toString() }
override Node getArg(int n) {
n > 0 and result = getArg(call, TShiftOneUp(), this.getCallableValue(), n)
or
n = 0 and result = TCfgNode(call.getFunction().(AttrNode).getObject())
}
override ControlFlowNode getNode() { result = call }
override DataFlowCallable getCallable() { result = TCallableValue(this.getCallableValue()) }
override DataFlowCallable getEnclosingCallable() { result.getScope() = call.getScope() }
}
/**
* Represents a call to a class.
* The pre-update node for the call is inserted as argument to the `self` parameter.
* That makes the call node be the post-update node holding the value of the object
* after the constructor has run.
*/
class ClassCall extends DataFlowCall, TClassCall {
CallNode call;
ClassValue c;
ClassCall() {
this = TClassCall(call) and
call = c.getACall()
}
private CallableValue getCallableValue() { c.getScope().getInitMethod() = result.getScope() }
override string toString() { result = call.toString() }
override Node getArg(int n) {
n > 0 and result = getArg(call, TShiftOneUp(), this.getCallableValue(), n)
or
n = 0 and result = TSyntheticPreUpdateNode(TCfgNode(call))
}
override ControlFlowNode getNode() { result = call }
override DataFlowCallable getCallable() { result = TCallableValue(this.getCallableValue()) }
override DataFlowCallable getEnclosingCallable() { result.getScope() = call.getScope() }
}
/** A call to a special method. */
class SpecialCall extends DataFlowCall, TSpecialCall {
SpecialMethodCallNode special;
SpecialCall() { this = TSpecialCall(special) }
override string toString() { result = special.toString() }
override Node getArg(int n) { result = TCfgNode(special.(SpecialMethod::Potential).getArg(n)) }
override ControlFlowNode getNode() { result = special }
override DataFlowCallable getCallable() {
result = TCallableValue(special.getResolvedSpecialMethod())
}
override DataFlowCallable getEnclosingCallable() {
result.getScope() = special.getNode().getScope()
}
}
/** Gets a viable run-time target for the call `call`. */
DataFlowCallable viableCallable(DataFlowCall call) { result = call.getCallable() }
private newtype TReturnKind = TNormalReturnKind()
/**
* A return kind. A return kind describes how a value can be returned
* from a callable. For Python, this is simply a method return.
*/
class ReturnKind extends TReturnKind {
/** Gets a textual representation of this element. */
string toString() { result = "return" }
}
/** A data flow node that represents a value returned by a callable. */
class ReturnNode extends CfgNode {
Return ret;
// See `TaintTrackingImplementation::returnFlowStep`
ReturnNode() { node = ret.getValue().getAFlowNode() }
/** Gets the kind of this return node. */
ReturnKind getKind() { any() }
}
/** A data flow node that represents the output of a call. */
class OutNode extends CfgNode {
OutNode() { node instanceof CallNode }
}
/**
* Gets a node that can read the value returned from `call` with return kind
* `kind`.
*/
OutNode getAnOutNode(DataFlowCall call, ReturnKind kind) {
call.getNode() = result.getNode() and
kind = TNormalReturnKind()
}

View File

@@ -87,21 +87,9 @@ abstract class Configuration extends string {
/** Holds if data flow into `node` is prohibited. */
predicate isBarrierIn(Node node) { none() }
/**
* Holds if data flow into `node` is prohibited when the flow state is
* `state`
*/
predicate isBarrierIn(Node node, FlowState state) { none() }
/** Holds if data flow out of `node` is prohibited. */
predicate isBarrierOut(Node node) { none() }
/**
* Holds if data flow out of `node` is prohibited when the flow state is
* `state`
*/
predicate isBarrierOut(Node node, FlowState state) { none() }
/** Holds if data flow through nodes guarded by `guard` is prohibited. */
predicate isBarrierGuard(BarrierGuard guard) { none() }
@@ -128,7 +116,7 @@ abstract class Configuration extends string {
* Holds if an arbitrary number of implicit read steps of content `c` may be
* taken at `node`.
*/
predicate allowImplicitRead(Node node, Content c) { none() }
predicate allowImplicitRead(Node node, ContentSet c) { none() }
/**
* Gets the virtual dispatch branching limit when calculating field flow.
@@ -321,7 +309,7 @@ private class RetNodeEx extends NodeEx {
ReturnKindExt getKind() { result = this.asNode().(ReturnNodeExt).getKind() }
}
private predicate fullInBarrier(NodeEx node, Configuration config) {
private predicate inBarrier(NodeEx node, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierIn(n)
@@ -330,16 +318,7 @@ private predicate fullInBarrier(NodeEx node, Configuration config) {
)
}
private predicate stateInBarrier(NodeEx node, FlowState state, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierIn(n, state)
|
config.isSource(n, state)
)
}
private predicate fullOutBarrier(NodeEx node, Configuration config) {
private predicate outBarrier(NodeEx node, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierOut(n)
@@ -348,15 +327,6 @@ private predicate fullOutBarrier(NodeEx node, Configuration config) {
)
}
private predicate stateOutBarrier(NodeEx node, FlowState state, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierOut(n, state)
|
config.isSink(n, state)
)
}
pragma[nomagic]
private predicate fullBarrier(NodeEx node, Configuration config) {
exists(Node n | node.asNode() = n |
@@ -382,12 +352,6 @@ private predicate stateBarrier(NodeEx node, FlowState state, Configuration confi
exists(Node n | node.asNode() = n |
config.isBarrier(n, state)
or
config.isBarrierIn(n, state) and
not config.isSource(n, state)
or
config.isBarrierOut(n, state) and
not config.isSink(n, state)
or
exists(BarrierGuard g |
config.isBarrierGuard(g, state) and
n = g.getAGuardedNode()
@@ -420,8 +384,8 @@ private predicate sinkNode(NodeEx node, FlowState state, Configuration config) {
/** Provides the relevant barriers for a step from `node1` to `node2`. */
pragma[inline]
private predicate stepFilter(NodeEx node1, NodeEx node2, Configuration config) {
not fullOutBarrier(node1, config) and
not fullInBarrier(node2, config) and
not outBarrier(node1, config) and
not inBarrier(node2, config) and
not fullBarrier(node1, config) and
not fullBarrier(node2, config)
}
@@ -474,8 +438,6 @@ private predicate additionalLocalStateStep(
config.isAdditionalFlowStep(n1, s1, n2, s2) and
getNodeEnclosingCallable(n1) = getNodeEnclosingCallable(n2) and
stepFilter(node1, node2, config) and
not stateOutBarrier(node1, s1, config) and
not stateInBarrier(node2, s2, config) and
not stateBarrier(node1, s1, config) and
not stateBarrier(node2, s2, config)
)
@@ -517,16 +479,15 @@ private predicate additionalJumpStateStep(
config.isAdditionalFlowStep(n1, s1, n2, s2) and
getNodeEnclosingCallable(n1) != getNodeEnclosingCallable(n2) and
stepFilter(node1, node2, config) and
not stateOutBarrier(node1, s1, config) and
not stateInBarrier(node2, s2, config) and
not stateBarrier(node1, s1, config) and
not stateBarrier(node2, s2, config) and
not config.getAFeature() instanceof FeatureEqualSourceSinkCallContext
)
}
private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
read(node1.asNode(), c, node2.asNode()) and
pragma[nomagic]
private predicate readSet(NodeEx node1, ContentSet c, NodeEx node2, Configuration config) {
readSet(node1.asNode(), c, node2.asNode()) and
stepFilter(node1, node2, config)
or
exists(Node n |
@@ -536,6 +497,25 @@ private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration conf
)
}
// inline to reduce fan-out via `getAReadContent`
pragma[inline]
private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
exists(ContentSet cs |
readSet(node1, cs, node2, config) and
c = cs.getAReadContent()
)
}
// inline to reduce fan-out via `getAReadContent`
pragma[inline]
private predicate clearsContentEx(NodeEx n, Content c) {
exists(ContentSet cs |
clearsContentCached(n.asNode(), cs) and
c = cs.getAReadContent()
)
}
pragma[nomagic]
private predicate store(
NodeEx node1, TypedContent tc, NodeEx node2, DataFlowType contentType, Configuration config
) {
@@ -613,9 +593,9 @@ private module Stage1 {
)
or
// read
exists(Content c |
fwdFlowRead(c, node, cc, config) and
fwdFlowConsCand(c, config)
exists(ContentSet c |
fwdFlowReadSet(c, node, cc, config) and
fwdFlowConsCandSet(c, _, config)
)
or
// flow into a callable
@@ -639,10 +619,10 @@ private module Stage1 {
private predicate fwdFlow(NodeEx node, Configuration config) { fwdFlow(node, _, config) }
pragma[nomagic]
private predicate fwdFlowRead(Content c, NodeEx node, Cc cc, Configuration config) {
private predicate fwdFlowReadSet(ContentSet c, NodeEx node, Cc cc, Configuration config) {
exists(NodeEx mid |
fwdFlow(mid, cc, config) and
read(mid, c, node, config)
readSet(mid, c, node, config)
)
}
@@ -660,6 +640,16 @@ private module Stage1 {
)
}
/**
* Holds if `cs` may be interpreted in a read as the target of some store
* into `c`, in the flow covered by `fwdFlow`.
*/
pragma[nomagic]
private predicate fwdFlowConsCandSet(ContentSet cs, Content c, Configuration config) {
fwdFlowConsCand(c, config) and
c = cs.getAReadContent()
}
pragma[nomagic]
private predicate fwdFlowReturnPosition(ReturnPosition pos, Cc cc, Configuration config) {
exists(RetNodeEx ret |
@@ -752,9 +742,9 @@ private module Stage1 {
)
or
// read
exists(NodeEx mid, Content c |
read(node, c, mid, config) and
fwdFlowConsCand(c, pragma[only_bind_into](config)) and
exists(NodeEx mid, ContentSet c |
readSet(node, c, mid, config) and
fwdFlowConsCandSet(c, _, pragma[only_bind_into](config)) and
revFlow(mid, toReturn, pragma[only_bind_into](config))
)
or
@@ -780,10 +770,10 @@ private module Stage1 {
*/
pragma[nomagic]
private predicate revFlowConsCand(Content c, Configuration config) {
exists(NodeEx mid, NodeEx node |
exists(NodeEx mid, NodeEx node, ContentSet cs |
fwdFlow(node, pragma[only_bind_into](config)) and
read(node, c, mid, config) and
fwdFlowConsCand(c, pragma[only_bind_into](config)) and
readSet(node, cs, mid, config) and
fwdFlowConsCandSet(cs, c, pragma[only_bind_into](config)) and
revFlow(pragma[only_bind_into](mid), _, pragma[only_bind_into](config))
)
}
@@ -802,6 +792,7 @@ private module Stage1 {
* Holds if `c` is the target of both a read and a store in the flow covered
* by `revFlow`.
*/
pragma[nomagic]
private predicate revFlowIsReadAndStored(Content c, Configuration conf) {
revFlowConsCand(c, conf) and
revFlowStore(c, _, _, conf)
@@ -900,9 +891,9 @@ private module Stage1 {
pragma[nomagic]
predicate readStepCand(NodeEx n1, Content c, NodeEx n2, Configuration config) {
revFlowIsReadAndStored(c, pragma[only_bind_into](config)) and
revFlow(n2, pragma[only_bind_into](config)) and
read(n1, c, n2, pragma[only_bind_into](config))
revFlowIsReadAndStored(pragma[only_bind_into](c), pragma[only_bind_into](config)) and
read(n1, c, n2, pragma[only_bind_into](config)) and
revFlow(n2, pragma[only_bind_into](config))
}
pragma[nomagic]
@@ -912,14 +903,17 @@ private module Stage1 {
predicate revFlow(
NodeEx node, FlowState state, boolean toReturn, ApOption returnAp, Ap ap, Configuration config
) {
revFlow(node, toReturn, config) and exists(state) and exists(returnAp) and exists(ap)
revFlow(node, toReturn, pragma[only_bind_into](config)) and
exists(state) and
exists(returnAp) and
exists(ap)
}
private predicate throughFlowNodeCand(NodeEx node, Configuration config) {
revFlow(node, true, config) and
fwdFlow(node, true, config) and
not fullInBarrier(node, config) and
not fullOutBarrier(node, config)
not inBarrier(node, config) and
not outBarrier(node, config)
}
/** Holds if flow may return from `callable`. */
@@ -1014,8 +1008,8 @@ private predicate flowOutOfCallNodeCand1(
) {
viableReturnPosOutNodeCand1(call, ret.getReturnPosition(), out, config) and
Stage1::revFlow(ret, config) and
not fullOutBarrier(ret, config) and
not fullInBarrier(out, config)
not outBarrier(ret, config) and
not inBarrier(out, config)
}
pragma[nomagic]
@@ -1036,8 +1030,8 @@ private predicate flowIntoCallNodeCand1(
) {
viableParamArgNodeCand1(call, p, arg, config) and
Stage1::revFlow(p, config) and
not fullOutBarrier(arg, config) and
not fullInBarrier(p, config)
not outBarrier(arg, config) and
not inBarrier(p, config)
}
/**
@@ -1189,7 +1183,7 @@ private module Stage2 {
bindingset[node, state, ap, config]
private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
PrevStage::revFlowState(state, config) and
PrevStage::revFlowState(state, pragma[only_bind_into](config)) and
exists(ap) and
not stateBarrier(node, state, config)
}
@@ -1614,7 +1608,7 @@ private module Stage2 {
Configuration config
) {
exists(Ap ap2, Content c |
store(node1, tc, node2, contentType, config) and
PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
revFlowConsCand(ap2, c, ap1, config)
)
@@ -1769,9 +1763,9 @@ private module LocalFlowBigStep {
or
node.asNode() instanceof OutNodeExt
or
store(_, _, node, _, config)
Stage2::storeStepCand(_, _, _, node, _, config)
or
read(_, _, node, config)
Stage2::readStepCand(_, _, node, config)
or
node instanceof FlowCheckNode
or
@@ -1792,8 +1786,8 @@ private module LocalFlowBigStep {
additionalJumpStep(node, next, config) or
flowIntoCallNodeCand1(_, node, next, config) or
flowOutOfCallNodeCand1(_, node, next, config) or
store(node, _, next, _, config) or
read(node, _, next, config)
Stage2::storeStepCand(node, _, _, next, _, config) or
Stage2::readStepCand(node, _, next, config)
)
or
exists(NodeEx next, FlowState s | Stage2::revFlow(next, s, config) |
@@ -1966,7 +1960,24 @@ private module Stage3 {
private predicate flowIntoCall = flowIntoCallNodeCand2/5;
pragma[nomagic]
private predicate clear(NodeEx node, Ap ap) { ap.isClearedAt(node.asNode()) }
private predicate clearSet(NodeEx node, ContentSet c, Configuration config) {
PrevStage::revFlow(node, config) and
clearsContentCached(node.asNode(), c)
}
pragma[nomagic]
private predicate clearContent(NodeEx node, Content c, Configuration config) {
exists(ContentSet cs |
PrevStage::readStepCand(_, pragma[only_bind_into](c), _, pragma[only_bind_into](config)) and
c = cs.getAReadContent() and
clearSet(node, cs, pragma[only_bind_into](config))
)
}
pragma[nomagic]
private predicate clear(NodeEx node, Ap ap, Configuration config) {
clearContent(node, ap.getHead().getContent(), config)
}
pragma[nomagic]
private predicate castingNodeEx(NodeEx node) { node.asNode() instanceof CastingNode }
@@ -1975,7 +1986,7 @@ private module Stage3 {
private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
exists(state) and
exists(config) and
not clear(node, ap) and
not clear(node, ap, config) and
if castingNodeEx(node) then compatibleTypes(node.getDataFlowType(), ap.getType()) else any()
}
@@ -2403,7 +2414,7 @@ private module Stage3 {
Configuration config
) {
exists(Ap ap2, Content c |
store(node1, tc, node2, contentType, config) and
PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
revFlowConsCand(ap2, c, ap1, config)
)
@@ -3230,7 +3241,7 @@ private module Stage4 {
Configuration config
) {
exists(Ap ap2, Content c |
store(node1, tc, node2, contentType, config) and
PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
revFlowConsCand(ap2, c, ap1, config)
)
@@ -4242,7 +4253,7 @@ private module Subpaths {
exists(NodeEx n1, NodeEx n2 | n1 = n.getNodeEx() and n2 = result.getNodeEx() |
localFlowBigStep(n1, _, n2, _, _, _, _, _) or
store(n1, _, n2, _, _) or
read(n1, _, n2, _)
readSet(n1, _, n2, _)
)
}
@@ -4597,7 +4608,7 @@ private module FlowExploration {
or
exists(PartialPathNodeRev mid |
revPartialPathStep(mid, node, state, sc1, sc2, sc3, ap, config) and
not clearsContentCached(node.asNode(), ap.getHead()) and
not clearsContentEx(node, ap.getHead()) and
not fullBarrier(node, config) and
not stateBarrier(node, state, config) and
distSink(node.getEnclosingCallable(), config) <= config.explorationLimit()
@@ -4613,7 +4624,7 @@ private module FlowExploration {
partialPathStep(mid, node, state, cc, sc1, sc2, sc3, ap, config) and
not fullBarrier(node, config) and
not stateBarrier(node, state, config) and
not clearsContentCached(node.asNode(), ap.getHead().getContent()) and
not clearsContentEx(node, ap.getHead().getContent()) and
if node.asNode() instanceof CastingNode
then compatibleTypes(node.getDataFlowType(), ap.getType())
else any()
@@ -5047,6 +5058,7 @@ private module FlowExploration {
)
}
pragma[nomagic]
private predicate revPartialPathStep(
PartialPathNodeRev mid, NodeEx node, FlowState state, TRevSummaryCtx1 sc1, TRevSummaryCtx2 sc2,
TRevSummaryCtx3 sc3, RevPartialAccessPath ap, Configuration config

View File

@@ -87,21 +87,9 @@ abstract class Configuration extends string {
/** Holds if data flow into `node` is prohibited. */
predicate isBarrierIn(Node node) { none() }
/**
* Holds if data flow into `node` is prohibited when the flow state is
* `state`
*/
predicate isBarrierIn(Node node, FlowState state) { none() }
/** Holds if data flow out of `node` is prohibited. */
predicate isBarrierOut(Node node) { none() }
/**
* Holds if data flow out of `node` is prohibited when the flow state is
* `state`
*/
predicate isBarrierOut(Node node, FlowState state) { none() }
/** Holds if data flow through nodes guarded by `guard` is prohibited. */
predicate isBarrierGuard(BarrierGuard guard) { none() }
@@ -128,7 +116,7 @@ abstract class Configuration extends string {
* Holds if an arbitrary number of implicit read steps of content `c` may be
* taken at `node`.
*/
predicate allowImplicitRead(Node node, Content c) { none() }
predicate allowImplicitRead(Node node, ContentSet c) { none() }
/**
* Gets the virtual dispatch branching limit when calculating field flow.
@@ -321,7 +309,7 @@ private class RetNodeEx extends NodeEx {
ReturnKindExt getKind() { result = this.asNode().(ReturnNodeExt).getKind() }
}
private predicate fullInBarrier(NodeEx node, Configuration config) {
private predicate inBarrier(NodeEx node, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierIn(n)
@@ -330,16 +318,7 @@ private predicate fullInBarrier(NodeEx node, Configuration config) {
)
}
private predicate stateInBarrier(NodeEx node, FlowState state, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierIn(n, state)
|
config.isSource(n, state)
)
}
private predicate fullOutBarrier(NodeEx node, Configuration config) {
private predicate outBarrier(NodeEx node, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierOut(n)
@@ -348,15 +327,6 @@ private predicate fullOutBarrier(NodeEx node, Configuration config) {
)
}
private predicate stateOutBarrier(NodeEx node, FlowState state, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierOut(n, state)
|
config.isSink(n, state)
)
}
pragma[nomagic]
private predicate fullBarrier(NodeEx node, Configuration config) {
exists(Node n | node.asNode() = n |
@@ -382,12 +352,6 @@ private predicate stateBarrier(NodeEx node, FlowState state, Configuration confi
exists(Node n | node.asNode() = n |
config.isBarrier(n, state)
or
config.isBarrierIn(n, state) and
not config.isSource(n, state)
or
config.isBarrierOut(n, state) and
not config.isSink(n, state)
or
exists(BarrierGuard g |
config.isBarrierGuard(g, state) and
n = g.getAGuardedNode()
@@ -420,8 +384,8 @@ private predicate sinkNode(NodeEx node, FlowState state, Configuration config) {
/** Provides the relevant barriers for a step from `node1` to `node2`. */
pragma[inline]
private predicate stepFilter(NodeEx node1, NodeEx node2, Configuration config) {
not fullOutBarrier(node1, config) and
not fullInBarrier(node2, config) and
not outBarrier(node1, config) and
not inBarrier(node2, config) and
not fullBarrier(node1, config) and
not fullBarrier(node2, config)
}
@@ -474,8 +438,6 @@ private predicate additionalLocalStateStep(
config.isAdditionalFlowStep(n1, s1, n2, s2) and
getNodeEnclosingCallable(n1) = getNodeEnclosingCallable(n2) and
stepFilter(node1, node2, config) and
not stateOutBarrier(node1, s1, config) and
not stateInBarrier(node2, s2, config) and
not stateBarrier(node1, s1, config) and
not stateBarrier(node2, s2, config)
)
@@ -517,16 +479,15 @@ private predicate additionalJumpStateStep(
config.isAdditionalFlowStep(n1, s1, n2, s2) and
getNodeEnclosingCallable(n1) != getNodeEnclosingCallable(n2) and
stepFilter(node1, node2, config) and
not stateOutBarrier(node1, s1, config) and
not stateInBarrier(node2, s2, config) and
not stateBarrier(node1, s1, config) and
not stateBarrier(node2, s2, config) and
not config.getAFeature() instanceof FeatureEqualSourceSinkCallContext
)
}
private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
read(node1.asNode(), c, node2.asNode()) and
pragma[nomagic]
private predicate readSet(NodeEx node1, ContentSet c, NodeEx node2, Configuration config) {
readSet(node1.asNode(), c, node2.asNode()) and
stepFilter(node1, node2, config)
or
exists(Node n |
@@ -536,6 +497,25 @@ private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration conf
)
}
// inline to reduce fan-out via `getAReadContent`
pragma[inline]
private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
exists(ContentSet cs |
readSet(node1, cs, node2, config) and
c = cs.getAReadContent()
)
}
// inline to reduce fan-out via `getAReadContent`
pragma[inline]
private predicate clearsContentEx(NodeEx n, Content c) {
exists(ContentSet cs |
clearsContentCached(n.asNode(), cs) and
c = cs.getAReadContent()
)
}
pragma[nomagic]
private predicate store(
NodeEx node1, TypedContent tc, NodeEx node2, DataFlowType contentType, Configuration config
) {
@@ -613,9 +593,9 @@ private module Stage1 {
)
or
// read
exists(Content c |
fwdFlowRead(c, node, cc, config) and
fwdFlowConsCand(c, config)
exists(ContentSet c |
fwdFlowReadSet(c, node, cc, config) and
fwdFlowConsCandSet(c, _, config)
)
or
// flow into a callable
@@ -639,10 +619,10 @@ private module Stage1 {
private predicate fwdFlow(NodeEx node, Configuration config) { fwdFlow(node, _, config) }
pragma[nomagic]
private predicate fwdFlowRead(Content c, NodeEx node, Cc cc, Configuration config) {
private predicate fwdFlowReadSet(ContentSet c, NodeEx node, Cc cc, Configuration config) {
exists(NodeEx mid |
fwdFlow(mid, cc, config) and
read(mid, c, node, config)
readSet(mid, c, node, config)
)
}
@@ -660,6 +640,16 @@ private module Stage1 {
)
}
/**
* Holds if `cs` may be interpreted in a read as the target of some store
* into `c`, in the flow covered by `fwdFlow`.
*/
pragma[nomagic]
private predicate fwdFlowConsCandSet(ContentSet cs, Content c, Configuration config) {
fwdFlowConsCand(c, config) and
c = cs.getAReadContent()
}
pragma[nomagic]
private predicate fwdFlowReturnPosition(ReturnPosition pos, Cc cc, Configuration config) {
exists(RetNodeEx ret |
@@ -752,9 +742,9 @@ private module Stage1 {
)
or
// read
exists(NodeEx mid, Content c |
read(node, c, mid, config) and
fwdFlowConsCand(c, pragma[only_bind_into](config)) and
exists(NodeEx mid, ContentSet c |
readSet(node, c, mid, config) and
fwdFlowConsCandSet(c, _, pragma[only_bind_into](config)) and
revFlow(mid, toReturn, pragma[only_bind_into](config))
)
or
@@ -780,10 +770,10 @@ private module Stage1 {
*/
pragma[nomagic]
private predicate revFlowConsCand(Content c, Configuration config) {
exists(NodeEx mid, NodeEx node |
exists(NodeEx mid, NodeEx node, ContentSet cs |
fwdFlow(node, pragma[only_bind_into](config)) and
read(node, c, mid, config) and
fwdFlowConsCand(c, pragma[only_bind_into](config)) and
readSet(node, cs, mid, config) and
fwdFlowConsCandSet(cs, c, pragma[only_bind_into](config)) and
revFlow(pragma[only_bind_into](mid), _, pragma[only_bind_into](config))
)
}
@@ -802,6 +792,7 @@ private module Stage1 {
* Holds if `c` is the target of both a read and a store in the flow covered
* by `revFlow`.
*/
pragma[nomagic]
private predicate revFlowIsReadAndStored(Content c, Configuration conf) {
revFlowConsCand(c, conf) and
revFlowStore(c, _, _, conf)
@@ -900,9 +891,9 @@ private module Stage1 {
pragma[nomagic]
predicate readStepCand(NodeEx n1, Content c, NodeEx n2, Configuration config) {
revFlowIsReadAndStored(c, pragma[only_bind_into](config)) and
revFlow(n2, pragma[only_bind_into](config)) and
read(n1, c, n2, pragma[only_bind_into](config))
revFlowIsReadAndStored(pragma[only_bind_into](c), pragma[only_bind_into](config)) and
read(n1, c, n2, pragma[only_bind_into](config)) and
revFlow(n2, pragma[only_bind_into](config))
}
pragma[nomagic]
@@ -912,14 +903,17 @@ private module Stage1 {
predicate revFlow(
NodeEx node, FlowState state, boolean toReturn, ApOption returnAp, Ap ap, Configuration config
) {
revFlow(node, toReturn, config) and exists(state) and exists(returnAp) and exists(ap)
revFlow(node, toReturn, pragma[only_bind_into](config)) and
exists(state) and
exists(returnAp) and
exists(ap)
}
private predicate throughFlowNodeCand(NodeEx node, Configuration config) {
revFlow(node, true, config) and
fwdFlow(node, true, config) and
not fullInBarrier(node, config) and
not fullOutBarrier(node, config)
not inBarrier(node, config) and
not outBarrier(node, config)
}
/** Holds if flow may return from `callable`. */
@@ -1014,8 +1008,8 @@ private predicate flowOutOfCallNodeCand1(
) {
viableReturnPosOutNodeCand1(call, ret.getReturnPosition(), out, config) and
Stage1::revFlow(ret, config) and
not fullOutBarrier(ret, config) and
not fullInBarrier(out, config)
not outBarrier(ret, config) and
not inBarrier(out, config)
}
pragma[nomagic]
@@ -1036,8 +1030,8 @@ private predicate flowIntoCallNodeCand1(
) {
viableParamArgNodeCand1(call, p, arg, config) and
Stage1::revFlow(p, config) and
not fullOutBarrier(arg, config) and
not fullInBarrier(p, config)
not outBarrier(arg, config) and
not inBarrier(p, config)
}
/**
@@ -1189,7 +1183,7 @@ private module Stage2 {
bindingset[node, state, ap, config]
private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
PrevStage::revFlowState(state, config) and
PrevStage::revFlowState(state, pragma[only_bind_into](config)) and
exists(ap) and
not stateBarrier(node, state, config)
}
@@ -1614,7 +1608,7 @@ private module Stage2 {
Configuration config
) {
exists(Ap ap2, Content c |
store(node1, tc, node2, contentType, config) and
PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
revFlowConsCand(ap2, c, ap1, config)
)
@@ -1769,9 +1763,9 @@ private module LocalFlowBigStep {
or
node.asNode() instanceof OutNodeExt
or
store(_, _, node, _, config)
Stage2::storeStepCand(_, _, _, node, _, config)
or
read(_, _, node, config)
Stage2::readStepCand(_, _, node, config)
or
node instanceof FlowCheckNode
or
@@ -1792,8 +1786,8 @@ private module LocalFlowBigStep {
additionalJumpStep(node, next, config) or
flowIntoCallNodeCand1(_, node, next, config) or
flowOutOfCallNodeCand1(_, node, next, config) or
store(node, _, next, _, config) or
read(node, _, next, config)
Stage2::storeStepCand(node, _, _, next, _, config) or
Stage2::readStepCand(node, _, next, config)
)
or
exists(NodeEx next, FlowState s | Stage2::revFlow(next, s, config) |
@@ -1966,7 +1960,24 @@ private module Stage3 {
private predicate flowIntoCall = flowIntoCallNodeCand2/5;
pragma[nomagic]
private predicate clear(NodeEx node, Ap ap) { ap.isClearedAt(node.asNode()) }
private predicate clearSet(NodeEx node, ContentSet c, Configuration config) {
PrevStage::revFlow(node, config) and
clearsContentCached(node.asNode(), c)
}
pragma[nomagic]
private predicate clearContent(NodeEx node, Content c, Configuration config) {
exists(ContentSet cs |
PrevStage::readStepCand(_, pragma[only_bind_into](c), _, pragma[only_bind_into](config)) and
c = cs.getAReadContent() and
clearSet(node, cs, pragma[only_bind_into](config))
)
}
pragma[nomagic]
private predicate clear(NodeEx node, Ap ap, Configuration config) {
clearContent(node, ap.getHead().getContent(), config)
}
pragma[nomagic]
private predicate castingNodeEx(NodeEx node) { node.asNode() instanceof CastingNode }
@@ -1975,7 +1986,7 @@ private module Stage3 {
private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
exists(state) and
exists(config) and
not clear(node, ap) and
not clear(node, ap, config) and
if castingNodeEx(node) then compatibleTypes(node.getDataFlowType(), ap.getType()) else any()
}
@@ -2403,7 +2414,7 @@ private module Stage3 {
Configuration config
) {
exists(Ap ap2, Content c |
store(node1, tc, node2, contentType, config) and
PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
revFlowConsCand(ap2, c, ap1, config)
)
@@ -3230,7 +3241,7 @@ private module Stage4 {
Configuration config
) {
exists(Ap ap2, Content c |
store(node1, tc, node2, contentType, config) and
PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
revFlowConsCand(ap2, c, ap1, config)
)
@@ -4242,7 +4253,7 @@ private module Subpaths {
exists(NodeEx n1, NodeEx n2 | n1 = n.getNodeEx() and n2 = result.getNodeEx() |
localFlowBigStep(n1, _, n2, _, _, _, _, _) or
store(n1, _, n2, _, _) or
read(n1, _, n2, _)
readSet(n1, _, n2, _)
)
}
@@ -4597,7 +4608,7 @@ private module FlowExploration {
or
exists(PartialPathNodeRev mid |
revPartialPathStep(mid, node, state, sc1, sc2, sc3, ap, config) and
not clearsContentCached(node.asNode(), ap.getHead()) and
not clearsContentEx(node, ap.getHead()) and
not fullBarrier(node, config) and
not stateBarrier(node, state, config) and
distSink(node.getEnclosingCallable(), config) <= config.explorationLimit()
@@ -4613,7 +4624,7 @@ private module FlowExploration {
partialPathStep(mid, node, state, cc, sc1, sc2, sc3, ap, config) and
not fullBarrier(node, config) and
not stateBarrier(node, state, config) and
not clearsContentCached(node.asNode(), ap.getHead().getContent()) and
not clearsContentEx(node, ap.getHead().getContent()) and
if node.asNode() instanceof CastingNode
then compatibleTypes(node.getDataFlowType(), ap.getType())
else any()
@@ -5047,6 +5058,7 @@ private module FlowExploration {
)
}
pragma[nomagic]
private predicate revPartialPathStep(
PartialPathNodeRev mid, NodeEx node, FlowState state, TRevSummaryCtx1 sc1, TRevSummaryCtx2 sc2,
TRevSummaryCtx3 sc3, RevPartialAccessPath ap, Configuration config

View File

@@ -87,21 +87,9 @@ abstract class Configuration extends string {
/** Holds if data flow into `node` is prohibited. */
predicate isBarrierIn(Node node) { none() }
/**
* Holds if data flow into `node` is prohibited when the flow state is
* `state`
*/
predicate isBarrierIn(Node node, FlowState state) { none() }
/** Holds if data flow out of `node` is prohibited. */
predicate isBarrierOut(Node node) { none() }
/**
* Holds if data flow out of `node` is prohibited when the flow state is
* `state`
*/
predicate isBarrierOut(Node node, FlowState state) { none() }
/** Holds if data flow through nodes guarded by `guard` is prohibited. */
predicate isBarrierGuard(BarrierGuard guard) { none() }
@@ -128,7 +116,7 @@ abstract class Configuration extends string {
* Holds if an arbitrary number of implicit read steps of content `c` may be
* taken at `node`.
*/
predicate allowImplicitRead(Node node, Content c) { none() }
predicate allowImplicitRead(Node node, ContentSet c) { none() }
/**
* Gets the virtual dispatch branching limit when calculating field flow.
@@ -321,7 +309,7 @@ private class RetNodeEx extends NodeEx {
ReturnKindExt getKind() { result = this.asNode().(ReturnNodeExt).getKind() }
}
private predicate fullInBarrier(NodeEx node, Configuration config) {
private predicate inBarrier(NodeEx node, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierIn(n)
@@ -330,16 +318,7 @@ private predicate fullInBarrier(NodeEx node, Configuration config) {
)
}
private predicate stateInBarrier(NodeEx node, FlowState state, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierIn(n, state)
|
config.isSource(n, state)
)
}
private predicate fullOutBarrier(NodeEx node, Configuration config) {
private predicate outBarrier(NodeEx node, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierOut(n)
@@ -348,15 +327,6 @@ private predicate fullOutBarrier(NodeEx node, Configuration config) {
)
}
private predicate stateOutBarrier(NodeEx node, FlowState state, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierOut(n, state)
|
config.isSink(n, state)
)
}
pragma[nomagic]
private predicate fullBarrier(NodeEx node, Configuration config) {
exists(Node n | node.asNode() = n |
@@ -382,12 +352,6 @@ private predicate stateBarrier(NodeEx node, FlowState state, Configuration confi
exists(Node n | node.asNode() = n |
config.isBarrier(n, state)
or
config.isBarrierIn(n, state) and
not config.isSource(n, state)
or
config.isBarrierOut(n, state) and
not config.isSink(n, state)
or
exists(BarrierGuard g |
config.isBarrierGuard(g, state) and
n = g.getAGuardedNode()
@@ -420,8 +384,8 @@ private predicate sinkNode(NodeEx node, FlowState state, Configuration config) {
/** Provides the relevant barriers for a step from `node1` to `node2`. */
pragma[inline]
private predicate stepFilter(NodeEx node1, NodeEx node2, Configuration config) {
not fullOutBarrier(node1, config) and
not fullInBarrier(node2, config) and
not outBarrier(node1, config) and
not inBarrier(node2, config) and
not fullBarrier(node1, config) and
not fullBarrier(node2, config)
}
@@ -474,8 +438,6 @@ private predicate additionalLocalStateStep(
config.isAdditionalFlowStep(n1, s1, n2, s2) and
getNodeEnclosingCallable(n1) = getNodeEnclosingCallable(n2) and
stepFilter(node1, node2, config) and
not stateOutBarrier(node1, s1, config) and
not stateInBarrier(node2, s2, config) and
not stateBarrier(node1, s1, config) and
not stateBarrier(node2, s2, config)
)
@@ -517,16 +479,15 @@ private predicate additionalJumpStateStep(
config.isAdditionalFlowStep(n1, s1, n2, s2) and
getNodeEnclosingCallable(n1) != getNodeEnclosingCallable(n2) and
stepFilter(node1, node2, config) and
not stateOutBarrier(node1, s1, config) and
not stateInBarrier(node2, s2, config) and
not stateBarrier(node1, s1, config) and
not stateBarrier(node2, s2, config) and
not config.getAFeature() instanceof FeatureEqualSourceSinkCallContext
)
}
private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
read(node1.asNode(), c, node2.asNode()) and
pragma[nomagic]
private predicate readSet(NodeEx node1, ContentSet c, NodeEx node2, Configuration config) {
readSet(node1.asNode(), c, node2.asNode()) and
stepFilter(node1, node2, config)
or
exists(Node n |
@@ -536,6 +497,25 @@ private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration conf
)
}
// inline to reduce fan-out via `getAReadContent`
pragma[inline]
private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
exists(ContentSet cs |
readSet(node1, cs, node2, config) and
c = cs.getAReadContent()
)
}
// inline to reduce fan-out via `getAReadContent`
pragma[inline]
private predicate clearsContentEx(NodeEx n, Content c) {
exists(ContentSet cs |
clearsContentCached(n.asNode(), cs) and
c = cs.getAReadContent()
)
}
pragma[nomagic]
private predicate store(
NodeEx node1, TypedContent tc, NodeEx node2, DataFlowType contentType, Configuration config
) {
@@ -613,9 +593,9 @@ private module Stage1 {
)
or
// read
exists(Content c |
fwdFlowRead(c, node, cc, config) and
fwdFlowConsCand(c, config)
exists(ContentSet c |
fwdFlowReadSet(c, node, cc, config) and
fwdFlowConsCandSet(c, _, config)
)
or
// flow into a callable
@@ -639,10 +619,10 @@ private module Stage1 {
private predicate fwdFlow(NodeEx node, Configuration config) { fwdFlow(node, _, config) }
pragma[nomagic]
private predicate fwdFlowRead(Content c, NodeEx node, Cc cc, Configuration config) {
private predicate fwdFlowReadSet(ContentSet c, NodeEx node, Cc cc, Configuration config) {
exists(NodeEx mid |
fwdFlow(mid, cc, config) and
read(mid, c, node, config)
readSet(mid, c, node, config)
)
}
@@ -660,6 +640,16 @@ private module Stage1 {
)
}
/**
* Holds if `cs` may be interpreted in a read as the target of some store
* into `c`, in the flow covered by `fwdFlow`.
*/
pragma[nomagic]
private predicate fwdFlowConsCandSet(ContentSet cs, Content c, Configuration config) {
fwdFlowConsCand(c, config) and
c = cs.getAReadContent()
}
pragma[nomagic]
private predicate fwdFlowReturnPosition(ReturnPosition pos, Cc cc, Configuration config) {
exists(RetNodeEx ret |
@@ -752,9 +742,9 @@ private module Stage1 {
)
or
// read
exists(NodeEx mid, Content c |
read(node, c, mid, config) and
fwdFlowConsCand(c, pragma[only_bind_into](config)) and
exists(NodeEx mid, ContentSet c |
readSet(node, c, mid, config) and
fwdFlowConsCandSet(c, _, pragma[only_bind_into](config)) and
revFlow(mid, toReturn, pragma[only_bind_into](config))
)
or
@@ -780,10 +770,10 @@ private module Stage1 {
*/
pragma[nomagic]
private predicate revFlowConsCand(Content c, Configuration config) {
exists(NodeEx mid, NodeEx node |
exists(NodeEx mid, NodeEx node, ContentSet cs |
fwdFlow(node, pragma[only_bind_into](config)) and
read(node, c, mid, config) and
fwdFlowConsCand(c, pragma[only_bind_into](config)) and
readSet(node, cs, mid, config) and
fwdFlowConsCandSet(cs, c, pragma[only_bind_into](config)) and
revFlow(pragma[only_bind_into](mid), _, pragma[only_bind_into](config))
)
}
@@ -802,6 +792,7 @@ private module Stage1 {
* Holds if `c` is the target of both a read and a store in the flow covered
* by `revFlow`.
*/
pragma[nomagic]
private predicate revFlowIsReadAndStored(Content c, Configuration conf) {
revFlowConsCand(c, conf) and
revFlowStore(c, _, _, conf)
@@ -900,9 +891,9 @@ private module Stage1 {
pragma[nomagic]
predicate readStepCand(NodeEx n1, Content c, NodeEx n2, Configuration config) {
revFlowIsReadAndStored(c, pragma[only_bind_into](config)) and
revFlow(n2, pragma[only_bind_into](config)) and
read(n1, c, n2, pragma[only_bind_into](config))
revFlowIsReadAndStored(pragma[only_bind_into](c), pragma[only_bind_into](config)) and
read(n1, c, n2, pragma[only_bind_into](config)) and
revFlow(n2, pragma[only_bind_into](config))
}
pragma[nomagic]
@@ -912,14 +903,17 @@ private module Stage1 {
predicate revFlow(
NodeEx node, FlowState state, boolean toReturn, ApOption returnAp, Ap ap, Configuration config
) {
revFlow(node, toReturn, config) and exists(state) and exists(returnAp) and exists(ap)
revFlow(node, toReturn, pragma[only_bind_into](config)) and
exists(state) and
exists(returnAp) and
exists(ap)
}
private predicate throughFlowNodeCand(NodeEx node, Configuration config) {
revFlow(node, true, config) and
fwdFlow(node, true, config) and
not fullInBarrier(node, config) and
not fullOutBarrier(node, config)
not inBarrier(node, config) and
not outBarrier(node, config)
}
/** Holds if flow may return from `callable`. */
@@ -1014,8 +1008,8 @@ private predicate flowOutOfCallNodeCand1(
) {
viableReturnPosOutNodeCand1(call, ret.getReturnPosition(), out, config) and
Stage1::revFlow(ret, config) and
not fullOutBarrier(ret, config) and
not fullInBarrier(out, config)
not outBarrier(ret, config) and
not inBarrier(out, config)
}
pragma[nomagic]
@@ -1036,8 +1030,8 @@ private predicate flowIntoCallNodeCand1(
) {
viableParamArgNodeCand1(call, p, arg, config) and
Stage1::revFlow(p, config) and
not fullOutBarrier(arg, config) and
not fullInBarrier(p, config)
not outBarrier(arg, config) and
not inBarrier(p, config)
}
/**
@@ -1189,7 +1183,7 @@ private module Stage2 {
bindingset[node, state, ap, config]
private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
PrevStage::revFlowState(state, config) and
PrevStage::revFlowState(state, pragma[only_bind_into](config)) and
exists(ap) and
not stateBarrier(node, state, config)
}
@@ -1614,7 +1608,7 @@ private module Stage2 {
Configuration config
) {
exists(Ap ap2, Content c |
store(node1, tc, node2, contentType, config) and
PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
revFlowConsCand(ap2, c, ap1, config)
)
@@ -1769,9 +1763,9 @@ private module LocalFlowBigStep {
or
node.asNode() instanceof OutNodeExt
or
store(_, _, node, _, config)
Stage2::storeStepCand(_, _, _, node, _, config)
or
read(_, _, node, config)
Stage2::readStepCand(_, _, node, config)
or
node instanceof FlowCheckNode
or
@@ -1792,8 +1786,8 @@ private module LocalFlowBigStep {
additionalJumpStep(node, next, config) or
flowIntoCallNodeCand1(_, node, next, config) or
flowOutOfCallNodeCand1(_, node, next, config) or
store(node, _, next, _, config) or
read(node, _, next, config)
Stage2::storeStepCand(node, _, _, next, _, config) or
Stage2::readStepCand(node, _, next, config)
)
or
exists(NodeEx next, FlowState s | Stage2::revFlow(next, s, config) |
@@ -1966,7 +1960,24 @@ private module Stage3 {
private predicate flowIntoCall = flowIntoCallNodeCand2/5;
pragma[nomagic]
private predicate clear(NodeEx node, Ap ap) { ap.isClearedAt(node.asNode()) }
private predicate clearSet(NodeEx node, ContentSet c, Configuration config) {
PrevStage::revFlow(node, config) and
clearsContentCached(node.asNode(), c)
}
pragma[nomagic]
private predicate clearContent(NodeEx node, Content c, Configuration config) {
exists(ContentSet cs |
PrevStage::readStepCand(_, pragma[only_bind_into](c), _, pragma[only_bind_into](config)) and
c = cs.getAReadContent() and
clearSet(node, cs, pragma[only_bind_into](config))
)
}
pragma[nomagic]
private predicate clear(NodeEx node, Ap ap, Configuration config) {
clearContent(node, ap.getHead().getContent(), config)
}
pragma[nomagic]
private predicate castingNodeEx(NodeEx node) { node.asNode() instanceof CastingNode }
@@ -1975,7 +1986,7 @@ private module Stage3 {
private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
exists(state) and
exists(config) and
not clear(node, ap) and
not clear(node, ap, config) and
if castingNodeEx(node) then compatibleTypes(node.getDataFlowType(), ap.getType()) else any()
}
@@ -2403,7 +2414,7 @@ private module Stage3 {
Configuration config
) {
exists(Ap ap2, Content c |
store(node1, tc, node2, contentType, config) and
PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
revFlowConsCand(ap2, c, ap1, config)
)
@@ -3230,7 +3241,7 @@ private module Stage4 {
Configuration config
) {
exists(Ap ap2, Content c |
store(node1, tc, node2, contentType, config) and
PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
revFlowConsCand(ap2, c, ap1, config)
)
@@ -4242,7 +4253,7 @@ private module Subpaths {
exists(NodeEx n1, NodeEx n2 | n1 = n.getNodeEx() and n2 = result.getNodeEx() |
localFlowBigStep(n1, _, n2, _, _, _, _, _) or
store(n1, _, n2, _, _) or
read(n1, _, n2, _)
readSet(n1, _, n2, _)
)
}
@@ -4597,7 +4608,7 @@ private module FlowExploration {
or
exists(PartialPathNodeRev mid |
revPartialPathStep(mid, node, state, sc1, sc2, sc3, ap, config) and
not clearsContentCached(node.asNode(), ap.getHead()) and
not clearsContentEx(node, ap.getHead()) and
not fullBarrier(node, config) and
not stateBarrier(node, state, config) and
distSink(node.getEnclosingCallable(), config) <= config.explorationLimit()
@@ -4613,7 +4624,7 @@ private module FlowExploration {
partialPathStep(mid, node, state, cc, sc1, sc2, sc3, ap, config) and
not fullBarrier(node, config) and
not stateBarrier(node, state, config) and
not clearsContentCached(node.asNode(), ap.getHead().getContent()) and
not clearsContentEx(node, ap.getHead().getContent()) and
if node.asNode() instanceof CastingNode
then compatibleTypes(node.getDataFlowType(), ap.getType())
else any()
@@ -5047,6 +5058,7 @@ private module FlowExploration {
)
}
pragma[nomagic]
private predicate revPartialPathStep(
PartialPathNodeRev mid, NodeEx node, FlowState state, TRevSummaryCtx1 sc1, TRevSummaryCtx2 sc2,
TRevSummaryCtx3 sc3, RevPartialAccessPath ap, Configuration config

View File

@@ -87,21 +87,9 @@ abstract class Configuration extends string {
/** Holds if data flow into `node` is prohibited. */
predicate isBarrierIn(Node node) { none() }
/**
* Holds if data flow into `node` is prohibited when the flow state is
* `state`
*/
predicate isBarrierIn(Node node, FlowState state) { none() }
/** Holds if data flow out of `node` is prohibited. */
predicate isBarrierOut(Node node) { none() }
/**
* Holds if data flow out of `node` is prohibited when the flow state is
* `state`
*/
predicate isBarrierOut(Node node, FlowState state) { none() }
/** Holds if data flow through nodes guarded by `guard` is prohibited. */
predicate isBarrierGuard(BarrierGuard guard) { none() }
@@ -128,7 +116,7 @@ abstract class Configuration extends string {
* Holds if an arbitrary number of implicit read steps of content `c` may be
* taken at `node`.
*/
predicate allowImplicitRead(Node node, Content c) { none() }
predicate allowImplicitRead(Node node, ContentSet c) { none() }
/**
* Gets the virtual dispatch branching limit when calculating field flow.
@@ -321,7 +309,7 @@ private class RetNodeEx extends NodeEx {
ReturnKindExt getKind() { result = this.asNode().(ReturnNodeExt).getKind() }
}
private predicate fullInBarrier(NodeEx node, Configuration config) {
private predicate inBarrier(NodeEx node, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierIn(n)
@@ -330,16 +318,7 @@ private predicate fullInBarrier(NodeEx node, Configuration config) {
)
}
private predicate stateInBarrier(NodeEx node, FlowState state, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierIn(n, state)
|
config.isSource(n, state)
)
}
private predicate fullOutBarrier(NodeEx node, Configuration config) {
private predicate outBarrier(NodeEx node, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierOut(n)
@@ -348,15 +327,6 @@ private predicate fullOutBarrier(NodeEx node, Configuration config) {
)
}
private predicate stateOutBarrier(NodeEx node, FlowState state, Configuration config) {
exists(Node n |
node.asNode() = n and
config.isBarrierOut(n, state)
|
config.isSink(n, state)
)
}
pragma[nomagic]
private predicate fullBarrier(NodeEx node, Configuration config) {
exists(Node n | node.asNode() = n |
@@ -382,12 +352,6 @@ private predicate stateBarrier(NodeEx node, FlowState state, Configuration confi
exists(Node n | node.asNode() = n |
config.isBarrier(n, state)
or
config.isBarrierIn(n, state) and
not config.isSource(n, state)
or
config.isBarrierOut(n, state) and
not config.isSink(n, state)
or
exists(BarrierGuard g |
config.isBarrierGuard(g, state) and
n = g.getAGuardedNode()
@@ -420,8 +384,8 @@ private predicate sinkNode(NodeEx node, FlowState state, Configuration config) {
/** Provides the relevant barriers for a step from `node1` to `node2`. */
pragma[inline]
private predicate stepFilter(NodeEx node1, NodeEx node2, Configuration config) {
not fullOutBarrier(node1, config) and
not fullInBarrier(node2, config) and
not outBarrier(node1, config) and
not inBarrier(node2, config) and
not fullBarrier(node1, config) and
not fullBarrier(node2, config)
}
@@ -474,8 +438,6 @@ private predicate additionalLocalStateStep(
config.isAdditionalFlowStep(n1, s1, n2, s2) and
getNodeEnclosingCallable(n1) = getNodeEnclosingCallable(n2) and
stepFilter(node1, node2, config) and
not stateOutBarrier(node1, s1, config) and
not stateInBarrier(node2, s2, config) and
not stateBarrier(node1, s1, config) and
not stateBarrier(node2, s2, config)
)
@@ -517,16 +479,15 @@ private predicate additionalJumpStateStep(
config.isAdditionalFlowStep(n1, s1, n2, s2) and
getNodeEnclosingCallable(n1) != getNodeEnclosingCallable(n2) and
stepFilter(node1, node2, config) and
not stateOutBarrier(node1, s1, config) and
not stateInBarrier(node2, s2, config) and
not stateBarrier(node1, s1, config) and
not stateBarrier(node2, s2, config) and
not config.getAFeature() instanceof FeatureEqualSourceSinkCallContext
)
}
private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
read(node1.asNode(), c, node2.asNode()) and
pragma[nomagic]
private predicate readSet(NodeEx node1, ContentSet c, NodeEx node2, Configuration config) {
readSet(node1.asNode(), c, node2.asNode()) and
stepFilter(node1, node2, config)
or
exists(Node n |
@@ -536,6 +497,25 @@ private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration conf
)
}
// inline to reduce fan-out via `getAReadContent`
pragma[inline]
private predicate read(NodeEx node1, Content c, NodeEx node2, Configuration config) {
exists(ContentSet cs |
readSet(node1, cs, node2, config) and
c = cs.getAReadContent()
)
}
// inline to reduce fan-out via `getAReadContent`
pragma[inline]
private predicate clearsContentEx(NodeEx n, Content c) {
exists(ContentSet cs |
clearsContentCached(n.asNode(), cs) and
c = cs.getAReadContent()
)
}
pragma[nomagic]
private predicate store(
NodeEx node1, TypedContent tc, NodeEx node2, DataFlowType contentType, Configuration config
) {
@@ -613,9 +593,9 @@ private module Stage1 {
)
or
// read
exists(Content c |
fwdFlowRead(c, node, cc, config) and
fwdFlowConsCand(c, config)
exists(ContentSet c |
fwdFlowReadSet(c, node, cc, config) and
fwdFlowConsCandSet(c, _, config)
)
or
// flow into a callable
@@ -639,10 +619,10 @@ private module Stage1 {
private predicate fwdFlow(NodeEx node, Configuration config) { fwdFlow(node, _, config) }
pragma[nomagic]
private predicate fwdFlowRead(Content c, NodeEx node, Cc cc, Configuration config) {
private predicate fwdFlowReadSet(ContentSet c, NodeEx node, Cc cc, Configuration config) {
exists(NodeEx mid |
fwdFlow(mid, cc, config) and
read(mid, c, node, config)
readSet(mid, c, node, config)
)
}
@@ -660,6 +640,16 @@ private module Stage1 {
)
}
/**
* Holds if `cs` may be interpreted in a read as the target of some store
* into `c`, in the flow covered by `fwdFlow`.
*/
pragma[nomagic]
private predicate fwdFlowConsCandSet(ContentSet cs, Content c, Configuration config) {
fwdFlowConsCand(c, config) and
c = cs.getAReadContent()
}
pragma[nomagic]
private predicate fwdFlowReturnPosition(ReturnPosition pos, Cc cc, Configuration config) {
exists(RetNodeEx ret |
@@ -752,9 +742,9 @@ private module Stage1 {
)
or
// read
exists(NodeEx mid, Content c |
read(node, c, mid, config) and
fwdFlowConsCand(c, pragma[only_bind_into](config)) and
exists(NodeEx mid, ContentSet c |
readSet(node, c, mid, config) and
fwdFlowConsCandSet(c, _, pragma[only_bind_into](config)) and
revFlow(mid, toReturn, pragma[only_bind_into](config))
)
or
@@ -780,10 +770,10 @@ private module Stage1 {
*/
pragma[nomagic]
private predicate revFlowConsCand(Content c, Configuration config) {
exists(NodeEx mid, NodeEx node |
exists(NodeEx mid, NodeEx node, ContentSet cs |
fwdFlow(node, pragma[only_bind_into](config)) and
read(node, c, mid, config) and
fwdFlowConsCand(c, pragma[only_bind_into](config)) and
readSet(node, cs, mid, config) and
fwdFlowConsCandSet(cs, c, pragma[only_bind_into](config)) and
revFlow(pragma[only_bind_into](mid), _, pragma[only_bind_into](config))
)
}
@@ -802,6 +792,7 @@ private module Stage1 {
* Holds if `c` is the target of both a read and a store in the flow covered
* by `revFlow`.
*/
pragma[nomagic]
private predicate revFlowIsReadAndStored(Content c, Configuration conf) {
revFlowConsCand(c, conf) and
revFlowStore(c, _, _, conf)
@@ -900,9 +891,9 @@ private module Stage1 {
pragma[nomagic]
predicate readStepCand(NodeEx n1, Content c, NodeEx n2, Configuration config) {
revFlowIsReadAndStored(c, pragma[only_bind_into](config)) and
revFlow(n2, pragma[only_bind_into](config)) and
read(n1, c, n2, pragma[only_bind_into](config))
revFlowIsReadAndStored(pragma[only_bind_into](c), pragma[only_bind_into](config)) and
read(n1, c, n2, pragma[only_bind_into](config)) and
revFlow(n2, pragma[only_bind_into](config))
}
pragma[nomagic]
@@ -912,14 +903,17 @@ private module Stage1 {
predicate revFlow(
NodeEx node, FlowState state, boolean toReturn, ApOption returnAp, Ap ap, Configuration config
) {
revFlow(node, toReturn, config) and exists(state) and exists(returnAp) and exists(ap)
revFlow(node, toReturn, pragma[only_bind_into](config)) and
exists(state) and
exists(returnAp) and
exists(ap)
}
private predicate throughFlowNodeCand(NodeEx node, Configuration config) {
revFlow(node, true, config) and
fwdFlow(node, true, config) and
not fullInBarrier(node, config) and
not fullOutBarrier(node, config)
not inBarrier(node, config) and
not outBarrier(node, config)
}
/** Holds if flow may return from `callable`. */
@@ -1014,8 +1008,8 @@ private predicate flowOutOfCallNodeCand1(
) {
viableReturnPosOutNodeCand1(call, ret.getReturnPosition(), out, config) and
Stage1::revFlow(ret, config) and
not fullOutBarrier(ret, config) and
not fullInBarrier(out, config)
not outBarrier(ret, config) and
not inBarrier(out, config)
}
pragma[nomagic]
@@ -1036,8 +1030,8 @@ private predicate flowIntoCallNodeCand1(
) {
viableParamArgNodeCand1(call, p, arg, config) and
Stage1::revFlow(p, config) and
not fullOutBarrier(arg, config) and
not fullInBarrier(p, config)
not outBarrier(arg, config) and
not inBarrier(p, config)
}
/**
@@ -1189,7 +1183,7 @@ private module Stage2 {
bindingset[node, state, ap, config]
private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
PrevStage::revFlowState(state, config) and
PrevStage::revFlowState(state, pragma[only_bind_into](config)) and
exists(ap) and
not stateBarrier(node, state, config)
}
@@ -1614,7 +1608,7 @@ private module Stage2 {
Configuration config
) {
exists(Ap ap2, Content c |
store(node1, tc, node2, contentType, config) and
PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
revFlowConsCand(ap2, c, ap1, config)
)
@@ -1769,9 +1763,9 @@ private module LocalFlowBigStep {
or
node.asNode() instanceof OutNodeExt
or
store(_, _, node, _, config)
Stage2::storeStepCand(_, _, _, node, _, config)
or
read(_, _, node, config)
Stage2::readStepCand(_, _, node, config)
or
node instanceof FlowCheckNode
or
@@ -1792,8 +1786,8 @@ private module LocalFlowBigStep {
additionalJumpStep(node, next, config) or
flowIntoCallNodeCand1(_, node, next, config) or
flowOutOfCallNodeCand1(_, node, next, config) or
store(node, _, next, _, config) or
read(node, _, next, config)
Stage2::storeStepCand(node, _, _, next, _, config) or
Stage2::readStepCand(node, _, next, config)
)
or
exists(NodeEx next, FlowState s | Stage2::revFlow(next, s, config) |
@@ -1966,7 +1960,24 @@ private module Stage3 {
private predicate flowIntoCall = flowIntoCallNodeCand2/5;
pragma[nomagic]
private predicate clear(NodeEx node, Ap ap) { ap.isClearedAt(node.asNode()) }
private predicate clearSet(NodeEx node, ContentSet c, Configuration config) {
PrevStage::revFlow(node, config) and
clearsContentCached(node.asNode(), c)
}
pragma[nomagic]
private predicate clearContent(NodeEx node, Content c, Configuration config) {
exists(ContentSet cs |
PrevStage::readStepCand(_, pragma[only_bind_into](c), _, pragma[only_bind_into](config)) and
c = cs.getAReadContent() and
clearSet(node, cs, pragma[only_bind_into](config))
)
}
pragma[nomagic]
private predicate clear(NodeEx node, Ap ap, Configuration config) {
clearContent(node, ap.getHead().getContent(), config)
}
pragma[nomagic]
private predicate castingNodeEx(NodeEx node) { node.asNode() instanceof CastingNode }
@@ -1975,7 +1986,7 @@ private module Stage3 {
private predicate filter(NodeEx node, FlowState state, Ap ap, Configuration config) {
exists(state) and
exists(config) and
not clear(node, ap) and
not clear(node, ap, config) and
if castingNodeEx(node) then compatibleTypes(node.getDataFlowType(), ap.getType()) else any()
}
@@ -2403,7 +2414,7 @@ private module Stage3 {
Configuration config
) {
exists(Ap ap2, Content c |
store(node1, tc, node2, contentType, config) and
PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
revFlowConsCand(ap2, c, ap1, config)
)
@@ -3230,7 +3241,7 @@ private module Stage4 {
Configuration config
) {
exists(Ap ap2, Content c |
store(node1, tc, node2, contentType, config) and
PrevStage::storeStepCand(node1, _, tc, node2, contentType, config) and
revFlowStore(ap2, c, ap1, node1, _, tc, node2, _, _, config) and
revFlowConsCand(ap2, c, ap1, config)
)
@@ -4242,7 +4253,7 @@ private module Subpaths {
exists(NodeEx n1, NodeEx n2 | n1 = n.getNodeEx() and n2 = result.getNodeEx() |
localFlowBigStep(n1, _, n2, _, _, _, _, _) or
store(n1, _, n2, _, _) or
read(n1, _, n2, _)
readSet(n1, _, n2, _)
)
}
@@ -4597,7 +4608,7 @@ private module FlowExploration {
or
exists(PartialPathNodeRev mid |
revPartialPathStep(mid, node, state, sc1, sc2, sc3, ap, config) and
not clearsContentCached(node.asNode(), ap.getHead()) and
not clearsContentEx(node, ap.getHead()) and
not fullBarrier(node, config) and
not stateBarrier(node, state, config) and
distSink(node.getEnclosingCallable(), config) <= config.explorationLimit()
@@ -4613,7 +4624,7 @@ private module FlowExploration {
partialPathStep(mid, node, state, cc, sc1, sc2, sc3, ap, config) and
not fullBarrier(node, config) and
not stateBarrier(node, state, config) and
not clearsContentCached(node.asNode(), ap.getHead().getContent()) and
not clearsContentEx(node, ap.getHead().getContent()) and
if node.asNode() instanceof CastingNode
then compatibleTypes(node.getDataFlowType(), ap.getType())
else any()
@@ -5047,6 +5058,7 @@ private module FlowExploration {
)
}
pragma[nomagic]
private predicate revPartialPathStep(
PartialPathNodeRev mid, NodeEx node, FlowState state, TRevSummaryCtx1 sc1, TRevSummaryCtx2 sc2,
TRevSummaryCtx3 sc3, RevPartialAccessPath ap, Configuration config

View File

@@ -326,7 +326,7 @@ private module Cached {
predicate jumpStepCached(Node node1, Node node2) { jumpStep(node1, node2) }
cached
predicate clearsContentCached(Node n, Content c) { clearsContent(n, c) }
predicate clearsContentCached(Node n, ContentSet c) { clearsContent(n, c) }
cached
predicate isUnreachableInCallCached(Node n, DataFlowCall call) { isUnreachableInCall(n, call) }
@@ -373,7 +373,7 @@ private module Cached {
// For reads, `x.f`, we want to check that the tracked type after the read (which
// is obtained by popping the head of the access path stack) is compatible with
// the type of `x.f`.
read(_, _, n)
readSet(_, _, n)
}
cached
@@ -469,7 +469,7 @@ private module Cached {
// read
exists(Node mid |
parameterValueFlowCand(p, mid, false) and
read(mid, _, node) and
readSet(mid, _, node) and
read = true
)
or
@@ -657,8 +657,10 @@ private module Cached {
* Holds if `arg` flows to `out` through a call using only
* value-preserving steps and a single read step, not taking call
* contexts into account, thus representing a getter-step.
*
* This predicate is exposed for testing only.
*/
predicate getterStep(ArgNode arg, Content c, Node out) {
predicate getterStep(ArgNode arg, ContentSet c, Node out) {
argumentValueFlowsThrough(arg, TReadStepTypesSome(_, c, _), out)
}
@@ -781,28 +783,30 @@ private module Cached {
parameterValueFlow(p, n.getPreUpdateNode(), TReadStepTypesNone())
}
cached
predicate readSet(Node node1, ContentSet c, Node node2) { readStep(node1, c, node2) }
private predicate store(
Node node1, Content c, Node node2, DataFlowType contentType, DataFlowType containerType
) {
storeStep(node1, c, node2) and
contentType = getNodeDataFlowType(node1) and
containerType = getNodeDataFlowType(node2)
or
exists(Node n1, Node n2 |
n1 = node1.(PostUpdateNode).getPreUpdateNode() and
n2 = node2.(PostUpdateNode).getPreUpdateNode()
|
argumentValueFlowsThrough(n2, TReadStepTypesSome(containerType, c, contentType), n1)
exists(ContentSet cs | c = cs.getAStoreContent() |
storeStep(node1, cs, node2) and
contentType = getNodeDataFlowType(node1) and
containerType = getNodeDataFlowType(node2)
or
read(n2, c, n1) and
contentType = getNodeDataFlowType(n1) and
containerType = getNodeDataFlowType(n2)
exists(Node n1, Node n2 |
n1 = node1.(PostUpdateNode).getPreUpdateNode() and
n2 = node2.(PostUpdateNode).getPreUpdateNode()
|
argumentValueFlowsThrough(n2, TReadStepTypesSome(containerType, cs, contentType), n1)
or
readSet(n2, cs, n1) and
contentType = getNodeDataFlowType(n1) and
containerType = getNodeDataFlowType(n2)
)
)
}
cached
predicate read(Node node1, Content c, Node node2) { readStep(node1, c, node2) }
/**
* Holds if data can flow from `node1` to `node2` via a direct assignment to
* `f`.
@@ -932,16 +936,16 @@ class CastingNode extends Node {
}
private predicate readStepWithTypes(
Node n1, DataFlowType container, Content c, Node n2, DataFlowType content
Node n1, DataFlowType container, ContentSet c, Node n2, DataFlowType content
) {
read(n1, c, n2) and
readSet(n1, c, n2) and
container = getNodeDataFlowType(n1) and
content = getNodeDataFlowType(n2)
}
private newtype TReadStepTypesOption =
TReadStepTypesNone() or
TReadStepTypesSome(DataFlowType container, Content c, DataFlowType content) {
TReadStepTypesSome(DataFlowType container, ContentSet c, DataFlowType content) {
readStepWithTypes(_, container, c, _, content)
}
@@ -950,7 +954,7 @@ private class ReadStepTypesOption extends TReadStepTypesOption {
DataFlowType getContainerType() { this = TReadStepTypesSome(result, _, _) }
Content getContent() { this = TReadStepTypesSome(_, result, _) }
ContentSet getContent() { this = TReadStepTypesSome(_, result, _) }
DataFlowType getContentType() { this = TReadStepTypesSome(_, _, result) }
@@ -1325,8 +1329,6 @@ abstract class AccessPathFront extends TAccessPathFront {
abstract boolean toBoolNonEmpty();
TypedContent getHead() { this = TFrontHead(result) }
predicate isClearedAt(Node n) { clearsContentCached(n, this.getHead().getContent()) }
}
class AccessPathFrontNil extends AccessPathFront, TFrontNil {

View File

@@ -401,8 +401,15 @@ class ModuleVariableNode extends Node, TModuleVariableNode {
private predicate isAccessedThroughImportStar(Module m) { m = ImportStar::getStarImported(_) }
private ModuleVariableNode import_star_read(Node n) {
ImportStar::importStarResolvesTo(n.asCfgNode(), result.getModule()) and
n.asCfgNode().(NameNode).getId() = result.getVariable().getId()
resolved_import_star_module(result.getModule(), result.getVariable().getId(), n)
}
pragma[nomagic]
private predicate resolved_import_star_module(Module m, string name, Node n) {
exists(NameNode nn | nn = n.asCfgNode() |
ImportStar::importStarResolvesTo(pragma[only_bind_into](nn), m) and
nn.getId() = name
)
}
/**
@@ -643,3 +650,20 @@ class AttributeContent extends TAttributeContent, Content {
override string toString() { result = "Attribute " + attr }
}
/**
* An entity that represents a set of `Content`s.
*
* The set may be interpreted differently depending on whether it is
* stored into (`getAStoreContent`) or read from (`getAReadContent`).
*/
class ContentSet instanceof Content {
/** Gets a content that may be stored into when storing into this set. */
Content getAStoreContent() { result = this }
/** Gets a content that may be read from when reading from this set. */
Content getAReadContent() { result = this }
/** Gets a textual representation of this content set. */
string toString() { result = super.toString() }
}

View File

@@ -0,0 +1,396 @@
/**
* The unpacking assignment takes the general form
* ```python
* sequence = iterable
* ```
* where `sequence` is either a tuple or a list and it can contain wildcards.
* The iterable can be any iterable, which means that (CodeQL modeling of) content
* will need to change type if it should be transferred from the LHS to the RHS.
*
* Note that (CodeQL modeling of) content does not have to change type on data-flow
* paths _inside_ the LHS, as the different allowed syntaxes here are merely a convenience.
* Consequently, we model all LHS sequences as tuples, which have the more precise content
* model, making flow to the elements more precise. If an element is a starred variable,
* we will have to mutate the content type to be list content.
*
* We may for instance have
* ```python
* (a, b) = ["a", SOURCE] # RHS has content `ListElementContent`
* ```
* Due to the abstraction for list content, we do not know whether `SOURCE`
* ends up in `a` or in `b`, so we want to overapproximate and see it in both.
*
* Using wildcards we may have
* ```python
* (a, *b) = ("a", "b", SOURCE) # RHS has content `TupleElementContent(2)`
* ```
* Since the starred variables are always assigned (Python-)type list, `*b` will be
* `["b", SOURCE]`, and we will again overapproximate and assign it
* content corresponding to anything found in the RHS.
*
* For a precise transfer
* ```python
* (a, b) = ("a", SOURCE) # RHS has content `TupleElementContent(1)`
* ```
* we wish to keep the precision, so only `b` receives the tuple content at index 1.
*
* Finally, `sequence` is actually a pattern and can have a more complicated structure,
* such as
* ```python
* (a, [b, *c]) = ("a", ["b", SOURCE]) # RHS has content `TupleElementContent(1); ListElementContent`
* ```
* where `a` should not receive content, but `b` and `c` should. `c` will be `[SOURCE]` so
* should have the content transferred, while `b` should read it.
*
* To transfer content from RHS to the elements of the LHS in the expression `sequence = iterable`,
* we use two synthetic nodes:
*
* - `TIterableSequence(sequence)` which captures the content-modeling the entire `sequence` will have
* (essentially just a copy of the content-modeling the RHS has)
*
* - `TIterableElement(sequence)` which captures the content-modeling that will be assigned to an element.
* Note that an empty access path means that the value we are tracking flows directly to the element.
*
*
* The `TIterableSequence(sequence)` is at this point superflous but becomes useful when handling recursive
* structures in the LHS, where `sequence` is some internal sequence node. We can have a uniform treatment
* by always having these two synthetic nodes. So we transfer to (or, in the recursive case, read into)
* `TIterableSequence(sequence)`, from which we take a read step to `TIterableElement(sequence)` and then a
* store step to `sequence`.
*
* This allows the unknown content from the RHS to be read into `TIterableElement(sequence)` and tuple content
* to then be stored into `sequence`. If the content is already tuple content, this inderection creates crosstalk
* between indices. Therefore, tuple content is never read into `TIterableElement(sequence)`; it is instead
* transferred directly from `TIterableSequence(sequence)` to `sequence` via a flow step. Such a flow step will
* also transfer other content, but only tuple content is further read from `sequence` into its elements.
*
* The strategy is then via several read-, store-, and flow steps:
* 1. a) [Flow] Content is transferred from `iterable` to `TIterableSequence(sequence)` via a
* flow step. From here, everything happens on the LHS.
*
* b) [Read] If the unpacking happens inside a for as in
* ```python
* for sequence in iterable
* ```
* then content is read from `iterable` to `TIterableSequence(sequence)`.
*
* 2. [Flow] Content is transferred from `TIterableSequence(sequence)` to `sequence` via a
* flow step. (Here only tuple content is relevant.)
*
* 3. [Read] Content is read from `TIterableSequence(sequence)` into `TIterableElement(sequence)`.
* As `sequence` is modeled as a tuple, we will not read tuple content as that would allow
* crosstalk.
*
* 4. [Store] Content is stored from `TIterableElement(sequence)` to `sequence`.
* Content type is `TupleElementContent` with indices taken from the syntax.
* For instance, if `sequence` is `(a, *b, c)`, content is written to index 0, 1, and 2.
* This is adequate as the route through `TIterableElement(sequence)` does not transfer precise content.
*
* 5. [Read] Content is read from `sequence` to its elements.
* a) If the element is a plain variable, the target is the corresponding essa node.
*
* b) If the element is itself a sequence, with control-flow node `seq`, the target is `TIterableSequence(seq)`.
*
* c) If the element is a starred variable, with control-flow node `v`, the target is `TIterableElement(v)`.
*
* 6. [Store] Content is stored from `TIterableElement(v)` to the essa variable for `v`, with
* content type `ListElementContent`.
*
* 7. [Flow, Read, Store] Steps 2 through 7 are repeated for all recursive elements which are sequences.
*
*
* We illustrate the above steps on the assignment
*
* ```python
* (a, b) = ["a", SOURCE]
* ```
*
* Looking at the content propagation to `a`:
* `["a", SOURCE]`: [ListElementContent]
*
* --Step 1a-->
*
* `TIterableSequence((a, b))`: [ListElementContent]
*
* --Step 3-->
*
* `TIterableElement((a, b))`: []
*
* --Step 4-->
*
* `(a, b)`: [TupleElementContent(0)]
*
* --Step 5a-->
*
* `a`: []
*
* Meaning there is data-flow from the RHS to `a` (an over approximation). The same logic would be applied to show there is data-flow to `b`. Note that _Step 3_ and _Step 4_ would not have been needed if the RHS had been a tuple (since that would have been able to use _Step 2_ instead).
*
* Another, more complicated example:
* ```python
* (a, [b, *c]) = ["a", [SOURCE]]
* ```
* where the path to `c` is
*
* `["a", [SOURCE]]`: [ListElementContent; ListElementContent]
*
* --Step 1a-->
*
* `TIterableSequence((a, [b, *c]))`: [ListElementContent; ListElementContent]
*
* --Step 3-->
*
* `TIterableElement((a, [b, *c]))`: [ListElementContent]
*
* --Step 4-->
*
* `(a, [b, *c])`: [TupleElementContent(1); ListElementContent]
*
* --Step 5b-->
*
* `TIterableSequence([b, *c])`: [ListElementContent]
*
* --Step 3-->
*
* `TIterableElement([b, *c])`: []
*
* --Step 4-->
*
* `[b, *c]`: [TupleElementContent(1)]
*
* --Step 5c-->
*
* `TIterableElement(c)`: []
*
* --Step 6-->
*
* `c`: [ListElementContent]
*/
private import python
private import DataFlowPublic
/**
* The target of a `for`, e.g. `x` in `for x in list` or in `[42 for x in list]`.
* This class also records the source, which in both above cases is `list`.
* This class abstracts away the differing representations of comprehensions and
* for statements.
*/
class ForTarget extends ControlFlowNode {
Expr source;
ForTarget() {
exists(For for |
source = for.getIter() and
this.getNode() = for.getTarget() and
not for = any(Comp comp).getNthInnerLoop(0)
)
or
exists(Comp comp |
source = comp.getIterable() and
this.getNode() = comp.getNthInnerLoop(0).getTarget()
)
}
Expr getSource() { result = source }
}
/** The LHS of an assignment, it also records the assigned value. */
class AssignmentTarget extends ControlFlowNode {
Expr value;
AssignmentTarget() {
exists(Assign assign | this.getNode() = assign.getATarget() | value = assign.getValue())
}
Expr getValue() { result = value }
}
/** A direct (or top-level) target of an unpacking assignment. */
class UnpackingAssignmentDirectTarget extends ControlFlowNode {
Expr value;
UnpackingAssignmentDirectTarget() {
this instanceof SequenceNode and
(
value = this.(AssignmentTarget).getValue()
or
value = this.(ForTarget).getSource()
)
}
Expr getValue() { result = value }
}
/** A (possibly recursive) target of an unpacking assignment. */
class UnpackingAssignmentTarget extends ControlFlowNode {
UnpackingAssignmentTarget() {
this instanceof UnpackingAssignmentDirectTarget
or
this = any(UnpackingAssignmentSequenceTarget parent).getAnElement()
}
}
/** A (possibly recursive) target of an unpacking assignment which is also a sequence. */
class UnpackingAssignmentSequenceTarget extends UnpackingAssignmentTarget instanceof SequenceNode {
ControlFlowNode getElement(int i) { result = super.getElement(i) }
ControlFlowNode getAnElement() { result = this.getElement(_) }
}
/**
* Step 1a
* Data flows from `iterable` to `TIterableSequence(sequence)`
*/
predicate iterableUnpackingAssignmentFlowStep(Node nodeFrom, Node nodeTo) {
exists(AssignmentTarget target |
nodeFrom.asExpr() = target.getValue() and
nodeTo = TIterableSequenceNode(target)
)
}
/**
* Step 1b
* Data is read from `iterable` to `TIterableSequence(sequence)`
*/
predicate iterableUnpackingForReadStep(CfgNode nodeFrom, Content c, Node nodeTo) {
exists(ForTarget target |
nodeFrom.asExpr() = target.getSource() and
target instanceof SequenceNode and
nodeTo = TIterableSequenceNode(target)
) and
(
c instanceof ListElementContent
or
c instanceof SetElementContent
)
}
/**
* Step 2
* Data flows from `TIterableSequence(sequence)` to `sequence`
*/
predicate iterableUnpackingTupleFlowStep(Node nodeFrom, Node nodeTo) {
exists(UnpackingAssignmentSequenceTarget target |
nodeFrom = TIterableSequenceNode(target) and
nodeTo.asCfgNode() = target
)
}
/**
* Step 3
* Data flows from `TIterableSequence(sequence)` into `TIterableElement(sequence)`.
* As `sequence` is modeled as a tuple, we will not read tuple content as that would allow
* crosstalk.
*/
predicate iterableUnpackingConvertingReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(UnpackingAssignmentSequenceTarget target |
nodeFrom = TIterableSequenceNode(target) and
nodeTo = TIterableElementNode(target) and
(
c instanceof ListElementContent
or
c instanceof SetElementContent
// TODO: dict content in iterable unpacking not handled
)
)
}
/**
* Step 4
* Data flows from `TIterableElement(sequence)` to `sequence`.
* Content type is `TupleElementContent` with indices taken from the syntax.
* For instance, if `sequence` is `(a, *b, c)`, content is written to index 0, 1, and 2.
*/
predicate iterableUnpackingConvertingStoreStep(Node nodeFrom, Content c, Node nodeTo) {
exists(UnpackingAssignmentSequenceTarget target |
nodeFrom = TIterableElementNode(target) and
nodeTo.asCfgNode() = target and
exists(int index | exists(target.getElement(index)) |
c.(TupleElementContent).getIndex() = index
)
)
}
/**
* Step 5
* For a sequence node inside an iterable unpacking, data flows from the sequence to its elements. There are
* three cases for what `toNode` should be:
* a) If the element is a plain variable, `toNode` is the corresponding essa node.
*
* b) If the element is itself a sequence, with control-flow node `seq`, `toNode` is `TIterableSequence(seq)`.
*
* c) If the element is a starred variable, with control-flow node `v`, `toNode` is `TIterableElement(v)`.
*/
predicate iterableUnpackingElementReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(
UnpackingAssignmentSequenceTarget target, int index, ControlFlowNode element, int starIndex
|
target.getElement(starIndex) instanceof StarredNode
or
not exists(target.getAnElement().(StarredNode)) and
starIndex = -1
|
nodeFrom.asCfgNode() = target and
element = target.getElement(index) and
(
if starIndex = -1 or index < starIndex
then c.(TupleElementContent).getIndex() = index
else
// This could get big if big tuples exist
if index = starIndex
then c.(TupleElementContent).getIndex() >= index
else c.(TupleElementContent).getIndex() >= index - 1
) and
(
if element instanceof SequenceNode
then
// Step 5b
nodeTo = TIterableSequenceNode(element)
else
if element instanceof StarredNode
then
// Step 5c
nodeTo = TIterableElementNode(element)
else
// Step 5a
nodeTo.asVar().getDefinition().(MultiAssignmentDefinition).getDefiningNode() = element
)
)
}
/**
* Step 6
* Data flows from `TIterableElement(v)` to the essa variable for `v`, with
* content type `ListElementContent`.
*/
predicate iterableUnpackingStarredElementStoreStep(Node nodeFrom, Content c, Node nodeTo) {
exists(ControlFlowNode starred | starred.getNode() instanceof Starred |
nodeFrom = TIterableElementNode(starred) and
nodeTo.asVar().getDefinition().(MultiAssignmentDefinition).getDefiningNode() = starred and
c instanceof ListElementContent
)
}
/** All read steps associated with unpacking assignment. */
predicate iterableUnpackingReadStep(Node nodeFrom, Content c, Node nodeTo) {
iterableUnpackingForReadStep(nodeFrom, c, nodeTo)
or
iterableUnpackingElementReadStep(nodeFrom, c, nodeTo)
or
iterableUnpackingConvertingReadStep(nodeFrom, c, nodeTo)
}
/** All store steps associated with unpacking assignment. */
predicate iterableUnpackingStoreStep(Node nodeFrom, Content c, Node nodeTo) {
iterableUnpackingStarredElementStoreStep(nodeFrom, c, nodeTo)
or
iterableUnpackingConvertingStoreStep(nodeFrom, c, nodeTo)
}
/** All flow steps associated with unpacking assignment. */
predicate iterableUnpackingFlowStep(Node nodeFrom, Node nodeTo) {
iterableUnpackingAssignmentFlowStep(nodeFrom, nodeTo)
or
iterableUnpackingTupleFlowStep(nodeFrom, nodeTo)
}

View File

@@ -0,0 +1,311 @@
/**
* There are a number of patterns available for the match statement.
* Each one transfers data and content differently to its parts.
*
* Furthermore, given a successful match, we can infer some data about
* the subject. Consider the example:
* ```python
* match choice:
* case 'Y':
* ...body
* ```
* Inside `body`, we know that `choice` has the value `'Y'`.
*
* A similar thing happens with the "as pattern". Consider the example:
* ```python
* match choice:
* case ('y'|'Y') as c:
* ...body
* ```
* By the binding rules, there is data flow from `choice` to `c`. But we
* can infer the value of `c` to be either `'y'` or `'Y'` if the match succeeds.
*
* We will treat such inferences separately as guards. First we will model the data flow
* stemming from the bindings and the matching of shape. Below, 'subject' is not necessarily the
* top-level subject of the match, but rather the part recursively matched by the current pattern.
* For instance, in the example:
* ```python
* match command:
* case ('quit' as c) | ('go', ('up'|'down') as c):
* ...body
* ```
* `command` is the subject of first the as-pattern, while the second component of `command`
* is the subject of the second as-pattern. As such, 'subject' refers to the pattern under evaluation.
*
* - as pattern: subject flows to alias as well as to the interior pattern
* - or pattern: subject flows to each alternative
* - literal pattern: flow from the literal to the pattern, to add information
* - capture pattern: subject flows to the variable
* - wildcard pattern: no flow
* - value pattern: flow from the value to the pattern, to add information
* - sequence pattern: each element reads from subject at the associated index
* - star pattern: subject flows to the variable, possibly via a conversion
* - mapping pattern: each value reads from subject at the associated key
* - double star pattern: subject flows to the variable, possibly via a conversion
* - key-value pattern: the value reads from the subject at the key (see mapping pattern)
* - class pattern: all keywords read the appropriate attribute from the subject
* - keyword pattern: the appropriate attribute is read from the subject (see class pattern)
*
* Inside the class pattern, we also find positional arguments. They are converted to
* keyword arguments using the `__match_args__` attribute on the class. We do not
* currently model this.
*/
private import python
private import DataFlowPublic
/**
* Holds when there is flow from the subject `nodeFrom` to the (top-level) pattern `nodeTo` of a `match` statement.
*
* The subject of a match flows to each top-level pattern
* (a pattern directly under a `case` statement).
*
* We could consider a model closer to use-use-flow, where the subject
* only flows to the first top-level pattern and from there to the
* following ones.
*/
predicate matchSubjectFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchStmt match, Expr subject, Pattern target |
subject = match.getSubject() and
target = match.getCase(_).(Case).getPattern()
|
nodeFrom.asExpr() = subject and
nodeTo.asCfgNode().getNode() = target
)
}
/**
* as pattern: subject flows to alias as well as to the interior pattern
* syntax (toplevel): `case pattern as alias:`
*/
predicate matchAsFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchAsPattern subject, Name alias | alias = subject.getAlias() |
// We make the subject flow to the interior pattern via the alias.
// That way, information can propagate from the interior pattern to the alias.
//
// the subject flows to the interior pattern
nodeFrom.asCfgNode().getNode() = subject and
nodeTo.asCfgNode().getNode() = subject.getPattern()
or
// the interior pattern flows to the alias
nodeFrom.asCfgNode().getNode() = subject.getPattern() and
nodeTo.asVar().getDefinition().(PatternAliasDefinition).getDefiningNode().getNode() = alias
)
}
/**
* or pattern: subject flows to each alternative
* syntax (toplevel): `case alt1 | alt2:`
*/
predicate matchOrFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchOrPattern subject, Pattern pattern | pattern = subject.getAPattern() |
nodeFrom.asCfgNode().getNode() = subject and
nodeTo.asCfgNode().getNode() = pattern
)
}
/**
* literal pattern: flow from the literal to the pattern, to add information
* syntax (toplevel): `case literal:`
*/
predicate matchLiteralFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchLiteralPattern pattern, Expr literal | literal = pattern.getLiteral() |
nodeFrom.asExpr() = literal and
nodeTo.asCfgNode().getNode() = pattern
)
}
/**
* capture pattern: subject flows to the variable
* syntax (toplevel): `case var:`
*/
predicate matchCaptureFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchCapturePattern capture, Name var | capture.getVariable() = var |
nodeFrom.asCfgNode().getNode() = capture and
nodeTo.asVar().getDefinition().(PatternCaptureDefinition).getDefiningNode().getNode() = var
)
}
/**
* value pattern: flow from the value to the pattern, to add information
* syntax (toplevel): `case Dotted.value:`
*/
predicate matchValueFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchValuePattern pattern, Expr value | value = pattern.getValue() |
nodeFrom.asExpr() = value and
nodeTo.asCfgNode().getNode() = pattern
)
}
/**
* sequence pattern: each element reads from subject at the associated index
* syntax (toplevel): `case [a, b]:`
*/
predicate matchSequenceReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(MatchSequencePattern subject, int index, Pattern element |
element = subject.getPattern(index)
|
nodeFrom.asCfgNode().getNode() = subject and
nodeTo.asCfgNode().getNode() = element and
(
// tuple content
c.(TupleElementContent).getIndex() = index
or
// list content
c instanceof ListElementContent
// set content is excluded from sequence patterns,
// see https://www.python.org/dev/peps/pep-0635/#sequence-patterns
)
)
}
/**
* star pattern: subject flows to the variable, possibly via a conversion
* syntax (toplevel): `case *var:`
*
* We decompose this flow into a read step and a store step. The read step
* reads both tuple and list content, the store step only stores list content.
* This way, we convert all content to list content.
*
* This is the read step.
*/
predicate matchStarReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(MatchSequencePattern subject, int index, MatchStarPattern star |
star = subject.getPattern(index)
|
nodeFrom.asCfgNode().getNode() = subject and
nodeTo = TStarPatternElementNode(star) and
(
// tuple content
c.(TupleElementContent).getIndex() >= index
or
// list content
c instanceof ListElementContent
// set content is excluded from sequence patterns,
// see https://www.python.org/dev/peps/pep-0635/#sequence-patterns
)
)
}
/**
* star pattern: subject flows to the variable, possibly via a conversion
* syntax (toplevel): `case *var:`
*
* We decompose this flow into a read step and a store step. The read step
* reads both tuple and list content, the store step only stores list content.
* This way, we convert all content to list content.
*
* This is the store step.
*/
predicate matchStarStoreStep(Node nodeFrom, Content c, Node nodeTo) {
exists(MatchStarPattern star |
nodeFrom = TStarPatternElementNode(star) and
nodeTo.asCfgNode().getNode() = star.getTarget() and
c instanceof ListElementContent
)
}
/**
* mapping pattern: each value reads from subject at the associated key
* syntax (toplevel): `case {"color": c, "height": x}:`
*/
predicate matchMappingReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(
MatchMappingPattern subject, MatchKeyValuePattern keyValue, MatchLiteralPattern key,
Pattern value
|
keyValue = subject.getAMapping() and
key = keyValue.getKey() and
value = keyValue.getValue()
|
nodeFrom.asCfgNode().getNode() = subject and
nodeTo.asCfgNode().getNode() = value and
c.(DictionaryElementContent).getKey() = key.getLiteral().(StrConst).getText()
)
}
/**
* double star pattern: subject flows to the variable, possibly via a conversion
* syntax (toplevel): `case {**var}:`
*
* Dictionary content flows to the double star, but all mentioned keys in the
* mapping pattern should be cleared.
*/
predicate matchMappingFlowStep(Node nodeFrom, Node nodeTo) {
exists(MatchMappingPattern subject, MatchDoubleStarPattern dstar | dstar = subject.getAMapping() |
nodeFrom.asCfgNode().getNode() = subject and
nodeTo.asCfgNode().getNode() = dstar.getTarget()
)
}
/**
* Bindings that are mentioned in a mapping pattern will not be available
* to a double star pattern in the same mapping pattern.
*/
predicate matchMappingClearStep(Node n, Content c) {
exists(
MatchMappingPattern subject, MatchKeyValuePattern keyValue, MatchLiteralPattern key,
MatchDoubleStarPattern dstar
|
keyValue = subject.getAMapping() and
key = keyValue.getKey() and
dstar = subject.getAMapping()
|
n.asCfgNode().getNode() = dstar.getTarget() and
c.(DictionaryElementContent).getKey() = key.getLiteral().(StrConst).getText()
)
}
/**
* class pattern: all keywords read the appropriate attribute from the subject
* syntax (toplevel): `case ClassName(attr = val):`
*/
predicate matchClassReadStep(Node nodeFrom, Content c, Node nodeTo) {
exists(MatchClassPattern subject, MatchKeywordPattern keyword, Name attr, Pattern value |
keyword = subject.getKeyword(_) and
attr = keyword.getAttribute() and
value = keyword.getValue()
|
nodeFrom.asCfgNode().getNode() = subject and
nodeTo.asCfgNode().getNode() = value and
c.(AttributeContent).getAttribute() = attr.getId()
)
}
/** All flow steps associated with match. */
predicate matchFlowStep(Node nodeFrom, Node nodeTo) {
matchSubjectFlowStep(nodeFrom, nodeTo)
or
matchAsFlowStep(nodeFrom, nodeTo)
or
matchOrFlowStep(nodeFrom, nodeTo)
or
matchLiteralFlowStep(nodeFrom, nodeTo)
or
matchCaptureFlowStep(nodeFrom, nodeTo)
or
matchValueFlowStep(nodeFrom, nodeTo)
or
matchMappingFlowStep(nodeFrom, nodeTo)
}
/** All read steps associated with match. */
predicate matchReadStep(Node nodeFrom, Content c, Node nodeTo) {
matchClassReadStep(nodeFrom, c, nodeTo)
or
matchSequenceReadStep(nodeFrom, c, nodeTo)
or
matchMappingReadStep(nodeFrom, c, nodeTo)
or
matchStarReadStep(nodeFrom, c, nodeTo)
}
/** All store steps associated with match. */
predicate matchStoreStep(Node nodeFrom, Content c, Node nodeTo) {
matchStarStoreStep(nodeFrom, c, nodeTo)
}
/**
* All clear steps associated with match
*/
predicate matchClearStep(Node n, Content c) { matchMappingClearStep(n, c) }

View File

@@ -1,138 +0,0 @@
# Using the shared dataflow library
## File organisation
The files currently live in `experimental` (whereas the existing implementation lives in `semmle\python\dataflow`).
In there is found `DataFlow.qll`, `DataFlow2.qll` etc. which refer to `internal\DataFlowImpl`, `internal\DataFlowImpl2` etc. respectively. The `DataFlowImplN`-files are all identical copies to avoid mutual recursion. They start off by including two files `internal\DataFlowImplCommon` and `internal\DataFlowImplSpecific`. The former contains all the language-agnostic definitions, while the latter is where we describe our favorite language. `Sepcific` simply forwards to two other files `internal\DataFlowPrivate.qll` and `internal\DataFlowPublic.qll`. Definitions in the former will be hidden behind a `private` modifier, while those in the latter can be referred to in data flow queries. For instance, the definition of `DataFlow::Node` should likely be in `DataFlowPublic.qll`.
## Define the dataflow graph
In order to use the dataflow library, we need to define the dataflow graph,
that is define the nodes and the edges.
### Define the nodes
The nodes are defined in the type `DataFlow::Node` (found in `DataFlowPublic.qll`).
This should likely be an IPA type, so we can extend it as needed.
Typical cases needed to construct the call graph include
- argument node
- parameter node
- return node
Typical extensions include
- postupdate nodes
- implicit `this`-nodes
### Define the edges
The edges split into local flow (within a function) and global flow (the call graph, between functions/procedures).
Extra flow, such as reading from and writing to global variables, can be captured in `jumpStep`.
The local flow should be obtainalble from an SSA computation.
Local flow nodes are generally either control flow nodes or SSA variables.
Flow from control flow nodes to SSA variables comes from SSA variable definitions, while flow from SSA variables to control flow nodes comes from def-use pairs.
The global flow should be obtainable from a `PointsTo` analysis. It is specified via `viableCallable` and
`getAnOutNode`. Consider making `ReturnKind` a singleton IPA type as in java.
Global flow includes local flow within a consistent call context. Thus, for local flow to count as global flow, all relevant nodes should implement `getEnclosingCallable`.
If complicated dispatch needs to be modelled, try using the `[reduced|pruned]viable*` predicates.
## Field flow
To track flow through fields we need to provide a model of fields, that is the `Content` class.
Field access is specified via `read_step` and `store_step`.
Work is being done to make field flow handle lists and dictionaries and the like.
`PostUpdateNode`s become important when field flow is used, as they track modifications to fields resulting from function calls.
## Type pruning
If type information is available, flows can be discarded on the grounds of type mismatch.
Tracked types are given by the class `DataFlowType` and the predicate `getTypeBound`, and compatibility is recorded in the predicate `compatibleTypes`.
If type pruning is not used, `compatibleTypes` should be implemented as `any`; if it is implemented, say, as `none`, all flows will be pruned.
Further, possible casts are given by the class `CastNode`.
---
# Plan
## Stage I, data flow
### Phase 0, setup
Define minimal IPA type for `DataFlow::Node`
Define all required predicates empty (via `none()`),
except `compatibleTypes` which should be `any()`.
Define `ReturnKind`, `DataFlowType`, and `Content` as singleton IPA types.
### Phase 1, local flow
Implement `simpleLocalFlowStep` based on the existing SSA computation
### Phase 2, local flow
Implement `viableCallable` and `getAnOutNode` based on the existing predicate `PointsTo`.
### Phase 3, field flow
Redefine `Content` and implement `read_step` and `store_step`.
Review use of post-update nodes.
### Phase 4, type pruning
Use type trackers to obtain relevant type information and redefine `DataFlowType` to contain appropriate cases. Record the type information in `getTypeBound`.
Implement `compatibleTypes` (perhaps simply as the identity).
If necessary, re-implement `getErasedRepr` and `ppReprType`.
If necessary, redefine `CastNode`.
### Phase 5, bonus
Review possible use of `[reduced|pruned]viable*` predicates.
Review need for more elaborate `ReturnKind`.
Review need for non-empty `jumpStep`.
Review need for non-empty `isUnreachableInCall`.
## Stage II, taint tracking
# Phase 0, setup
Implement all predicates empty.
# Phase 1, experiments
Try recovering an existing taint tracking query by implementing sources, sinks, sanitizers, and barriers.
---
# Status
## Achieved
- Copy of shared library; implemented enough predicates to make it compile.
- Simple flow into, out of, and through functions.
- Some tests, in particular a sceleton for something comprehensive.
## TODO
- Implementation has largely been done by finding a plausibly-sounding predicate in the python library to refer to. We should review that we actually have the intended semantics in all places.
- Comprehensive testing.
- The regression tests track the value of guards in order to eliminate impossible data flow. We currently have regressions because of this. We cannot readily replicate the existing method, as it uses the interdefinedness of data flow and taint tracking (there is a boolean taint kind). C++ [does something similar](https://github.com/github/codeql/blob/master/cpp/ql/src/semmle/code/cpp/controlflow/internal/ConstantExprs.qll#L27-L36) for eliminating impossible control flow, which we might be able to replicate (they infer values of "interesting" control flow nodes, which are those needed to determine values of guards).
- Flow for some syntactic constructs are done via extra taint steps in the existing implementation, we should find a way to get data flow for it. Some of this should be covered by field flow.
- A document is being written about proper use of the shared data flow library, this should be adhered to. In particular, we should consider replacing def-use with def-to-first-use and use-to-next-use in local flow.
- We seem to get duplicated results for global flow, as well as flow with and without type (so four times the "unique" results).
- We currently consider control flow nodes like exit nodes for functions, we should probably filter down which ones are of interest.
- We should probably override ToString for a number of data flow nodes.
- Test flow through classes, constructors and methods.
- What happens with named arguments? What does C# do?
- What should the enclosable callable for global variables be? C++ [makes it the variable itself](https://github.com/github/codeql/blob/master/cpp/ql/src/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll#L417), C# seems to not have nodes for these but only for their reads and writes.
- Is `yield` another return type? If not, how is it handled?
- Should `OutNode` include magic function calls?
- Consider creating an internal abstract class for nodes as C# does. Among other things, this can help the optimizer by stating that `getEnclosingCallable` [is functional](https://github.com/github/codeql/blob/master/csharp/ql/src/semmle/code/csharp/dataflow/internal/DataFlowPublic.qll#L62).

View File

@@ -109,16 +109,6 @@ abstract class Configuration extends DataFlow::Configuration {
/** Holds if taint propagation into `node` is prohibited. */
predicate isSanitizerIn(DataFlow::Node node) { none() }
/**
* Holds if taint propagation into `node` is prohibited when the flow state is
* `state`.
*/
predicate isSanitizerIn(DataFlow::Node node, DataFlow::FlowState state) { none() }
final override predicate isBarrierIn(DataFlow::Node node, DataFlow::FlowState state) {
this.isSanitizerIn(node, state)
}
final override predicate isBarrierIn(DataFlow::Node node) { this.isSanitizerIn(node) }
/** Holds if taint propagation out of `node` is prohibited. */
@@ -126,16 +116,6 @@ abstract class Configuration extends DataFlow::Configuration {
final override predicate isBarrierOut(DataFlow::Node node) { this.isSanitizerOut(node) }
/**
* Holds if taint propagation out of `node` is prohibited when the flow state is
* `state`.
*/
predicate isSanitizerOut(DataFlow::Node node, DataFlow::FlowState state) { none() }
final override predicate isBarrierOut(DataFlow::Node node, DataFlow::FlowState state) {
this.isSanitizerOut(node, state)
}
/** Holds if taint propagation through nodes guarded by `guard` is prohibited. */
predicate isSanitizerGuard(DataFlow::BarrierGuard guard) { none() }
@@ -181,7 +161,7 @@ abstract class Configuration extends DataFlow::Configuration {
this.isAdditionalTaintStep(node1, state1, node2, state2)
}
override predicate allowImplicitRead(DataFlow::Node node, DataFlow::Content c) {
override predicate allowImplicitRead(DataFlow::Node node, DataFlow::ContentSet c) {
(this.isSink(node) or this.isAdditionalTaintStep(node, _)) and
defaultImplicitTaintRead(node, c)
}

View File

@@ -109,16 +109,6 @@ abstract class Configuration extends DataFlow::Configuration {
/** Holds if taint propagation into `node` is prohibited. */
predicate isSanitizerIn(DataFlow::Node node) { none() }
/**
* Holds if taint propagation into `node` is prohibited when the flow state is
* `state`.
*/
predicate isSanitizerIn(DataFlow::Node node, DataFlow::FlowState state) { none() }
final override predicate isBarrierIn(DataFlow::Node node, DataFlow::FlowState state) {
this.isSanitizerIn(node, state)
}
final override predicate isBarrierIn(DataFlow::Node node) { this.isSanitizerIn(node) }
/** Holds if taint propagation out of `node` is prohibited. */
@@ -126,16 +116,6 @@ abstract class Configuration extends DataFlow::Configuration {
final override predicate isBarrierOut(DataFlow::Node node) { this.isSanitizerOut(node) }
/**
* Holds if taint propagation out of `node` is prohibited when the flow state is
* `state`.
*/
predicate isSanitizerOut(DataFlow::Node node, DataFlow::FlowState state) { none() }
final override predicate isBarrierOut(DataFlow::Node node, DataFlow::FlowState state) {
this.isSanitizerOut(node, state)
}
/** Holds if taint propagation through nodes guarded by `guard` is prohibited. */
predicate isSanitizerGuard(DataFlow::BarrierGuard guard) { none() }
@@ -181,7 +161,7 @@ abstract class Configuration extends DataFlow::Configuration {
this.isAdditionalTaintStep(node1, state1, node2, state2)
}
override predicate allowImplicitRead(DataFlow::Node node, DataFlow::Content c) {
override predicate allowImplicitRead(DataFlow::Node node, DataFlow::ContentSet c) {
(this.isSink(node) or this.isAdditionalTaintStep(node, _)) and
defaultImplicitTaintRead(node, c)
}

View File

@@ -109,16 +109,6 @@ abstract class Configuration extends DataFlow::Configuration {
/** Holds if taint propagation into `node` is prohibited. */
predicate isSanitizerIn(DataFlow::Node node) { none() }
/**
* Holds if taint propagation into `node` is prohibited when the flow state is
* `state`.
*/
predicate isSanitizerIn(DataFlow::Node node, DataFlow::FlowState state) { none() }
final override predicate isBarrierIn(DataFlow::Node node, DataFlow::FlowState state) {
this.isSanitizerIn(node, state)
}
final override predicate isBarrierIn(DataFlow::Node node) { this.isSanitizerIn(node) }
/** Holds if taint propagation out of `node` is prohibited. */
@@ -126,16 +116,6 @@ abstract class Configuration extends DataFlow::Configuration {
final override predicate isBarrierOut(DataFlow::Node node) { this.isSanitizerOut(node) }
/**
* Holds if taint propagation out of `node` is prohibited when the flow state is
* `state`.
*/
predicate isSanitizerOut(DataFlow::Node node, DataFlow::FlowState state) { none() }
final override predicate isBarrierOut(DataFlow::Node node, DataFlow::FlowState state) {
this.isSanitizerOut(node, state)
}
/** Holds if taint propagation through nodes guarded by `guard` is prohibited. */
predicate isSanitizerGuard(DataFlow::BarrierGuard guard) { none() }
@@ -181,7 +161,7 @@ abstract class Configuration extends DataFlow::Configuration {
this.isAdditionalTaintStep(node1, state1, node2, state2)
}
override predicate allowImplicitRead(DataFlow::Node node, DataFlow::Content c) {
override predicate allowImplicitRead(DataFlow::Node node, DataFlow::ContentSet c) {
(this.isSink(node) or this.isAdditionalTaintStep(node, _)) and
defaultImplicitTaintRead(node, c)
}

View File

@@ -109,16 +109,6 @@ abstract class Configuration extends DataFlow::Configuration {
/** Holds if taint propagation into `node` is prohibited. */
predicate isSanitizerIn(DataFlow::Node node) { none() }
/**
* Holds if taint propagation into `node` is prohibited when the flow state is
* `state`.
*/
predicate isSanitizerIn(DataFlow::Node node, DataFlow::FlowState state) { none() }
final override predicate isBarrierIn(DataFlow::Node node, DataFlow::FlowState state) {
this.isSanitizerIn(node, state)
}
final override predicate isBarrierIn(DataFlow::Node node) { this.isSanitizerIn(node) }
/** Holds if taint propagation out of `node` is prohibited. */
@@ -126,16 +116,6 @@ abstract class Configuration extends DataFlow::Configuration {
final override predicate isBarrierOut(DataFlow::Node node) { this.isSanitizerOut(node) }
/**
* Holds if taint propagation out of `node` is prohibited when the flow state is
* `state`.
*/
predicate isSanitizerOut(DataFlow::Node node, DataFlow::FlowState state) { none() }
final override predicate isBarrierOut(DataFlow::Node node, DataFlow::FlowState state) {
this.isSanitizerOut(node, state)
}
/** Holds if taint propagation through nodes guarded by `guard` is prohibited. */
predicate isSanitizerGuard(DataFlow::BarrierGuard guard) { none() }
@@ -181,7 +161,7 @@ abstract class Configuration extends DataFlow::Configuration {
this.isAdditionalTaintStep(node1, state1, node2, state2)
}
override predicate allowImplicitRead(DataFlow::Node node, DataFlow::Content c) {
override predicate allowImplicitRead(DataFlow::Node node, DataFlow::ContentSet c) {
(this.isSink(node) or this.isAdditionalTaintStep(node, _)) and
defaultImplicitTaintRead(node, c)
}

View File

@@ -9,7 +9,6 @@
*/
import python
private import semmle.python.pointsto.Base
private import semmle.python.pointsto.PointsTo
private import semmle.python.pointsto.PointsToContext
private import semmle.python.objects.ObjectInternal

View File

@@ -498,13 +498,13 @@ private EssaVariable potential_input(EssaNodeRefinement ref) {
/** An assignment to a variable `v = val` */
class AssignmentDefinition extends EssaNodeDefinition {
ControlFlowNode value;
AssignmentDefinition() {
SsaSource::assignment_definition(this.getSourceVariable(), this.getDefiningNode(), _)
SsaSource::assignment_definition(this.getSourceVariable(), this.getDefiningNode(), value)
}
ControlFlowNode getValue() {
SsaSource::assignment_definition(this.getSourceVariable(), this.getDefiningNode(), result)
}
ControlFlowNode getValue() { result = value }
override string getRepresentation() { result = this.getValue().getNode().toString() }
@@ -764,7 +764,8 @@ class CallsiteRefinement extends EssaNodeRefinement {
/** An implicit (possible) modification of the object referred at a method call */
class MethodCallsiteRefinement extends EssaNodeRefinement {
MethodCallsiteRefinement() {
SsaSource::method_call_refinement(this.getSourceVariable(), _, this.getDefiningNode()) and
SsaSource::method_call_refinement(pragma[only_bind_into](this.getSourceVariable()), _,
this.getDefiningNode()) and
not this instanceof SingleSuccessorGuard
}

View File

@@ -496,8 +496,8 @@ private module SsaComputeImpl {
predicate firstUse(EssaDefinition def, ControlFlowNode use) {
exists(SsaSourceVariable v, BasicBlock b1, int i1, BasicBlock b2, int i2 |
adjacentVarRefs(v, b1, i1, b2, i2) and
definesAt(def, v, b1, i1) and
variableSourceUse(v, use, b2, i2)
definesAt(def, pragma[only_bind_into](v), b1, i1) and
variableSourceUse(pragma[only_bind_into](v), use, b2, i2)
)
or
exists(

View File

@@ -4,7 +4,6 @@
*/
import python
private import semmle.python.pointsto.Base
private import semmle.python.internal.CachedStages
cached

View File

@@ -33,8 +33,8 @@ private module Asyncpg {
string methodName;
SqlExecutionOnConnection() {
methodName in ["copy_from_query", "execute", "fetch", "fetchrow", "fetchval", "executemany"] and
this.calls([connectionPool().getAUse(), connection().getAUse()], methodName)
this = [connectionPool(), connection()].getMember(methodName).getACall() and
methodName in ["copy_from_query", "execute", "fetch", "fetchrow", "fetchval", "executemany"]
}
override DataFlow::Node getSql() {
@@ -51,8 +51,8 @@ private module Asyncpg {
string methodName;
FileAccessOnConnection() {
methodName in ["copy_from_query", "copy_from_table", "copy_to_table"] and
this.calls([connectionPool().getAUse(), connection().getAUse()], methodName)
this = [connectionPool(), connection()].getMember(methodName).getACall() and
methodName in ["copy_from_query", "copy_from_table", "copy_to_table"]
}
// The path argument is keyword only.

View File

@@ -22,7 +22,7 @@ private module CryptographyModel {
* Gets a predefined curve class from
* `cryptography.hazmat.primitives.asymmetric.ec` with a specific key size (in bits).
*/
private API::Node predefinedCurveClass(int keySize) {
API::Node predefinedCurveClass(int keySize) {
exists(string curveName |
result =
API::moduleImport("cryptography")
@@ -73,41 +73,6 @@ private module CryptographyModel {
curveName = "BrainpoolP512R1" and keySize = 512
)
}
/** Gets a reference to a predefined curve class with a specific key size (in bits), as well as the origin of the class. */
private DataFlow::TypeTrackingNode curveClassWithKeySize(
DataFlow::TypeTracker t, int keySize, DataFlow::Node origin
) {
t.start() and
result = predefinedCurveClass(keySize).getAnImmediateUse() and
origin = result
or
exists(DataFlow::TypeTracker t2 |
result = curveClassWithKeySize(t2, keySize, origin).track(t2, t)
)
}
/** Gets a reference to a predefined curve class with a specific key size (in bits), as well as the origin of the class. */
DataFlow::Node curveClassWithKeySize(int keySize, DataFlow::Node origin) {
curveClassWithKeySize(DataFlow::TypeTracker::end(), keySize, origin).flowsTo(result)
}
/** Gets a reference to a predefined curve class instance with a specific key size (in bits), as well as the origin of the class. */
private DataFlow::TypeTrackingNode curveClassInstanceWithKeySize(
DataFlow::TypeTracker t, int keySize, DataFlow::Node origin
) {
t.start() and
result.(DataFlow::CallCfgNode).getFunction() = curveClassWithKeySize(keySize, origin)
or
exists(DataFlow::TypeTracker t2 |
result = curveClassInstanceWithKeySize(t2, keySize, origin).track(t2, t)
)
}
/** Gets a reference to a predefined curve class instance with a specific key size (in bits), as well as the origin of the class. */
DataFlow::Node curveClassInstanceWithKeySize(int keySize, DataFlow::Node origin) {
curveClassInstanceWithKeySize(DataFlow::TypeTracker::end(), keySize, origin).flowsTo(result)
}
}
// ---------------------------------------------------------------------------
@@ -179,9 +144,13 @@ private module CryptographyModel {
DataFlow::Node getCurveArg() { result in [this.getArg(0), this.getArgByName("curve")] }
override int getKeySizeWithOrigin(DataFlow::Node origin) {
this.getCurveArg() = Ecc::curveClassInstanceWithKeySize(result, origin)
or
this.getCurveArg() = Ecc::curveClassWithKeySize(result, origin)
exists(API::Node n |
n = Ecc::predefinedCurveClass(result) and origin = n.getAnImmediateUse()
|
this.getCurveArg() = n.getAUse()
or
this.getCurveArg() = n.getReturn().getAUse()
)
}
// Note: There is not really a key-size argument, since it's always specified by the curve.
@@ -202,9 +171,8 @@ private module CryptographyModel {
}
/** Gets a reference to a Cipher instance using algorithm with `algorithmName`. */
DataFlow::TypeTrackingNode cipherInstance(DataFlow::TypeTracker t, string algorithmName) {
t.start() and
exists(DataFlow::CallCfgNode call | result = call |
API::Node cipherInstance(string algorithmName) {
exists(API::CallNode call | result = call.getReturn() |
call =
API::moduleImport("cryptography")
.getMember("hazmat")
@@ -216,47 +184,6 @@ private module CryptographyModel {
call.getArg(0), call.getArgByName("algorithm")
]
)
or
exists(DataFlow::TypeTracker t2 | result = cipherInstance(t2, algorithmName).track(t2, t))
}
/** Gets a reference to a Cipher instance using algorithm with `algorithmName`. */
DataFlow::Node cipherInstance(string algorithmName) {
cipherInstance(DataFlow::TypeTracker::end(), algorithmName).flowsTo(result)
}
/** Gets a reference to the encryptor of a Cipher instance using algorithm with `algorithmName`. */
DataFlow::TypeTrackingNode cipherEncryptor(DataFlow::TypeTracker t, string algorithmName) {
t.start() and
result.(DataFlow::MethodCallNode).calls(cipherInstance(algorithmName), "encryptor")
or
exists(DataFlow::TypeTracker t2 | result = cipherEncryptor(t2, algorithmName).track(t2, t))
}
/**
* Gets a reference to the encryptor of a Cipher instance using algorithm with `algorithmName`.
*
* You obtain an encryptor by using the `encryptor()` method on a Cipher instance.
*/
DataFlow::Node cipherEncryptor(string algorithmName) {
cipherEncryptor(DataFlow::TypeTracker::end(), algorithmName).flowsTo(result)
}
/** Gets a reference to the dncryptor of a Cipher instance using algorithm with `algorithmName`. */
DataFlow::TypeTrackingNode cipherDecryptor(DataFlow::TypeTracker t, string algorithmName) {
t.start() and
result.(DataFlow::MethodCallNode).calls(cipherInstance(algorithmName), "decryptor")
or
exists(DataFlow::TypeTracker t2 | result = cipherDecryptor(t2, algorithmName).track(t2, t))
}
/**
* Gets a reference to the decryptor of a Cipher instance using algorithm with `algorithmName`.
*
* You obtain an decryptor by using the `decryptor()` method on a Cipher instance.
*/
DataFlow::Node cipherDecryptor(string algorithmName) {
cipherDecryptor(DataFlow::TypeTracker::end(), algorithmName).flowsTo(result)
}
/**
@@ -267,11 +194,12 @@ private module CryptographyModel {
string algorithmName;
CryptographyGenericCipherOperation() {
exists(DataFlow::Node object, string method |
object in [cipherEncryptor(algorithmName), cipherDecryptor(algorithmName)] and
method in ["update", "update_into"] and
this.calls(object, method)
)
this =
cipherInstance(algorithmName)
.getMember(["decryptor", "encryptor"])
.getReturn()
.getMember(["update", "update_into"])
.getACall()
}
override Cryptography::CryptographicAlgorithm getAlgorithm() {
@@ -298,9 +226,8 @@ private module CryptographyModel {
}
/** Gets a reference to a Hash instance using algorithm with `algorithmName`. */
private DataFlow::TypeTrackingNode hashInstance(DataFlow::TypeTracker t, string algorithmName) {
t.start() and
exists(DataFlow::CallCfgNode call | result = call |
private API::Node hashInstance(string algorithmName) {
exists(API::CallNode call | result = call.getReturn() |
call =
API::moduleImport("cryptography")
.getMember("hazmat")
@@ -312,13 +239,6 @@ private module CryptographyModel {
call.getArg(0), call.getArgByName("algorithm")
]
)
or
exists(DataFlow::TypeTracker t2 | result = hashInstance(t2, algorithmName).track(t2, t))
}
/** Gets a reference to a Hash instance using algorithm with `algorithmName`. */
DataFlow::Node hashInstance(string algorithmName) {
hashInstance(DataFlow::TypeTracker::end(), algorithmName).flowsTo(result)
}
/**
@@ -328,7 +248,9 @@ private module CryptographyModel {
DataFlow::MethodCallNode {
string algorithmName;
CryptographyGenericHashOperation() { this.calls(hashInstance(algorithmName), "update") }
CryptographyGenericHashOperation() {
this = hashInstance(algorithmName).getMember("update").getACall()
}
override Cryptography::CryptographicAlgorithm getAlgorithm() {
result.matchesName(algorithmName)

View File

@@ -554,7 +554,7 @@ module PrivateDjango {
/** A `django.db.connection` is a PEP249 compliant DB connection. */
class DjangoDbConnection extends PEP249::Connection::InstanceSource {
DjangoDbConnection() { this = connection().getAUse() }
DjangoDbConnection() { this = connection().getAnImmediateUse() }
}
// -------------------------------------------------------------------------

View File

@@ -403,11 +403,8 @@ module Flask {
}
private class RequestAttrMultiDict extends Werkzeug::MultiDict::InstanceSource {
string attr_name;
RequestAttrMultiDict() {
attr_name in ["args", "values", "form", "files"] and
this.(DataFlow::AttrRead).accesses(request().getAUse(), attr_name)
this = request().getMember(["args", "values", "form", "files"]).getAnImmediateUse()
}
}
@@ -421,7 +418,7 @@ module Flask {
// TODO: This approach for identifying member-access is very adhoc, and we should
// be able to do something more structured for providing modeling of the members
// of a container-object.
exists(DataFlow::AttrRead files | files.accesses(request().getAUse(), "files") |
exists(DataFlow::AttrRead files | files = request().getMember("files").getAnImmediateUse() |
this.asCfgNode().(SubscriptNode).getObject() = files.asCfgNode()
or
this.(DataFlow::MethodCallNode).calls(files, "get")
@@ -435,15 +432,13 @@ module Flask {
/** An `Headers` instance that originates from a flask request. */
private class FlaskRequestHeadersInstances extends Werkzeug::Headers::InstanceSource {
FlaskRequestHeadersInstances() {
this.(DataFlow::AttrRead).accesses(request().getAUse(), "headers")
}
FlaskRequestHeadersInstances() { this = request().getMember("headers").getAnImmediateUse() }
}
/** An `Authorization` instance that originates from a flask request. */
private class FlaskRequestAuthorizationInstances extends Werkzeug::Authorization::InstanceSource {
FlaskRequestAuthorizationInstances() {
this.(DataFlow::AttrRead).accesses(request().getAUse(), "authorization")
this = request().getMember("authorization").getAnImmediateUse()
}
}

View File

@@ -35,7 +35,7 @@ private module FlaskSqlAlchemy {
/** Access on a DB resulting in an Engine */
private class DbEngine extends SqlAlchemy::Engine::InstanceSource {
DbEngine() {
this = dbInstance().getMember("engine").getAUse()
this = dbInstance().getMember("engine").getAnImmediateUse()
or
this = dbInstance().getMember("get_engine").getACall()
}
@@ -44,7 +44,7 @@ private module FlaskSqlAlchemy {
/** Access on a DB resulting in a Session */
private class DbSession extends SqlAlchemy::Session::InstanceSource {
DbSession() {
this = dbInstance().getMember("session").getAUse()
this = dbInstance().getMember("session").getAnImmediateUse()
or
this = dbInstance().getMember("create_session").getReturn().getACall()
or

View File

@@ -9,7 +9,6 @@
private import python
private import semmle.python.Concepts
private import semmle.python.ApiGraphs
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.TaintTracking
private import semmle.python.frameworks.internal.InstanceTaintStepsHelper
private import semmle.python.frameworks.Stdlib

View File

@@ -959,13 +959,18 @@ private module StdlibPrivate {
}
}
/** A call to `os.path.samefile` will raise an exception if an `os.stat()` call on either pathname fails. */
/**
* A call to `os.path.samefile` will raise an exception if an `os.stat()` call on either pathname fails.
*
* See https://docs.python.org/3.10/library/os.path.html#os.path.samefile
*/
private class OsPathSamefileCall extends FileSystemAccess::Range, DataFlow::CallCfgNode {
OsPathSamefileCall() { this = OS::path().getMember("samefile").getACall() }
override DataFlow::Node getAPathArgument() {
result in [
this.getArg(0), this.getArgByName("path1"), this.getArg(1), this.getArgByName("path2")
// note that the f1/f2 names doesn't match the documentation, but is what actually works (tested on 3.8.10)
this.getArg(0), this.getArgByName("f1"), this.getArg(1), this.getArgByName("f2")
]
}
}
@@ -2534,6 +2539,56 @@ private module StdlibPrivate {
PathLibOpenCall() { attrbuteName = "open" }
}
/**
* A call to the `link_to`, `hardlink_to`, or `symlink_to` method on a `pathlib.Path` instance.
*
* See
* - https://docs.python.org/3/library/pathlib.html#pathlib.Path.link_to
* - https://docs.python.org/3/library/pathlib.html#pathlib.Path.hardlink_to
* - https://docs.python.org/3/library/pathlib.html#pathlib.Path.symlink_to
*/
private class PathLibLinkToCall extends PathlibFileAccess, API::CallNode {
PathLibLinkToCall() { attrbuteName in ["link_to", "hardlink_to", "symlink_to"] }
override DataFlow::Node getAPathArgument() {
result = super.getAPathArgument()
or
result = this.getParameter(0, "target").getARhs()
}
}
/**
* A call to the `replace` or `rename` method on a `pathlib.Path` instance.
*
* See
* - https://docs.python.org/3/library/pathlib.html#pathlib.Path.replace
* - https://docs.python.org/3/library/pathlib.html#pathlib.Path.rename
*/
private class PathLibReplaceCall extends PathlibFileAccess, API::CallNode {
PathLibReplaceCall() { attrbuteName in ["replace", "rename"] }
override DataFlow::Node getAPathArgument() {
result = super.getAPathArgument()
or
result = this.getParameter(0, "target").getARhs()
}
}
/**
* A call to the `samefile` method on a `pathlib.Path` instance.
*
* See https://docs.python.org/3/library/pathlib.html#pathlib.Path.samefile
*/
private class PathLibSameFileCall extends PathlibFileAccess, API::CallNode {
PathLibSameFileCall() { attrbuteName = "samefile" }
override DataFlow::Node getAPathArgument() {
result = super.getAPathArgument()
or
result = this.getParameter(0, "other_path").getARhs()
}
}
/** An additional taint steps for objects of type `pathlib.Path` */
private class PathlibPathTaintStep extends TaintTracking::AdditionalTaintStep {
override predicate step(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {

View File

@@ -204,7 +204,7 @@ private module NotExposed {
FindSubclassesSpec spec, string newSubclassQualified, ClassExpr classExpr, Module mod,
Location loc
) {
classExpr = newOrExistingModeling(spec).getASubclass*().getAUse().asExpr() and
classExpr = newOrExistingModeling(spec).getASubclass*().getAnImmediateUse().asExpr() and
classExpr.getScope() = mod and
newSubclassQualified = mod.getName() + "." + classExpr.getName() and
loc = classExpr.getLocation() and

View File

@@ -136,6 +136,7 @@ class PackageObjectInternal extends ModuleObjectInternal, TPackageObject {
/** Gets the init module of this package */
PythonModuleObjectInternal getInitModule() { result = TPythonModule(this.getSourceModule()) }
/** Holds if the folder for this package has no init module. */
predicate hasNoInitModule() {
exists(Folder f |
f = this.getFolder() and

View File

@@ -2129,7 +2129,7 @@ module Conditionals {
/** INTERNAL: Do not use. */
predicate declaredAttributeVar(PythonClassObjectInternal cls, string name, EssaVariable var) {
name = var.getName() and
var.getAUse() = cls.getScope().getANormalExit()
pragma[only_bind_into](pragma[only_bind_into](var).getAUse()) = cls.getScope().getANormalExit()
}
cached

View File

@@ -75,7 +75,7 @@ private string canonical_name(API::Node flag) {
*/
private DataFlow::TypeTrackingNode re_flag_tracker(string flag_name, DataFlow::TypeTracker t) {
t.start() and
exists(API::Node flag | flag_name = canonical_name(flag) and result = flag.getAUse())
exists(API::Node flag | flag_name = canonical_name(flag) and result = flag.getAnImmediateUse())
or
exists(BinaryExprNode binop, DataFlow::Node operand |
operand.getALocalSource() = re_flag_tracker(flag_name, t.continue()) and

View File

@@ -3,7 +3,6 @@ import semmle.python.types.Exceptions
private import semmle.python.pointsto.PointsTo
private import semmle.python.objects.Callables
private import semmle.python.libraries.Zope
private import semmle.python.pointsto.Base
private import semmle.python.objects.ObjectInternal
private import semmle.python.types.Builtins

View File

@@ -1,5 +1,4 @@
import python
private import semmle.python.objects.ObjectAPI
private import semmle.python.objects.ObjectInternal
private import semmle.python.types.Builtins
private import semmle.python.internal.CachedStages

View File

@@ -2,7 +2,6 @@ import python
import semmle.python.dataflow.TaintTracking
import semmle.python.web.Http
import semmle.python.web.falcon.General
import semmle.python.security.strings.External
/** https://falcon.readthedocs.io/en/stable/api/request_and_response.html */
deprecated class FalconRequest extends TaintKind {

View File

@@ -2,7 +2,6 @@ import python
import semmle.python.dataflow.TaintTracking
import semmle.python.web.Http
import semmle.python.web.falcon.General
import semmle.python.security.strings.External
/** https://falcon.readthedocs.io/en/stable/api/request_and_response.html */
deprecated class FalconResponse extends TaintKind {

View File

@@ -3,7 +3,6 @@ import semmle.python.dataflow.TaintTracking
import semmle.python.security.strings.Basic
import semmle.python.web.Http
private import semmle.python.web.pyramid.View
private import semmle.python.web.Http
/**
* A pyramid response, which is vulnerable to any sort of

View File

@@ -1,3 +1,9 @@
## 0.1.0
## 0.0.13
## 0.0.12
## 0.0.11
### New Queries

View File

@@ -11,7 +11,6 @@
*/
import python
import semmle.python.SelfAttribute
import Equality
predicate class_stores_to_attribute(ClassValue cls, SelfAttributeStore store, string name) {

View File

@@ -11,7 +11,6 @@
*/
import python
import semmle.python.SelfAttribute
import ClassAttributes
predicate guarded_by_other_attribute(SelfAttributeRead a, CheckClass c) {

View File

@@ -11,7 +11,6 @@
*/
import python
import semmle.python.SelfAttribute
import ClassAttributes
predicate undefined_class_attribute(SelfAttributeRead a, CheckClass c, int line, string name) {

View File

@@ -10,7 +10,6 @@
* @id py/str-format/surplus-argument
*/
import python
import python
import AdvancedFormatting

View File

@@ -10,7 +10,6 @@
import python
import Lexical.CommentedOutCode
import python
from File f, int n
where n = count(CommentedOutCodeLine c | not c.maybeExampleCode() and c.getLocation().getFile() = f)

View File

@@ -10,36 +10,30 @@
*/
import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.ApiGraphs
predicate squareOp(BinaryExpr e) {
e.getOp() instanceof Pow and e.getRight().(IntegerLiteral).getN() = "2"
}
predicate squareMul(BinaryExpr e) {
e.getOp() instanceof Mult and e.getRight().(Name).getId() = e.getLeft().(Name).getId()
}
predicate squareRef(Name e) {
e.isUse() and
exists(SsaVariable v, Expr s | v.getVariable() = e.getVariable() |
s = v.getDefinition().getNode().getParentNode().(AssignStmt).getValue() and
square(s)
DataFlow::ExprNode squareOp() {
exists(BinaryExpr e | e = result.asExpr() |
e.getOp() instanceof Pow and e.getRight().(IntegerLiteral).getN() = "2"
)
}
predicate square(Expr e) {
squareOp(e)
or
squareMul(e)
or
squareRef(e)
DataFlow::ExprNode squareMul() {
exists(BinaryExpr e | e = result.asExpr() |
e.getOp() instanceof Mult and e.getRight().(Name).getId() = e.getLeft().(Name).getId()
)
}
from Call c, BinaryExpr s
DataFlow::ExprNode square() { result in [squareOp(), squareMul()] }
from DataFlow::CallCfgNode c, BinaryExpr s, DataFlow::ExprNode left, DataFlow::ExprNode right
where
c.getFunc().toString() = "sqrt" and
c.getArg(0) = s and
c = API::moduleImport("math").getMember("sqrt").getACall() and
c.getArg(0).asExpr() = s and
s.getOp() instanceof Add and
square(s.getLeft()) and
square(s.getRight())
left.asExpr() = s.getLeft() and
right.asExpr() = s.getRight() and
left.getALocalSource() = square() and
right.getALocalSource() = square()
select c, "Pythagorean calculation with sub-optimal numerics"

View File

@@ -1,7 +1,6 @@
/** Contains predicates concerning when and where files are opened and closed. */
import python
import semmle.python.GuardedControlFlow
import semmle.python.pointsto.Filters
/** Holds if `open` is a call that returns a newly opened file */

View File

@@ -27,9 +27,9 @@ private DataFlow::TypeTrackingNode truthyLiteral(DataFlow::TypeTracker t) {
/** Gets a reference to a truthy literal. */
DataFlow::Node truthyLiteral() { truthyLiteral(DataFlow::TypeTracker::end()).flowsTo(result) }
from DataFlow::CallCfgNode call, DataFlow::Node debugArg
from API::CallNode call, DataFlow::Node debugArg
where
call.getFunction() = Flask::FlaskApp::instance().getMember("run").getAUse() and
call = Flask::FlaskApp::instance().getMember("run").getACall() and
debugArg in [call.getArg(2), call.getArgByName("debug")] and
debugArg = truthyLiteral()
select call,

View File

@@ -21,7 +21,7 @@ Ensure that all required modules and packages can be found when running the extr
</recommendation>
<references>
<li>Semmle Tutorial: <a href="https://help.semmle.com/codeql/codeql-cli/procedures/create-codeql-database.html">Creating a CodeQL database</a>.</li>
<li>CodeQL Tutorial: <a href="https://codeql.github.com/docs/codeql-cli/creating-codeql-databases">Creating CodeQL databases</a>.</li>
</references>

View File

@@ -0,0 +1 @@
## 0.0.12

View File

@@ -0,0 +1 @@
## 0.0.13

View File

@@ -0,0 +1 @@
## 0.1.0

View File

@@ -1,2 +1,2 @@
---
lastReleaseVersion: 0.0.11
lastReleaseVersion: 0.1.0

View File

@@ -0,0 +1,56 @@
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>
<overview>
<p>Extracting files from a malicious zip archive without validating that the destination file path
is within the destination directory can cause files outside the destination directory to be
overwritten, due to the possible presence of directory traversal elements (<code>..</code>) in
archive paths.</p>
<p>Zip archives contain archive entries representing each file in the archive. These entries
include a file path for the entry, but these file paths are not restricted and may contain
unexpected special elements such as the directory traversal element (<code>..</code>). If these
file paths are used to determine an output file to write the contents of the archive item to, then
the file may be written to an unexpected location. This can result in sensitive information being
revealed or deleted, or an attacker being able to influence behavior by modifying unexpected
files.</p>
<p>For example, if a Zip archive contains a file entry <code>..\sneaky-file</code>, and the Zip archive
is extracted to the directory <code>c:\output</code>, then naively combining the paths would result
in an output file path of <code>c:\output\..\sneaky-file</code>, which would cause the file to be
written to <code>c:\sneaky-file</code>.</p>
</overview>
<recommendation>
<p>Ensure that output paths constructed from Zip archive entries are validated
to prevent writing files to unexpected locations.</p>
<p>The recommended way of writing an output file from a Zip archive entry is to call <code>extract()</code> or <code>extractall()</code>.
</p>
</recommendation>
<example>
<p>
In this example an archive is extracted without validating file paths.
</p>
<sample src="zipslip_bad.py" />
<p>To fix this vulnerability, we need to call the function <code>extractall()</code>.
</p>
<sample src="zipslip_good.py" />
</example>
<references>
<li>
Snyk:
<a href="https://snyk.io/research/zip-slip-vulnerability">Zip Slip Vulnerability</a>.
</li>
</references>
</qhelp>

View File

@@ -0,0 +1,22 @@
/**
* @name Arbitrary file write during archive extraction ("Zip Slip")
* @description Extracting files from a malicious archive without validating that the
* destination file path is within the destination directory can cause files outside
* the destination directory to be overwritten.
* @kind path-problem
* @id py/zipslip
* @problem.severity error
* @security-severity 7.5
* @precision high
* @tags security
* external/cwe/cwe-022
*/
import python
import experimental.semmle.python.security.ZipSlip
import DataFlow::PathGraph
from ZipSlipConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Extraction of zipfile from $@", source.getNode(),
"a potentially untrusted source"

View File

@@ -0,0 +1,16 @@
import zipfile
import shutil
def unzip(filename):
with tarfile.open(filename) as zipf:
#BAD : This could write any file on the filesystem.
for entry in zipf:
shutil.copyfile(entry, "/tmp/unpack/")
def unzip4(filename):
zf = zipfile.ZipFile(filename)
filelist = zf.namelist()
for x in filelist:
with zf.open(x) as srcf:
shutil.copyfileobj(srcf, dstfile)

View File

@@ -0,0 +1,10 @@
import zipfile
def unzip(filename, dir):
zf = zipfile.ZipFile(filename)
zf.extractall(dir)
def unzip1(filename, dir):
zf = zipfile.ZipFile(filename)
zf.extract(dir)

View File

@@ -1,8 +1,8 @@
private import python
private import semmle.python.Concepts
private import semmle.python.ApiGraphs
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.RemoteFlowSources
private import semmle.python.dataflow.new.DataFlow
/**
* A data flow source of the client ip obtained according to the remote endpoint identifier specified

View File

@@ -14,6 +14,73 @@ private import semmle.python.dataflow.new.RemoteFlowSources
private import semmle.python.dataflow.new.TaintTracking
private import experimental.semmle.python.Frameworks
/** Provides classes for modeling copying file related APIs. */
module CopyFile {
/**
* A data flow node for copying file.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `CopyFile` instead.
*/
abstract class Range extends DataFlow::Node {
/**
* Gets the argument containing the path.
*/
abstract DataFlow::Node getAPathArgument();
/**
* Gets fsrc argument.
*/
abstract DataFlow::Node getfsrcArgument();
}
}
/**
* A data flow node for copying file.
*
* Extend this class to refine existing API models. If you want to model new APIs,
* extend `CopyFile::Range` instead.
*/
class CopyFile extends DataFlow::Node {
CopyFile::Range range;
CopyFile() { this = range }
DataFlow::Node getAPathArgument() { result = range.getAPathArgument() }
DataFlow::Node getfsrcArgument() { result = range.getfsrcArgument() }
}
/** Provides classes for modeling log related APIs. */
module LogOutput {
/**
* A data flow node for log output.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `LogOutput` instead.
*/
abstract class Range extends DataFlow::Node {
/**
* Get the parameter value of the log output function.
*/
abstract DataFlow::Node getAnInput();
}
}
/**
* A data flow node for log output.
*
* Extend this class to refine existing API models. If you want to model new APIs,
* extend `LogOutput::Range` instead.
*/
class LogOutput extends DataFlow::Node {
LogOutput::Range range;
LogOutput() { this = range }
DataFlow::Node getAnInput() { result = range.getAnInput() }
}
/**
* Since there is both XML module in normal and experimental Concepts,
* we have to rename the experimental module as this.

View File

@@ -14,3 +14,4 @@ private import experimental.semmle.python.libraries.PyJWT
private import experimental.semmle.python.libraries.Python_JWT
private import experimental.semmle.python.libraries.Authlib
private import experimental.semmle.python.libraries.PythonJose
private import experimental.semmle.python.frameworks.CopyFile

View File

@@ -0,0 +1,42 @@
private import python
private import experimental.semmle.python.Concepts
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.ApiGraphs
private module CopyFileImpl {
/**
* The `shutil` module provides methods to copy or move files.
* See:
* - https://docs.python.org/3/library/shutil.html#shutil.copyfile
* - https://docs.python.org/3/library/shutil.html#shutil.copy
* - https://docs.python.org/3/library/shutil.html#shutil.copy2
* - https://docs.python.org/3/library/shutil.html#shutil.copytree
* - https://docs.python.org/3/library/shutil.html#shutil.move
*/
private class CopyFiles extends DataFlow::CallCfgNode, CopyFile::Range {
CopyFiles() {
this =
API::moduleImport("shutil")
.getMember(["copyfile", "copy", "copy2", "copytree", "move"])
.getACall()
}
override DataFlow::Node getAPathArgument() {
result in [this.getArg(0), this.getArgByName("src")]
}
override DataFlow::Node getfsrcArgument() { none() }
}
// TODO: once we have flow summaries, model `shutil.copyfileobj` which copies the content between its' file-like arguments.
// See https://docs.python.org/3/library/shutil.html#shutil.copyfileobj
private class CopyFileobj extends DataFlow::CallCfgNode, CopyFile::Range {
CopyFileobj() { this = API::moduleImport("shutil").getMember("copyfileobj").getACall() }
override DataFlow::Node getfsrcArgument() {
result in [this.getArg(0), this.getArgByName("fsrc")]
}
override DataFlow::Node getAPathArgument() { none() }
}
}

View File

@@ -27,17 +27,8 @@ module ExperimentalFlask {
}
/** Gets a reference to a header instance. */
private DataFlow::LocalSourceNode headerInstance(DataFlow::TypeTracker t) {
t.start() and
result.(DataFlow::AttrRead).getObject().getALocalSource() =
[Flask::Response::classRef(), flaskMakeResponse()].getReturn().getAUse()
or
exists(DataFlow::TypeTracker t2 | result = headerInstance(t2).track(t2, t))
}
/** Gets a reference to a header instance use. */
private DataFlow::Node headerInstance() {
headerInstance(DataFlow::TypeTracker::end()).flowsTo(result)
private DataFlow::LocalSourceNode headerInstance() {
result = [Flask::Response::classRef(), flaskMakeResponse()].getReturn().getAMember().getAUse()
}
/** Gets a reference to a header instance call/subscript */

View File

@@ -119,7 +119,7 @@ private module SaxBasedParsing {
*
* See https://docs.python.org/3.10/library/xml.sax.reader.html#xml.sax.xmlreader.XMLReader.setFeature
*/
class SaxParserSetFeatureCall extends DataFlow::MethodCallNode {
class SaxParserSetFeatureCall extends API::CallNode, DataFlow::MethodCallNode {
SaxParserSetFeatureCall() {
this =
API::moduleImport("xml")
@@ -132,27 +132,9 @@ private module SaxBasedParsing {
// The keyword argument names does not match documentation. I checked (with Python
// 3.9.5) that the names used here actually works.
DataFlow::Node getFeatureArg() { result in [this.getArg(0), this.getArgByName("name")] }
API::Node getFeatureArg() { result = this.getParameter(0, "name") }
DataFlow::Node getStateArg() { result in [this.getArg(1), this.getArgByName("state")] }
}
/** Gets a back-reference to the `setFeature` state argument `arg`. */
private DataFlow::TypeTrackingNode saxParserSetFeatureStateArgBacktracker(
DataFlow::TypeBackTracker t, DataFlow::Node arg
) {
t.start() and
arg = any(SaxParserSetFeatureCall c).getStateArg() and
result = arg.getALocalSource()
or
exists(DataFlow::TypeBackTracker t2 |
result = saxParserSetFeatureStateArgBacktracker(t2, arg).backtrack(t2, t)
)
}
/** Gets a back-reference to the `setFeature` state argument `arg`. */
DataFlow::LocalSourceNode saxParserSetFeatureStateArgBacktracker(DataFlow::Node arg) {
result = saxParserSetFeatureStateArgBacktracker(DataFlow::TypeBackTracker::end(), arg)
API::Node getStateArg() { result = this.getParameter(1, "state") }
}
/**
@@ -163,16 +145,13 @@ private module SaxBasedParsing {
private DataFlow::Node saxParserWithFeatureExternalGesTurnedOn(DataFlow::TypeTracker t) {
t.start() and
exists(SaxParserSetFeatureCall call |
call.getFeatureArg() =
call.getFeatureArg().getARhs() =
API::moduleImport("xml")
.getMember("sax")
.getMember("handler")
.getMember("feature_external_ges")
.getAUse() and
saxParserSetFeatureStateArgBacktracker(call.getStateArg())
.asExpr()
.(BooleanLiteral)
.booleanValue() = true and
call.getStateArg().getAValueReachingRhs().asExpr().(BooleanLiteral).booleanValue() = true and
result = call.getObject()
)
or
@@ -182,16 +161,13 @@ private module SaxBasedParsing {
// take account of that we can set the feature to False, which makes the parser safe again
not exists(SaxParserSetFeatureCall call |
call.getObject() = result and
call.getFeatureArg() =
call.getFeatureArg().getARhs() =
API::moduleImport("xml")
.getMember("sax")
.getMember("handler")
.getMember("feature_external_ges")
.getAUse() and
saxParserSetFeatureStateArgBacktracker(call.getStateArg())
.asExpr()
.(BooleanLiteral)
.booleanValue() = false
call.getStateArg().getAValueReachingRhs().asExpr().(BooleanLiteral).booleanValue() = false
)
}

View File

@@ -0,0 +1,39 @@
import python
import experimental.semmle.python.Concepts
import semmle.python.dataflow.new.DataFlow
import semmle.python.ApiGraphs
import semmle.python.dataflow.new.TaintTracking
class ZipSlipConfig extends TaintTracking::Configuration {
ZipSlipConfig() { this = "ZipSlipConfig" }
override predicate isSource(DataFlow::Node source) {
(
source =
API::moduleImport("zipfile").getMember("ZipFile").getReturn().getMember("open").getACall() or
source =
API::moduleImport("zipfile")
.getMember("ZipFile")
.getReturn()
.getMember("namelist")
.getACall() or
source = API::moduleImport("tarfile").getMember("open").getACall() or
source = API::moduleImport("tarfile").getMember("TarFile").getACall() or
source = API::moduleImport("bz2").getMember("open").getACall() or
source = API::moduleImport("bz2").getMember("BZ2File").getACall() or
source = API::moduleImport("gzip").getMember("GzipFile").getACall() or
source = API::moduleImport("gzip").getMember("open").getACall() or
source = API::moduleImport("lzma").getMember("open").getACall() or
source = API::moduleImport("lzma").getMember("LZMAFile").getACall()
) and
not source.getScope().getLocation().getFile().inStdlib()
}
override predicate isSink(DataFlow::Node sink) {
(
sink = any(CopyFile copyfile).getAPathArgument() or
sink = any(CopyFile copyfile).getfsrcArgument()
) and
not sink.getScope().getLocation().getFile().inStdlib()
}
}

View File

@@ -1,5 +1,5 @@
name: codeql/python-queries
version: 0.0.12-dev
version: 0.1.1-dev
groups:
- python
- queries

View File

@@ -5,6 +5,9 @@
* there is a def/use feature reachable from the root along the given path, and its
* associated data-flow node must start on the same line as the comment.
*
* We also support negative assertions of the form `MISSING: def <path>` or `MISSING: use <path>`, which assert
* that there _isn't_ a node with the given path on the same line.
*
* The query only produces output for failed assertions, meaning that it should have no output
* under normal circumstances.
*

View File

@@ -5,7 +5,7 @@ import semmle.python.ApiGraphs
private DataFlow::TypeTrackingNode module_tracker(TypeTracker t) {
t.start() and
result = API::moduleImport("module").getAUse()
result = API::moduleImport("module").getAnImmediateUse()
or
exists(TypeTracker t2 | result = module_tracker(t2).track(t2, t))
}

View File

@@ -120,7 +120,7 @@ class TrackedSelfTest extends InlineExpectationsTest {
/** Gets a reference to `foo` (fictive module). */
private DataFlow::TypeTrackingNode foo(DataFlow::TypeTracker t) {
t.start() and
result = API::moduleImport("foo").getAUse()
result = API::moduleImport("foo").getAnImmediateUse()
or
exists(DataFlow::TypeTracker t2 | result = foo(t2).track(t2, t))
}
@@ -131,7 +131,7 @@ DataFlow::Node foo() { foo(DataFlow::TypeTracker::end()).flowsTo(result) }
/** Gets a reference to `foo.bar` (fictive module). */
private DataFlow::TypeTrackingNode foo_bar(DataFlow::TypeTracker t) {
t.start() and
result = API::moduleImport("foo.bar").getAUse()
result = API::moduleImport("foo.bar").getAnImmediateUse()
or
t.startInAttr("bar") and
result = foo()
@@ -145,7 +145,7 @@ DataFlow::Node foo_bar() { foo_bar(DataFlow::TypeTracker::end()).flowsTo(result)
/** Gets a reference to `foo.bar.baz` (fictive attribute on `foo.bar` module). */
private DataFlow::TypeTrackingNode foo_bar_baz(DataFlow::TypeTracker t) {
t.start() and
result = API::moduleImport("foo.bar.baz").getAUse()
result = API::moduleImport("foo.bar.baz").getAnImmediateUse()
or
t.startInAttr("baz") and
result = foo_bar()

View File

@@ -0,0 +1,34 @@
edges
| zipslip_bad.py:8:10:8:31 | ControlFlowNode for Attribute() | zipslip_bad.py:10:13:10:17 | SSA variable entry |
| zipslip_bad.py:10:13:10:17 | SSA variable entry | zipslip_bad.py:11:25:11:29 | ControlFlowNode for entry |
| zipslip_bad.py:14:10:14:28 | ControlFlowNode for Attribute() | zipslip_bad.py:16:13:16:17 | SSA variable entry |
| zipslip_bad.py:16:13:16:17 | SSA variable entry | zipslip_bad.py:17:26:17:30 | ControlFlowNode for entry |
| zipslip_bad.py:20:10:20:27 | ControlFlowNode for Attribute() | zipslip_bad.py:22:13:22:17 | SSA variable entry |
| zipslip_bad.py:22:13:22:17 | SSA variable entry | zipslip_bad.py:23:29:23:33 | ControlFlowNode for entry |
| zipslip_bad.py:27:10:27:22 | ControlFlowNode for Attribute() | zipslip_bad.py:29:13:29:13 | SSA variable x |
| zipslip_bad.py:29:13:29:13 | SSA variable x | zipslip_bad.py:30:25:30:25 | ControlFlowNode for x |
| zipslip_bad.py:34:16:34:28 | ControlFlowNode for Attribute() | zipslip_bad.py:35:9:35:9 | SSA variable x |
| zipslip_bad.py:35:9:35:9 | SSA variable x | zipslip_bad.py:37:32:37:32 | ControlFlowNode for x |
nodes
| zipslip_bad.py:8:10:8:31 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| zipslip_bad.py:10:13:10:17 | SSA variable entry | semmle.label | SSA variable entry |
| zipslip_bad.py:11:25:11:29 | ControlFlowNode for entry | semmle.label | ControlFlowNode for entry |
| zipslip_bad.py:14:10:14:28 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| zipslip_bad.py:16:13:16:17 | SSA variable entry | semmle.label | SSA variable entry |
| zipslip_bad.py:17:26:17:30 | ControlFlowNode for entry | semmle.label | ControlFlowNode for entry |
| zipslip_bad.py:20:10:20:27 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| zipslip_bad.py:22:13:22:17 | SSA variable entry | semmle.label | SSA variable entry |
| zipslip_bad.py:23:29:23:33 | ControlFlowNode for entry | semmle.label | ControlFlowNode for entry |
| zipslip_bad.py:27:10:27:22 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| zipslip_bad.py:29:13:29:13 | SSA variable x | semmle.label | SSA variable x |
| zipslip_bad.py:30:25:30:25 | ControlFlowNode for x | semmle.label | ControlFlowNode for x |
| zipslip_bad.py:34:16:34:28 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| zipslip_bad.py:35:9:35:9 | SSA variable x | semmle.label | SSA variable x |
| zipslip_bad.py:37:32:37:32 | ControlFlowNode for x | semmle.label | ControlFlowNode for x |
subpaths
#select
| zipslip_bad.py:11:25:11:29 | ControlFlowNode for entry | zipslip_bad.py:8:10:8:31 | ControlFlowNode for Attribute() | zipslip_bad.py:11:25:11:29 | ControlFlowNode for entry | Extraction of zipfile from $@ | zipslip_bad.py:8:10:8:31 | ControlFlowNode for Attribute() | a potentially untrusted source |
| zipslip_bad.py:17:26:17:30 | ControlFlowNode for entry | zipslip_bad.py:14:10:14:28 | ControlFlowNode for Attribute() | zipslip_bad.py:17:26:17:30 | ControlFlowNode for entry | Extraction of zipfile from $@ | zipslip_bad.py:14:10:14:28 | ControlFlowNode for Attribute() | a potentially untrusted source |
| zipslip_bad.py:23:29:23:33 | ControlFlowNode for entry | zipslip_bad.py:20:10:20:27 | ControlFlowNode for Attribute() | zipslip_bad.py:23:29:23:33 | ControlFlowNode for entry | Extraction of zipfile from $@ | zipslip_bad.py:20:10:20:27 | ControlFlowNode for Attribute() | a potentially untrusted source |
| zipslip_bad.py:30:25:30:25 | ControlFlowNode for x | zipslip_bad.py:27:10:27:22 | ControlFlowNode for Attribute() | zipslip_bad.py:30:25:30:25 | ControlFlowNode for x | Extraction of zipfile from $@ | zipslip_bad.py:27:10:27:22 | ControlFlowNode for Attribute() | a potentially untrusted source |
| zipslip_bad.py:37:32:37:32 | ControlFlowNode for x | zipslip_bad.py:34:16:34:28 | ControlFlowNode for Attribute() | zipslip_bad.py:37:32:37:32 | ControlFlowNode for x | Extraction of zipfile from $@ | zipslip_bad.py:34:16:34:28 | ControlFlowNode for Attribute() | a potentially untrusted source |

View File

@@ -0,0 +1 @@
experimental/Security/CWE-022/ZipSlip.ql

View File

@@ -0,0 +1,39 @@
import tarfile
import shutil
import bz2
import gzip
import zipfile
def unzip(filename):
with tarfile.open(filename) as zipf:
#BAD : This could write any file on the filesystem.
for entry in zipf:
shutil.move(entry, "/tmp/unpack/")
def unzip1(filename):
with gzip.open(filename) as zipf:
#BAD : This could write any file on the filesystem.
for entry in zipf:
shutil.copy2(entry, "/tmp/unpack/")
def unzip2(filename):
with bz2.open(filename) as zipf:
#BAD : This could write any file on the filesystem.
for entry in zipf:
shutil.copyfile(entry, "/tmp/unpack/")
def unzip3(filename):
zf = zipfile.ZipFile(filename)
with zf.namelist() as filelist:
#BAD : This could write any file on the filesystem.
for x in filelist:
shutil.copy(x, "/tmp/unpack/")
def unzip4(filename):
zf = zipfile.ZipFile(filename)
filelist = zf.namelist()
for x in filelist:
with zf.open(x) as srcf:
shutil.copyfileobj(x, "/tmp/unpack/")
import tty # to set the import root so we can identify the standard library

View File

@@ -0,0 +1,14 @@
import zipfile
import tarfile
import shutil
def unzip(filename, dir):
zf = zipfile.ZipFile(filename)
zf.extractall(dir)
def unzip1(filename, dir):
zf = zipfile.ZipFile(filename)
zf.extract(dir)

View File

@@ -1,5 +1,4 @@
import python
import semmle.python.objects.ObjectAPI
from int line, ControlFlowNode f, Value v
where

View File

@@ -1,5 +1,4 @@
import python
import python
import semmle.python.pointsto.PointsTo
import semmle.python.pointsto.PointsToContext
import semmle.python.objects.ObjectInternal

View File

@@ -1,5 +1,4 @@
import python
import semmle.python.pointsto.Base
from ClassObject cls, string name
where class_declares_attribute(cls, name)

View File

@@ -1,5 +1,4 @@
import python
import semmle.python.types.Descriptors
import Util
from ClassMethodObject cm, CallNode call

View File

@@ -1,7 +1,6 @@
import python
import Util
import semmle.python.pointsto.PointsTo
import semmle.python.objects.ObjectInternal
/* This test should return _no_ results. */
predicate relevant_node(ControlFlowNode n) {

View File

@@ -1,7 +1,6 @@
import python
import Util
import semmle.python.pointsto.PointsTo
import semmle.python.objects.ObjectInternal
from ControlFlowNode f, ControlFlowNode x
where PointsTo::pointsTo(f, _, ObjectInternal::unknown(), x)

View File

@@ -1,5 +1,4 @@
import python
import semmle.python.SelfAttribute
from SelfAttributeRead sa, int line, string g, string l
where

View File

@@ -1,5 +1,4 @@
import python
import semmle.python.types.Descriptors
int lineof(Object o) { result = o.getOrigin().getLocation().getStartLine() }

View File

@@ -1,5 +1,4 @@
import python
import semmle.python.types.Descriptors
from PropertyValue p, string method_name, FunctionValue method
where

View File

@@ -21,3 +21,15 @@ o(name) # $ getAPathArgument=name
wb = p.write_bytes
wb(b"hello") # $ getAPathArgument=p fileWriteData=b"hello"
p.link_to("target") # $ getAPathArgument=p getAPathArgument="target"
p.link_to(target="target") # $ getAPathArgument=p getAPathArgument="target"
p.samefile("other_path") # $ getAPathArgument=p getAPathArgument="other_path"
p.samefile(other_path="other_path") # $ getAPathArgument=p getAPathArgument="other_path"
p.rename("target") # $ getAPathArgument=p getAPathArgument="target"
p.rename(target="target") # $ getAPathArgument=p getAPathArgument="target"
p.replace("target") # $ getAPathArgument=p getAPathArgument="target"
p.replace(target="target") # $ getAPathArgument=p getAPathArgument="target"

View File

@@ -48,6 +48,9 @@ os.path.islink(path="path") # $ getAPathArgument="path"
os.path.ismount("path") # $ getAPathArgument="path"
os.path.ismount(path="path") # $ getAPathArgument="path"
os.path.samefile("f1", "f2") # $ getAPathArgument="f1" getAPathArgument="f2"
os.path.samefile(f1="f1", f2="f2") # $ getAPathArgument="f1" getAPathArgument="f2"
# actual os.path implementations
import posixpath
import ntpath
@@ -269,4 +272,4 @@ shutil.copystat("src", "dst") # $ getAPathArgument="src" getAPathArgument="dst"
shutil.copystat(src="src", dst="dst") # $ getAPathArgument="src" getAPathArgument="dst"
shutil.disk_usage("path") # $ getAPathArgument="path"
shutil.disk_usage(path="path") # $ getAPathArgument="path"
shutil.disk_usage(path="path") # $ getAPathArgument="path"

View File

@@ -1,4 +1,4 @@
WARNING: Type CommandSink has been deprecated and may be removed in future (CommandSinks.ql:5,6-17)
WARNING: Type CommandSink has been deprecated and may be removed in future (CommandSinks.ql:4,6-17)
| fabric_v1_test.py:8:7:8:28 | FabricV1Commands | externally controlled string |
| fabric_v1_test.py:9:5:9:27 | FabricV1Commands | externally controlled string |
| fabric_v1_test.py:10:6:10:38 | FabricV1Commands | externally controlled string |

View File

@@ -1,6 +1,5 @@
import python
import semmle.python.security.injection.Command
import semmle.python.security.strings.Untrusted
from CommandSink sink, TaintKind kind
where sink.sinks(kind)

View File

@@ -2,7 +2,6 @@ import python
import semmle.python.security.injection.Sql
import semmle.python.web.django.Db
import semmle.python.web.django.Model
import semmle.python.security.strings.Untrusted
from SqlInjectionSink sink, TaintKind kind
where sink.sinks(kind)

Some files were not shown because too many files have changed in this diff Show More