mirror of
https://github.com/github/codeql.git
synced 2026-04-30 11:15:13 +02:00
Python: Taint tracking setup alá Go
\## TaintFlow sources The class `RemoteFlowSource` is very similarly defined as the other languages [C++](ac22e7950c/cpp/ql/src/semmle/code/cpp/security/FlowSources.qll), [Java](6de612a566/java/ql/src/semmle/code/java/dataflow/FlowSources.qll), [C#](fddbce0b7b/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsources/Remote.qll), [JS](78334af354/javascript/ql/src/semmle/javascript/security/dataflow/RemoteFlowSources.qll), and [Go](24b3133e0c/ql/src/semmle/go/security/FlowSources.qll). There are some minor differences: - Java/C++ defines the class in `FlowSources.qll` - C# uses `csharp/ql/src/semmle/code/csharp/security/dataflow/flowsources/Remote.qll`, and provide `StoredFlowSource` and `LocalFlowSource` in separate classes. - JS uses `RemoteFlowSources.qll`. - JS defines additional predicate `RemoteFlowSource.isUserControlledObject` - Go uses the class name `UntrustedFlowSource`, but still defined in `ql/src/semmle/go/security/FlowSources.qll` - Go uses the `::Range` pattern to allow both extensibility and refinement The big difference is how a RemoteFlowSource is specified: - Java and C# have all subclasses of `RemoteFlowSource` defined in the same file - Go and JS defines subclasses for frameworks in the actual framework `.qll` file, and all frameworks are transitively imported by `import go` or `import javascript` (so subclasses are always in scope). - C++ uses class `RemoteFlowFunction` to do all the heavy lifting (and its subclasses are transitively imported). \### What we will do Use file `RemoteFlowSource.qll`, define subclasses in framework library classes. _Why? Personally I really like it, Go/JS is already doing it, and Tom expressed a preference for doing the same for C# (although that is not what they are doing today)._ Jonas gave this advice: > Whether you split the definitions between multiple files or keep them all in one file, the property you want is that all definitions are included when the abstract class is included. Otherwise you can get unexpected results via transitive includes. We will make imports of all frameworks in the same file that defines `RemoteFlowSource`, as it seems to be the least intrusive change. If that turns out to be a problem, we can also move them to `python.qll` (the other way is not so easy). \## TaintFlow sinks [JS](473787a426/javascript/ql/src/semmle/javascript/Concepts.qll) and [Go](ecff1e6a16/ql/src/semmle/go/Concepts.qll) defines abstract base classes for interesting sinks in `Concepts.qll` (and all uses the `::Range` pattern in Go). I really like this idea, since it allows multiple queries to reuse the same sink definitions, and it makes it _easy_ to discover what default sinks are available. Personally I'm not 100% on board with the naming, but I don't have any good reason to change the naming convention. \## Framework modeling Following the model from Go ([example](https://github.com/github/codeql-go/blob/main/ql/src/semmle/go/frameworks/Gin.qll)), I propose that we make every definition in a framework modeling `private`. This allows some greater flexibility in changing our modeling, since we don't need to think about keeping deprecated versions around for a whole year. It _does_ have the downside that someone writing a query can't reuse the classes/predicates for a framework, but it didn't seem to be too big of a concern. If we need to provide access, we can always make the definitions non-private (the other way is not so easy). \## Customizations Also introduced `Customizations.qll` like in JS/Java/Go (to replace `site.qll`)
This commit is contained in:
20
python/ql/src/experimental/Customizations.qll
Normal file
20
python/ql/src/experimental/Customizations.qll
Normal file
@@ -0,0 +1,20 @@
|
||||
/**
|
||||
* Contains customizations to the standard library.
|
||||
*
|
||||
* This module is imported by `python.qll`, so any customizations defined here automatically
|
||||
* apply to all queries.
|
||||
*
|
||||
* Typical examples of customizations include adding new subclasses of abstract classes such as
|
||||
* the `RemoteFlowSource::Range` and `AdditionalTaintStep` classes associated with the security
|
||||
* queries to model frameworks that are not covered by the standard library.
|
||||
*/
|
||||
|
||||
import python
|
||||
/* General import that is useful */
|
||||
// import experimental.dataflow.DataFlow
|
||||
//
|
||||
/* for extending `TaintTracking::AdditionalTaintStep` */
|
||||
// import experimental.dataflow.TaintTracking
|
||||
//
|
||||
/* for extending `RemoteFlowSource::Range` */
|
||||
// import experimental.dataflow.RemoteFlowSources
|
||||
33
python/ql/src/experimental/dataflow/RemoteFlowSources.qll
Normal file
33
python/ql/src/experimental/dataflow/RemoteFlowSources.qll
Normal file
@@ -0,0 +1,33 @@
|
||||
private import python
|
||||
private import experimental.dataflow.DataFlow
|
||||
// Need to import since frameworks can extend `RemoteFlowSource::Range`
|
||||
private import experimental.semmle.python.Frameworks
|
||||
|
||||
/**
|
||||
* A data flow source of remote user input.
|
||||
*
|
||||
* Extend this class to refine existing API models. If you want to model new APIs,
|
||||
* extend `RemoteFlowSource::Range` instead.
|
||||
*/
|
||||
class RemoteFlowSource extends DataFlow::Node {
|
||||
RemoteFlowSource::Range self;
|
||||
|
||||
RemoteFlowSource() { this = self }
|
||||
|
||||
/** Gets a string that describes the type of this remote flow source. */
|
||||
string getSourceType() { result = self.getSourceType() }
|
||||
}
|
||||
|
||||
/** Provides a class for modeling new sources of remote user input. */
|
||||
module RemoteFlowSource {
|
||||
/**
|
||||
* A data flow source of remote user input.
|
||||
*
|
||||
* Extend this class to model new APIs. If you want to refine existing API models,
|
||||
* extend `RemoteFlowSource` instead.
|
||||
*/
|
||||
abstract class Range extends DataFlow::Node {
|
||||
/** Gets a string that describes the type of this remote flow source. */
|
||||
abstract string getSourceType();
|
||||
}
|
||||
}
|
||||
@@ -6,6 +6,8 @@
|
||||
private import python
|
||||
private import TaintTrackingPrivate
|
||||
private import experimental.dataflow.DataFlow
|
||||
// Need to import since frameworks can extend `AdditionalTaintStep`
|
||||
private import experimental.semmle.python.Frameworks
|
||||
|
||||
// Local taint flow and helpers
|
||||
/**
|
||||
|
||||
40
python/ql/src/experimental/semmle/python/Concepts.qll
Normal file
40
python/ql/src/experimental/semmle/python/Concepts.qll
Normal file
@@ -0,0 +1,40 @@
|
||||
/**
|
||||
* Provides abstract classes representing generic concepts such as file system
|
||||
* access or system command execution, for which individual framework libraries
|
||||
* provide concrete subclasses.
|
||||
*/
|
||||
|
||||
import python
|
||||
private import experimental.dataflow.DataFlow
|
||||
private import experimental.semmle.python.Frameworks
|
||||
|
||||
/**
|
||||
* A data-flow node that executes an operating system command,
|
||||
* for instance by spawning a new process.
|
||||
*
|
||||
* Extend this class to refine existing API models. If you want to model new APIs,
|
||||
* extend `SystemCommandExecution::Range` instead.
|
||||
*/
|
||||
class SystemCommandExecution extends DataFlow::Node {
|
||||
SystemCommandExecution::Range self;
|
||||
|
||||
SystemCommandExecution() { this = self }
|
||||
|
||||
/** Gets the argument that specifies the command to be executed. */
|
||||
DataFlow::Node getCommand() { result = self.getCommand() }
|
||||
}
|
||||
|
||||
/** Provides a class for modeling new system-command execution APIs. */
|
||||
module SystemCommandExecution {
|
||||
/**
|
||||
* A data-flow node that executes an operating system command,
|
||||
* for instance by spawning a new process.
|
||||
*
|
||||
* Extend this class to model new APIs. If you want to refine existing API models,
|
||||
* extend `SystemCommandExecution` instead.
|
||||
*/
|
||||
abstract class Range extends DataFlow::Node {
|
||||
/** Gets the argument that specifies the command to be executed. */
|
||||
abstract DataFlow::Node getCommand();
|
||||
}
|
||||
}
|
||||
6
python/ql/src/experimental/semmle/python/Frameworks.qll
Normal file
6
python/ql/src/experimental/semmle/python/Frameworks.qll
Normal file
@@ -0,0 +1,6 @@
|
||||
/**
|
||||
* Helper file that imports all framework modeling.
|
||||
*/
|
||||
private import experimental.semmle.python.frameworks.Flask
|
||||
private import experimental.semmle.python.frameworks.Django
|
||||
private import experimental.semmle.python.frameworks.Stdlib
|
||||
@@ -0,0 +1,12 @@
|
||||
/**
|
||||
* Provides classes modeling security-relevant aspects of the `django` package.
|
||||
*/
|
||||
|
||||
private import python
|
||||
private import experimental.dataflow.DataFlow
|
||||
private import experimental.dataflow.RemoteFlowSources
|
||||
private import experimental.semmle.python.Concepts
|
||||
|
||||
private module Django {
|
||||
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
/**
|
||||
* Provides classes modeling security-relevant aspects of the `flask` package.
|
||||
*/
|
||||
|
||||
private import python
|
||||
private import experimental.dataflow.DataFlow
|
||||
private import experimental.dataflow.RemoteFlowSources
|
||||
private import experimental.semmle.python.Concepts
|
||||
|
||||
private module Flask {
|
||||
|
||||
}
|
||||
@@ -0,0 +1,9 @@
|
||||
/**
|
||||
* Provides classes modeling security-relevant aspects of the standard libraries.
|
||||
* Note: some modeling is done internally in the dataflow/taint tracking implementation.
|
||||
*/
|
||||
|
||||
private import python
|
||||
private import experimental.dataflow.DataFlow
|
||||
private import experimental.dataflow.RemoteFlowSources
|
||||
private import experimental.semmle.python.Concepts
|
||||
Reference in New Issue
Block a user