Summary of recent meeting.

Perhaps a not-python-specific version of this could go into the shared implementation.
2026-04-29 10:45:15 +02:00 · 2020-06-15 08:01:02 +02:00
parent 375da38765
commit 1af2e56894
1 changed files with 106 additions and 0 deletions
--- a/python/ql/src/semmle/code/python/dataflow/internal/readme.md
+++ b/python/ql/src/semmle/code/python/dataflow/internal/readme.md
@@ -0,0 +1,106 @@
+# Using the shared dataflow library
+
+## File organisation
+
+The files currently live in `semmle/code/python` (whereas the exisitng implementation lives in `semmle/python/dataflow`).
+
+In there is found `DataFlow.qll`, `DataFlow2.qll` etc. which refer to `internal\DataFlowImpl`, `internal\DataFlowImpl2` etc. respectively. The `DataFlowImplN`-files are all identical copies to avoid mutual recursion. They start off by including two files `internal\DataFlowImplCommon` and `internal\DataFlowImplSpecific`. The former contains all the language-agnostic definitions, while the latter is where we describe our favorite language. `Sepcific` simply forwards to two other files `internal/DataFlowPrivate.qll` and `internal/DataFlowPublic.qll`. Definitions in the former will be hidden behind a `private` modifier, while those in the latter can be referred to in data flow queries. For instance, the definition of `DataFlow::Node` should likely be in `DataFlowPublic.qll`.
+
+## Define the dataflow graph
+
+In order to use the dataflow library, we need to define the dataflow graph,
+that is define the nodes and the edges.
+
+### Define the nodes
+
+The nodes are defined in the type `DataFlow::Node` (found in `DataFlowPublic.qll`).
+This should likely be an IPA type, so we can extend it as needed.
+
+Typical cases needed to construct the call graph include
+ - argument node
+ - parameter node
+ - return node
+
+Typical extensions include
+ - postupdate nodes
+ - implicit `this`-nodes
+
+### Define the edges
+
+The edges split into local flow (within a function) and global flow (the call graph, between functions/procedures).
+
+Extra flow, such as reading from and writing to global variables, can be captured in `jumpStep`.
+The local flow should be obtainalble from an SSA computation.
+
+The global flow should be obtainable from a `PointsTo` analysis. It is specified via `viableCallable` and
+`getAnOutNode`. Consider making `ReturnKind` a singleton IPA type as in java.
+
+If complicated dispatch needs to be modelled, try using the `[reduced|pruned]viable*` predicates.
+
+## Field flow
+
+To track flow through fields we need to provide a model of fields, that is the `Content` class.
+
+Field access is specified via `read_step` and `store_step`.
+
+Work is being done to make field flow handle lists and dictionaries and the like.
+
+`PostUpdateNode`s become important when field flow is used, as they track modifications to fields resulting from function calls.
+
+## Type pruning
+
+If type information is available, flows can be discarded on the grounds of type mismatch.
+
+Tracked types are given by the class `DataFlowType` and the predicate `getTypeBound`, and compatibility is recorded in the predicate `compatibleTypes`.
+
+Further, possible casts are given by the class `CastNode`.
+
+---
+
+# Plan
+
+## Stage I, data flow
+
+### Phase 0, setup
+Define minimal IPA type for `DataFlow::Node`
+Define all required predicates empty (via `none()`),
+except `compatibleTypes` which should be `any()`.
+Define `ReturnKind`, `DataFlowType`, and `Content` as singleton IPA types.
+
+
+### Phase 1, local flow
+Implement `simpleLocalFlowStep` based on the existing SSA computation
+
+### Phase 2, local flow
+Implement `viableCallable` and `getAnOutNode` based on the existing predicate `PointsTo`.
+
+### Phase 3, field flow
+Redefine `Content` and implement `read_step` and `store_step`.
+
+Review use of post-update nodes.
+
+### Phase 4, type pruning
+Use type trackers to obtain relevant type information and redefine `DataFlowType` to contain appropriate cases. Record the type information in `getTypeBound`.
+
+Implement `compatibleTypes` (perhaps simply as the identity).
+
+If necessary, re-implement `getErasedRepr` and `ppReprType`.
+
+If necessary, redefine `CastNode`.
+
+### Phase 5, bonus
+Review possible use of `[reduced|pruned]viable*` predicates.
+
+Review need for more elaborate `ReturnKind`.
+
+Review need for non-empty `jumpStep`.
+
+Review need for non-empty `isUnreachableInCall`.
+
+## Stage II, taint tracking
+
+# Phase 0, setup
+Implement all predicates empty.
+
+# Phase 1, experiments
+Try recovering an existing taint tracking query by implementing sources, sinks, sanitizers, and barriers.