Files
codeql/go/ql/lib/semmle/go/dataflow/ExternalFlow.qll
2026-03-27 22:15:45 +00:00

550 lines
22 KiB
Plaintext

/**
* INTERNAL use only. This is an experimental API subject to change without notice.
*
* Provides classes and predicates for dealing with flow models specified
* in data extensions and CSV format.
*
* The extensible relations have the following columns:
* - Sources:
* `package; type; subtypes; name; signature; ext; output; kind; provenance`
* - Sinks:
* `package; type; subtypes; name; signature; ext; input; kind; provenance`
* - Summaries:
* `package; type; subtypes; name; signature; ext; input; output; kind; provenance`
* - Barriers:
* `package; type; subtypes; name; signature; ext; output; kind; provenance`
* - BarrierGuards:
* `package; type; subtypes; name; signature; ext; input; acceptingValue; kind; provenance`
* - Neutrals:
* `package; type; name; signature; kind; provenance`
* A neutral is used to indicate that a callable is neutral with respect to flow (no summary), source (is not a source) or sink (is not a sink).
*
* The interpretation of a row is similar to API-graphs with a left-to-right
* reading.
* 1. The `package` column selects a package. Note that if the package does not
* contain a major version suffix (like "/v2") then we will match all major
* versions. This can be disabled by putting `fixed-version:` at the start
* of the package path. Also, instead of a package path, if this column is
* "group:<groupname>" then it indicates that the row applies to all
* packages in the group `<groupname>` according to the `packageGrouping`
* predicate.
* 2. The `type` column selects a type within that package.
* 3. The `subtypes` column is a boolean that controls what restrictions we
* place on the type `t` of the selector base when accessing a field or
* calling a method. When it is false, `t` must be the exact type specified
* by this row. When it is true, `t` may be a type which embeds the specified
* type, and for interface methods `t` may be a type which implements the
* interface.
* 4. The `name` column optionally selects a specific named member of the type.
* 5. The `signature` column is always empty.
* 6. The `ext` column is always empty.
* 7. The `input` column specifies how data enters the element selected by the
* first 6 columns, and the `output` column specifies how data leaves the
* element selected by the first 6 columns. An `input` can be either "",
* "Argument[n]", or "Argument[n1..n2]":
* - "": Selects a write to the selected element in case this is a field or
* package-level variable.
* - "Argument[n]": Selects an argument in a call to the selected element.
* The arguments are zero-indexed, and `receiver` specifies the receiver.
* - "Argument[n1..n2]": Similar to "Argument[n]" but selects any argument
* in the given range. The range is inclusive at both ends.
*
* An `output` can be either "", "Argument[n]", "Argument[n1..n2]", "Parameter",
* "Parameter[n]", "Parameter[n1..n2]", , "ReturnValue", "ReturnValue[n]", or
* "ReturnValue[n1..n2]":
* - "": Selects a read of a selected field or package-level variable.
* - "Argument[n]": Selects the post-update value of an argument in a call to the
* selected element. That is, the value of the argument after the call returns.
* The arguments are zero-indexed, and `receiver` specifies the receiver.
* - "Argument[n1..n2]": Similar to "Argument[n]" but select any argument in
* the given range. The range is inclusive at both ends.
* - "Parameter": Selects the value of a parameter of the selected element.
* - "Parameter[n]": Similar to "Parameter" but restricted to a specific
* numbered parameter (zero-indexed, and `receiver` specifies the receiver).
* - "Parameter[n1..n2]": Similar to "Parameter[n]" but selects any parameter
* in the given range. The range is inclusive at both ends.
* - "ReturnValue": Selects the first value being returned by the selected
* element. This requires that the selected element is a method with a
* body.
* - "ReturnValue[n]": Similar to "ReturnValue" but selects the specified
* return value. The return values are zero-indexed
* - "ReturnValue[n1..n2]": Similar to "ReturnValue[n]" but selects any
* return value in the given range. The range is inclusive at both ends.
*
* For summaries, `input` and `output` may be suffixed by any number of the
* following, separated by ".":
* - "Field[pkg.className.fieldname]": Selects the contents of the field `f`
* which satisfies `f.hasQualifiedName(pkg, className, fieldname)`.
* - "SyntheticField[f]": Selects the contents of the synthetic field `f`.
* - "ArrayElement": Selects an element in an array or slice.
* - "Element": Selects an element in a collection.
* - "MapKey": Selects a key in a map.
* - "MapValue": Selects a value in a map.
* - "Dereference": Selects the value referenced by a pointer.
*
* 8. The `acceptingValue` column of barrier guard models specifies the condition
* under which the guard blocks flow. It can be one of "true" or "false". In
* the future "no-exception", "not-zero", "null", "not-null" may be supported.
* 9. The `kind` column is a tag that can be referenced from QL to determine to
* which classes the interpreted elements should be added. For example, for
* sources "remote" indicates a default remote flow source, and for summaries
* "taint" indicates a default additional taint step and "value" indicates a
* globally applicable value-preserving step.
* 10. The `provenance` column is a tag to indicate the origin and verification of a model.
* The format is {origin}-{verification} or just "manual" where the origin describes
* the origin of the model and verification describes how the model has been verified.
* Some examples are:
* - "df-generated": The model has been generated by the model generator tool.
* - "df-manual": The model has been generated by the model generator and verified by a human.
* - "manual": The model has been written by hand.
* This information is used in a heuristic for dataflow analysis to determine, if a
* model or source code should be used for determining flow.
*/
overlay[local?]
module;
private import go
private import internal.ExternalFlowExtensions::Extensions as Extensions
private import FlowSummary as FlowSummary
private import internal.DataFlowPrivate
private import internal.FlowSummaryImpl
private import internal.FlowSummaryImpl::Public as Public
private import internal.FlowSummaryImpl::Private
private import internal.FlowSummaryImpl::Private::External
private import codeql.mad.ModelValidation as SharedModelVal
private import codeql.mad.static.ModelsAsData as SharedMaD
private module MadInput implements SharedMaD::InputSig {
string namespaceSegmentSeparator() { result = "/" }
bindingset[p]
string cleanNamespace(string p) {
exists(string noPrefix |
p = fixedVersionPrefix() + noPrefix
or
not p = fixedVersionPrefix() + any(string s) and
noPrefix = p
|
result = noPrefix.regexpReplaceAll(majorVersionSuffixRegex(), "")
)
}
}
private module MaD = SharedMaD::ModelsAsData<Extensions, MadInput>;
import MaD
module FlowExtensions = Extensions;
/** Gets the prefix for a group of packages. */
private string groupPrefix() { result = "group:" }
/** Provides a query predicate to check the MaD models for validation errors. */
module ModelValidation {
private import codeql.dataflow.internal.AccessPathSyntax as AccessPathSyntax
private predicate getRelevantAccessPath(string path) {
summaryModel(_, _, _, _, _, _, path, _, _, _, _) or
summaryModel(_, _, _, _, _, _, _, path, _, _, _) or
sinkModel(_, _, _, _, _, _, path, _, _, _) or
sourceModel(_, _, _, _, _, _, path, _, _, _) or
barrierModel(_, _, _, _, _, _, path, _, _, _) or
barrierGuardModel(_, _, _, _, _, _, path, _, _, _, _)
}
private module MkAccessPath = AccessPathSyntax::AccessPath<getRelevantAccessPath/1>;
class AccessPath = MkAccessPath::AccessPath;
class AccessPathToken = MkAccessPath::AccessPathToken;
private string getInvalidModelInput() {
exists(string pred, AccessPath input, AccessPathToken part |
sinkModel(_, _, _, _, _, _, input, _, _, _) and pred = "sink"
or
barrierGuardModel(_, _, _, _, _, _, input, _, _, _, _) and pred = "barrier guard"
or
summaryModel(_, _, _, _, _, _, input, _, _, _, _) and pred = "summary"
|
(
invalidSpecComponent(input, part) and
not part = "" and
not (part = "Argument" and pred = "sink") and
not parseArg(part, _) and
// If the database does not contain any fields/pointer types then no
// FieldContent/PointerContent exists, so we spuriously think that
// these spec components are invalid.
not part.getName() = ["Field", "Dereference"]
or
part = input.getToken(0) and
parseParam(part, _)
or
invalidIndexComponent(input, part)
) and
result = "Unrecognized input specification \"" + part + "\" in " + pred + " model."
)
}
private string getInvalidModelOutput() {
exists(string pred, AccessPath output, AccessPathToken part |
sourceModel(_, _, _, _, _, _, output, _, _, _) and pred = "source"
or
barrierModel(_, _, _, _, _, _, output, _, _, _) and pred = "barrier"
or
summaryModel(_, _, _, _, _, _, _, output, _, _, _) and pred = "summary"
|
(
invalidSpecComponent(output, part) and
not part = "" and
not (part = ["Argument", "Parameter"] and pred = "source") and
// If the database does not contain any fields/pointer types then no
// FieldContent/PointerContent exists, so we spuriously think that
// these spec components are invalid.
not part.getName() = ["Field", "Dereference"]
or
invalidIndexComponent(output, part)
) and
result = "Unrecognized output specification \"" + part + "\" in " + pred + " model."
)
}
private module KindValConfig implements SharedModelVal::KindValidationConfigSig {
predicate summaryKind(string kind) { summaryModel(_, _, _, _, _, _, _, _, kind, _, _) }
predicate sinkKind(string kind) {
sinkModel(_, _, _, _, _, _, _, kind, _, _)
or
barrierModel(_, _, _, _, _, _, _, kind, _, _)
or
barrierGuardModel(_, _, _, _, _, _, _, _, kind, _, _)
}
predicate sourceKind(string kind) { sourceModel(_, _, _, _, _, _, _, kind, _, _) }
predicate neutralKind(string kind) { neutralModel(_, _, _, _, kind, _) }
}
private module KindVal = SharedModelVal::KindValidation<KindValConfig>;
private string getInvalidModelSignature() {
exists(
string pred, string package, string type, string name, string signature, string ext,
string provenance
|
sourceModel(package, type, _, name, signature, ext, _, _, provenance, _) and pred = "source"
or
sinkModel(package, type, _, name, signature, ext, _, _, provenance, _) and pred = "sink"
or
barrierModel(package, type, _, name, signature, ext, _, _, provenance, _) and pred = "barrier"
or
barrierGuardModel(package, type, _, name, signature, ext, _, _, _, provenance, _) and
pred = "barrier guard"
or
summaryModel(package, type, _, name, signature, ext, _, _, _, provenance, _) and
pred = "summary"
or
neutralModel(package, type, name, signature, _, provenance) and
ext = "" and
pred = "neutral"
|
not package.replaceAll(fixedVersionPrefix(), "").regexpMatch("[a-zA-Z0-9_\\./-]*") and
result = "Dubious package \"" + package + "\" in " + pred + " model."
or
not type.regexpMatch("[a-zA-Z0-9_\\$<>]*") and
result = "Dubious type \"" + type + "\" in " + pred + " model."
or
not name.regexpMatch("[a-zA-Z0-9_]*") and
result = "Dubious name \"" + name + "\" in " + pred + " model."
or
not signature.regexpMatch("|\\([a-zA-Z0-9_\\.\\$<>,\\[\\]]*\\)") and
result = "Dubious signature \"" + signature + "\" in " + pred + " model."
or
not ext.regexpMatch("|Annotated") and
result = "Unrecognized extra API graph element \"" + ext + "\" in " + pred + " model."
or
invalidProvenance(provenance) and
result = "Unrecognized provenance description \"" + provenance + "\" in " + pred + " model."
)
or
exists(string acceptingValue |
barrierGuardModel(_, _, _, _, _, _, _, acceptingValue, _, _, _) and
invalidAcceptingValue(acceptingValue) and
result =
"Unrecognized accepting value description \"" + acceptingValue +
"\" in barrier guard model."
)
}
private string getInvalidPackageGroup() {
exists(string pred, string group, string package |
FlowExtensions::sourceModel(package, _, _, _, _, _, _, _, _, _) and pred = "source"
or
FlowExtensions::sinkModel(package, _, _, _, _, _, _, _, _, _) and pred = "sink"
or
FlowExtensions::barrierModel(package, _, _, _, _, _, _, _, _, _) and pred = "barrier"
or
FlowExtensions::barrierGuardModel(package, _, _, _, _, _, _, _, _, _, _) and
pred = "barrier guard"
or
FlowExtensions::summaryModel(package, _, _, _, _, _, _, _, _, _, _) and
pred = "summary"
or
FlowExtensions::neutralModel(package, _, _, _, _, _) and
pred = "neutral"
|
package = groupPrefix() + group and
not FlowExtensions::packageGrouping(group, _) and
result = "Dubious package group \"" + package + "\" in " + pred + " model."
)
}
/** Holds if some row in a MaD flow model appears to contain typos. */
query predicate invalidModelRow(string msg) {
msg =
[
getInvalidModelSignature(), getInvalidModelInput(), getInvalidModelOutput(),
KindVal::getInvalidModelKind(), getInvalidPackageGroup()
]
}
}
pragma[nomagic]
private predicate elementSpec(
string package, string type, boolean subtypes, string name, string signature, string ext
) {
sourceModel(package, type, subtypes, name, signature, ext, _, _, _, _)
or
sinkModel(package, type, subtypes, name, signature, ext, _, _, _, _)
or
barrierModel(package, type, subtypes, name, signature, ext, _, _, _, _)
or
barrierGuardModel(package, type, subtypes, name, signature, ext, _, _, _, _, _)
or
summaryModel(package, type, subtypes, name, signature, ext, _, _, _, _, _)
or
neutralModel(package, type, name, signature, _, _) and ext = "" and subtypes = false
}
private string fixedVersionPrefix() { result = "fixed-version:" }
/**
* Gets the string for the package path corresponding to `p`, if one exists.
*
* We attempt to account for major version suffixes as follows: if `p` is
* `github.com/a/b/c/d` then we will return any path for a package that was
* imported which matches that, possibly with a major version suffix in it,
* so if `github.com/a/b/c/d/v2` or `github.com/a/b/v3/c/d` were imported then
* they will be in the results. There are two situations where we do not do
* this: (1) when `p` already contains a major version suffix; (2) if `p` has
* `fixed-version:` at the start (which we remove).
*/
bindingset[p]
private string interpretPackage(string p) {
exists(Package pkg | result = pkg.getPath() |
p = fixedVersionPrefix() + result
or
not p = fixedVersionPrefix() + any(string s) and
(
if exists(p.regexpFind(majorVersionSuffixRegex(), 0, _))
then result = p
else p = pkg.getPathWithoutMajorVersionSuffix()
)
)
or
// Special case for built-in functions, which are not in any package, but
// satisfy `hasQualifiedName` with the package path "".
p = "" and result = ""
}
/** Gets the source/sink/summary element corresponding to the supplied parameters. */
cached
SourceSinkInterpretationInput::SourceOrSinkElement interpretElement(
string pkg, string type, boolean subtypes, string name, string signature, string ext
) {
elementSpec(pkg, type, subtypes, name, signature, ext) and
// Go does not need to distinguish functions with signature
signature = "" and
exists(string p | p = interpretPackage(pkg) |
exists(Entity e | result.hasFullInfo(e, p, type, subtypes) |
e.(Field).hasQualifiedName(p, type, name) or
e.(Method).hasQualifiedName(p, type, name)
)
or
subtypes = true and
// p.type is an interface and we include types which implement it
exists(Method m2, string pkg2, string type2 |
m2.getReceiverType().implements(p, type) and
m2.getName() = name and
m2.getReceiverBaseType().hasQualifiedName(pkg2, type2)
|
result.hasFullInfo(m2, pkg2, type2, subtypes)
)
or
type = "" and
exists(Entity e | e.hasQualifiedName(p, name) | result.asOtherEntity() = e)
)
}
private predicate parseField(AccessPathToken c, DataFlow::FieldContent f) {
exists(
string fieldRegex, string qualifiedName, string package, string className, string fieldName
|
c.getName() = "Field" and
qualifiedName = c.getAnArgument() and
fieldRegex = "^(.*)\\.([^.]+)\\.([^.]+)$" and
package = qualifiedName.regexpCapture(fieldRegex, 1) and
className = qualifiedName.regexpCapture(fieldRegex, 2) and
fieldName = qualifiedName.regexpCapture(fieldRegex, 3) and
f.getField().hasQualifiedName(package, className, fieldName)
)
}
/** A string representing a synthetic instance field. */
class SyntheticField extends string {
SyntheticField() { parseSynthField(_, this) }
/**
* Gets the type of this field. The default type is `interface{}`, but this can be
* overridden.
*/
Type getType() { result instanceof EmptyInterfaceType }
}
private predicate parseSynthField(AccessPathToken c, string f) {
c.getName() = "SyntheticField" and
f = c.getAnArgument()
}
/** Holds if the specification component parses as a `Content`. */
predicate parseContent(AccessPathToken component, DataFlow::Content content) {
parseField(component, content)
or
parseSynthField(component, content.(DataFlow::SyntheticFieldContent).getField())
or
component = "ArrayElement" and content instanceof DataFlow::ArrayContent
or
component = "Element" and content instanceof DataFlow::CollectionContent
or
component = "MapKey" and content instanceof DataFlow::MapKeyContent
or
component = "MapValue" and content instanceof DataFlow::MapValueContent
or
component = "Dereference" and content instanceof DataFlow::PointerContent
}
cached
private module Cached {
/**
* Holds if `node` is specified as a source with the given kind in a MaD flow
* model.
*/
cached
predicate sourceNode(DataFlow::Node node, string kind, string model) {
exists(SourceSinkInterpretationInput::InterpretNode n |
isSourceNode(n, kind, model) and n.asNode() = node
)
}
/**
* Holds if `node` is specified as a sink with the given kind in a MaD flow
* model.
*/
cached
predicate sinkNode(DataFlow::Node node, string kind, string model) {
exists(SourceSinkInterpretationInput::InterpretNode n |
isSinkNode(n, kind, model) and n.asNode() = node
)
}
private newtype TKindModelPair =
TMkPair(string kind, string model) { isBarrierGuardNode(_, _, kind, model) }
private boolean convertAcceptingValue(Public::AcceptingValue av) {
av.isTrue() and result = true
or
av.isFalse() and result = false
// Remaining cases are not supported yet, they depend on the shared Guards library.
// or
// av.isNoException() and result.getDualValue().isThrowsException()
// or
// av.isZero() and result.asIntValue() = 0
// or
// av.isNotZero() and result.getDualValue().asIntValue() = 0
// or
// av.isNull() and result.isNullValue()
// or
// av.isNotNull() and result.isNonNullValue()
}
private predicate barrierGuardChecks(DataFlow::Node g, Expr e, boolean gv, TKindModelPair kmp) {
exists(
SourceSinkInterpretationInput::InterpretNode n, Public::AcceptingValue acceptingValue,
string kind, string model
|
isBarrierGuardNode(n, acceptingValue, kind, model) and
n.asNode().asExpr() = e and
kmp = TMkPair(kind, model) and
gv = convertAcceptingValue(acceptingValue)
|
g.asExpr().(CallExpr).getAnArgument() = e // TODO: qualifier?
)
}
/**
* Holds if `node` is specified as a barrier with the given kind in a MaD flow
* model.
*/
cached
predicate barrierNode(DataFlow::Node node, string kind, string model) {
exists(SourceSinkInterpretationInput::InterpretNode n |
isBarrierNode(n, kind, model) and n.asNode() = node
)
or
DataFlow::ParameterizedBarrierGuard<TKindModelPair, barrierGuardChecks/4>::getABarrierNode(TMkPair(kind,
model)) = node
}
}
import Cached
/**
* Holds if `node` is specified as a source with the given kind in a MaD flow
* model.
*/
predicate sourceNode(DataFlow::Node node, string kind) { sourceNode(node, kind, _) }
/**
* Holds if `node` is specified as a sink with the given kind in a MaD flow
* model.
*/
predicate sinkNode(DataFlow::Node node, string kind) { sinkNode(node, kind, _) }
/**
* Holds if `node` is specified as a barrier with the given kind in a MaD flow
* model.
*/
predicate barrierNode(DataFlow::Node node, string kind) { barrierNode(node, kind, _) }
// adapter class for converting Mad summaries to `SummarizedCallable`s
private class SummarizedCallableAdapter extends Public::SummarizedCallable {
string input_;
string output_;
string kind;
Public::Provenance p_;
string model_;
SummarizedCallableAdapter() { summaryElement(this, input_, output_, kind, p_, model_) }
override predicate propagatesFlow(
string input, string output, boolean preservesValue, Public::Provenance p, boolean isExact,
string model
) {
input = input_ and
output = output_ and
(if kind = "value" then preservesValue = true else preservesValue = false) and
p = p_ and
isExact = true and
model = model_
}
}