mirror of
https://github.com/github/codeql.git
synced 2026-05-16 04:09:27 +02:00
Compare commits
4 Commits
codeql-cli
...
experiment
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a1af496216 | ||
|
|
70b0fe38e3 | ||
|
|
a5a3c047a8 | ||
|
|
1d05f98eb6 |
@@ -0,0 +1,125 @@
|
||||
private import javascript as raw
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* A configuration class for defining known endpoints and endpoint filters for adaptive threat
|
||||
* modeling (ATM). Each boosted query must define its own extension of this abstract class.
|
||||
*
|
||||
* A configuration defines a set of known sources (`isKnownSource`) and sinks (`isKnownSink`).
|
||||
* It must also define a sink endpoint filter (`isEffectiveSink`) that filters candidate sinks
|
||||
* predicted by the machine learning model to a set of effective sinks.
|
||||
*
|
||||
* Optionally, a configuration may also define additional edges beyond the base data flow edges
|
||||
* (`isAdditionalFlowStep`) and sanitizers (`isSanitizer` and `isSanitizerGuard`).
|
||||
*
|
||||
* To get started with ATM, you can copy-paste an implementation of the `DataFlow::Configuration`
|
||||
* class for a standard security query, for example `SqlInjection::Configuration`. Note that if
|
||||
* the security query configuration defines additional edges beyond the standard data flow edges,
|
||||
* such as `NosqlInjection::Configuration`, you may need to replace the definition of
|
||||
* `isAdditionalFlowStep` with a more generalised definition of additional edges. See
|
||||
* `NosqlInjectionATM.ql` for an example of doing this.
|
||||
*
|
||||
* Technical information:
|
||||
*
|
||||
* - Conceptually, this class is very similar to the subclass of `DataFlow::Configuration` that is
|
||||
* used to define the base security query. The reason why we define a new class to provide this
|
||||
* information to ATM is due to performance implications of QL's dispatch behaviour: defining
|
||||
* another `DataFlow::Configuration` instance would slow the evaluation of the boosted query.
|
||||
*
|
||||
* - Furthermore, we cannot use the approach used by the `ForwardExploration` and
|
||||
* `BackwardExploration` modules to implement ATM, since ATM needs access to the sets of sources
|
||||
* and sinks from the *original* dataflow configuration in order to perform similarity search.
|
||||
*/
|
||||
abstract class ATMConfig extends string {
|
||||
bindingset[this]
|
||||
ATMConfig() { any() }
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Holds if `source` is a known source of flow.
|
||||
*/
|
||||
predicate isKnownSource(raw::DataFlow::Node source) { none() }
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Holds if `source` is a known source of flow labeled with `lbl`.
|
||||
*/
|
||||
predicate isKnownSource(raw::DataFlow::Node source, raw::DataFlow::FlowLabel lbl) { none() }
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Holds if `sink` is a known sink of flow.
|
||||
*/
|
||||
predicate isKnownSink(raw::DataFlow::Node sink) { none() }
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Holds if `sink` is a known sink of flow labeled with `lbl`.
|
||||
*/
|
||||
predicate isKnownSink(raw::DataFlow::Node sink, raw::DataFlow::FlowLabel lbl) { none() }
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Holds if the candidate sink `candidateSink` predicted by the machine learning model should be
|
||||
* an effective sink, i.e. one considered as a possible sink of flow in the boosted query.
|
||||
*/
|
||||
abstract predicate isEffectiveSink(raw::DataFlow::Node candidateSink);
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Holds if the intermediate node `node` is a taint sanitizer.
|
||||
*/
|
||||
predicate isSanitizer(raw::DataFlow::Node node) { none() }
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Holds if for the boosted query the data flow node `guard` can act as a sanitizer when
|
||||
* appearing in a condition.
|
||||
*
|
||||
* For example, if `guard` is the comparison expression in
|
||||
* `if(x == 'some-constant'){ ... x ... }`, it could sanitize flow of `x` into the "then"
|
||||
* branch.
|
||||
*/
|
||||
predicate isSanitizerGuard(raw::TaintTracking::SanitizerGuardNode guard) { none() }
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Holds if the additional taint propagation step from `src` to `trg` must be taken into account
|
||||
* in the boosted query.
|
||||
*/
|
||||
predicate isAdditionalTaintStep(raw::DataFlow::Node src, raw::DataFlow::Node trg) { none() }
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Holds if `src -> trg` should be considered as a flow edge in addition to standard data flow
|
||||
* edges in the boosted query.
|
||||
*/
|
||||
predicate isAdditionalFlowStep(
|
||||
raw::DataFlow::Node src, raw::DataFlow::Node trg, raw::DataFlow::FlowLabel inlbl,
|
||||
raw::DataFlow::FlowLabel outlbl
|
||||
) {
|
||||
none()
|
||||
}
|
||||
}
|
||||
|
||||
// To debug the ATMConfig module, import this module by adding "import ATMConfigDebugging" to the
|
||||
// top-level.
|
||||
module ATMConfigDebugging {
|
||||
query predicate knownSources(ATMConfig config, raw::DataFlow::Node source) {
|
||||
config.isKnownSource(source) or config.isKnownSource(source, _)
|
||||
}
|
||||
|
||||
query predicate anchorSinks(ATMConfig config, raw::DataFlow::Node sink) {
|
||||
config.isKnownSink(sink) or config.isKnownSink(sink, _)
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,432 @@
|
||||
external private predicate adaptiveThreatModelingModels(
|
||||
string modelChecksum, string modelLanguage, string modelName, string modelType
|
||||
);
|
||||
|
||||
private import javascript as raw
|
||||
private import raw::DataFlow as DataFlow
|
||||
import ATMConfig
|
||||
|
||||
module ATMEmbeddings {
|
||||
private import CodeToFeatures::DatabaseFeatures as DatabaseFeatures
|
||||
|
||||
class Entity = DatabaseFeatures::Entity;
|
||||
|
||||
/* Currently the only label is a label marking an embedding as derived from an entity in the current database. */
|
||||
private newtype TEmbeddingLabel = TEntityLabel(Entity entity)
|
||||
|
||||
/**
|
||||
* An abstract label that can be used to mark an embedding with the object from which it has been
|
||||
* derived.
|
||||
*/
|
||||
abstract class EmbeddingLabel extends TEmbeddingLabel {
|
||||
abstract string toString();
|
||||
}
|
||||
|
||||
/**
|
||||
* A label marking an embedding as derived from an entity in the current database, i.e. the
|
||||
* database we're running the query on.
|
||||
*/
|
||||
class EntityLabel extends EmbeddingLabel {
|
||||
private Entity entity;
|
||||
|
||||
EntityLabel() { this = TEntityLabel(entity) }
|
||||
|
||||
Entity getEntity() { result = entity }
|
||||
|
||||
override string toString() { result = "EntityLabel(" + entity.toString() + ")" }
|
||||
}
|
||||
|
||||
/**
|
||||
* `entities` relation suitable for passing to the `codeEmbedding` HOP.
|
||||
*
|
||||
* The `codeEmbedding` HOP expects an entities relation with eight columns, while
|
||||
* `DatabaseFeatures` generates one with nine columns.
|
||||
*/
|
||||
predicate entities(
|
||||
Entity entity, string entityName, string entityType, string path, int startLine,
|
||||
int startColumn, int endLine, int endColumn
|
||||
) {
|
||||
DatabaseFeatures::entities(entity, entityName, entityType, path, startLine, startColumn,
|
||||
endLine, endColumn, _)
|
||||
}
|
||||
|
||||
private predicate databaseEmbeddingsByEntity(
|
||||
Entity entity, int embeddingIndex, float embeddingValue
|
||||
) =
|
||||
codeEmbedding(entities/8, DatabaseFeatures::astNodes/5, DatabaseFeatures::nodeAttributes/2,
|
||||
modelChecksum/0)(entity, embeddingIndex, embeddingValue)
|
||||
|
||||
/** Embeddings for each entity in the current database. */
|
||||
predicate databaseEmbeddings(EntityLabel label, int embeddingIndex, float embeddingValue) {
|
||||
exists(Entity entity |
|
||||
databaseEmbeddingsByEntity(entity, embeddingIndex, embeddingValue) and
|
||||
label.getEntity() = entity
|
||||
)
|
||||
}
|
||||
|
||||
/** Checksum of the model that should be used. */
|
||||
string modelChecksum() { adaptiveThreatModelingModels(result, "javascript", _, _) }
|
||||
}
|
||||
|
||||
private module ATMEmbeddingsDebugging {
|
||||
query predicate databaseEmbeddingsDebug = ATMEmbeddings::databaseEmbeddings/3;
|
||||
|
||||
query predicate modelChecksumDebug = ATMEmbeddings::modelChecksum/0;
|
||||
}
|
||||
|
||||
private ATMConfig getCfg() { any() }
|
||||
|
||||
/**
|
||||
* This module provides functionality that takes a sink and provides an entity that encloses that
|
||||
* sink and is suitable for similarity analysis.
|
||||
*/
|
||||
module SinkToEntity {
|
||||
private import CodeToFeatures
|
||||
|
||||
private raw::Function getNamedEnclosingFunction(raw::Function f) {
|
||||
if not exists(f.getName())
|
||||
then result = getNamedEnclosingFunction(f.getEnclosingContainer())
|
||||
else result = f
|
||||
}
|
||||
|
||||
private raw::Function nodeToNamedFunction(DataFlow::Node node) {
|
||||
result = getNamedEnclosingFunction(node.getContainer())
|
||||
}
|
||||
|
||||
/**
|
||||
* We use the innermost named function that encloses a sink, if one exists.
|
||||
* Otherwise, we use the innermost function that encloses the sink.
|
||||
*/
|
||||
private raw::Function sinkToFunction(DataFlow::Node sink) {
|
||||
if exists(raw::Function f | f = nodeToNamedFunction(sink))
|
||||
then result = nodeToNamedFunction(sink)
|
||||
else result = sink.getContainer()
|
||||
}
|
||||
|
||||
private DatabaseFeatures::Entity getFirstExtractedEntity(raw::Function e) {
|
||||
if
|
||||
DatabaseFeatures::entities(result, _, _, _, _, _, _, _, _) and
|
||||
result.getDefinedFunction() = e
|
||||
then any()
|
||||
else result = getFirstExtractedEntity(e.getEnclosingContainer())
|
||||
}
|
||||
|
||||
/** Get an entity enclosing the sink that is suitable for similarity analysis. */
|
||||
DatabaseFeatures::Entity getEntityForSink(DataFlow::Node sink) {
|
||||
result = getFirstExtractedEntity(sinkToFunction(sink))
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* This module provides functionality that takes an entity and provides sink candidates within
|
||||
* that entity.
|
||||
*/
|
||||
module EntityToSinkCandidate {
|
||||
private import CodeToFeatures
|
||||
|
||||
/** Get a sink candidate enclosed within the specified entity. */
|
||||
DataFlow::Node getASinkCandidate(DatabaseFeatures::Entity entity) {
|
||||
getCfg().isEffectiveSink(result) and
|
||||
result.getContainer().getEnclosingContainer*() = entity.getDefinedFunction()
|
||||
}
|
||||
}
|
||||
|
||||
// To debug the EntityToSinkCandidate module, import this module by adding
|
||||
// "import EntityToSinkCandidateDebugging" to the top-level.
|
||||
module EntityToSinkCandidateDebugging {
|
||||
private import CodeToFeatures
|
||||
|
||||
query predicate databaseSinks(DataFlow::Node sink) {
|
||||
exists(DatabaseFeatures::Entity entity |
|
||||
DatabaseFeatures::entities(entity, _, _, _, _, _, _, _, _) and
|
||||
sink = EntityToSinkCandidate::getASinkCandidate(entity)
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
module ATM {
|
||||
import ATMEmbeddings
|
||||
|
||||
private int getNumberOfSinkSemSearchResults() { result = 100000000 }
|
||||
|
||||
private predicate sinkSemSearchResults(
|
||||
EmbeddingLabel searchLabel, EmbeddingLabel resultLabel, float score
|
||||
) =
|
||||
semanticSearch(sinkQueryEmbeddings/3, databaseEmbeddings/3, getNumberOfSinkSemSearchResults/0)(searchLabel,
|
||||
resultLabel, score)
|
||||
|
||||
/** `DataFlow::Configuration` for adaptive threat modeling (ATM). */
|
||||
class Configuration extends raw::TaintTracking::Configuration {
|
||||
Configuration() { this = "AdaptiveThreatModeling" }
|
||||
|
||||
override predicate isSource(DataFlow::Node source) {
|
||||
// Is an existing source
|
||||
getCfg().isKnownSource(source)
|
||||
}
|
||||
|
||||
override predicate isSource(DataFlow::Node source, DataFlow::FlowLabel lbl) {
|
||||
// Is an existing source
|
||||
getCfg().isKnownSource(source, lbl)
|
||||
}
|
||||
|
||||
override predicate isSink(DataFlow::Node sink) {
|
||||
// Is in a result entity that is similar to a known sink-containing entity according to
|
||||
// semantic search
|
||||
exists(Entity resultEntity, EntityLabel resultLabel |
|
||||
sinkSemSearchResults(_, resultLabel, _) and
|
||||
sink = EntityToSinkCandidate::getASinkCandidate(resultEntity) and
|
||||
resultLabel.getEntity() = resultEntity
|
||||
)
|
||||
or
|
||||
// Is an existing sink
|
||||
getCfg().isKnownSource(sink)
|
||||
}
|
||||
|
||||
override predicate isSink(DataFlow::Node sink, DataFlow::FlowLabel lbl) {
|
||||
// Is in a result entity that is similar to a known sink-containing entity according to
|
||||
// semantic search
|
||||
exists(DataFlow::Node originalSink, EntityLabel seedLabel, EntityLabel resultLabel |
|
||||
getCfg().isKnownSink(originalSink, lbl) and
|
||||
seedLabel.getEntity() = SinkToEntity::getEntityForSink(sink) and
|
||||
sinkSemSearchResults(seedLabel, resultLabel, _) and
|
||||
sink = EntityToSinkCandidate::getASinkCandidate(resultLabel.getEntity())
|
||||
)
|
||||
or
|
||||
// Is an existing sink
|
||||
getCfg().isKnownSink(sink, lbl)
|
||||
}
|
||||
|
||||
override predicate isSanitizer(DataFlow::Node node) {
|
||||
super.isSanitizer(node) or
|
||||
getCfg().isSanitizer(node)
|
||||
}
|
||||
|
||||
override predicate isSanitizerGuard(raw::TaintTracking::SanitizerGuardNode guard) {
|
||||
super.isSanitizerGuard(guard) or
|
||||
getCfg().isSanitizerGuard(guard)
|
||||
}
|
||||
|
||||
override predicate isAdditionalTaintStep(DataFlow::Node src, DataFlow::Node trg) {
|
||||
getCfg().isAdditionalTaintStep(src, trg)
|
||||
}
|
||||
|
||||
override predicate isAdditionalFlowStep(
|
||||
DataFlow::Node src, DataFlow::Node trg, DataFlow::FlowLabel inlbl, DataFlow::FlowLabel outlbl
|
||||
) {
|
||||
getCfg().isAdditionalFlowStep(src, trg, inlbl, outlbl)
|
||||
}
|
||||
}
|
||||
|
||||
private Entity getSeedSinkEntity() {
|
||||
exists(DataFlow::Node sink |
|
||||
(getCfg().isKnownSink(sink) or getCfg().isKnownSink(sink, _)) and
|
||||
result = SinkToEntity::getEntityForSink(sink)
|
||||
)
|
||||
}
|
||||
|
||||
private predicate sinkQueryEmbeddings(
|
||||
EmbeddingLabel label, int embeddingIndex, float embeddingValue
|
||||
) {
|
||||
label.(EntityLabel).getEntity() = getSeedSinkEntity() and
|
||||
databaseEmbeddings(label, embeddingIndex, embeddingValue)
|
||||
}
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* This module contains informational predicates about the results returned by adaptive threat
|
||||
* modeling (ATM).
|
||||
*/
|
||||
module ResultsInfo {
|
||||
/**
|
||||
* Holds if the node `source` is a source in the standard security query.
|
||||
*/
|
||||
private predicate isSourceASeed(DataFlow::Node source) {
|
||||
getCfg().isKnownSource(source) or getCfg().isKnownSource(source, _)
|
||||
}
|
||||
|
||||
/**
|
||||
* Holds if the node `sink` is a sink in the standard security query.
|
||||
*/
|
||||
private predicate isSinkASeed(DataFlow::Node sink) {
|
||||
getCfg().isKnownSink(sink) or getCfg().isKnownSink(sink, _)
|
||||
}
|
||||
|
||||
private float scoreForSink(DataFlow::Node sink) {
|
||||
if isSinkASeed(sink)
|
||||
then result = 1.0
|
||||
else
|
||||
result =
|
||||
max(float score |
|
||||
exists(ATMEmbeddings::EntityLabel entityLabel |
|
||||
sinkSemSearchResults(_, entityLabel, score) and
|
||||
sink = EntityToSinkCandidate::getASinkCandidate(entityLabel.getEntity())
|
||||
)
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Returns the score for the flow between the source `source` and the `sink` sink in the
|
||||
* boosted query.
|
||||
*/
|
||||
float scoreForFlow(DataFlow::Node source, DataFlow::Node sink) { result = scoreForSink(sink) }
|
||||
|
||||
/**
|
||||
* Pad a score returned from `scoreForFlow` to a particular length by adding a decimal point
|
||||
* if one does not already exist, and "0"s after that decimal point.
|
||||
*
|
||||
* Note that this predicate must itself define an upper bound on `length`, so that it has a
|
||||
* finite number of results. Currently this is defined as 12.
|
||||
*/
|
||||
private string paddedScore(float score, int length) {
|
||||
// In this definition, we must restrict the values that `length` and `score` can take on so that the
|
||||
// predicate has a finite number of results.
|
||||
score = scoreForFlow(_, _) and
|
||||
length = result.length() and
|
||||
(
|
||||
// We need to make sure the padded score contains a "." so lexically sorting the padded scores is
|
||||
// equivalent to numerically sorting the scores.
|
||||
score.toString().charAt(_) = "." and
|
||||
result = score.toString()
|
||||
or
|
||||
not score.toString().charAt(_) = "." and
|
||||
result = score.toString() + "."
|
||||
)
|
||||
or
|
||||
result = paddedScore(score, length - 1) + "0" and
|
||||
length <= 12
|
||||
}
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Return a string representing the score of the flow between `source` and `sink` in the
|
||||
* boosted query.
|
||||
*
|
||||
* The returned string is a fixed length, such that lexically sorting the strings returned by
|
||||
* this predicate gives the same sort order as numerically sorting the scores of the flows.
|
||||
*/
|
||||
string scoreStringForFlow(DataFlow::Node source, DataFlow::Node sink) {
|
||||
exists(float score |
|
||||
score = scoreForFlow(source, sink) and
|
||||
(
|
||||
// A length of 12 is equivalent to 10 decimal places.
|
||||
score.toString().length() >= 12 and
|
||||
result = score.toString().substring(0, 12)
|
||||
or
|
||||
score.toString().length() < 12 and
|
||||
result = paddedScore(score, 12)
|
||||
)
|
||||
)
|
||||
}
|
||||
|
||||
private ATMEmbeddings::EmbeddingLabel bestSearchLabelsForSink(DataFlow::Node sink) {
|
||||
exists(ATMEmbeddings::EntityLabel resultLabel |
|
||||
sinkSemSearchResults(result, resultLabel, scoreForSink(sink)) and
|
||||
sink = EntityToSinkCandidate::getASinkCandidate(resultLabel.getEntity())
|
||||
)
|
||||
}
|
||||
|
||||
private newtype TEndpointOrigins =
|
||||
TOrigins(boolean isKnown, boolean isSimilarToKnown) {
|
||||
isKnown = [true, false] and isSimilarToKnown = [true, false]
|
||||
}
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* A class representing the origins of an endpoint.
|
||||
*/
|
||||
class EndpointOrigins extends TEndpointOrigins {
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Whether the endpoint is a known endpoint in the database.
|
||||
*/
|
||||
boolean isKnown;
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Whether the endpoint is a predicted endpoint that is near to a known endpoint in
|
||||
* the database.
|
||||
*/
|
||||
boolean isSimilarToKnown;
|
||||
|
||||
EndpointOrigins() { this = TOrigins(isKnown, isSimilarToKnown) }
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* A string listing the origins of a predicted endpoint.
|
||||
*
|
||||
* Origins include:
|
||||
*
|
||||
* - `known`: The endpoint is a known endpoint in the database.
|
||||
* - `similar_to_known`: The endpoint is a predicted endpoint that is similar to a known
|
||||
* endpoint in the database.
|
||||
*/
|
||||
string listOfOriginComponents() {
|
||||
// Ensure that this predicate has exactly one result.
|
||||
result =
|
||||
any(string x | if isKnown = true then x = "known" else x = "") +
|
||||
any(string x | if isKnown = true and isSimilarToKnown = true then x = "," else x = "") +
|
||||
any(string x | if isSimilarToKnown = true then x = "similar_to_known" else x = "")
|
||||
}
|
||||
|
||||
string toString() { result = "EndpointOrigins(" + listOfOriginComponents() + ")" }
|
||||
}
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* The highest-scoring origins of the source.
|
||||
*/
|
||||
EndpointOrigins originsForSource(DataFlow::Node source) {
|
||||
result =
|
||||
TOrigins(any(boolean b | if isSourceASeed(source) then b = true else b = false), false)
|
||||
}
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* The highest-scoring origins of the sink.
|
||||
*/
|
||||
EndpointOrigins originsForSink(DataFlow::Node sink) {
|
||||
result =
|
||||
TOrigins(any(boolean b | if isSinkASeed(sink) then b = true else b = false),
|
||||
any(boolean b |
|
||||
if
|
||||
not isSinkASeed(sink) and
|
||||
exists(ATMEmbeddings::EntityLabel label | label = bestSearchLabelsForSink(sink))
|
||||
then b = true
|
||||
else b = false
|
||||
))
|
||||
}
|
||||
|
||||
/**
|
||||
* EXPERIMENTAL. This API may change in the future.
|
||||
*
|
||||
* Indicates whether the flow from source to sink is likely to be reported by the base security
|
||||
* query.
|
||||
*
|
||||
* Currently this is a heuristic: it ignores potential differences in the definitions of
|
||||
* additional flow steps.
|
||||
*/
|
||||
predicate isFlowLikelyInBaseQuery(DataFlow::Node source, DataFlow::Node sink) {
|
||||
isSourceASeed(source) and isSinkASeed(sink)
|
||||
}
|
||||
}
|
||||
|
||||
// To debug the ATM module, import this module by adding "import ATM::Debugging" to the top-level.
|
||||
module Debugging {
|
||||
query predicate sinkSemSearchResultsDebug = sinkSemSearchResults/3;
|
||||
|
||||
query predicate atmSources(DataFlow::Node source) {
|
||||
any(ATM::Configuration cfg).isSource(source)
|
||||
}
|
||||
|
||||
query predicate atmSinks(DataFlow::Node sink) { any(ATM::Configuration cfg).isSink(sink) }
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,442 @@
|
||||
/*
|
||||
* For internal use only.
|
||||
*
|
||||
* Extracts data about the functions in the database for use in adaptive threat modeling (ATM).
|
||||
*/
|
||||
|
||||
module Raw {
|
||||
private import javascript as raw
|
||||
|
||||
class RawAstNode = raw::ASTNode;
|
||||
|
||||
class Entity = raw::Function;
|
||||
|
||||
class Location = raw::Location;
|
||||
|
||||
/**
|
||||
* Exposed as a tool for defining anchors for semantic search.
|
||||
*/
|
||||
class UnderlyingFunction = raw::Function;
|
||||
|
||||
/**
|
||||
* Determines whether an entity should be omitted from ATM.
|
||||
*/
|
||||
predicate isEntityIgnored(Entity entity) {
|
||||
// Ignore entities which don't have definitions, for example those in TypeScript
|
||||
// declaration files.
|
||||
not exists(entity.getBody())
|
||||
}
|
||||
|
||||
newtype WrappedAstNode = TAstNode(RawAstNode rawNode)
|
||||
|
||||
/**
|
||||
* This class represents nodes in the AST.
|
||||
*/
|
||||
class AstNode extends TAstNode {
|
||||
RawAstNode rawNode;
|
||||
|
||||
AstNode() { this = TAstNode(rawNode) }
|
||||
|
||||
AstNode getAChildNode() { result = TAstNode(rawNode.getAChild()) }
|
||||
|
||||
AstNode getParentNode() { result = TAstNode(rawNode.getParent()) }
|
||||
|
||||
/**
|
||||
* Holds if the AST node has `result` as its `index`th attribute.
|
||||
*
|
||||
* The index is not intended to mean anything, and is only here for disambiguation.
|
||||
* There are no guarantees about any particular index being used (or not being used).
|
||||
*/
|
||||
string astNodeAttribute(int index) {
|
||||
(
|
||||
// NB: Unary and binary operator expressions e.g. -a, a + b and compound
|
||||
// assignments e.g. a += b can be identified by the expression type.
|
||||
result = "ID:" + rawNode.(raw::Identifier).getName()
|
||||
or
|
||||
// Add an ID: for computed property accesses for which we can predetermine the property being accessed.
|
||||
// Slight lie but useful for the model.
|
||||
// NB: May alias with operators e.g. could have '+' as a property name.
|
||||
result = "ID:" + rawNode.(raw::IndexExpr).getPropertyName()
|
||||
or
|
||||
// Want to have distinct representations for `0xa`, `0xA`, and `10`.
|
||||
result = "LIT:" + rawNode.(raw::NumberLiteral).getRawValue()
|
||||
or
|
||||
// Want to map `"a"` and `'a'` onto the same representation.
|
||||
not rawNode instanceof raw::NumberLiteral and
|
||||
result = "LIT:" + rawNode.(raw::Literal).getValue()
|
||||
or
|
||||
result = "LIT:" + rawNode.(raw::TemplateElement).getRawValue()
|
||||
) and
|
||||
index = 0
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns a string indicating the "type" of the AST node.
|
||||
*/
|
||||
string astNodeType() {
|
||||
// The definition of this method should correspond with that of the `@ast_node` entry in the
|
||||
// dbscheme.
|
||||
result = "js_exprs." + any(int kind | exprs(rawNode, kind, _, _, _))
|
||||
or
|
||||
result = "js_properties." + any(int kind | properties(rawNode, _, _, kind, _))
|
||||
or
|
||||
result = "js_stmts." + any(int kind | stmts(rawNode, kind, _, _, _))
|
||||
or
|
||||
result = "js_toplevel" and rawNode instanceof raw::TopLevel
|
||||
or
|
||||
result = "js_typeexprs." + any(int kind | typeexprs(rawNode, kind, _, _, _))
|
||||
}
|
||||
|
||||
/**
|
||||
* Holds if `result` is the `index`'th child of the AST node, for some arbitrary indexing.
|
||||
* A root of the AST should be its own child, with an arbitrary (though conventionally
|
||||
* 0) index.
|
||||
*
|
||||
* Notably, the order in which child nodes are visited is not required to be meaningful,
|
||||
* and no particular index is required to be meaningful. However, `(parent, index)`
|
||||
* should be a keyset.
|
||||
*/
|
||||
pragma[nomagic]
|
||||
AstNode astNodeChild(int index) {
|
||||
result =
|
||||
rank[index - 1](AstNode child, raw::Location l |
|
||||
child = this.getAChildNode() and l = child.getLocation()
|
||||
|
|
||||
child
|
||||
order by
|
||||
l.getStartLine(), l.getStartColumn(), l.getEndLine(), l.getEndColumn(),
|
||||
child.astNodeType()
|
||||
)
|
||||
or
|
||||
not exists(result.getParentNode()) and this = result and index = 0
|
||||
}
|
||||
|
||||
raw::Location getLocation() { result = rawNode.getLocation() }
|
||||
|
||||
string toString() { result = rawNode.toString() }
|
||||
|
||||
predicate isEntityNameNode(Entity entity) {
|
||||
exists(int index |
|
||||
TAstNode(entity) = getParentNode() and
|
||||
this = getParentNode().astNodeChild(index) and
|
||||
// An entity name node must be the first child of the entity.
|
||||
index = min(int otherIndex | exists(getParentNode().astNodeChild(otherIndex))) and
|
||||
entity.getName() = rawNode.(raw::VarDecl).getName()
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Holds if `result` is the `index`'th child of the `parent` entity. Such
|
||||
* a node is a root of an AST associated with this entity.
|
||||
*/
|
||||
AstNode entityChild(AstNode parent, int index) {
|
||||
// In JavaScript, entities appear in the AST parent/child relationship.
|
||||
result = parent.astNodeChild(index)
|
||||
}
|
||||
|
||||
/**
|
||||
* Holds if `node` is contained in `entity`. Note that a single node may be contained
|
||||
* in multiple entities, if they are nested. An entity, in particular, should be
|
||||
* reported as contained within itself.
|
||||
*/
|
||||
predicate entityContains(Entity entity, AstNode node) {
|
||||
node.getParentNode*() = TAstNode(entity) and not node.isEntityNameNode(entity)
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the name of the entity.
|
||||
*
|
||||
* We attempt to assign unnamed entities approximate names if they are passed to a likely
|
||||
* external library function. If we can't assign them an approximate name, we give them the name
|
||||
* `""`, so that these entities are included in `AdaptiveThreatModeling.qll`.
|
||||
*
|
||||
* For entities which have multiple names, we choose the lexically smallest name.
|
||||
*/
|
||||
string getEntityName(Entity entity) {
|
||||
if exists(entity.getName())
|
||||
then
|
||||
// https://github.com/github/ml-ql-adaptive-threat-modeling/issues/244 discusses making use
|
||||
// of all the names during training.
|
||||
result = min(entity.getName())
|
||||
else
|
||||
if exists(getApproximateNameForEntity(entity))
|
||||
then result = getApproximateNameForEntity(entity)
|
||||
else result = ""
|
||||
}
|
||||
|
||||
/**
|
||||
* Holds if the call `call` has `entity` is its `argumentIndex`th argument.
|
||||
*/
|
||||
private predicate entityUsedAsArgumentToCall(
|
||||
Entity entity, raw::DataFlow::CallNode call, int argumentIndex
|
||||
) {
|
||||
raw::DataFlow::localFlowStep*(call.getArgument(argumentIndex), entity.flow())
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns a generated name for the entity. This name is generated such that
|
||||
* entities with the same names have similar behaviour.
|
||||
*/
|
||||
private string getApproximateNameForEntity(Entity entity) {
|
||||
count(raw::DataFlow::CallNode call, int index | entityUsedAsArgumentToCall(entity, call, index)) =
|
||||
1 and
|
||||
exists(raw::DataFlow::CallNode call, int index, string basePart |
|
||||
entityUsedAsArgumentToCall(entity, call, index) and
|
||||
(
|
||||
if count(getReceiverName(call)) = 1
|
||||
then basePart = getReceiverName(call) + "."
|
||||
else basePart = ""
|
||||
) and
|
||||
result = basePart + call.getCalleeName() + "#functional_argument_" + index
|
||||
)
|
||||
}
|
||||
|
||||
private string getReceiverName(raw::DataFlow::CallNode call) {
|
||||
result = call.getReceiver().asExpr().(raw::VarAccess).getName()
|
||||
}
|
||||
|
||||
/** Sanity checks: these predicates should each have no results */
|
||||
module Sanity {
|
||||
/** `getEntityName` should assign each entity a single name. */
|
||||
query predicate entityWithManyNames(Entity entity, string name) {
|
||||
name = getEntityName(entity) and
|
||||
count(getEntityName(entity)) > 1
|
||||
}
|
||||
|
||||
query predicate nodeWithNoType(AstNode node) { not exists(node.astNodeType()) }
|
||||
|
||||
query predicate nodeWithManyTypes(AstNode node, string type) {
|
||||
type = node.astNodeType() and
|
||||
count(node.astNodeType()) > 1
|
||||
}
|
||||
|
||||
query predicate nodeWithNoParent(AstNode node, string type) {
|
||||
not node = any(AstNode parent).astNodeChild(_) and
|
||||
type = node.astNodeType() and
|
||||
not exists(RawAstNode rawNode | node = TAstNode(rawNode) and rawNode instanceof raw::Module)
|
||||
}
|
||||
|
||||
query predicate duplicateChildIndex(AstNode parent, int index, AstNode child) {
|
||||
child = parent.astNodeChild(index) and
|
||||
count(parent.astNodeChild(index)) > 1
|
||||
}
|
||||
|
||||
query predicate duplicateAttributeIndex(AstNode node, int index) {
|
||||
exists(node.astNodeAttribute(index)) and
|
||||
count(node.astNodeAttribute(index)) > 1
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
module Wrapped {
|
||||
/*
|
||||
* We require any node with attributes to be a leaf. Where a non-leaf node
|
||||
* has an attribute, we instead create a synthetic leaf node that has that
|
||||
* attribute.
|
||||
*/
|
||||
|
||||
/**
|
||||
* Holds if the AST node `e` is a leaf node.
|
||||
*/
|
||||
private predicate isLeaf(Raw::AstNode e) { not exists(e.astNodeChild(_)) }
|
||||
|
||||
newtype WrappedEntity =
|
||||
TEntity(Raw::Entity entity) {
|
||||
exists(entity.getLocation().getFile().getRelativePath()) and
|
||||
Raw::entityContains(entity, _)
|
||||
}
|
||||
|
||||
/**
|
||||
* A type ranging over the kinds of entities for which we want to consider embeddings.
|
||||
*/
|
||||
class Entity extends WrappedEntity {
|
||||
Raw::Entity rawEntity;
|
||||
|
||||
Entity() { this = TEntity(rawEntity) and not Raw::isEntityIgnored(rawEntity) }
|
||||
|
||||
string getName() { result = Raw::getEntityName(rawEntity) }
|
||||
|
||||
AstNode getAstRoot(int index) {
|
||||
result = TAstNode(rawEntity, Raw::entityChild(Raw::TAstNode(rawEntity), index))
|
||||
}
|
||||
|
||||
string toString() { result = rawEntity.toString() }
|
||||
|
||||
Raw::Location getLocation() { result = rawEntity.getLocation() }
|
||||
|
||||
Raw::UnderlyingFunction getDefinedFunction() { result = rawEntity }
|
||||
}
|
||||
|
||||
newtype WrappedAstNode =
|
||||
TAstNode(Raw::Entity enclosingEntity, Raw::AstNode node) {
|
||||
Raw::entityContains(enclosingEntity, node)
|
||||
} or
|
||||
TSyntheticNode(
|
||||
Raw::Entity enclosingEntity, Raw::AstNode node, int syntheticChildIndex, int attrIndex
|
||||
) {
|
||||
Raw::entityContains(enclosingEntity, node) and
|
||||
exists(node.astNodeAttribute(attrIndex)) and
|
||||
not isLeaf(node) and
|
||||
if exists(node.astNodeChild(_))
|
||||
then
|
||||
syntheticChildIndex =
|
||||
attrIndex - min(int other | exists(node.astNodeAttribute(other))) +
|
||||
max(int other | exists(node.astNodeChild(other))) + 1
|
||||
else syntheticChildIndex = attrIndex
|
||||
}
|
||||
|
||||
pragma[nomagic]
|
||||
private AstNode injectedChild(Raw::Entity enclosingEntity, Raw::AstNode parent, int index) {
|
||||
result = TAstNode(enclosingEntity, parent.astNodeChild(index)) or
|
||||
result = TSyntheticNode(enclosingEntity, parent, index, _)
|
||||
}
|
||||
|
||||
/**
|
||||
* A type ranging over AST nodes. Ultimately, only nodes contained in entities will
|
||||
* be considered.
|
||||
*/
|
||||
class AstNode extends WrappedAstNode {
|
||||
Raw::Entity enclosingEntity;
|
||||
Raw::AstNode rawNode;
|
||||
|
||||
AstNode() {
|
||||
(
|
||||
this = TAstNode(enclosingEntity, rawNode) or
|
||||
this = TSyntheticNode(enclosingEntity, rawNode, _, _)
|
||||
) and
|
||||
not Raw::isEntityIgnored(enclosingEntity)
|
||||
}
|
||||
|
||||
string getAttribute(int index) {
|
||||
result = rawNode.astNodeAttribute(index) and
|
||||
not exists(TSyntheticNode(enclosingEntity, rawNode, _, index))
|
||||
}
|
||||
|
||||
string getType() { result = rawNode.astNodeType() }
|
||||
|
||||
AstNode getChild(int index) { result = injectedChild(enclosingEntity, rawNode, index) }
|
||||
|
||||
string toString() { result = getType() }
|
||||
|
||||
Raw::Location getLocation() { result = rawNode.getLocation() }
|
||||
}
|
||||
|
||||
/**
|
||||
* A synthetic AST node, created to be a leaf for an otherwise non-leaf attribute.
|
||||
*/
|
||||
class SyntheticAstNode extends AstNode, TSyntheticNode {
|
||||
int childIndex;
|
||||
int attributeIndex;
|
||||
|
||||
SyntheticAstNode() {
|
||||
this = TSyntheticNode(enclosingEntity, rawNode, childIndex, attributeIndex)
|
||||
}
|
||||
|
||||
override string getAttribute(int index) {
|
||||
result = rawNode.astNodeAttribute(attributeIndex) and index = attributeIndex
|
||||
}
|
||||
|
||||
override string getType() {
|
||||
result = rawNode.astNodeType() + "::<synthetic " + childIndex + ">"
|
||||
}
|
||||
|
||||
override AstNode getChild(int index) { none() }
|
||||
}
|
||||
}
|
||||
|
||||
module DatabaseFeatures {
|
||||
/**
|
||||
* Exposed as a tool for defining anchors for semantic search.
|
||||
*/
|
||||
class UnderlyingFunction = Raw::UnderlyingFunction;
|
||||
|
||||
private class Location = Raw::Location;
|
||||
|
||||
private newtype TEntityOrAstNode =
|
||||
TEntity(Wrapped::Entity entity) or
|
||||
TAstNode(Wrapped::AstNode astNode)
|
||||
|
||||
class EntityOrAstNode extends TEntityOrAstNode {
|
||||
abstract string getType();
|
||||
|
||||
abstract string toString();
|
||||
|
||||
abstract Location getLocation();
|
||||
}
|
||||
|
||||
class Entity extends EntityOrAstNode, TEntity {
|
||||
Wrapped::Entity entity;
|
||||
|
||||
Entity() { this = TEntity(entity) }
|
||||
|
||||
string getName() { result = entity.getName() }
|
||||
|
||||
AstNode getAstRoot(int index) { result = TAstNode(entity.getAstRoot(index)) }
|
||||
|
||||
override string getType() { result = "javascript function" }
|
||||
|
||||
override string toString() { result = "Entity: " + getName() }
|
||||
|
||||
override Location getLocation() { result = entity.getLocation() }
|
||||
|
||||
UnderlyingFunction getDefinedFunction() { result = entity.getDefinedFunction() }
|
||||
}
|
||||
|
||||
class AstNode extends EntityOrAstNode, TAstNode {
|
||||
Wrapped::AstNode rawNode;
|
||||
|
||||
AstNode() { this = TAstNode(rawNode) }
|
||||
|
||||
AstNode getChild(int index) { result = TAstNode(rawNode.getChild(index)) }
|
||||
|
||||
string getAttribute(int index) { result = rawNode.getAttribute(index) }
|
||||
|
||||
override string getType() { result = rawNode.getType() }
|
||||
|
||||
override string toString() { result = this.getType() }
|
||||
|
||||
override Location getLocation() { result = rawNode.getLocation() }
|
||||
}
|
||||
|
||||
/** Sanity checks: these predicates should each have no results */
|
||||
module Sanity {
|
||||
query predicate nonLeafAttribute(AstNode node, int index, string attribute) {
|
||||
attribute = node.getAttribute(index) and
|
||||
exists(node.getChild(_))
|
||||
}
|
||||
}
|
||||
|
||||
query predicate entities(
|
||||
Entity entity, string entity_name, string entity_type, string path, int startLine,
|
||||
int startColumn, int endLine, int endColumn, string absolutePath
|
||||
) {
|
||||
entity_name = entity.getName() and
|
||||
entity_type = entity.getType() and
|
||||
exists(Location l | l = entity.getLocation() |
|
||||
path = l.getFile().getRelativePath() and
|
||||
absolutePath = l.getFile().getAbsolutePath() and
|
||||
l.hasLocationInfo(_, startLine, startColumn, endLine, endColumn)
|
||||
)
|
||||
}
|
||||
|
||||
query predicate astNodes(
|
||||
Entity enclosingEntity, EntityOrAstNode parent, int index, AstNode node, string node_type
|
||||
) {
|
||||
node = enclosingEntity.getAstRoot(index) and
|
||||
parent = enclosingEntity and
|
||||
node_type = node.getType()
|
||||
or
|
||||
astNodes(enclosingEntity, _, _, parent, _) and
|
||||
node = parent.(AstNode).getChild(index) and
|
||||
node_type = node.getType()
|
||||
}
|
||||
|
||||
query predicate nodeAttributes(AstNode node, string attr) {
|
||||
// Only get attributes of AST nodes we extract.
|
||||
// This excludes nodes in standard libraries since the standard library files
|
||||
// are located outside the source root.
|
||||
astNodes(_, _, _, node, _) and
|
||||
attr = node.getAttribute(_)
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,109 @@
|
||||
/**
|
||||
* Provides predicates that expose the knowledge of models
|
||||
* in the core CodeQL JavaScript libraries.
|
||||
*/
|
||||
|
||||
private import javascript
|
||||
private import semmle.javascript.security.dataflow.XxeCustomizations
|
||||
private import semmle.javascript.security.dataflow.RemotePropertyInjectionCustomizations
|
||||
private import semmle.javascript.security.dataflow.TypeConfusionThroughParameterTamperingCustomizations
|
||||
private import semmle.javascript.security.dataflow.ZipSlipCustomizations
|
||||
private import semmle.javascript.security.dataflow.TaintedPathCustomizations
|
||||
private import semmle.javascript.security.dataflow.CleartextLoggingCustomizations
|
||||
private import semmle.javascript.security.dataflow.XpathInjectionCustomizations
|
||||
private import semmle.javascript.security.dataflow.Xss::Shared as Xss
|
||||
private import semmle.javascript.security.dataflow.StackTraceExposureCustomizations
|
||||
private import semmle.javascript.security.dataflow.ClientSideUrlRedirectCustomizations
|
||||
private import semmle.javascript.security.dataflow.CodeInjectionCustomizations
|
||||
private import semmle.javascript.security.dataflow.RequestForgeryCustomizations
|
||||
private import semmle.javascript.security.dataflow.CorsMisconfigurationForCredentialsCustomizations
|
||||
private import semmle.javascript.security.dataflow.ShellCommandInjectionFromEnvironmentCustomizations
|
||||
private import semmle.javascript.security.dataflow.DifferentKindsComparisonBypassCustomizations
|
||||
private import semmle.javascript.security.dataflow.CommandInjectionCustomizations
|
||||
private import semmle.javascript.security.dataflow.PrototypePollutionCustomizations
|
||||
private import semmle.javascript.security.dataflow.UnvalidatedDynamicMethodCallCustomizations
|
||||
private import semmle.javascript.security.dataflow.TaintedFormatStringCustomizations
|
||||
private import semmle.javascript.security.dataflow.NosqlInjectionCustomizations
|
||||
private import semmle.javascript.security.dataflow.PostMessageStarCustomizations
|
||||
private import semmle.javascript.security.dataflow.RegExpInjectionCustomizations
|
||||
private import semmle.javascript.security.dataflow.SqlInjectionCustomizations
|
||||
private import semmle.javascript.security.dataflow.InsecureRandomnessCustomizations
|
||||
private import semmle.javascript.security.dataflow.XmlBombCustomizations
|
||||
private import semmle.javascript.security.dataflow.InsufficientPasswordHashCustomizations
|
||||
private import semmle.javascript.security.dataflow.UnsafeJQueryPluginCustomizations
|
||||
private import semmle.javascript.security.dataflow.HardcodedCredentialsCustomizations
|
||||
private import semmle.javascript.security.dataflow.FileAccessToHttpCustomizations
|
||||
private import semmle.javascript.security.dataflow.UnsafeDynamicMethodAccessCustomizations
|
||||
private import semmle.javascript.security.dataflow.UnsafeDeserializationCustomizations
|
||||
private import semmle.javascript.security.dataflow.HardcodedDataInterpretedAsCodeCustomizations
|
||||
private import semmle.javascript.security.dataflow.ServerSideUrlRedirectCustomizations
|
||||
private import semmle.javascript.security.dataflow.IndirectCommandInjectionCustomizations
|
||||
private import semmle.javascript.security.dataflow.ConditionalBypassCustomizations
|
||||
private import semmle.javascript.security.dataflow.HttpToFileAccessCustomizations
|
||||
private import semmle.javascript.security.dataflow.BrokenCryptoAlgorithmCustomizations
|
||||
private import semmle.javascript.security.dataflow.LoopBoundInjectionCustomizations
|
||||
private import semmle.javascript.security.dataflow.CleartextStorageCustomizations
|
||||
|
||||
/**
|
||||
* Holds if the node `n` is a known sink in a modeled library.
|
||||
*/
|
||||
predicate isKnownLibrarySink(DataFlow::Node n) {
|
||||
n instanceof Xxe::Sink or
|
||||
n instanceof ZipSlip::Sink or
|
||||
n instanceof TaintedPath::Sink or
|
||||
n instanceof CleartextLogging::Sink or
|
||||
n instanceof XpathInjection::Sink or
|
||||
n instanceof Xss::Sink or
|
||||
n instanceof StackTraceExposure::Sink or
|
||||
n instanceof ClientSideUrlRedirect::Sink or
|
||||
n instanceof CodeInjection::Sink or
|
||||
n instanceof RequestForgery::Sink or
|
||||
n instanceof CorsMisconfigurationForCredentials::Sink or
|
||||
n instanceof ShellCommandInjectionFromEnvironment::Sink or
|
||||
n instanceof CommandInjection::Sink or
|
||||
n instanceof PrototypePollution::Sink or
|
||||
n instanceof UnvalidatedDynamicMethodCall::Sink or
|
||||
n instanceof TaintedFormatString::Sink or
|
||||
n instanceof NosqlInjection::Sink or
|
||||
n instanceof PostMessageStar::Sink or
|
||||
n instanceof RegExpInjection::Sink or
|
||||
n instanceof SqlInjection::Sink or
|
||||
n instanceof InsecureRandomness::Sink or
|
||||
n instanceof XmlBomb::Sink or
|
||||
n instanceof FileAccessToHttp::Sink or
|
||||
n instanceof UnsafeDeserialization::Sink or
|
||||
n instanceof ServerSideUrlRedirect::Sink or
|
||||
n instanceof IndirectCommandInjection::Sink or
|
||||
n instanceof HttpToFileAccess::Sink or
|
||||
n instanceof CleartextStorage::Sink
|
||||
}
|
||||
|
||||
/**
|
||||
* Holds if the node `n` is known as the predecessor in a modeled flow step.
|
||||
*/
|
||||
predicate isKnownStepSrc(DataFlow::Node n) {
|
||||
any(TaintTracking::AdditionalTaintStep s).step(n, _) or
|
||||
any(DataFlow::AdditionalFlowStep s).step(n, _) or
|
||||
any(DataFlow::AdditionalFlowStep s).step(n, _, _, _)
|
||||
}
|
||||
|
||||
/**
|
||||
* Holds if the node `n` is an unlikely sink for a security query.
|
||||
*/
|
||||
predicate isUnlikelySink(DataFlow::Node n) {
|
||||
any(LodashUnderscore::Member m).getACall().getAnArgument() = n
|
||||
or
|
||||
exists(ClientRequest r |
|
||||
r.getAnArgument() = n or n = r.getUrl() or n = r.getHost() or n = r.getADataNode()
|
||||
)
|
||||
or
|
||||
exists(DataFlow::CallNode call |
|
||||
n = call.getAnArgument() and
|
||||
// Heuristically remove calls that look like logging calls
|
||||
call.getCalleeName() = getAStandardLoggerMethodName()
|
||||
)
|
||||
or
|
||||
exists(PromiseDefinition p |
|
||||
n = [p.getResolveParameter(), p.getRejectParameter()].getACall().getAnArgument()
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,47 @@
|
||||
private import javascript
|
||||
|
||||
/**
|
||||
* Holds if the receiver of `call` is a global variable for which we cannot find a global variable
|
||||
* definition with the same name.
|
||||
*/
|
||||
private predicate isReceiverUndefinedGlobalVar(DataFlow::CallNode call) {
|
||||
exists(GlobalVariable var |
|
||||
var.getAnAccess().flow() = call.getReceiver() and
|
||||
not exists(VarDef def | def.getAVariable().getName() = var.getName())
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* Gets a node that flows to callback-parameter `p`.
|
||||
*/
|
||||
private DataFlow::SourceNode getACallback(DataFlow::ParameterNode p, DataFlow::TypeBackTracker t) {
|
||||
t.start() and
|
||||
result = p and
|
||||
any(DataFlow::FunctionNode f).getLastParameter() = p and
|
||||
exists(p.getACall())
|
||||
or
|
||||
exists(DataFlow::TypeBackTracker t2 | result = getACallback(p, t2).backtrack(t2, t))
|
||||
}
|
||||
|
||||
/**
|
||||
* Get calls for which we do not have the callee (i.e. the definition of the called function). This
|
||||
* acts as a heuristic for identifying calls to external library functions.
|
||||
*/
|
||||
private DataFlow::CallNode getACallWithoutCallee() {
|
||||
not exists(result.getACallee()) and
|
||||
not exists(DataFlow::ParameterNode param, DataFlow::FunctionNode callback |
|
||||
param.flowsTo(result.getCalleeNode()) and
|
||||
callback = getACallback(param, DataFlow::TypeBackTracker::end())
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* Get calls which are likely to be to external non-built-in libraries.
|
||||
*
|
||||
* We filter out any function call where the receiver is a global variable without definition as a
|
||||
* heuristic for identifying built-in global variables.
|
||||
*/
|
||||
DataFlow::CallNode getALikelyExternalLibraryCall() {
|
||||
result = getACallWithoutCallee() and
|
||||
not isReceiverUndefinedGlobalVar(result)
|
||||
}
|
||||
@@ -0,0 +1,145 @@
|
||||
/**
|
||||
* @name NoSQL database query built from user-controlled sources (boosted)
|
||||
* @description Building a database query from user-controlled sources is vulnerable to insertion of
|
||||
* malicious code by the user.
|
||||
* @kind path-problem
|
||||
* @problem.severity error
|
||||
*/
|
||||
|
||||
import experimental.adaptivethreatmodeling.AdaptiveThreatModeling
|
||||
import experimental.adaptivethreatmodeling.CoreKnowledge as CoreKnowledge
|
||||
import experimental.adaptivethreatmodeling.EndpointFilterUtils as EndpointFilterUtils
|
||||
import semmle.javascript.security.dataflow.NosqlInjectionCustomizations
|
||||
import ATM::ResultsInfo
|
||||
import DataFlow::PathGraph
|
||||
|
||||
/**
|
||||
* This module provides logic to filter candidate sinks to those which are likely NoSQL injection
|
||||
* sinks.
|
||||
*/
|
||||
module SinkEndpointFilter {
|
||||
private import javascript
|
||||
private import NoSQL
|
||||
|
||||
/**
|
||||
* Require that local dataflow contains a property write to `node`.
|
||||
*
|
||||
* For example, this predicate would be true for a node corresponding to
|
||||
* `{ password : req.body.password }`, but false for a node corresponding to just
|
||||
* `req.body.password`.
|
||||
*
|
||||
* This is appropriate for NoSQL injection as we are looking for a query object built up from
|
||||
* user-controlled data. Rarely is the query object itself user-controlled data.
|
||||
*/
|
||||
predicate containsAPropertyThatIsWrittenTo(DataFlow::Node node) {
|
||||
exists(DataFlow::PropWrite pw, DataFlow::Node base |
|
||||
(
|
||||
base = pw.getBase() or
|
||||
base = pw.getBase().getImmediatePredecessor()
|
||||
) and
|
||||
DataFlow::localFlowStep*(base, node)
|
||||
)
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns any argument of calls that satisfy the following conditions:
|
||||
* - The call is likely to be to an external non-built-in library
|
||||
* - The argument is not explicitly modelled as a sink, and is not an unlikely sink
|
||||
* - The argument contains a property that is written to. This condition means that we look for
|
||||
* arguments that have the shape of a NoSQL query. See `containsAPropertyThatIsWrittenTo` for
|
||||
* further details.
|
||||
*/
|
||||
predicate isEffectiveSink(DataFlow::Node sinkCandidate) {
|
||||
exists(DataFlow::CallNode call |
|
||||
call = EndpointFilterUtils::getALikelyExternalLibraryCall() and
|
||||
sinkCandidate = call.getAnArgument() and
|
||||
containsAPropertyThatIsWrittenTo(sinkCandidate) and
|
||||
not (
|
||||
// Remove modeled sinks
|
||||
CoreKnowledge::isKnownLibrarySink(sinkCandidate) or
|
||||
// Remove common kinds of unlikely sinks
|
||||
CoreKnowledge::isKnownStepSrc(sinkCandidate) or
|
||||
CoreKnowledge::isUnlikelySink(sinkCandidate) or
|
||||
// Remove modeled database calls. Arguments to modeled calls are very likely to be modeled
|
||||
// as sinks if they are true positives. Therefore arguments that are not modeled as sinks
|
||||
// are unlikely to be true positives.
|
||||
call instanceof DatabaseAccess or
|
||||
// Remove calls to APIs that aren't relevant to NoSQL injection
|
||||
call.getReceiver().asExpr() instanceof HTTP::RequestExpr or
|
||||
call.getReceiver().asExpr() instanceof HTTP::ResponseExpr
|
||||
)
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* This predicate allows us to propagate data flow through property writes and array constructors
|
||||
* within a query object, enabling the security query to pick up NoSQL injection vulnerabilities
|
||||
* involving more complex queries.
|
||||
*/
|
||||
private DataFlow::Node getASubexpressionWithinQuery(DataFlow::SourceNode query) {
|
||||
// The right-hand side of a property write is a query subexpression
|
||||
result = getASubexpressionWithinQuery*(query).(DataFlow::SourceNode).getAPropertyWrite().getRhs() or
|
||||
// An element within an array constructor is also a query subexpression
|
||||
result = getASubexpressionWithinQuery*(query).(DataFlow::ArrayCreationNode).getAnElement()
|
||||
}
|
||||
|
||||
class NosqlInjectionATMConfig extends ATMConfig {
|
||||
NosqlInjectionATMConfig() { this = "NosqlInjectionATMConfig" }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source) {
|
||||
source instanceof NosqlInjection::Source
|
||||
}
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source, DataFlow::FlowLabel lbl) {
|
||||
TaintedObject::isSource(source, lbl)
|
||||
}
|
||||
|
||||
override predicate isKnownSink(DataFlow::Node sink, DataFlow::FlowLabel lbl) {
|
||||
sink.(NosqlInjection::Sink).getAFlowLabel() = lbl
|
||||
}
|
||||
|
||||
override predicate isEffectiveSink(DataFlow::Node sinkCandidate) {
|
||||
SinkEndpointFilter::isEffectiveSink(sinkCandidate)
|
||||
}
|
||||
|
||||
override predicate isSanitizer(DataFlow::Node node) { node instanceof NosqlInjection::Sanitizer }
|
||||
|
||||
override predicate isSanitizerGuard(TaintTracking::SanitizerGuardNode guard) {
|
||||
guard instanceof TaintedObject::SanitizerGuard
|
||||
}
|
||||
|
||||
override predicate isAdditionalFlowStep(
|
||||
DataFlow::Node src, DataFlow::Node trg, DataFlow::FlowLabel inlbl, DataFlow::FlowLabel outlbl
|
||||
) {
|
||||
TaintedObject::step(src, trg, inlbl, outlbl)
|
||||
or
|
||||
// additional flow step to track taint through NoSQL query objects
|
||||
inlbl = TaintedObject::label() and
|
||||
outlbl = TaintedObject::label() and
|
||||
exists(NoSQL::Query query, DataFlow::SourceNode queryObj |
|
||||
queryObj.flowsToExpr(query) and
|
||||
queryObj.flowsTo(trg) and
|
||||
src = queryObj.getAPropertyWrite().getRhs()
|
||||
)
|
||||
or
|
||||
// relaxed version of previous additional flow step to track taint through unmodeled NoSQL
|
||||
// query objects
|
||||
any(ATM::Configuration cfg).isSink(trg) and
|
||||
src = getASubexpressionWithinQuery(trg)
|
||||
}
|
||||
}
|
||||
|
||||
from
|
||||
ATM::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, string scoreString,
|
||||
string sourceSinkOriginReport
|
||||
where
|
||||
cfg.hasFlowPath(source, sink) and
|
||||
not isFlowLikelyInBaseQuery(source.getNode(), sink.getNode()) and
|
||||
scoreString = scoreStringForFlow(source.getNode(), sink.getNode()) and
|
||||
sourceSinkOriginReport =
|
||||
"Source origin: " + originsForSource(source.getNode()).listOfOriginComponents() + " " +
|
||||
" Sink origin: " + originsForSink(sink.getNode()).listOfOriginComponents()
|
||||
select sink.getNode(), source, sink,
|
||||
"[Score = " + scoreString + "] This may be a NoSQL query depending on $@ " +
|
||||
sourceSinkOriginReport as msg, source.getNode(), "a user-provided value"
|
||||
@@ -0,0 +1,3 @@
|
||||
# Adaptive Threat Modeling for JavaScript
|
||||
|
||||
This directory contains CodeQL libraries and queries that power adaptive threat modeling for JavaScript. All APIs are experimental and may change in the future.
|
||||
File diff suppressed because one or more lines are too long
|
After Width: | Height: | Size: 90 KiB |
@@ -0,0 +1,63 @@
|
||||
# About adaptive threat modeling
|
||||
|
||||
**Note: Adaptive threat modeling is in beta and subject to change.
|
||||
It is currently only available for JavaScript and TypeScript code.**
|
||||
|
||||
|
||||
Adaptive threat modeling (ATM) is an extension for CodeQL that uses machine learning to boost queries to find more security vulnerabilities.
|
||||
|
||||
A "threat model" is a specification of potential security holes in a software application. Typical security queries, available from the [CodeQL repository](https://github.com/github/codeql), use manually supplied threat models. But threat models are often incomplete. ATM is a semi-automatic method to adaptively enlarge, or boost, manually defined threat models. Boosted queries can identify security vulnerabilities that were missed by the original query.
|
||||
|
||||
ATM is for existing CodeQL users who want to find more security vulnerabilities in JavaScript codebases. Specifically, you may want to use ATM if:
|
||||
|
||||
- you already write CodeQL JavaScript and TypeScript security queries.
|
||||
- you are a security researcher who wants to find vulnerabilities in JavaScript and TypeScript code.
|
||||
|
||||
For example, GitHub's internal JavaScript security team used ATM to surface previously unknown NoSQL injection sinks (across 50 JavaScript projects) that generated 118 new candidate security vulnerabilities.
|
||||
|
||||
## Using adaptive threat modeling to boost security queries
|
||||
|
||||
Almost any security query, which relies on data flow, can be boosted by ATM.
|
||||
|
||||
A typical security query includes a data flow configuration that defines endpoints where user data enters a system (tainted sources) and where it may exit a system to cause a security vulnerability (tainted sinks). These endpoints comprise part of the “threat model” we mentioned earlier. Data flow configurations are manually defined in CodeQL, either in the standard libraries or via bespoke extensions.
|
||||
|
||||
However, data flow configurations are always incomplete, which means that queries miss sources and sinks, and therefore miss security vulnerabilities. ATM addresses this problem by predicting additional taint endpoints from known endpoints.
|
||||
|
||||
In most use-cases you write a small amount of QL code to supply the right information to the ATM library. Specifically you must:
|
||||
|
||||
- provide a CodeQL data flow configuration for "known endpoints".
|
||||
- define an "endpoint filter" and then expose this information to the ATM library.
|
||||
|
||||
Known endpoints are typically the set of taint endpoints defined by an existing query. For example, if we are boosting a command injection query then the known endpoints are simply the set of sources and sinks defined for this query.
|
||||
|
||||
The endpoint filter removes predicted endpoints that cannot be taint sources or sinks. For example, for a command injection query, the endpoint filter should reject candidates that CodeQL knows are not arguments to a function.
|
||||
|
||||

|
||||
|
||||
Once you have supplied this information:
|
||||
1. ATM forwards the known endpoints to a machine learning model.
|
||||
2. The model predicts a set of candidate endpoints. This step is an over-approximation because ATM predicts lots of candidates and many will not be true endpoints.
|
||||
3. ATM then applies the endpoint filter to reduce the candidates to a smaller, effective set that is more likely to contain true endpoints.
|
||||
4. The effective endpoints are forwarded to CodeQL’s data flow analysis.
|
||||
5. The queries generate a set of security alerts with scores.
|
||||
The score for each alert is based on how confident the machine learning model is in the effective endpoints.
|
||||
It is not related to the feasibility of the flow between a source and a sink.
|
||||
|
||||
Results with higher scores are more likely to be actual security vulnerabilities (true positives) than results with lower scores (which may be false positives). ATM uses experimental machine learning techniques and therefore you should expect a higher incidence of false positives in boosted queries compared to standard CodeQL security queries.
|
||||
|
||||
Note that a boosted query only generates additional results not found by the unboosted query.
|
||||
For full coverage, run the boosted query and the standard query together.
|
||||
|
||||
## Refining your results
|
||||
|
||||
You can help ATM find more security vulnerabilities in two ways:
|
||||
|
||||
- You can improve the recall and precision of the candidate endpoints by adding more true positives to and removing any false positives from the set of known endpoints.
|
||||
This will improve the scoring, increasing the likelihood that higher scoring results are true positives.
|
||||
- You can refine the endpoint filter such that it allows more true candidate endpoints to pass through and excludes more false candidate endpoints.
|
||||
This has the effect of increasing the number of true positives and reducing the number of false positives.
|
||||
|
||||
## Further reading
|
||||
|
||||
- [GitHub Security Lab](https://securitylab.github.com/)
|
||||
- [CodeQL for JavaScript](https://help.semmle.com/QL/learn-ql/javascript/ql-for-javascript.html)
|
||||
@@ -0,0 +1,263 @@
|
||||
# Creating a boosted security query
|
||||
|
||||
You can use adaptive threat modeling (ATM) to boost security queries by enlarging their threat model to find more potential vulnerabilities.
|
||||
|
||||
Adaptive threat modeling is a set of CodeQL libraries for writing boosted security queries.
|
||||
Boosted security queries identify potential vulnerabilities missed by an existing query by semi-automatically enlarging the query's threat model.
|
||||
For more information about ATM, see "[About adaptive threat modeling](./about-adaptive-threat-modeling.md)."
|
||||
|
||||
**Note: Adaptive threat modeling is in beta and subject to change.
|
||||
It is currently only available for JavaScript and TypeScript code.**
|
||||
|
||||
## About boosted queries
|
||||
|
||||
Boosted queries use machine learning to semi-automatically enlarge a query's threat model.
|
||||
A boosted query predicts new security vulnerabilities that the standard query cannot identify.
|
||||
Each extra result is scored, with higher scores more likely to be true positive results.
|
||||
|
||||
**Note: Currently, adaptive threat modeling only supports boosting sinks.**
|
||||
|
||||
To create a boosted query, you supply an ATM configuration that provides the following information:
|
||||
|
||||
- The [data flow configuration](https://help.semmle.com/QL/learn-ql/javascript/dataflow.html#global-data-flow) for the known endpoints (known sources and sinks), as a starting point for boosting.
|
||||
- A sink endpoint filter, which is used to filter out implausible sinks from the set of candidate sinks predicted by the machine learning model.
|
||||
The sinks that remain after applying the sink endpoint filter are the known as the effective sinks.
|
||||
- Optionally, new additional flow steps.
|
||||
These may be needed to find data flow paths from the known sources to the effective sinks.
|
||||
|
||||
For more information about how adaptive threat modeling works, see "[About adaptive threat modeling](./about-adaptive-threat-modeling.md)."
|
||||
|
||||
## Example: boosting the standard NoSQL injection JavaScript query
|
||||
|
||||
Before working through this example, ensure that you have correctly set up adaptive threat modeling and can run a boosted query.
|
||||
For more information, see "[Setting up adaptive threat modeling](https://github.com/github/vscode-codeql-starter/tree/experimental/atm#readme)."
|
||||
|
||||
A potential NoSQL injection vulnerability occurs when a user-controlled object becomes part of a query that is run against a NoSQL database.
|
||||
The CodeQL library for JavaScript contains a [standard query](https://github.com/github/codeql/blob/master/javascript/ql/src/Security/CWE-089/SqlInjection.ql) that discovers such vulnerabilities.
|
||||
This standard query works by defining a data flow configuration that marks user-controlled objects as taint sources and specific arguments of database access calls as taint sinks.
|
||||
|
||||
To know which arguments should be taint sinks, the CodeQL library for JavaScript models a set of popular NoSQL libraries by defining the structure of these libraries in QL.
|
||||
However, the CodeQL library for JavaScript does not model rare or bespoke NoSQL libraries.
|
||||
Consequently, the standard NoSQL injection query will not detect vulnerabilities in projects which use these libraries.
|
||||
This guide will show you how to use adaptive threat modeling to boost the standard NoSQL injection query to find new candidate vulnerabilities.
|
||||
|
||||
### Downloading the example CodeQL database
|
||||
|
||||
For this example, we have created a CodeQL database for you to test your boosted query against.
|
||||
To follow along with this guide, download the CodeQL database for this project by visiting https://drive.google.com/open?id=1I8M0yySyIH9xPzkPER85azZ6pWp0C5mb.
|
||||
Add the downloaded database to CodeQL for VS Code and select it as your current database.
|
||||
For more information about CodeQL for VS Code, visit "[CodeQL for VS Code](https://help.semmle.com/codeql/codeql-for-vscode.html)."
|
||||
|
||||
### About the standard NoSQL injection JavaScript query
|
||||
|
||||
The standard query for NoSQL injection in the CodeQL library for JavaScript is located at [`ql/javascript/ql/src/Security/CWE-089/SqlInjection.ql`](https://github.com/github/codeql/blob/master/javascript/ql/src/Security/CWE-089/SqlInjection.ql).
|
||||
It uses a data flow configuration located at [`ql/javascript/ql/src/semmle/javascript/security/dataflow/NosqlInjection.qll`](https://github.com/github/codeql/blob/master/javascript/ql/src/semmle/javascript/security/dataflow/NosqlInjection.qll) to specify a threat model for NoSQL injection.
|
||||
This threat model incorporates sources (`isSource`), sinks (`isSink`), sanitizers (`isSanitizer`, `isSanitizerGuard`), and additional flow steps (`isAdditionalFlowStep`).
|
||||
|
||||
### Specifying the data flow configuration for the known endpoints
|
||||
|
||||
First, create a stub boosted query by creating a new file named `NosqlInjectionATM.ql` and copying over the contents of the [`step1.ql` stub query](resources/step1.ql).
|
||||
|
||||
Now, we will specify the data flow configuration for the known endpoints.
|
||||
Since we are boosting an existing security query we can reuse the predicates from the existing data flow configuration.
|
||||
|
||||
1. Open the data flow configuration for the standard security query.
|
||||
For NoSQL injection, the standard query is located at [`ql/javascript/ql/src/semmle/javascript/security/dataflow/NosqlInjection.qll`](https://github.com/github/codeql/blob/master/javascript/ql/src/semmle/javascript/security/dataflow/NosqlInjection.qll).
|
||||
|
||||
2. Port over all the predicates from the standard query to the ATM configuration class.
|
||||
|
||||
- Copy and paste the predicates from the data flow configuration for the standard query to the ATM configuration class.
|
||||
For NoSQL injection, copy the predicates from the data flow configuration within `NosqlInjection.qll` to the `NosqlInjectionATMConfig` class.
|
||||
- Rename the `isSource` and `isSink` predicates to `isKnownSource` and `isKnownSink` respectively.
|
||||
- Add the required import statements alongside the other import statements in the boosted query file.
|
||||
For NoSQL injection, add `import semmle.javascript.security.TaintedObject` and `import semmle.javascript.security.dataflow.NosqlInjectionCustomizations::NosqlInjection` alongside the other import statements in `NosqlInjectionATM.ql`.
|
||||
|
||||
Your query should now look like the contents of the [`step2.ql` query](resources/step2.ql).
|
||||
|
||||
3. Test both `isKnownSource` predicates and the `isKnownSink` predicate by [quick-evaluating them](https://help.semmle.com/codeql/codeql-for-vscode/procedures/using-extension.html#running-a-specific-part-of-a-query-or-library) with CodeQL for VS Code and checking that they have results.
|
||||
There must be at least one known source and known sink in the database, otherwise ATM will not produce any results.
|
||||
|
||||
4. Check whether the standard query has an `isAdditionalFlowStep` or `isAdditionalTaintStep` predicate defined in its data flow configuration.
|
||||
|
||||
For some standard queries, the additional flow steps defined within these predicates will only work with modeled objects.
|
||||
However ATM will generate results including non-modeled objects, so in these circumstances, boosting will fail.
|
||||
To make sure the query is boosted properly, we need to explicitly include non-modeled objects in the additional flow steps.
|
||||
|
||||
For NoSQL injection, the standard query includes the following logic in the `isAdditionalFlowStep` predicate to propagate taint flow from a property of a query object to the query object itself:
|
||||
|
||||
```codeql
|
||||
// additional flow step to track taint through NoSQL query objects
|
||||
inlbl = TaintedObject::label() and
|
||||
outlbl = TaintedObject::label() and
|
||||
exists(NoSQL::Query query, DataFlow::SourceNode queryObj |
|
||||
queryObj.flowsToExpr(query) and
|
||||
queryObj.flowsTo(trg) and
|
||||
src = queryObj.getAPropertyWrite().getRhs()
|
||||
)
|
||||
```
|
||||
|
||||
This additional flow step only propagates taint within objects modeled by the CodeQL library for JavaScript.
|
||||
We therefore need to relax the additional flow step to include objects predicted by ATM.
|
||||
|
||||
One of the ways that we can do this is by observing that query objects, both modeled and unmodeled, are sinks for NoSQL injection.
|
||||
Specifically, we can relax the additional flow step such that all objects that are sinks, rather than just all modeled query objects, are included as possible targets `trg` of the flow step.
|
||||
|
||||
Relax the additional flow step for NoSQL injection by adding the following QL code to the bottom of the `isAdditionalFlowStep` predicate within the `NosqlInjectionATMConfig` class:
|
||||
|
||||
```codeql
|
||||
or
|
||||
// relaxed version of previous step to track taint through predicted NoSQL query objects
|
||||
any(ATM::Configuration cfg).isSink(trg) and
|
||||
src = trg.(DataFlow::SourceNode).getAPropertyWrite().getRhs()
|
||||
```
|
||||
|
||||
Your query should now look like the [`step3.ql` query](resources/step3.ql).
|
||||
|
||||
### Creating a sink endpoint filter
|
||||
|
||||
We have now defined all the known endpoints for our boosted query and included non-modeled objects in any additional flow steps.
|
||||
The next step is to create an endpoint filter.
|
||||
This will exclude candidate endpoints predicted by machine learning that we can easily recognize as incorrect.
|
||||
|
||||
1. Consider the main properties that the sinks for the security query you'd like to define have in common.
|
||||
For NoSQL injection, a typical sink looks like the query object `{ password }` in the following snippet:
|
||||
|
||||
```js
|
||||
MongoClient.connect("mongodb://someHost:somePort/", (err, client) => {
|
||||
if (err) throw err;
|
||||
let db = client.db("someDbName");
|
||||
db.collection("someCollection").find({ password }).toArray((err, result) => {
|
||||
if (err) throw err;
|
||||
console.log(result);
|
||||
client.close();
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
One of the common properties of these sinks is that they are typically arguments to API calls.
|
||||
We therefore filter out candidate sinks that don't meet this criterion.
|
||||
|
||||
2. Add logic to the `isEffectiveSink` predicate such that it holds only when the data flow node `candidateSink` has these properties.
|
||||
For NoSQL injection, add the following logic which restricts sinks to be arguments to API calls using the `EndpointFilterUtils` library:
|
||||
|
||||
```codeql
|
||||
override predicate isEffectiveSink(DataFlow::Node candidateSink) {
|
||||
exists(DataFlow::CallNode call |
|
||||
call = EndpointFilterUtils::getALikelyExternalLibraryCall() and
|
||||
candidateSink = call.getAnArgument()
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
Your query should now look like the [`step4.ql` query](resources/step4.ql).
|
||||
|
||||
3. Run the boosted query and examine the results.
|
||||
Look for common sources of false positive results.
|
||||
In the example project, one of the sources of false positives is the arguments to logging calls such as `Logger.log`, for instance:
|
||||
|
||||
```js
|
||||
// Logger.log = (message, ...objs) => console.log(message, objs);
|
||||
Logger.log("/updateName called with new name", req.body.name);
|
||||
```
|
||||
|
||||
4. Add additional logic to the `isEffectiveSink` predicate to remove common sources of false positives.
|
||||
In the example project, you can remove arguments to likely logging calls from the set of effective sinks using the `CoreKnowledge` library:
|
||||
|
||||
```codeql
|
||||
override predicate isEffectiveSink(DataFlow::Node candidateSink) {
|
||||
candidateSink = EndpointFilterUtils::getALikelyExternalLibraryCall().getAnArgument() and
|
||||
not (
|
||||
// Remove modeled sinks
|
||||
CoreKnowledge::isKnownLibrarySink(candidateSink) or
|
||||
// Remove common kinds of unlikely sinks
|
||||
CoreKnowledge::isKnownStepSrc(candidateSink) or
|
||||
CoreKnowledge::isUnlikelySink(candidateSink)
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
Your query should now look like the [`step5.ql` query](resources/step5.ql).
|
||||
|
||||
5. Where possible, continue this process to eliminate whole classes of false positives by adding filtering logic to the `isEffectiveSink` predicate.
|
||||
|
||||
In the example project, another source of false positives is the sinks that are arguments to [Express](https://expressjs.com/) API calls such as `res.json`, for instance:
|
||||
|
||||
```js
|
||||
router.post('/updateName', async (req, res) => {
|
||||
// ...
|
||||
res.json({
|
||||
name: req.body.name
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
To remove these, we can use the `HTTP` module from the CodeQL library for JavaScript:
|
||||
|
||||
```codeql
|
||||
override predicate isEffectiveSink(DataFlow::Node candidateSink) {
|
||||
candidateSink = EndpointFilterUtils::getALikelyExternalLibraryCall().getAnArgument() and
|
||||
not (
|
||||
// Remove modeled sinks
|
||||
CoreKnowledge::isKnownLibrarySink(candidateSink) or
|
||||
// Remove common kinds of unlikely sinks
|
||||
CoreKnowledge::isKnownStepSrc(candidateSink) or
|
||||
CoreKnowledge::isUnlikelySink(candidateSink) or
|
||||
// Remove calls to APIs that aren't relevant to NoSQL injection
|
||||
call.getReceiver().asExpr() instanceof HTTP::RequestExpr or
|
||||
call.getReceiver().asExpr() instanceof HTTP::ResponseExpr
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
Your query should now look like the [`step6.ql` query](resources/step6.ql).
|
||||
|
||||
## Conclusion
|
||||
|
||||
Congratulations, you've boosted your first security query!
|
||||
Run it and take a look at the new alerts.
|
||||
Higher scores typically indicate a higher chance of a true positive.
|
||||
Note that the boosted query does not include results from the standard query.
|
||||
For full coverage, the boosted query and the standard query should be run together.
|
||||
|
||||
Here are three ways you can improve your boosted query further:
|
||||
|
||||
- You can remove more false positives by refining the sink endpoint filter.
|
||||
- You can add more true positives by adding more known endpoints.
|
||||
- You can also recover more true positives by implementing further additional flow steps.
|
||||
|
||||
## Further information
|
||||
|
||||
### Improving your boosted query further
|
||||
|
||||
To see an example of how to further improve your boosted query, check out the [boosted NoSQL injection query](https://github.com/github/codeql/blob/experimental/atm/javascript/ql/src/experimental/adaptivethreatmodeling/NosqlInjectionATM.ql) provided with the ATM libraries.
|
||||
One of the ways in which this query improves on the query described in this guide is by implementing further additional flow steps to recover more true positives.
|
||||
Specifically, this query generalizes the additional flow step described earlier in this guide.
|
||||
The additional flow step now includes more complex query objects, such as in the following code:
|
||||
|
||||
```js
|
||||
const notes = await Note.find({
|
||||
$or: [
|
||||
{
|
||||
isPublic: true
|
||||
},
|
||||
{
|
||||
ownerToken: req.body.token
|
||||
}
|
||||
]
|
||||
}).exec();
|
||||
```
|
||||
|
||||
### Including the standard results
|
||||
|
||||
By default, ATM only provides results when there is data flow from a known source to an effective sink.
|
||||
This means that the results of a boosted query do not contain results from the standard query.
|
||||
|
||||
To get all of the results in the boosted query:
|
||||
|
||||
1. Remove the sinks that are relevant to NoSQL injection from the sink endpoint filter.
|
||||
You can do this by replacing `CoreKnowledge::isKnownLibrarySink(candidateSink)` with `(CoreKnowledge::isKnownLibrarySink(candidateSink) and not candidateSink instanceof NosqlInjection::Sink)` in the `isEffectiveSink` predicate.
|
||||
|
||||
2. Remove the standard results filter from the [select clause](https://help.semmle.com/QL/ql-handbook/queries.html#select-clauses).
|
||||
You can do this by deleting the line containing the expression `isFlowLikelyInBaseQuery(source.getNode(), sink.getNode())`.
|
||||
|
||||
Your final query should look like the [`optional-step6-all-results.ql`](./resources/optional-step6-all-results.ql) query.
|
||||
@@ -0,0 +1,93 @@
|
||||
/**
|
||||
* @name NoSQL database query built from user-controlled sources (boosted)
|
||||
* @description Building a database query from user-controlled sources is vulnerable to insertion of
|
||||
* malicious code by the user.
|
||||
* @kind path-problem
|
||||
* @problem.severity error
|
||||
*/
|
||||
|
||||
import javascript
|
||||
import experimental.adaptivethreatmodeling.AdaptiveThreatModeling
|
||||
import experimental.adaptivethreatmodeling.CoreKnowledge as CoreKnowledge
|
||||
import experimental.adaptivethreatmodeling.EndpointFilterUtils as EndpointFilterUtils
|
||||
import ATM::ResultsInfo
|
||||
import DataFlow::PathGraph
|
||||
import semmle.javascript.security.TaintedObject
|
||||
import semmle.javascript.security.dataflow.NosqlInjectionCustomizations::NosqlInjection
|
||||
|
||||
class NosqlInjectionATMConfig extends ATMConfig {
|
||||
NosqlInjectionATMConfig() { this = "NosqlInjectionATMConfig" }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source) { source instanceof Source }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source, DataFlow::FlowLabel label) {
|
||||
TaintedObject::isSource(source, label)
|
||||
}
|
||||
|
||||
override predicate isKnownSink(DataFlow::Node sink, DataFlow::FlowLabel label) {
|
||||
sink.(Sink).getAFlowLabel() = label
|
||||
}
|
||||
|
||||
override predicate isSanitizer(DataFlow::Node node) {
|
||||
super.isSanitizer(node) or
|
||||
node instanceof Sanitizer
|
||||
}
|
||||
|
||||
override predicate isSanitizerGuard(TaintTracking::SanitizerGuardNode guard) {
|
||||
guard instanceof TaintedObject::SanitizerGuard
|
||||
}
|
||||
|
||||
override predicate isAdditionalFlowStep(
|
||||
DataFlow::Node src, DataFlow::Node trg, DataFlow::FlowLabel inlbl, DataFlow::FlowLabel outlbl
|
||||
) {
|
||||
TaintedObject::step(src, trg, inlbl, outlbl)
|
||||
or
|
||||
// additional flow step to track taint through NoSQL query objects
|
||||
inlbl = TaintedObject::label() and
|
||||
outlbl = TaintedObject::label() and
|
||||
exists(NoSQL::Query query, DataFlow::SourceNode queryObj |
|
||||
queryObj.flowsToExpr(query) and
|
||||
queryObj.flowsTo(trg) and
|
||||
src = queryObj.getAPropertyWrite().getRhs()
|
||||
)
|
||||
or
|
||||
// relaxed version of previous flow step to track taint through predicted NoSQL query objects
|
||||
any(ATM::Configuration cfg).isSink(trg) and
|
||||
src = trg.(DataFlow::SourceNode).getAPropertyWrite().getRhs()
|
||||
}
|
||||
|
||||
override predicate isEffectiveSink(DataFlow::Node candidateSink) {
|
||||
exists(DataFlow::CallNode call |
|
||||
call = EndpointFilterUtils::getALikelyExternalLibraryCall() and
|
||||
candidateSink = call.getAnArgument() and
|
||||
not (
|
||||
// Remove modeled sinks, apart from the modeled NoSQL injection sinks
|
||||
CoreKnowledge::isKnownLibrarySink(candidateSink) and
|
||||
not candidateSink instanceof Sink
|
||||
or
|
||||
// Remove common kinds of unlikely sinks
|
||||
CoreKnowledge::isKnownStepSrc(candidateSink)
|
||||
or
|
||||
CoreKnowledge::isUnlikelySink(candidateSink)
|
||||
or
|
||||
// Remove calls to APIs that aren't relevant to NoSQL injection
|
||||
call.getReceiver().asExpr() instanceof HTTP::RequestExpr
|
||||
or
|
||||
call.getReceiver().asExpr() instanceof HTTP::ResponseExpr
|
||||
)
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
from
|
||||
ATM::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, string scoreString,
|
||||
string sourceSinkOriginReport
|
||||
where
|
||||
cfg.hasFlowPath(source, sink) and
|
||||
scoreString = scoreStringForFlow(source.getNode(), sink.getNode()) and
|
||||
sourceSinkOriginReport =
|
||||
"Source origin: " + originsForSource(source.getNode()).listOfOriginComponents() + " " +
|
||||
" Sink origin: " + originsForSink(sink.getNode()).listOfOriginComponents()
|
||||
select sink.getNode(), source, sink,
|
||||
"[Score = " + scoreString + "] This may be a NoSQL query depending on $@ " +
|
||||
sourceSinkOriginReport as msg, source.getNode(), "a user-provided value"
|
||||
@@ -0,0 +1,36 @@
|
||||
/**
|
||||
* @name NoSQL database query built from user-controlled sources (boosted)
|
||||
* @description Building a database query from user-controlled sources is vulnerable to insertion of
|
||||
* malicious code by the user.
|
||||
* @kind path-problem
|
||||
* @problem.severity error
|
||||
*/
|
||||
|
||||
import javascript
|
||||
import experimental.adaptivethreatmodeling.AdaptiveThreatModeling
|
||||
import ATM::ResultsInfo
|
||||
import DataFlow::PathGraph
|
||||
|
||||
class NosqlInjectionATMConfig extends ATMConfig {
|
||||
NosqlInjectionATMConfig() { this = "NosqlInjectionATMConfig" }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source) { none() }
|
||||
|
||||
override predicate isKnownSink(DataFlow::Node sink) { none() }
|
||||
|
||||
override predicate isEffectiveSink(DataFlow::Node candidateSink) { none() }
|
||||
}
|
||||
|
||||
from
|
||||
ATM::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, string scoreString,
|
||||
string sourceSinkOriginReport
|
||||
where
|
||||
cfg.hasFlowPath(source, sink) and
|
||||
not isFlowLikelyInBaseQuery(source.getNode(), sink.getNode()) and
|
||||
scoreString = scoreStringForFlow(source.getNode(), sink.getNode()) and
|
||||
sourceSinkOriginReport =
|
||||
"Source origin: " + originsForSource(source.getNode()).listOfOriginComponents() + " " +
|
||||
" Sink origin: " + originsForSink(sink.getNode()).listOfOriginComponents()
|
||||
select sink.getNode(), source, sink,
|
||||
"[Score = " + scoreString + "] This may be a NoSQL query depending on $@ " +
|
||||
sourceSinkOriginReport as msg, source.getNode(), "a user-provided value"
|
||||
@@ -0,0 +1,68 @@
|
||||
/**
|
||||
* @name NoSQL database query built from user-controlled sources (boosted)
|
||||
* @description Building a database query from user-controlled sources is vulnerable to insertion of
|
||||
* malicious code by the user.
|
||||
* @kind path-problem
|
||||
* @problem.severity error
|
||||
*/
|
||||
|
||||
import javascript
|
||||
import experimental.adaptivethreatmodeling.AdaptiveThreatModeling
|
||||
import ATM::ResultsInfo
|
||||
import DataFlow::PathGraph
|
||||
import semmle.javascript.security.TaintedObject
|
||||
import semmle.javascript.security.dataflow.NosqlInjectionCustomizations::NosqlInjection
|
||||
|
||||
class NosqlInjectionATMConfig extends ATMConfig {
|
||||
NosqlInjectionATMConfig() { this = "NosqlInjectionATMConfig" }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source) { source instanceof Source }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source, DataFlow::FlowLabel label) {
|
||||
TaintedObject::isSource(source, label)
|
||||
}
|
||||
|
||||
override predicate isKnownSink(DataFlow::Node sink, DataFlow::FlowLabel label) {
|
||||
sink.(Sink).getAFlowLabel() = label
|
||||
}
|
||||
|
||||
override predicate isSanitizer(DataFlow::Node node) {
|
||||
super.isSanitizer(node) or
|
||||
node instanceof Sanitizer
|
||||
}
|
||||
|
||||
override predicate isSanitizerGuard(TaintTracking::SanitizerGuardNode guard) {
|
||||
guard instanceof TaintedObject::SanitizerGuard
|
||||
}
|
||||
|
||||
override predicate isAdditionalFlowStep(
|
||||
DataFlow::Node src, DataFlow::Node trg, DataFlow::FlowLabel inlbl, DataFlow::FlowLabel outlbl
|
||||
) {
|
||||
TaintedObject::step(src, trg, inlbl, outlbl)
|
||||
or
|
||||
// additional flow step to track taint through NoSQL query objects
|
||||
inlbl = TaintedObject::label() and
|
||||
outlbl = TaintedObject::label() and
|
||||
exists(NoSQL::Query query, DataFlow::SourceNode queryObj |
|
||||
queryObj.flowsToExpr(query) and
|
||||
queryObj.flowsTo(trg) and
|
||||
src = queryObj.getAPropertyWrite().getRhs()
|
||||
)
|
||||
}
|
||||
|
||||
override predicate isEffectiveSink(DataFlow::Node candidateSink) { none() }
|
||||
}
|
||||
|
||||
from
|
||||
ATM::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, string scoreString,
|
||||
string sourceSinkOriginReport
|
||||
where
|
||||
cfg.hasFlowPath(source, sink) and
|
||||
not isFlowLikelyInBaseQuery(source.getNode(), sink.getNode()) and
|
||||
scoreString = scoreStringForFlow(source.getNode(), sink.getNode()) and
|
||||
sourceSinkOriginReport =
|
||||
"Source origin: " + originsForSource(source.getNode()).listOfOriginComponents() + " " +
|
||||
" Sink origin: " + originsForSink(sink.getNode()).listOfOriginComponents()
|
||||
select sink.getNode(), source, sink,
|
||||
"[Score = " + scoreString + "] This may be a NoSQL query depending on $@ " +
|
||||
sourceSinkOriginReport as msg, source.getNode(), "a user-provided value"
|
||||
@@ -0,0 +1,72 @@
|
||||
/**
|
||||
* @name NoSQL database query built from user-controlled sources (boosted)
|
||||
* @description Building a database query from user-controlled sources is vulnerable to insertion of
|
||||
* malicious code by the user.
|
||||
* @kind path-problem
|
||||
* @problem.severity error
|
||||
*/
|
||||
|
||||
import javascript
|
||||
import experimental.adaptivethreatmodeling.AdaptiveThreatModeling
|
||||
import ATM::ResultsInfo
|
||||
import DataFlow::PathGraph
|
||||
import semmle.javascript.security.TaintedObject
|
||||
import semmle.javascript.security.dataflow.NosqlInjectionCustomizations::NosqlInjection
|
||||
|
||||
class NosqlInjectionATMConfig extends ATMConfig {
|
||||
NosqlInjectionATMConfig() { this = "NosqlInjectionATMConfig" }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source) { source instanceof Source }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source, DataFlow::FlowLabel label) {
|
||||
TaintedObject::isSource(source, label)
|
||||
}
|
||||
|
||||
override predicate isKnownSink(DataFlow::Node sink, DataFlow::FlowLabel label) {
|
||||
sink.(Sink).getAFlowLabel() = label
|
||||
}
|
||||
|
||||
override predicate isSanitizer(DataFlow::Node node) {
|
||||
super.isSanitizer(node) or
|
||||
node instanceof Sanitizer
|
||||
}
|
||||
|
||||
override predicate isSanitizerGuard(TaintTracking::SanitizerGuardNode guard) {
|
||||
guard instanceof TaintedObject::SanitizerGuard
|
||||
}
|
||||
|
||||
override predicate isAdditionalFlowStep(
|
||||
DataFlow::Node src, DataFlow::Node trg, DataFlow::FlowLabel inlbl, DataFlow::FlowLabel outlbl
|
||||
) {
|
||||
TaintedObject::step(src, trg, inlbl, outlbl)
|
||||
or
|
||||
// additional flow step to track taint through NoSQL query objects
|
||||
inlbl = TaintedObject::label() and
|
||||
outlbl = TaintedObject::label() and
|
||||
exists(NoSQL::Query query, DataFlow::SourceNode queryObj |
|
||||
queryObj.flowsToExpr(query) and
|
||||
queryObj.flowsTo(trg) and
|
||||
src = queryObj.getAPropertyWrite().getRhs()
|
||||
)
|
||||
or
|
||||
// relaxed version of previous flow step to track taint through predicted NoSQL query objects
|
||||
any(ATM::Configuration cfg).isSink(trg) and
|
||||
src = trg.(DataFlow::SourceNode).getAPropertyWrite().getRhs()
|
||||
}
|
||||
|
||||
override predicate isEffectiveSink(DataFlow::Node candidateSink) { none() }
|
||||
}
|
||||
|
||||
from
|
||||
ATM::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, string scoreString,
|
||||
string sourceSinkOriginReport
|
||||
where
|
||||
cfg.hasFlowPath(source, sink) and
|
||||
not isFlowLikelyInBaseQuery(source.getNode(), sink.getNode()) and
|
||||
scoreString = scoreStringForFlow(source.getNode(), sink.getNode()) and
|
||||
sourceSinkOriginReport =
|
||||
"Source origin: " + originsForSource(source.getNode()).listOfOriginComponents() + " " +
|
||||
" Sink origin: " + originsForSink(sink.getNode()).listOfOriginComponents()
|
||||
select sink.getNode(), source, sink,
|
||||
"[Score = " + scoreString + "] This may be a NoSQL query depending on $@ " +
|
||||
sourceSinkOriginReport as msg, source.getNode(), "a user-provided value"
|
||||
@@ -0,0 +1,78 @@
|
||||
/**
|
||||
* @name NoSQL database query built from user-controlled sources (boosted)
|
||||
* @description Building a database query from user-controlled sources is vulnerable to insertion of
|
||||
* malicious code by the user.
|
||||
* @kind path-problem
|
||||
* @problem.severity error
|
||||
*/
|
||||
|
||||
import javascript
|
||||
import experimental.adaptivethreatmodeling.AdaptiveThreatModeling
|
||||
import experimental.adaptivethreatmodeling.EndpointFilterUtils as EndpointFilterUtils
|
||||
import ATM::ResultsInfo
|
||||
import DataFlow::PathGraph
|
||||
import semmle.javascript.security.TaintedObject
|
||||
import semmle.javascript.security.dataflow.NosqlInjectionCustomizations::NosqlInjection
|
||||
|
||||
class NosqlInjectionATMConfig extends ATMConfig {
|
||||
NosqlInjectionATMConfig() { this = "NosqlInjectionATMConfig" }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source) { source instanceof Source }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source, DataFlow::FlowLabel label) {
|
||||
TaintedObject::isSource(source, label)
|
||||
}
|
||||
|
||||
override predicate isKnownSink(DataFlow::Node sink, DataFlow::FlowLabel label) {
|
||||
sink.(Sink).getAFlowLabel() = label
|
||||
}
|
||||
|
||||
override predicate isSanitizer(DataFlow::Node node) {
|
||||
super.isSanitizer(node) or
|
||||
node instanceof Sanitizer
|
||||
}
|
||||
|
||||
override predicate isSanitizerGuard(TaintTracking::SanitizerGuardNode guard) {
|
||||
guard instanceof TaintedObject::SanitizerGuard
|
||||
}
|
||||
|
||||
override predicate isAdditionalFlowStep(
|
||||
DataFlow::Node src, DataFlow::Node trg, DataFlow::FlowLabel inlbl, DataFlow::FlowLabel outlbl
|
||||
) {
|
||||
TaintedObject::step(src, trg, inlbl, outlbl)
|
||||
or
|
||||
// additional flow step to track taint through NoSQL query objects
|
||||
inlbl = TaintedObject::label() and
|
||||
outlbl = TaintedObject::label() and
|
||||
exists(NoSQL::Query query, DataFlow::SourceNode queryObj |
|
||||
queryObj.flowsToExpr(query) and
|
||||
queryObj.flowsTo(trg) and
|
||||
src = queryObj.getAPropertyWrite().getRhs()
|
||||
)
|
||||
or
|
||||
// relaxed version of previous flow step to track taint through predicted NoSQL query objects
|
||||
any(ATM::Configuration cfg).isSink(trg) and
|
||||
src = trg.(DataFlow::SourceNode).getAPropertyWrite().getRhs()
|
||||
}
|
||||
|
||||
override predicate isEffectiveSink(DataFlow::Node candidateSink) {
|
||||
exists(DataFlow::CallNode call |
|
||||
call = EndpointFilterUtils::getALikelyExternalLibraryCall() and
|
||||
candidateSink = call.getAnArgument()
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
from
|
||||
ATM::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, string scoreString,
|
||||
string sourceSinkOriginReport
|
||||
where
|
||||
cfg.hasFlowPath(source, sink) and
|
||||
not isFlowLikelyInBaseQuery(source.getNode(), sink.getNode()) and
|
||||
scoreString = scoreStringForFlow(source.getNode(), sink.getNode()) and
|
||||
sourceSinkOriginReport =
|
||||
"Source origin: " + originsForSource(source.getNode()).listOfOriginComponents() + " " +
|
||||
" Sink origin: " + originsForSink(sink.getNode()).listOfOriginComponents()
|
||||
select sink.getNode(), source, sink,
|
||||
"[Score = " + scoreString + "] This may be a NoSQL query depending on $@ " +
|
||||
sourceSinkOriginReport as msg, source.getNode(), "a user-provided value"
|
||||
@@ -0,0 +1,86 @@
|
||||
/**
|
||||
* @name NoSQL database query built from user-controlled sources (boosted)
|
||||
* @description Building a database query from user-controlled sources is vulnerable to insertion of
|
||||
* malicious code by the user.
|
||||
* @kind path-problem
|
||||
* @problem.severity error
|
||||
*/
|
||||
|
||||
import javascript
|
||||
import experimental.adaptivethreatmodeling.AdaptiveThreatModeling
|
||||
import experimental.adaptivethreatmodeling.CoreKnowledge as CoreKnowledge
|
||||
import experimental.adaptivethreatmodeling.EndpointFilterUtils as EndpointFilterUtils
|
||||
import ATM::ResultsInfo
|
||||
import DataFlow::PathGraph
|
||||
import semmle.javascript.security.TaintedObject
|
||||
import semmle.javascript.security.dataflow.NosqlInjectionCustomizations::NosqlInjection
|
||||
|
||||
class NosqlInjectionATMConfig extends ATMConfig {
|
||||
NosqlInjectionATMConfig() { this = "NosqlInjectionATMConfig" }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source) { source instanceof Source }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source, DataFlow::FlowLabel label) {
|
||||
TaintedObject::isSource(source, label)
|
||||
}
|
||||
|
||||
override predicate isKnownSink(DataFlow::Node sink, DataFlow::FlowLabel label) {
|
||||
sink.(Sink).getAFlowLabel() = label
|
||||
}
|
||||
|
||||
override predicate isSanitizer(DataFlow::Node node) {
|
||||
super.isSanitizer(node) or
|
||||
node instanceof Sanitizer
|
||||
}
|
||||
|
||||
override predicate isSanitizerGuard(TaintTracking::SanitizerGuardNode guard) {
|
||||
guard instanceof TaintedObject::SanitizerGuard
|
||||
}
|
||||
|
||||
override predicate isAdditionalFlowStep(
|
||||
DataFlow::Node src, DataFlow::Node trg, DataFlow::FlowLabel inlbl, DataFlow::FlowLabel outlbl
|
||||
) {
|
||||
TaintedObject::step(src, trg, inlbl, outlbl)
|
||||
or
|
||||
// additional flow step to track taint through NoSQL query objects
|
||||
inlbl = TaintedObject::label() and
|
||||
outlbl = TaintedObject::label() and
|
||||
exists(NoSQL::Query query, DataFlow::SourceNode queryObj |
|
||||
queryObj.flowsToExpr(query) and
|
||||
queryObj.flowsTo(trg) and
|
||||
src = queryObj.getAPropertyWrite().getRhs()
|
||||
)
|
||||
or
|
||||
// relaxed version of previous flow step to track taint through predicted NoSQL query objects
|
||||
any(ATM::Configuration cfg).isSink(trg) and
|
||||
src = trg.(DataFlow::SourceNode).getAPropertyWrite().getRhs()
|
||||
}
|
||||
|
||||
override predicate isEffectiveSink(DataFlow::Node candidateSink) {
|
||||
exists(DataFlow::CallNode call |
|
||||
call = EndpointFilterUtils::getALikelyExternalLibraryCall() and
|
||||
candidateSink = call.getAnArgument() and
|
||||
not (
|
||||
// Remove modeled sinks
|
||||
CoreKnowledge::isKnownLibrarySink(candidateSink) or
|
||||
// Remove common kinds of unlikely sinks
|
||||
CoreKnowledge::isKnownStepSrc(candidateSink) or
|
||||
CoreKnowledge::isUnlikelySink(candidateSink)
|
||||
)
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
from
|
||||
ATM::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, string scoreString,
|
||||
string sourceSinkOriginReport
|
||||
where
|
||||
cfg.hasFlowPath(source, sink) and
|
||||
not isFlowLikelyInBaseQuery(source.getNode(), sink.getNode()) and
|
||||
scoreString = scoreStringForFlow(source.getNode(), sink.getNode()) and
|
||||
sourceSinkOriginReport =
|
||||
"Source origin: " + originsForSource(source.getNode()).listOfOriginComponents() + " " +
|
||||
" Sink origin: " + originsForSink(sink.getNode()).listOfOriginComponents()
|
||||
select sink.getNode(), source, sink,
|
||||
"[Score = " + scoreString + "] This may be a NoSQL query depending on $@ " +
|
||||
sourceSinkOriginReport as msg, source.getNode(), "a user-provided value"
|
||||
@@ -0,0 +1,89 @@
|
||||
/**
|
||||
* @name NoSQL database query built from user-controlled sources (boosted)
|
||||
* @description Building a database query from user-controlled sources is vulnerable to insertion of
|
||||
* malicious code by the user.
|
||||
* @kind path-problem
|
||||
* @problem.severity error
|
||||
*/
|
||||
|
||||
import javascript
|
||||
import experimental.adaptivethreatmodeling.AdaptiveThreatModeling
|
||||
import experimental.adaptivethreatmodeling.CoreKnowledge as CoreKnowledge
|
||||
import experimental.adaptivethreatmodeling.EndpointFilterUtils as EndpointFilterUtils
|
||||
import ATM::ResultsInfo
|
||||
import DataFlow::PathGraph
|
||||
import semmle.javascript.security.TaintedObject
|
||||
import semmle.javascript.security.dataflow.NosqlInjectionCustomizations::NosqlInjection
|
||||
|
||||
class NosqlInjectionATMConfig extends ATMConfig {
|
||||
NosqlInjectionATMConfig() { this = "NosqlInjectionATMConfig" }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source) { source instanceof Source }
|
||||
|
||||
override predicate isKnownSource(DataFlow::Node source, DataFlow::FlowLabel label) {
|
||||
TaintedObject::isSource(source, label)
|
||||
}
|
||||
|
||||
override predicate isKnownSink(DataFlow::Node sink, DataFlow::FlowLabel label) {
|
||||
sink.(Sink).getAFlowLabel() = label
|
||||
}
|
||||
|
||||
override predicate isSanitizer(DataFlow::Node node) {
|
||||
super.isSanitizer(node) or
|
||||
node instanceof Sanitizer
|
||||
}
|
||||
|
||||
override predicate isSanitizerGuard(TaintTracking::SanitizerGuardNode guard) {
|
||||
guard instanceof TaintedObject::SanitizerGuard
|
||||
}
|
||||
|
||||
override predicate isAdditionalFlowStep(
|
||||
DataFlow::Node src, DataFlow::Node trg, DataFlow::FlowLabel inlbl, DataFlow::FlowLabel outlbl
|
||||
) {
|
||||
TaintedObject::step(src, trg, inlbl, outlbl)
|
||||
or
|
||||
// additional flow step to track taint through NoSQL query objects
|
||||
inlbl = TaintedObject::label() and
|
||||
outlbl = TaintedObject::label() and
|
||||
exists(NoSQL::Query query, DataFlow::SourceNode queryObj |
|
||||
queryObj.flowsToExpr(query) and
|
||||
queryObj.flowsTo(trg) and
|
||||
src = queryObj.getAPropertyWrite().getRhs()
|
||||
)
|
||||
or
|
||||
// relaxed version of previous flow step to track taint through predicted NoSQL query objects
|
||||
any(ATM::Configuration cfg).isSink(trg) and
|
||||
src = trg.(DataFlow::SourceNode).getAPropertyWrite().getRhs()
|
||||
}
|
||||
|
||||
override predicate isEffectiveSink(DataFlow::Node candidateSink) {
|
||||
exists(DataFlow::CallNode call |
|
||||
call = EndpointFilterUtils::getALikelyExternalLibraryCall() and
|
||||
candidateSink = call.getAnArgument() and
|
||||
not (
|
||||
// Remove modeled sinks
|
||||
CoreKnowledge::isKnownLibrarySink(candidateSink) or
|
||||
// Remove common kinds of unlikely sinks
|
||||
CoreKnowledge::isKnownStepSrc(candidateSink) or
|
||||
CoreKnowledge::isUnlikelySink(candidateSink) or
|
||||
// Remove calls to APIs that aren't relevant to NoSQL injection
|
||||
call.getReceiver().asExpr() instanceof HTTP::RequestExpr or
|
||||
call.getReceiver().asExpr() instanceof HTTP::ResponseExpr
|
||||
)
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
from
|
||||
ATM::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, string scoreString,
|
||||
string sourceSinkOriginReport
|
||||
where
|
||||
cfg.hasFlowPath(source, sink) and
|
||||
not isFlowLikelyInBaseQuery(source.getNode(), sink.getNode()) and
|
||||
scoreString = scoreStringForFlow(source.getNode(), sink.getNode()) and
|
||||
sourceSinkOriginReport =
|
||||
"Source origin: " + originsForSource(source.getNode()).listOfOriginComponents() + " " +
|
||||
" Sink origin: " + originsForSink(sink.getNode()).listOfOriginComponents()
|
||||
select sink.getNode(), source, sink,
|
||||
"[Score = " + scoreString + "] This may be a NoSQL query depending on $@ " +
|
||||
sourceSinkOriginReport as msg, source.getNode(), "a user-provided value"
|
||||
@@ -0,0 +1,35 @@
|
||||
# Support and feedback
|
||||
|
||||
We're eager to help you use adaptive threat modeling (ATM) and learn from your experiences of using ATM to find new security vulnerabilities. Your feedback will help us to improve ATM and contribute to securing the world's software.
|
||||
|
||||
## How do I get help?
|
||||
|
||||
You should have access to the Slack channel [`#codeql-atm-beta`](https://ghsecuritylab.slack.com/archives/C011BJD7279) on the GitHub Security Lab Slack instance. Please raise any issues there, and the ATM team will get back to you as quickly as possible.
|
||||
|
||||
## How can I give feedback?
|
||||
|
||||
We'd like as much feedback on ATM as you're willing to give. But we know your time is precious. So here's a list of suggested feedback items, ordered from quick to a (bit) longer.
|
||||
|
||||
- Say hi on `#codeql-atm-beta` with a short sentence to explain why you're interested in ATM. This will help us gauge the interest in advanced security features.
|
||||
|
||||
- Share examples of boosted queries. We're interested in how you generate known endpoints, and what kinds of endpoint filters turn out to be useful.
|
||||
|
||||
- Once you've completed your experiments with ATM, please answer the following questions, and share on the channel:
|
||||
|
||||
1. Did you find security vulnerabilities with ATM? (yes or no)
|
||||
|
||||
2. What do you like best about ATM?
|
||||
|
||||
3. What would you most like to improve about ATM?
|
||||
|
||||
4. How useful is ATM to you? Answer from 1 to 5 (1 = "definitely useful", 5 = "definitely not useful")
|
||||
|
||||
5. How likely would you be to recommend ATM to a colleague?
|
||||
Answer from 1 to 5 (1 = "extremely likely", 5 = "not at all likely").
|
||||
|
||||
- Share any vulnerabilities you find (that you can disclose) by privately emailing `codeql-atm-beta@github.com`. We'd like to understand the efficacy of different types of boosted queries. Please tell us (i) which repos you ran the query against, (ii) the number of alerts generated, (iii) how many you manually checked, and (iv) how many of those turned out to be actual vulnerabilities (whether exploitable or not). We don’t expect you to eyeball all the results, so it makes sense to sub-sample with a bias towards alerts with higher scores. Complete a table, with a row for every repo you run the query against, and email us. For example:
|
||||
|
||||
| URL of repo (if open source) or short description (if closed source) | # alerts | # alerts checked | # true positives |
|
||||
|----------------------------------------------------------------------|----------|------------------|------------------|
|
||||
| https://github.com/example1/example1 | 97 | 10 | 1 |
|
||||
| https://github.com/example2/example2 | 120 | 11 | 2 |
|
||||
Reference in New Issue
Block a user