Compare commits

...

9 Commits

Author SHA1 Message Date
Aditya Sharad
e60c8470a3 JS: Improve performance of DominatingPaths::hasDominatingWrite
Check that the read node is in a *reachable* basic block
before looking for a dominating write block.
2022-01-18 16:23:04 +00:00
Aditya Sharad
ee97acefcd JS: Minor simplication of ranked basic block calculation
We have to look up the  node index within the block anyway,
so include it as an aggregation variable.
2022-01-18 16:23:04 +00:00
Aditya Sharad
c553330ee3 JS: Fix possible typo in DominatingPaths::hasWrite
Should most likely refer to `AccessPathWrite` and look for write nodes.
This also improves the performance of `rankedAccessPath`, since the
set of candidate blocks is now limited to blocks with both a read and a write.
2022-01-18 16:23:04 +00:00
Aditya Sharad
1d507f1993 JS: Improve performance of StandardEndpointFilters::isNumeric
Factor the regex-independent logic of `isReadFrom` into its own predicate.

Call this predicate directly from `isNumeric`, which doesn't have much
restrictive context on the set of starting nodes.
Use a binding hint to discourage starting with all expr nodes in this case.

Other callers may have more restrictive context on the set of nodes,
so they are not changed.
2022-01-18 16:23:04 +00:00
Aditya Sharad
1b7088abde JS: Tweak performance of CorsOriginHeaderWithAssociatedCredentialHeader
On databases with a large number of Exprs, it can be better
to start with the set of route handlers, then find their
response headers, then find the expression values set in
those headers.
2022-01-18 16:23:03 +00:00
Aditya Sharad
eec7b926b0 JS: Add this reference for clarity
No behaviour change.
2022-01-18 16:23:03 +00:00
Aditya Sharad
0a003edaa7 JS: Improve performance of Xss::isOptionallySanitizedEdge
When join-ordering and evaluating this conjunction,
it is preferable to start with the relatively small set of
`sanitizer` calls, then compute the set of SSA variables accessed
as the arguments of those sanitizer calls, then reason about how
those variables are used in phi nodes.

Use directional binding pragmas to encourage this join order
by picking `sanitizer` first, and discourage picking the
opposite join order starting with `phi`.

This impacts performance of the ATM XSS queries on large databases like Node,
where computing all variable accesses from phi nodes leads to 435M+ tuples.
2022-01-18 16:23:03 +00:00
Aditya Sharad
b7852cec7a JS: Improve performance of ClassifyFiles::isTestFile
One of the heuristics for test files looks for source files
of the form `base.ext`, then looks for sibling test files
of the form `base.test.ext` or `base.spec.ext`.

On large databases, the result join order computed all source files,
the containers of those files, then all other files within those
containers, before computing the test file names and filtering using
those names.

The product of all files with all other files in the same containers
is of the same order of magnitude as the product of the `files`
table with itself, which on large DBs like Node can be 12M+ tuples.

As a performance optimisation, factor out a helper predicate that
computes the likely test file names for each source file, so these
can be determined with a single join against the files table.
This results in much better join orders, such as computing the set
of files and their containers, then the test file names, then the
sibling files with those names.

This loses some flexibility because the set of 'test' extension names
is hardcoded in the library rather than provided by the caller predicate.
The original predicate remains to avoid breaking other callers, but could
eventually be deprecated.
2022-01-18 16:23:03 +00:00
Henry Mercer
03bb3cce73 JS: Update featurization for absent features optimization
Absent features are now represented implicitly by the absence of a row
in the `tokenFeatures` relation, rather than explicitly by an empty
string. This leads to improved runtime performance. To enable this
implicit representation, we pass the set of supported token features to
the `scoreEndpoints` HOP. Requires CodeQL CLI v2.7.4.
2022-01-13 16:29:39 +00:00
10 changed files with 89 additions and 33 deletions

View File

@@ -284,7 +284,7 @@ private module FunctionNames {
}
/** Get a name of a supported generic token-based feature. */
private string getASupportedFeatureName() {
string getASupportedFeatureName() {
result =
[
"enclosingFunctionName", "calleeName", "receiverName", "argumentIndex", "calleeApiName",
@@ -301,12 +301,5 @@ private string getASupportedFeatureName() {
predicate tokenFeatures(DataFlow::Node endpoint, string featureName, string featureValue) {
// Performance optimization: Restrict feature extraction to endpoints we've explicitly asked to featurize.
endpoint = any(FeaturizationConfig cfg).getAnEndpointToFeaturize() and
(
if strictcount(getTokenFeature(endpoint, featureName)) = 1
then featureValue = getTokenFeature(endpoint, featureName)
else (
// Performance note: this is a Cartesian product between all endpoints and feature names.
featureValue = "" and featureName = getASupportedFeatureName()
)
)
featureValue = getTokenFeature(endpoint, featureName)
}

View File

@@ -36,9 +36,9 @@ module ModelScoring {
private int getARequestedEndpointType() { result = any(EndpointType type).getEncoding() }
predicate endpointScores(DataFlow::Node endpoint, int encodedEndpointType, float score) =
scoreEndpoints(getARequestedEndpoint/0, getARequestedEndpointType/0,
EndpointFeatures::tokenFeatures/3, getACompatibleModelChecksum/0)(endpoint,
encodedEndpointType, score)
scoreEndpoints(getARequestedEndpoint/0, EndpointFeatures::tokenFeatures/3,
EndpointFeatures::getASupportedFeatureName/0, getARequestedEndpointType/0,
getACompatibleModelChecksum/0)(endpoint, encodedEndpointType, score)
}
/**

View File

@@ -53,7 +53,11 @@ predicate isSomeModeledArgument(DataFlow::Node n) {
/**
* Holds if `n` appears to be a numeric value.
*/
predicate isNumeric(DataFlow::Node n) { isReadFrom(n, ".*index.*") }
// Performance optimisation: This predicate operates on a large set of
// starting nodes, so use binding hints to suggest computing that set last.
predicate isNumeric(DataFlow::Node n) {
getAnAccessedName(pragma[only_bind_into](n)).regexpMatch(".*index.*")
}
/**
* Holds if `n` is an argument to a library without sinks.

View File

@@ -462,15 +462,15 @@ module AccessPath {
ReachableBasicBlock bb, Root root, string path, int ranking, AccessPathKind type
) {
result =
rank[ranking](ControlFlowNode ref |
rank[ranking](ControlFlowNode ref, int i |
ref = getAccessTo(root, path, _) and
ref.getBasicBlock() = bb and
ref = bb.getNode(i) and
// Prunes the accesses where there does not exists a read and write within the same basicblock.
// This could be more precise, but doing it like this avoids massive joins.
hasRead(bb) and
hasWrite(bb)
|
ref order by any(int i | ref = bb.getNode(i))
ref order by i
) and
result = getAccessTo(root, path, type)
}
@@ -492,7 +492,7 @@ module AccessPath {
*/
pragma[noinline]
private predicate hasWrite(ReachableBasicBlock bb) {
bb = getAccessTo(_, _, AccessPathRead()).getBasicBlock()
bb = getAccessTo(_, _, AccessPathWrite()).getBasicBlock()
}
/**
@@ -565,9 +565,12 @@ module AccessPath {
)
or
// across basic blocks.
exists(Root root, string path |
exists(Root root, string path, ReachableBasicBlock readBlock |
read.asExpr() = getAccessTo(root, path, AccessPathRead()) and
getAWriteBlock(root, path).strictlyDominates(read.getBasicBlock())
readBlock = read.getBasicBlock() and
// Performance optimisation: check that `read` is in a *reachable* basic block
// before looking for a dominating write block.
getAWriteBlock(root, path).strictlyDominates(pragma[only_bind_out](readBlock))
)
}
}

View File

@@ -217,7 +217,7 @@ private class AnalyzedImplicitInit extends AnalyzedSsaDefinition, SsaImplicitIni
*/
private class AnalyzedVariableCapture extends AnalyzedSsaDefinition, SsaVariableCapture {
override AbstractValue getAnRhsValue() {
exists(LocalVariable v | v = getSourceVariable() |
exists(LocalVariable v | v = this.getSourceVariable() |
result = v.(AnalyzedCapturedVariable).getALocalValue()
or
result = any(AnalyzedExplicitDefinition def | def.getSourceVariable() = v).getAnRhsValue()

View File

@@ -56,9 +56,7 @@ predicate isGeneratedCodeFile(File f) { isGenerated(f.getATopLevel()) }
predicate isTestFile(File f) {
exists(Test t | t.getFile() = f)
or
exists(string stemExt | stemExt = "test" or stemExt = "spec" |
f = getTestFile(any(File orig), stemExt)
)
f = getATestFile(_)
or
f.getAbsolutePath().regexpMatch(".*/__(mocks|tests)__/.*")
}

View File

@@ -40,7 +40,7 @@ class BDDTest extends Test, @call_expr {
/**
* Gets the test file for `f` with stem extension `stemExt`.
* That is, a file named file named `<base>.<stemExt>.<ext>` in the
* That is, a file named `<base>.<stemExt>.<ext>` in the
* same directory as `f` which is named `<base>.<ext>`.
*/
bindingset[stemExt]
@@ -48,6 +48,33 @@ File getTestFile(File f, string stemExt) {
result = f.getParentContainer().getFile(f.getStem() + "." + stemExt + "." + f.getExtension())
}
/**
* Gets a test file for `f`.
* That is, a file named `<base>.<stemExt>.<ext>` in the
* same directory as `f`, where `f` is named `<base>.<ext>` and
* `<stemExt>` is a well-known test file identifier, such as `test` or `spec`.
*/
File getATestFile(File f) {
result = f.getParentContainer().getFile(getATestFileName(f))
}
/**
* Gets a name of a test file for `f`.
* That is, `<base>.<stemExt>.<ext>` where
* `f` is named `<base>.<ext>` and `<stemExt>` is
* a well-known test file identifier, such as `test` or `spec`.
*/
// Helper predicate factored out for performance.
// This predicate is linear in the size of f, and forces
// callers to join only once against f rather than two separate joins
// when computing the stem and the extension.
// This loses some flexibility because callers cannot specify
// an arbitrary stemExt.
pragma[nomagic]
private string getATestFileName(File f) {
result = f.getStem() + "." + ["test", "spec"] + "." + f.getExtension()
}
/**
* A Jest test, that is, an invocation of a global function named
* `test` where the first argument is a string and the second

View File

@@ -16,15 +16,23 @@ import javascript
*/
bindingset[regexp]
predicate isReadFrom(DataFlow::Node read, string regexp) {
getAnAccessedName(read).regexpMatch(regexp)
}
/**
* Gets the "name" accessed by `read`. The "name" is one of:
* - the name of the read variable, if `read` is a variable read
* - the name of the read property, if `read` is a property read
* - the suffix of the getter-method name, if `read` is a getter invocation, for example "Number" in "getNumber"
*/
string getAnAccessedName(DataFlow::Node read) {
exists(DataFlow::Node actualRead |
actualRead = read.asExpr().getUnderlyingValue().(LogOrExpr).getAnOperand().flow() or // unfold `x || y` once
actualRead = read
|
exists(string name | name.regexpMatch(regexp) |
actualRead.asExpr().getUnderlyingValue().(VarAccess).getName() = name or
actualRead.(DataFlow::PropRead).getPropertyName() = name or
actualRead.(DataFlow::InvokeNode).getCalleeName() = "get" + name
)
actualRead.asExpr().getUnderlyingValue().(VarAccess).getName() = result or
actualRead.(DataFlow::PropRead).getPropertyName() = result or
actualRead.(DataFlow::InvokeNode).getCalleeName() = "get" + result
)
}

View File

@@ -50,8 +50,12 @@ module CorsMisconfigurationForCredentials {
|
routeHandler.getAResponseHeader(_) = origin and
routeHandler.getAResponseHeader(_) = credentials and
origin.definesExplicitly("access-control-allow-origin", this.asExpr()) and
credentials.definesExplicitly("access-control-allow-credentials", credentialsValue)
// Performance optimisation: start with the set of all route handlers
// rather than the set of all exprs.
pragma[only_bind_into](origin)
.definesExplicitly("access-control-allow-origin", this.asExpr()) and
pragma[only_bind_into](credentials)
.definesExplicitly("access-control-allow-credentials", credentialsValue)
|
credentialsValue.mayHaveBooleanValue(true) or
credentialsValue.mayHaveStringValue("true")

View File

@@ -437,8 +437,27 @@ module DomBasedXss {
b = phi.getAnInput().getDefinition() and
count(phi.getAnInput()) = 2 and
not a = b and
sanitizer = DataFlow::valueNode(a.getDef().getSource()) and
sanitizer.getAnArgument().asExpr().(VarAccess).getVariable() = b.getSourceVariable()
/*
* Performance optimisation:
*
* When join-ordering and evaluating this conjunction,
* it is preferable to start with the relatively small set of
* `sanitizer` calls, then compute the set of SSA variables accessed
* as the arguments of those sanitizer calls, then reason about how
* those variables are used in phi nodes.
*
* Use directional binding pragmas to encourage this join order,
* starting with `sanitizer`.
*
* Without these pragmas, the join orderer may choose the opposite order:
* start with all `phi` nodes, then compute the set of SSA variables involved,
* then the (potentially large) set of accesses to those variables,
* then the set of accesses used as the argument of a sanitizer call.
*/
pragma[only_bind_out](sanitizer) = DataFlow::valueNode(a.getDef().getSource()) and
pragma[only_bind_out](sanitizer.getAnArgument().asExpr()) =
b.getSourceVariable().getAnAccess()
|
pred = DataFlow::ssaDefinitionNode(b) and
succ = DataFlow::ssaDefinitionNode(phi)