JS: Improve performance of ClassifyFiles::isTestFile

One of the heuristics for test files looks for source files
of the form `base.ext`, then looks for sibling test files
of the form `base.test.ext` or `base.spec.ext`.

On large databases, the result join order computed all source files,
the containers of those files, then all other files within those
containers, before computing the test file names and filtering using
those names.

The product of all files with all other files in the same containers
is of the same order of magnitude as the product of the `files`
table with itself, which on large DBs like Node can be 12M+ tuples.

As a performance optimisation, factor out a helper predicate that
computes the likely test file names for each source file, so these
can be determined with a single join against the files table.
This results in much better join orders, such as computing the set
of files and their containers, then the test file names, then the
sibling files with those names.

This loses some flexibility because the set of 'test' extension names
is hardcoded in the library rather than provided by the caller predicate.
The original predicate remains to avoid breaking other callers, but could
eventually be deprecated.
This commit is contained in:
Aditya Sharad
2021-12-21 16:44:34 -08:00
committed by Henry Mercer
parent 03bb3cce73
commit b7852cec7a
2 changed files with 29 additions and 4 deletions

View File

@@ -56,9 +56,7 @@ predicate isGeneratedCodeFile(File f) { isGenerated(f.getATopLevel()) }
predicate isTestFile(File f) {
exists(Test t | t.getFile() = f)
or
exists(string stemExt | stemExt = "test" or stemExt = "spec" |
f = getTestFile(any(File orig), stemExt)
)
f = getATestFile(_)
or
f.getAbsolutePath().regexpMatch(".*/__(mocks|tests)__/.*")
}

View File

@@ -40,7 +40,7 @@ class BDDTest extends Test, @call_expr {
/**
* Gets the test file for `f` with stem extension `stemExt`.
* That is, a file named file named `<base>.<stemExt>.<ext>` in the
* That is, a file named `<base>.<stemExt>.<ext>` in the
* same directory as `f` which is named `<base>.<ext>`.
*/
bindingset[stemExt]
@@ -48,6 +48,33 @@ File getTestFile(File f, string stemExt) {
result = f.getParentContainer().getFile(f.getStem() + "." + stemExt + "." + f.getExtension())
}
/**
* Gets a test file for `f`.
* That is, a file named `<base>.<stemExt>.<ext>` in the
* same directory as `f`, where `f` is named `<base>.<ext>` and
* `<stemExt>` is a well-known test file identifier, such as `test` or `spec`.
*/
File getATestFile(File f) {
result = f.getParentContainer().getFile(getATestFileName(f))
}
/**
* Gets a name of a test file for `f`.
* That is, `<base>.<stemExt>.<ext>` where
* `f` is named `<base>.<ext>` and `<stemExt>` is
* a well-known test file identifier, such as `test` or `spec`.
*/
// Helper predicate factored out for performance.
// This predicate is linear in the size of f, and forces
// callers to join only once against f rather than two separate joins
// when computing the stem and the extension.
// This loses some flexibility because callers cannot specify
// an arbitrary stemExt.
pragma[nomagic]
private string getATestFileName(File f) {
result = f.getStem() + "." + ["test", "spec"] + "." + f.getExtension()
}
/**
* A Jest test, that is, an invocation of a global function named
* `test` where the first argument is a string and the second