Commit Graph

218 Commits

Author SHA1 Message Date
Henry Mercer
a586be956e JS: Remove versions from packs we don't intend to publish 2022-02-02 18:10:57 +00:00
Henry Mercer
7018f6ad40 JS: Add missing @id for endpoint types query 2022-02-02 13:15:15 +00:00
Henry Mercer
fbcb8d6857 JS: Migrate CodeQL tests for ML-powered queries 2022-02-02 13:15:04 +00:00
Henry Mercer
14601316a5 JS: Autoformat 2022-02-01 17:08:21 +00:00
Henry Mercer
368839edfc JS: Fix QLDoc style in ExtractMisclassifiedEndpointFeatures.ql 2022-02-01 15:39:15 +00:00
Henry Mercer
db0b4fc463 JS: Add model building pack for ML-powered queries
Tests are currently still internal. They will be migrated to
`github/codeql` in a subsequent PR.
2022-02-01 15:03:26 +00:00
Ian Wright
83ecc065ab restrict size of strings 2022-01-31 12:28:46 +00:00
Ian Wright
aceeb7324c restrict AST nodes according to string length 2022-01-28 15:06:10 +00:00
Henry Mercer
70f7535988 JS: Move experimental notice to the bottom of the ML-powered query help
The Code Scanning UI shows just the first paragraph of the query help
as a summary, until a user chooses to expand the help.
We decided it was more useful to display the standard query help in this
summary compared to the experimental query notice, since there is
already a notice about experimental queries on the alert show page.
2022-01-25 15:52:09 +00:00
Henry Mercer
84907f91f1 JS: Fix copy/paste error in XSS ML-powered queries results patterns
We didn’t catch this because our unit tests test only library code due
to the previous difficulty of running queries with an ML model (the ML
models in packs work should fix that), and because the end-to-end
evaluation runs separate queries that have different result patterns.

Going forward we should create unit tests for the queries themselves,
which will require using the ML model in tests. We should also be able
to catch this type of error using DCA.
2022-01-21 15:17:52 +00:00
Henry Mercer
c134e6c9ef JS: Bump ML-powered query packs to v0.0.6 2022-01-19 14:40:42 +00:00
Henry Mercer
d467725ccd JS: Bump ML-powered query packs to v0.0.5 2022-01-19 12:08:33 +00:00
Henry Mercer
63672ca394 Merge pull request #7616 from github/henrymercer/js-atm-add-query-help
JS: Add query help for ML-powered queries
2022-01-18 18:11:53 +00:00
Henry Mercer
be0c26f83d Merge pull request #7617 from github/henrymercer/js-atm-update-alert-messages
JS: Update alert messages for ML-powered queries
2022-01-18 11:37:02 +00:00
Henry Mercer
1893b9f7a9 Merge pull request #7376 from github/henrymercer/js-atm-absent-features-optimization
JS: Update featurization for absent features optimization
2022-01-18 10:15:53 +00:00
Henry Mercer
ffa4135cbe JS: Update alert messages for ML-powered queries 2022-01-17 17:19:49 +00:00
Henry Mercer
e9128466d4 JS: Add query help for ML-powered queries
Query help is identical to the original query, except for a new
paragraph prepended to the overview explaining that the queries are
experimental.

We add Markdown query help since only Markdown query help is embedded in
SARIF via `--sarif-add-query-help`.
2022-01-17 16:34:50 +00:00
Henry Mercer
568d37e9b9 JS: Update definition of ATM query suite
It's simpler to just run all the queries in the pack instead of
specifying the IDs.
2022-01-17 16:34:50 +00:00
Henry Mercer
ed28b7f174 Merge pull request #7575 from github/henrymercer/atm-remove-code-to-features
JS: Remove ATM `CodeToFeatures` library
2022-01-14 15:31:34 +00:00
Henry Mercer
e9bb9f5294 JS: Update names, IDs, and tags for ML-powered queries 2022-01-13 17:45:40 +00:00
Henry Mercer
8e9d8c112d JS: Improve comments in FunctionBodyFeatures.qll 2022-01-13 17:20:42 +00:00
Henry Mercer
2aea3257cb JS: Improve documentation for getTokenizedAstNode 2022-01-13 17:20:41 +00:00
Henry Mercer
92d6fecc73 Optimize performance of body tokens
The refactoring to remove the `CodeToFeatures` AST reintroduced a
performance problem. This commit resolves it by pushing size
restrictions into intermediate predicates.
2022-01-13 16:29:04 +00:00
Henry Mercer
9abc3411a4 JS: Bump ATM pack versions to 0.0.4 2022-01-12 15:19:13 +00:00
Henry Mercer
7f61738a23 Use US English spelling 2022-01-12 13:07:09 +00:00
Henry Mercer
6e37a65e84 Remove CodeToFeatures AST library 2022-01-12 12:47:28 +00:00
Henry Mercer
957e34d8a7 Make function body features library independent of CodeToFeatures AST 2022-01-12 12:47:28 +00:00
Henry Mercer
9e50ce873d Move function body features into their own file 2022-01-12 12:47:28 +00:00
Henry Mercer
865fb5d0ef Migrate representative entity -> representative function 2022-01-12 12:47:27 +00:00
Henry Mercer
0e5b493d0e Remove CodeToFeatures AST consistency checks
We no longer use the `CodeToFeatures` AST, therefore these checks are
defunct.
2022-01-12 12:47:27 +00:00
Henry Mercer
387829bbb4 Extract body tokens from the JS AST, not the CodeToFeatures AST 2022-01-12 12:47:25 +00:00
Henry Mercer
3f70476c87 ATM: Optimize body tokens by pushing in size limit
Pushing the restriction to 256 tokens into the `bodyTokens` predicate
means we avoid this predicate blowing up due to very large functions.

This results in a runtime improvement from 1800s+ to 294s as measured
on a problematic repo on my machine (I didn't wait for the query to
finish running).
2022-01-11 16:16:54 +00:00
Andrew Eisenberg
6d62227576 Merge pull request #7431 from aeisenberg/aeisenberg/solorigate-publish
Solorigate: Extract to separate qlpack
2022-01-06 08:53:32 -08:00
Nick Rolfe
f18492e39b Merge pull request #7443 from github/nickrolfe/behavior
QL4QL: catch behaviour/behavior in ql/non-us-spelling
2021-12-20 13:23:53 +00:00
Henry Mercer
144ec8c629 JS: Update featurization for absent features optimization
Absent features are now represented implicitly by the absence of a row
in the `tokenFeatures` relation, rather than explicitly by an empty
string. This leads to improved runtime performance. To enable this
implicit representation, we pass the set of supported token features to
the `scoreEndpoints` HOP. Requires CodeQL CLI v2.7.4.
2021-12-17 18:04:42 +00:00
Henry Mercer
bebf4ca8fc Merge pull request #7357 from github/henrymercer/js-atm-only-featurize-with-flow
JS: Only featurize endpoints that are part of a flow path
2021-12-17 18:03:40 +00:00
Henry Mercer
055432530f Bump ATM pack version to 0.0.2 2021-12-17 16:49:59 +00:00
Henry Mercer
c1864531cd JS: Push FeaturizationConfig context into more predicates 2021-12-17 16:31:56 +00:00
Henry Mercer
383437c571 JS: Only featurize endpoints that are part of a flow path 2021-12-17 16:31:56 +00:00
Nick Rolfe
28912c508f Fix non-US spelling of 'behavior' 2021-12-17 15:29:31 +00:00
Andrew Eisenberg
50ee4ab330 Solorigate: Extract to separate qlpack
Extracts solorigate to separate qlpacks in preparation for
publishing them to the registry.
2021-12-16 16:09:20 -08:00
Ian Wright
1c79d1f985 Merge pull request #7352 from github/esbena/atm-endpoint-polish
ATM Endpoint filtering improvements
2021-12-14 08:19:23 +00:00
Esben Sparre Andreasen
1949a4e59a autoformat 2021-12-13 22:21:52 +01:00
Esben Sparre Andreasen
13288be7fc make ATM anti sink model for dojo.require 2021-12-10 15:07:51 +01:00
Esben Sparre Andreasen
cfd2dcffa0 recognize more modelled database accesses 2021-12-10 14:54:59 +01:00
Esben Sparre Andreasen
b0f6cf1491 expose more marsdb calls as database accesses 2021-12-10 13:46:19 +01:00
Esben Sparre Andreasen
10498c3643 treat jQuery as fully modelled 2021-12-10 12:51:45 +01:00
Esben Sparre Andreasen
a1ee900f50 treat Base64 manipulations as non-sinks 2021-12-10 12:37:44 +01:00
Henry Mercer
f08f07e19e JS: Improve handling of heuristic sinks in endpoint filters
Previously heuristic sinks were always included, to avoid us filtering
them out due to not being an argument to an external library call.
In this commit we move the argument to an external library call
filtering to the query-specific endpoint filters.
This lets us filter out heuristic sinks if they match one of the other
endpoint filters, reducing FPs.
2021-12-09 15:00:54 +00:00
Henry Mercer
322e39446d JS: Autoformat 2021-12-07 14:17:11 +00:00