Commit Graph

307 Commits

Author SHA1 Message Date
tiferet
7eb04f8c32 Bug fix 2022-12-01 15:48:50 -08:00
tiferet
58bd6ac504 Fix implicit uses of this 2022-12-01 15:48:50 -08:00
tiferet
47b8f1420c Tokenize a neighborhood around the endpoint and properly create the last row of the codex prompt 2022-12-01 15:48:50 -08:00
tiferet
486752d19e Code improvements:
- Replace the containment logic with built in `Location` functionality.
- Generalize `tokenize` to output the tokens that fall within any location.
2022-12-01 15:48:50 -08:00
tiferet
a79bdf1cbc Generalize endpoint tokenization to work correctly across multiple lines 2022-12-01 15:48:50 -08:00
tiferet
3be6b42200 Add in Aditya's endpoint tokenization 2022-12-01 15:48:50 -08:00
tiferet
55839c8df2 small update 2022-12-01 15:48:50 -08:00
tiferet
0c87b25698 Typo fix 2022-12-01 15:48:50 -08:00
tiferet
33a8962f5a For now hardcode a training prompt string 2022-12-01 15:48:50 -08:00
tiferet
456aab0497 Make predicates private if they don't need to be public 2022-12-01 15:48:50 -08:00
tiferet
1919206e1e Start adding the codex prompt feature 2022-12-01 15:48:50 -08:00
tiferet
ad13f5585d Extract only a single feature, the codex prompt for the current endpoint. 2022-12-01 15:48:50 -08:00
tiferet
4a6de3e444 Apply suggestion from code review 2022-11-30 17:25:19 -08:00
tiferet
a0a742eb82 Rename predicates to fit style guide:
- `getEndpoints` → `appliesToEndpoint`
- `getImplications` → `hasImplications`
- `getAlerts` → `hasAlert`
2022-11-30 17:01:56 -08:00
tiferet
c5184d37e7 Suggestion from code review:
Name the query configuration e.g. `NosqlInjectionATMConfig` rather than `Configuration`.
2022-11-29 15:46:05 -08:00
tiferet
6f807e9d43 Doc suggestion from code review 2022-11-29 13:20:47 -08:00
tiferet
75cd7a9ebc Remove code duplication in query .ql files:
Define the query for finding ATM alerts in the base class `AtmConfig`, and call it from each query's .ql file.
2022-11-29 13:20:47 -08:00
tiferet
a710b723d1 Move the definition of isSink to the base class:
Holds if `sink` is a known taint sink or an "effective" sink.
2022-11-29 13:20:47 -08:00
tiferet
cd24ec88d6 Move the definition of isSource to the base class:
A long as we're not boosting sources, `isSource` is identical to `isKnownSource`.
2022-11-29 13:20:47 -08:00
tiferet
50291c7b7c AtmConfig inherits from TaintTracking::Configuration.
That way the specific configs which inherit from `AtmConfig` also inherit from `TaintTracking::Configuration`.

This removes the need for two separate config classes for each query.
2022-11-29 13:20:47 -08:00
tiferet
05a943c9b5 Delete StandardEndpointFilters.
All remaining functionality in `StandardEndpointFilters` is only being used in `EndpointCharacteristics`, so it can be moved there as a small set of helper predicates.
2022-11-29 13:20:47 -08:00
tiferet
5402f047bf Delete CoreKnowledge.
All remaining functionality in `CoreKnowledge` is only being used in `EndpointCharacteristics`, so it can be moved there as a small set of helper predicates.
2022-11-29 13:20:47 -08:00
tiferet
1d4b2ccab4 Merge branch 'main' into tiferet/complexity-reduction 2022-11-29 12:47:18 -08:00
Tiferet Gazit
f375b0cc1b Merge pull request #11281 from github/tiferet/endpoint-filters
ATM: Implement the current endpoint filters as EndpointCharacteristics
2022-11-29 12:38:12 -08:00
tiferet
4580b55673 Oops -- forgot to stage one file in the previous commit :) 2022-11-28 11:34:34 -08:00
tiferet
210644e87d Delete StandardEndpointFilters.
All remaining functionality in `StandardEndpointFilters` is only being used in `EndpointCharacteristics`, so it can be moved there as a small set of helper predicates.
2022-11-28 11:34:34 -08:00
tiferet
15121931b4 Delete CoreKnowledge.
All remaining functionality in `CoreKnowledge` is only being used in `EndpointCharacteristics`, so it can be moved there as a small set of helper predicates.
2022-11-28 11:34:34 -08:00
tiferet
1c679378e7 FilteringReason is no longer being used and can be deleted 2022-11-28 11:34:33 -08:00
tiferet
99de397a5f Remove redundant code
`isOtherModeledArgument` and `isArgumentToBuiltinFunction` contained the old logic for selecting negative endpoints for training.

These can now be deleted, and replaced by a single base class that collects all EndpointCharacteristics that are currently used to indicate negative training samples: `OtherModeledArgumentCharacteristic`.

This in turn lets us delete code from `StandardEndpointFilters` that effectively said that endpoints that are high-confidence non-sinks shouldn't be scored at inference time, either.
2022-11-28 11:34:33 -08:00
tiferet
7b0269c999 Fix British spelling that code scanning didn't like.
I've been working with Brits for too long :)
2022-11-28 11:28:08 -08:00
tiferet
963407de4c Update the documentation 2022-11-28 11:16:06 -08:00
Henry Mercer
56e5f01ce0 Merge branch 'main' into codeql-ci/atm/release-0.4.2 2022-11-24 14:41:49 +00:00
github-actions[bot]
78d49e44b1 JS: Bump version of ML-powered library and query packs to 0.4.3 2022-11-24 14:22:14 +00:00
github-actions[bot]
8d96bfe973 JS: Bump patch version of ML-powered library and query packs 2022-11-24 14:18:13 +00:00
Erik Krogh Kristensen
1eec067474 Merge pull request #11294 from erik-krogh/fileDoc
QL: improve the "this block-comment should have been a QLDoc"-query
2022-11-23 22:23:36 +01:00
tiferet
03b8e649f1 Filter endpoints by confidence
Select endpoints to score at inference time base purely on their confidence level, and not on whether they fit the historical definition of endpoint filters.
2022-11-23 10:46:27 -08:00
Henry Mercer
3b69821630 ATM: Add descriptions to ML-powered packs 2022-11-23 10:46:23 +00:00
tiferet
1c9545e49a Address comment from code review:
Make `SyntacticHeuristics` an explicit import
2022-11-21 08:00:31 -08:00
tiferet
8d22fd25f1 Suggestions from code review 2022-11-18 15:57:46 -08:00
tiferet
4a1382925e Remove some imports that are no longer used 2022-11-16 14:01:16 -08:00
tiferet
ccbf1ca2a9 Add a comment 2022-11-16 13:05:06 -08:00
tiferet
38c40a7192 isEffectiveSink can't be final because ExtractMisclassifiedEndpointFeatures overrides it. 2022-11-16 12:12:50 -08:00
tiferet
8fee9cb0d5 Fix CodeQL warnings 2022-11-16 12:06:52 -08:00
tiferet
c2035e85d2 Be explicit in requiring that each ATM config set its endpoint type. 2022-11-16 11:55:23 -08:00
tiferet
0fd013f9fd Update the reason names in FilteredTruePositives.expected.
This is needed because we changed the names of three endpoint filters that were all called "not a direct argument to a likely external library call or a heuristic sink" in order to disambiguate them (fc56c5a022).
2022-11-16 11:54:10 -08:00
tiferet
eab270eb84 Move the definitions of isEffectiveSink and getAReasonSinkExcluded to the base class.
They can now be implemented generically for all sink types.
2022-11-16 11:47:24 -08:00
tiferet
fc56c5a022 Implement the type-specific endpoint filters as EndpointCharacteristics.
Also disambiguate three filters from three different sink types that all have the same name, "not a direct argument to a likely external library call or a heuristic sink".
2022-11-16 11:14:25 -08:00
erik-krogh
9eaeaf7322 ATM: convert some block-comments that could be QLDoc to QLDoc 2022-11-16 13:41:52 +01:00
tiferet
13cb0ab554 Fix CodeQL warning 2022-11-15 17:32:30 -08:00
tiferet
2ecdfd1ff6 Delete some code that's no longer in use 2022-11-15 17:29:03 -08:00