Commit Graph

271 Commits

Author SHA1 Message Date
tiferet
99c47fa626 Delete CoreKnowledge.
All remaining functionality in `CoreKnowledge` is only being used in `EndpointCharacteristics`, so it can be moved there as a small set of helper predicates.
2022-11-18 16:06:15 -08:00
tiferet
f88bae36d6 FilteringReason is no longer being used and can be deleted 2022-11-18 16:06:15 -08:00
tiferet
ed59fc39f6 Remove redundant code
`isOtherModeledArgument` and `isArgumentToBuiltinFunction` contained the old logic for selecting negative endpoints for training.

These can now be deleted, and replaced by a single base class that collects all EndpointCharacteristics that are currently used to indicate negative training samples: `OtherModeledArgumentCharacteristic`.

This in turn lets us delete code from `StandardEndpointFilters` that effectively said that endpoints that are high-confidence non-sinks shouldn't be scored at inference time, either.
2022-11-18 16:06:14 -08:00
tiferet
8d22fd25f1 Suggestions from code review 2022-11-18 15:57:46 -08:00
tiferet
4a1382925e Remove some imports that are no longer used 2022-11-16 14:01:16 -08:00
tiferet
ccbf1ca2a9 Add a comment 2022-11-16 13:05:06 -08:00
tiferet
38c40a7192 isEffectiveSink can't be final because ExtractMisclassifiedEndpointFeatures overrides it. 2022-11-16 12:12:50 -08:00
tiferet
8fee9cb0d5 Fix CodeQL warnings 2022-11-16 12:06:52 -08:00
tiferet
c2035e85d2 Be explicit in requiring that each ATM config set its endpoint type. 2022-11-16 11:55:23 -08:00
tiferet
0fd013f9fd Update the reason names in FilteredTruePositives.expected.
This is needed because we changed the names of three endpoint filters that were all called "not a direct argument to a likely external library call or a heuristic sink" in order to disambiguate them (fc56c5a022).
2022-11-16 11:54:10 -08:00
tiferet
eab270eb84 Move the definitions of isEffectiveSink and getAReasonSinkExcluded to the base class.
They can now be implemented generically for all sink types.
2022-11-16 11:47:24 -08:00
tiferet
fc56c5a022 Implement the type-specific endpoint filters as EndpointCharacteristics.
Also disambiguate three filters from three different sink types that all have the same name, "not a direct argument to a likely external library call or a heuristic sink".
2022-11-16 11:14:25 -08:00
tiferet
13cb0ab554 Fix CodeQL warning 2022-11-15 17:32:30 -08:00
tiferet
2ecdfd1ff6 Delete some code that's no longer in use 2022-11-15 17:29:03 -08:00
tiferet
fedb98ddb5 Implement the standard getAReasonSinkExcluded using StandardEndpointFilterCharacteristics 2022-11-15 17:22:00 -08:00
tiferet
cf4e37a0ab Implement the standard endpoint filters as EndpointCharacteristics 2022-11-15 17:20:20 -08:00
tiferet
cb632b3534 Delete the file ExtractEndpointData.expected which was leftover in the last PR 2022-11-15 17:11:34 -08:00
Tiferet Gazit
710b215c38 Merge pull request #11263 from github/tiferet/extract-training-data
ATM: Extract training data
2022-11-15 12:08:13 -08:00
tiferet
fc078a47fd Apply suggestion from code review 2022-11-15 11:14:01 -08:00
Tiferet Gazit
092e019de9 Apply suggestions from code review
Co-authored-by: Stephan Brandauer <kaeluka@github.com>
2022-11-15 10:48:32 -08:00
Andrew Eisenberg
88750a7000 Add more information about ATM queries for external users 2022-11-15 10:17:56 -08:00
Stephan Brandauer
ec3578364e remove superfluous class in EndpointCharacteristics hierarchy 2022-11-15 10:17:38 +01:00
tiferet
9ecff0723c Fix non-ascii character in docs 2022-11-14 16:34:24 -08:00
tiferet
6b7612fed7 Fix import errors in DebugResultInclusion.ql 2022-11-14 15:33:46 -08:00
tiferet
b47723d607 Delete ExtractEndpointData.
Also remove the associated test files.
2022-11-14 14:57:59 -08:00
tiferet
9d7e7735d5 Extract training data:
Implement the new query that selects data for training. For now we include clauses that implement logic that is identical to the old queries.

Include a temporary wrapper query that converts the resulting data into the format expected by the endpoint pipeline.

Move the small pieces of `ExtractEndpointData` that are still needed into `ExtractEndpointDataTraining.qll`.
2022-11-14 14:33:08 -08:00
Tiferet Gazit
855eddab80 Merge pull request #11174 from github/tiferet/non-sink-endpoint-characteristics
Non-sink endpoint characteristics
2022-11-14 09:37:25 -08:00
github-actions[bot]
b5b69e9357 JS: Bump version of ML-powered library and query packs to 0.4.2 2022-11-11 12:48:00 +00:00
github-actions[bot]
3e5e695325 JS: Bump patch version of ML-powered library and query packs 2022-11-11 12:36:19 +00:00
tiferet
dbcdc2209e Use names constants for confidence levels 2022-11-09 14:25:08 -08:00
tiferet
b6532fa9a0 Fix QLDoc style warning 2022-11-09 13:10:54 -08:00
tiferet
243980ef73 Documentation improvements 2022-11-09 13:04:16 -08:00
Tiferet Gazit
6cb01a210f Apply suggestions from code review
Co-authored-by: Stephan Brandauer <kaeluka@github.com>
2022-11-09 12:53:52 -08:00
tiferet
ac14b6d685 Create EndpointCharacteristics to replace all existing NotASinkReasons and LikelyNotASinkReasons 2022-11-08 13:37:49 -08:00
tiferet
fadbdc1f63 Documentation improvements suggested by Andrew 2022-11-08 11:45:33 -08:00
github-actions[bot]
69df9f9daa JS: Bump version of ML-powered library and query packs to 0.4.1 2022-11-07 13:06:46 +00:00
github-actions[bot]
82277d8f56 JS: Bump minor version of ML-powered library and query packs 2022-11-07 13:00:28 +00:00
github-actions[bot]
268a990aa6 JS: Bump version of ML-powered model pack to 0.3.1 2022-11-07 13:00:28 +00:00
github-actions[bot]
a1e0bf022e ATM: Update model pack dependency of ML-powered model building and query packs 2022-11-07 13:00:27 +00:00
github-actions[bot]
be808deb59 JS: Bump minor version of ML-powered model pack 2022-11-07 12:59:44 +00:00
tiferet
833041c62e Fix QLDoc style errors 2022-11-04 09:30:31 -07:00
tiferet
2aa4651534 Remove predicates not yet used from the current PR 2022-11-04 09:30:31 -07:00
tiferet
74c8bfff4f Minor changes from code review 2022-11-04 09:30:31 -07:00
tiferet
e60c016fc6 Format fixes 2022-11-04 09:30:31 -07:00
tiferet
cbf81b8839 Improve the import structure 2022-11-04 09:30:31 -07:00
tiferet
300456cd3e Enforce the abstraction over characteristics:
Make the implementations of specific `EndpointCharacteristic`s private.
2022-11-04 09:30:31 -07:00
tiferet
c0cc754fb5 Rename ClassificationReasons
Change the name to EndpointCharacteristics.
2022-11-04 09:30:30 -07:00
tiferet
a4939b91e7 Generalize the definition of a known sink:
If the list of reasons includes positive indicators with maximal confidence for this class, it's a known sink for the class.

This negates the need for each query config to define the isKnownSink predicate individually.
2022-11-04 09:30:29 -07:00
tiferet
08bbe596a2 Create the sink ClassificationReasons
Write the reasons that indicate that an endpoint is a sink for each sink type.

Also fix import error.
2022-11-04 09:30:29 -07:00
Henry Mercer
3e863a539a ATM: Fix CodeQL pack workspace references
This fixes the
[ATM PR checks](https://github.com/github/codeql/actions/runs/3392995797/jobs/5639827326)
breaking on main as a result of
https://github.com/github/codeql/pull/11004.
2022-11-04 14:03:34 +00:00