Commit Graph

51324 Commits

Author SHA1 Message Date
tiferet
ec5425d952 When extracting positive and negative examples for the Java prompt, extract the data used in the MaD extensible predicate.
This will enable the codex prompt to optionally use this data in additional columns.
2023-03-14 12:49:28 -07:00
tiferet
7666843316 Resolve two TODO items 2023-03-14 12:49:28 -07:00
tiferet
e06bcc3112 Exclude negative examples that are type access nodes.
These will never be on a flow path so they're not useful negative examples.
2023-03-14 12:49:28 -07:00
tiferet
3229b37436 Increase diversity of negative prompt examples by creating finer sub-types 2023-03-14 12:49:28 -07:00
tiferet
559570419d If a node satisfies the logic for both isSink and isSanitizer, don't include it as a positive or negative example in the prompt, because it's too ambiguous and will confuse the model. 2023-03-14 12:49:28 -07:00
tiferet
844171a28e Simplify the definition of ExtractPositiveExamples.ql 2023-03-14 12:49:28 -07:00
tiferet
ecf4d4dc02 Avoid accidentally extracting positive prompt examples when there is a codex-generated data extension file in java/ql/lib/ext 2023-03-14 12:49:28 -07:00
tiferet
0d4e85ff93 Add a predicate that finds endpoints with logically-inconsistent characteristics, and exclude such endpoints from both positive and negative examples extracted for the codex prompt. 2023-03-14 12:49:28 -07:00
tiferet
1211197914 Fix codeql-pack.lock.yml so it's not looking for an ML model 2023-03-14 12:49:28 -07:00
tiferet
41df8df182 Typo fix 2023-03-14 12:49:28 -07:00
tiferet
125245aa62 Delete TODO items that are done 2023-03-14 12:49:28 -07:00
tiferet
8bb2b2eaea Have each EndpointType keep track of the sink/source kind for this endpoint type as used in Models as Data 2023-03-14 12:49:28 -07:00
tiferet
27efe524da Fix the extraction of data for the data extension YML file. 2023-03-14 12:49:28 -07:00
tiferet
ae4668c488 Add data needed for the data extension YML file to ExtractSinkCandidatesWithFlow.ql: first pass. 2023-03-14 12:49:28 -07:00
tiferet
3987d8d374 Small update to SafeExternalApiMethodCharacteristic 2023-03-14 12:49:28 -07:00
tiferet
fd75952c1e Improvements to ExtractSinkCandidatesWithFlow.ql 2023-03-14 12:49:28 -07:00
tiferet
4db0dec82e Minor improvement 2023-03-14 12:49:28 -07:00
tiferet
a73b52adef Improvements to ExtractSinkCandidatesWithFlow.ql 2023-03-14 12:49:28 -07:00
tiferet
39a4513fcc Delete the queries the Java team isn't currently interested in boosting 2023-03-14 12:49:28 -07:00
tiferet
3c44332f17 Move isFlowLikelyInBaseQuery to the ATMConfig and delete AdaptiveThreatModeling.qll 2023-03-14 12:49:27 -07:00
tiferet
06c7f1012c Rename request forgery sink to server-side request forgery sink 2023-03-14 12:49:27 -07:00
tiferet
9421ba5303 Add and implementation of request forgery sinks and corresponding positive EndpointCharacteristic in Java 2023-03-14 12:49:27 -07:00
tiferet
f5109be2ac Bug fixes 2023-03-14 12:49:27 -07:00
tiferet
c14a4c4d93 Add an implementation of TaintedPathATM.qll and corresponding positive EndpointCharacteristic in Java 2023-03-14 12:49:27 -07:00
tiferet
4546dbe51b Subsample negative examples to 1% to prevent huge numbers. 2023-03-14 12:49:26 -07:00
tiferet
5d62dc3d2e Add a Java NotASinkCharacteristic safe external API method 2023-03-14 12:49:26 -07:00
tiferet
0acd06a6d3 Add queries to surface high-confidence Java sinks and non-sinks to use as examples in the codex prompt. 2023-03-14 12:49:26 -07:00
tiferet
04abb87fef Rewrite ExtractSinkCandidatesWithFlow.ql as a problem query so we can run it with codeql database analyze to output SARIF results. 2023-03-14 12:49:26 -07:00
tiferet
5dc5c3fb3f Add a couple of endpoint filters for Java 2023-03-14 12:49:26 -07:00
tiferet
653b0128f5 Try implementing SqlInjectionATM.qll in Java 2023-03-14 12:49:26 -07:00
tiferet
c0f58371b4 Start making the additions needed to surface candidate Java sinks for codex classification outside the evaluator. 2023-03-14 12:49:26 -07:00
tiferet
cf289d57e9 Go back to the prompt of https://github.com/github/codeql-dca-main/issues/9475 2023-03-14 12:49:26 -07:00
tiferet
459050151a Give more explicit instructions in the codex prompt, but don't solicit rare sink types. 2023-03-14 12:49:26 -07:00
tiferet
01979aeb62 Give more explicit instructions in the codex prompt. 2023-03-14 12:49:26 -07:00
tiferet
ef95f4c419 Minor prompt improvements:
- Tell codex explicitly that this is JavaScript code
- Replace "Dataflow node" with "Code snippet"
2023-03-14 12:49:26 -07:00
tiferet
ac5434b3f3 Minor prompt improvements:
Remove spaces that break the code syntax or make for strange code styling.
2023-03-14 12:49:26 -07:00
tiferet
ce17d94f80 In-line predicates that are costing a lot of compute time 2023-03-14 12:49:26 -07:00
tiferet
bcc4cdd376 Add a test that can be used to determine the alerts codex will surface for each query. 2023-03-14 12:49:25 -07:00
tiferet
9aba7a0bca Bug fixes for things that interfere with using the codex model 2023-03-14 12:49:25 -07:00
tiferet
9a21539fca Add a test that can be used to determine how well codex reproduces the manual modeling for each sink type. 2023-03-14 12:49:25 -07:00
tiferet
d76d11bd27 Fix endpointScores 2023-03-14 12:49:25 -07:00
tiferet
4603a66411 Bug fix in selecting a node's location:
Locations only exist where there are locatable structures in the DB. Thus, select the largest location that contains the node and at most `neighborhoodSize` lines before and after the node.
2023-03-14 12:49:25 -07:00
tiferet
b130b2e82f Give endpoint types more intuitive names and then use those names directly in composing the codex prompt. 2023-03-14 12:49:25 -07:00
tiferet
94676ed713 Further improve the structure of endpoint scoring 2023-03-14 12:49:25 -07:00
tiferet
4ed57e71db Remove tokens from the prompt that the Java side can't handle 2023-03-14 12:49:25 -07:00
tiferet
12def779e6 Change the prompt to use sink names defined in EndpointType 2023-03-14 12:49:25 -07:00
tiferet
a6c01042eb Improve the structure of endpoint scoring 2023-03-14 12:49:25 -07:00
tiferet
fa36fc838b Pull in the prompt work from branch tiferet/codex-prompt 2023-03-14 12:49:25 -07:00
tiferet
09bf2218d4 Merge in aeisenberg/atm-codex 2023-03-14 12:49:24 -07:00
Harry Maclean
aaeb8a0aa0 Merge pull request #12493 from hmac/ar-sinks 2023-03-15 07:59:07 +13:00