tiferet
ec5425d952
When extracting positive and negative examples for the Java prompt, extract the data used in the MaD extensible predicate.
...
This will enable the codex prompt to optionally use this data in additional columns.
2023-03-14 12:49:28 -07:00
tiferet
7666843316
Resolve two TODO items
2023-03-14 12:49:28 -07:00
tiferet
e06bcc3112
Exclude negative examples that are type access nodes.
...
These will never be on a flow path so they're not useful negative examples.
2023-03-14 12:49:28 -07:00
tiferet
3229b37436
Increase diversity of negative prompt examples by creating finer sub-types
2023-03-14 12:49:28 -07:00
tiferet
559570419d
If a node satisfies the logic for both isSink and isSanitizer, don't include it as a positive or negative example in the prompt, because it's too ambiguous and will confuse the model.
2023-03-14 12:49:28 -07:00
tiferet
844171a28e
Simplify the definition of ExtractPositiveExamples.ql
2023-03-14 12:49:28 -07:00
tiferet
ecf4d4dc02
Avoid accidentally extracting positive prompt examples when there is a codex-generated data extension file in java/ql/lib/ext
2023-03-14 12:49:28 -07:00
tiferet
0d4e85ff93
Add a predicate that finds endpoints with logically-inconsistent characteristics, and exclude such endpoints from both positive and negative examples extracted for the codex prompt.
2023-03-14 12:49:28 -07:00
tiferet
1211197914
Fix codeql-pack.lock.yml so it's not looking for an ML model
2023-03-14 12:49:28 -07:00
tiferet
41df8df182
Typo fix
2023-03-14 12:49:28 -07:00
tiferet
125245aa62
Delete TODO items that are done
2023-03-14 12:49:28 -07:00
tiferet
8bb2b2eaea
Have each EndpointType keep track of the sink/source kind for this endpoint type as used in Models as Data
2023-03-14 12:49:28 -07:00
tiferet
27efe524da
Fix the extraction of data for the data extension YML file.
2023-03-14 12:49:28 -07:00
tiferet
ae4668c488
Add data needed for the data extension YML file to ExtractSinkCandidatesWithFlow.ql: first pass.
2023-03-14 12:49:28 -07:00
tiferet
3987d8d374
Small update to SafeExternalApiMethodCharacteristic
2023-03-14 12:49:28 -07:00
tiferet
fd75952c1e
Improvements to ExtractSinkCandidatesWithFlow.ql
2023-03-14 12:49:28 -07:00
tiferet
4db0dec82e
Minor improvement
2023-03-14 12:49:28 -07:00
tiferet
a73b52adef
Improvements to ExtractSinkCandidatesWithFlow.ql
2023-03-14 12:49:28 -07:00
tiferet
39a4513fcc
Delete the queries the Java team isn't currently interested in boosting
2023-03-14 12:49:28 -07:00
tiferet
3c44332f17
Move isFlowLikelyInBaseQuery to the ATMConfig and delete AdaptiveThreatModeling.qll
2023-03-14 12:49:27 -07:00
tiferet
06c7f1012c
Rename request forgery sink to server-side request forgery sink
2023-03-14 12:49:27 -07:00
tiferet
9421ba5303
Add and implementation of request forgery sinks and corresponding positive EndpointCharacteristic in Java
2023-03-14 12:49:27 -07:00
tiferet
f5109be2ac
Bug fixes
2023-03-14 12:49:27 -07:00
tiferet
c14a4c4d93
Add an implementation of TaintedPathATM.qll and corresponding positive EndpointCharacteristic in Java
2023-03-14 12:49:27 -07:00
tiferet
4546dbe51b
Subsample negative examples to 1% to prevent huge numbers.
2023-03-14 12:49:26 -07:00
tiferet
5d62dc3d2e
Add a Java NotASinkCharacteristic safe external API method
2023-03-14 12:49:26 -07:00
tiferet
0acd06a6d3
Add queries to surface high-confidence Java sinks and non-sinks to use as examples in the codex prompt.
2023-03-14 12:49:26 -07:00
tiferet
04abb87fef
Rewrite ExtractSinkCandidatesWithFlow.ql as a problem query so we can run it with codeql database analyze to output SARIF results.
2023-03-14 12:49:26 -07:00
tiferet
5dc5c3fb3f
Add a couple of endpoint filters for Java
2023-03-14 12:49:26 -07:00
tiferet
653b0128f5
Try implementing SqlInjectionATM.qll in Java
2023-03-14 12:49:26 -07:00
tiferet
c0f58371b4
Start making the additions needed to surface candidate Java sinks for codex classification outside the evaluator.
2023-03-14 12:49:26 -07:00
tiferet
cf289d57e9
Go back to the prompt of https://github.com/github/codeql-dca-main/issues/9475
2023-03-14 12:49:26 -07:00
tiferet
459050151a
Give more explicit instructions in the codex prompt, but don't solicit rare sink types.
2023-03-14 12:49:26 -07:00
tiferet
01979aeb62
Give more explicit instructions in the codex prompt.
2023-03-14 12:49:26 -07:00
tiferet
ef95f4c419
Minor prompt improvements:
...
- Tell codex explicitly that this is JavaScript code
- Replace "Dataflow node" with "Code snippet"
2023-03-14 12:49:26 -07:00
tiferet
ac5434b3f3
Minor prompt improvements:
...
Remove spaces that break the code syntax or make for strange code styling.
2023-03-14 12:49:26 -07:00
tiferet
ce17d94f80
In-line predicates that are costing a lot of compute time
2023-03-14 12:49:26 -07:00
tiferet
bcc4cdd376
Add a test that can be used to determine the alerts codex will surface for each query.
2023-03-14 12:49:25 -07:00
tiferet
9aba7a0bca
Bug fixes for things that interfere with using the codex model
2023-03-14 12:49:25 -07:00
tiferet
9a21539fca
Add a test that can be used to determine how well codex reproduces the manual modeling for each sink type.
2023-03-14 12:49:25 -07:00
tiferet
d76d11bd27
Fix endpointScores
2023-03-14 12:49:25 -07:00
tiferet
4603a66411
Bug fix in selecting a node's location:
...
Locations only exist where there are locatable structures in the DB. Thus, select the largest location that contains the node and at most `neighborhoodSize` lines before and after the node.
2023-03-14 12:49:25 -07:00
tiferet
b130b2e82f
Give endpoint types more intuitive names and then use those names directly in composing the codex prompt.
2023-03-14 12:49:25 -07:00
tiferet
94676ed713
Further improve the structure of endpoint scoring
2023-03-14 12:49:25 -07:00
tiferet
4ed57e71db
Remove tokens from the prompt that the Java side can't handle
2023-03-14 12:49:25 -07:00
tiferet
12def779e6
Change the prompt to use sink names defined in EndpointType
2023-03-14 12:49:25 -07:00
tiferet
a6c01042eb
Improve the structure of endpoint scoring
2023-03-14 12:49:25 -07:00
tiferet
fa36fc838b
Pull in the prompt work from branch tiferet/codex-prompt
2023-03-14 12:49:25 -07:00
tiferet
09bf2218d4
Merge in aeisenberg/atm-codex
2023-03-14 12:49:24 -07:00
Harry Maclean
aaeb8a0aa0
Merge pull request #12493 from hmac/ar-sinks
2023-03-15 07:59:07 +13:00