Commit Graph

8937 Commits

Author SHA1 Message Date
tiferet
10b81eebb7 Improve EndpointTypes:
- Create two derived classes for EndpointType: SinkType and SourceType.
- EndpointTypes don't use a `newtype`, but rather extend string, with their characteristic predicate replacing the current getDescription predicate.
2023-03-14 12:49:30 -07:00
tiferet
91109c826d List the MaD provenance as "ai-generated" rather than "manual"
See https://github.com/github/codeql/pull/12228
2023-03-14 12:49:30 -07:00
tiferet
abe3a2dae1 Improve positive prompt examples:
Include only sinks that are arguments to an external API call, because these are the sinks we are most interested in.
2023-03-14 12:49:30 -07:00
tiferet
4db03cf4ae Remove IsMaDTaintStepCharacteristic for now because it's catching all our known sinks as well as taint steps 2023-03-14 12:49:30 -07:00
tiferet
f73b3e0d97 Add endpoint filters:
- Filter out MaD taint steps
2023-03-14 12:49:30 -07:00
tiferet
3b508f7879 Remove redundancy from ExceptionCharacteristic 2023-03-14 12:49:30 -07:00
tiferet
9b028476b8 Add endpoint filters:
- Filter out exceptions
- Filter out endpoints in test files
2023-03-14 12:49:30 -07:00
tiferet
24e01104a2 As part of the metadata extraction predicate, surface whether or not the argument is being passed to an external API 2023-03-14 12:49:29 -07:00
tiferet
8f6db6b244 Switch back to one sink type per supported query, rather than existing MaD kinds. 2023-03-14 12:49:29 -07:00
tiferet
d6c897c9fd Small bug fix for handling queries with multiple sink types:
`getAReasonSinkExcluded` excludes endpoints that have a characteristic that implies they're not sinks for this particular sink type _for every sink type relevant to this query_.
2023-03-14 12:49:29 -07:00
tiferet
8d8a21b100 Fix a bug that allowed some known sinks to end up as sink candidates for codex 2023-03-14 12:49:29 -07:00
tiferet
a27ae27101 In the MaD data, set the subtypes field to false for final classes / methods. 2023-03-14 12:49:29 -07:00
tiferet
4b6d1f7b78 Create a new class other sink:
See https://github.com/github/atm-codex/pull/3

- Add a sink type `OtherMaDSinkType`, and corresponding characteristic `OtherMaDSinkCharacteristic`, for other sinks modeled by a MaD `kind` but not belonging to any of the existing sink types.
- Extract positive prompt examples for the new sink type, together with the corresponding MaD `kind`.
2023-03-14 12:49:29 -07:00
tiferet
66c77e890c Bug fix 2023-03-14 12:49:29 -07:00
tiferet
be9c6500b8 In the MaD data, extract the argument index as an int rather than a string wrapped up in "Argument[]" 2023-03-14 12:49:29 -07:00
tiferet
831830831c Fix the MaD signature to the correct format 2023-03-14 12:49:29 -07:00
tiferet
ae69a2bcd9 Separate out the sink types to align with the MaD kinds that currently exist, adding a sink type for all sinks of a given query that are not currently mapped in the MaD kinds. 2023-03-14 12:49:29 -07:00
tiferet
65923ed2c1 Add support for multiple sink types per query 2023-03-14 12:49:29 -07:00
tiferet
a7269075e2 As part of the metadata extraction predicate, surface whether or not the callee is a public method 2023-03-14 12:49:29 -07:00
tiferet
d3a5ee53c6 Refactor the CodeQL code that extracts metadata for methods presented to Codex, to make it easy to add another field 2023-03-14 12:49:29 -07:00
tiferet
f32bb65c54 Refactor the CodeQL code that extracts metadata for methods presented to Codex, to make it easy to add another field 2023-03-14 12:49:29 -07:00
tiferet
633bfdba28 Make the endpoint filter to filter out flow steps in Java a bit broader, and document it 2023-03-14 12:49:28 -07:00
tiferet
db9cec6ea6 Add an endpoint filter to filter out flow steps 2023-03-14 12:49:28 -07:00
tiferet
ec5425d952 When extracting positive and negative examples for the Java prompt, extract the data used in the MaD extensible predicate.
This will enable the codex prompt to optionally use this data in additional columns.
2023-03-14 12:49:28 -07:00
tiferet
7666843316 Resolve two TODO items 2023-03-14 12:49:28 -07:00
tiferet
e06bcc3112 Exclude negative examples that are type access nodes.
These will never be on a flow path so they're not useful negative examples.
2023-03-14 12:49:28 -07:00
tiferet
3229b37436 Increase diversity of negative prompt examples by creating finer sub-types 2023-03-14 12:49:28 -07:00
tiferet
559570419d If a node satisfies the logic for both isSink and isSanitizer, don't include it as a positive or negative example in the prompt, because it's too ambiguous and will confuse the model. 2023-03-14 12:49:28 -07:00
tiferet
844171a28e Simplify the definition of ExtractPositiveExamples.ql 2023-03-14 12:49:28 -07:00
tiferet
ecf4d4dc02 Avoid accidentally extracting positive prompt examples when there is a codex-generated data extension file in java/ql/lib/ext 2023-03-14 12:49:28 -07:00
tiferet
0d4e85ff93 Add a predicate that finds endpoints with logically-inconsistent characteristics, and exclude such endpoints from both positive and negative examples extracted for the codex prompt. 2023-03-14 12:49:28 -07:00
tiferet
1211197914 Fix codeql-pack.lock.yml so it's not looking for an ML model 2023-03-14 12:49:28 -07:00
tiferet
41df8df182 Typo fix 2023-03-14 12:49:28 -07:00
tiferet
125245aa62 Delete TODO items that are done 2023-03-14 12:49:28 -07:00
tiferet
8bb2b2eaea Have each EndpointType keep track of the sink/source kind for this endpoint type as used in Models as Data 2023-03-14 12:49:28 -07:00
tiferet
27efe524da Fix the extraction of data for the data extension YML file. 2023-03-14 12:49:28 -07:00
tiferet
ae4668c488 Add data needed for the data extension YML file to ExtractSinkCandidatesWithFlow.ql: first pass. 2023-03-14 12:49:28 -07:00
tiferet
3987d8d374 Small update to SafeExternalApiMethodCharacteristic 2023-03-14 12:49:28 -07:00
tiferet
fd75952c1e Improvements to ExtractSinkCandidatesWithFlow.ql 2023-03-14 12:49:28 -07:00
tiferet
4db0dec82e Minor improvement 2023-03-14 12:49:28 -07:00
tiferet
a73b52adef Improvements to ExtractSinkCandidatesWithFlow.ql 2023-03-14 12:49:28 -07:00
tiferet
39a4513fcc Delete the queries the Java team isn't currently interested in boosting 2023-03-14 12:49:28 -07:00
tiferet
3c44332f17 Move isFlowLikelyInBaseQuery to the ATMConfig and delete AdaptiveThreatModeling.qll 2023-03-14 12:49:27 -07:00
tiferet
06c7f1012c Rename request forgery sink to server-side request forgery sink 2023-03-14 12:49:27 -07:00
tiferet
9421ba5303 Add and implementation of request forgery sinks and corresponding positive EndpointCharacteristic in Java 2023-03-14 12:49:27 -07:00
tiferet
f5109be2ac Bug fixes 2023-03-14 12:49:27 -07:00
tiferet
c14a4c4d93 Add an implementation of TaintedPathATM.qll and corresponding positive EndpointCharacteristic in Java 2023-03-14 12:49:27 -07:00
tiferet
4546dbe51b Subsample negative examples to 1% to prevent huge numbers. 2023-03-14 12:49:26 -07:00
tiferet
5d62dc3d2e Add a Java NotASinkCharacteristic safe external API method 2023-03-14 12:49:26 -07:00
tiferet
0acd06a6d3 Add queries to surface high-confidence Java sinks and non-sinks to use as examples in the codex prompt. 2023-03-14 12:49:26 -07:00