Same pattern we've seen many times before: a field on an anonymous node
gets attached to the parent node instead.
I'm not 100% sure this is the right solution, but it seemed wrong to
just make `_parenthesized_type` named instead (we don't usually name
parentheticals). At the very least, this cleans up the spurious
navigation_expression.element and tuple_type_item.element fields.
Because `_type` was anonymous, its body was inlined in all of the places
it appeared. Because this body contained a `name` field, this field was
_also_ inlined. This caused a bunch of nodes to have spurious `name`
fields, and for some of them (that already had such a field) it caused
that field have multiplicity greater than one.
To fix this, we make the `_type` node named, which prevents the errant
field from escaping.
Adds a new type `nested_type_identifier`, which contains the
choice-branch that previously allowed those tokens to bleed through into
the closest parent field.
You know the drill. We just make an anonymous node named instead. In
this case, however, we have to be a bit more clever about how to rewrite
it. We turn the sequence of a type followed by an optional ! into a
_choice_ between mere type or type followed by bang (the latter being
our new named node).
Supertypes are a honking great idea. We should use more of them.
This massively cleans up the node types, without polluting the AST with
`expression` nodes.
Before, the `condition` field of an if statement supposedly could
contain things like parentheses and commas, due to bleeding from
referenced anonymous nodes. Making the node named makes this issue go
away.
We make _referenceable_operator a named node. This prevents it from
bleeding through to the _expression definition. It likely also makes the
output easier to deal with, as bare operators used as arguments now have
a named node wrapping them in the AST.
Also removes a duplicated inclusion of _comparison_operator that served
no purpose.
This caused any field containing an _expression to appear as if it could
countain any number of such nodes. It also threw away the information
that there was a `?` marker there.
To fix it, we simply move the definition into its own named node.
The astute reader will note that we seem to _lose_ some node types in
the process. Apparently, these were unreachable in the grammar, and the
newer version of tree-sitter removes such "dead code".
Changes based on code review:
1. Remove redundant strings.Contains check in isExactTestPackage
The equality check on the next line handles both cases, making
the early return unnecessary.
2. Extract package selection logic into selectBestPackages function
This reduces code duplication and allows the test to call the
actual implementation rather than copying the logic.
3. Add TestSelectBestPackages to test the new function
Comprehensive test covering single packages, test vs production,
exact vs nested tests, and multiple packages.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Generated by manually applying the output from CI's Gazelle check.
This adds the go_test target for the new extractor_test.go file.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This test verifies that root internal test files (package foo, not
foo_test) are correctly extracted when the repository has both:
1. Root-level internal tests (main_test.go with package main)
2. Nested packages with tests (nested/nested_test.go)
This scenario reproduces the bug that was fixed: the old extractor
would select the wrong package variant and miss root internal test
files.
The test ensures:
- main_test.go (root internal test) is extracted
- nested/nested_test.go (nested test) is extracted
- All test functions from both files are present in the database
This prevents regression of the bug fix.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
When CODEQL_EXTRACTOR_GO_OPTION_EXTRACT_TESTS=true is set, the Go
extractor was incorrectly skipping internal test files (package foo)
at repository roots when the project contains nested test packages.
Root Cause:
The extractor selected package variants by longest ID string, but this
heuristic fails when nested packages have tests. For a package like
"github.com/go-git/go-git/v6", packages.Load returns multiple variants:
1. "github.com/go-git/go-git/v6" (19 files, production only)
2. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6.test]"
(39 files, production + 20 root tests) ← Should select this
3. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6/plumbing/format/packfile.test]"
(19 files, test dependency) ← Was incorrectly selected (longest string)
The old logic selected variant #3 (76 chars) over #2 (68 chars),
causing 20 root test files to be missing from the database.
Fix:
Replace string length comparison with a better heuristic that prefers:
1. Exact test packages (e.g., "pkg [pkg.test]") over nested dependencies
2. Packages with more Syntax nodes (more files to extract)
3. String length as a tiebreaker
This ensures the extractor selects the variant with the most complete
test coverage, particularly for root-level internal tests.
Testing:
- Added comprehensive unit tests covering the selection logic
- Tests simulate the real-world go-git scenario
- All tests pass
Impact:
Root-level external tests (package foo_test) were already extracted
correctly. This fix ensures internal tests (package foo) at the root
are now also extracted when they exist alongside nested test packages.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Previously, apply_rules_inner snapshotted a node's fields by cloning
the BTreeMap into a Vec<(FieldId, Vec<Id>)>, then built a fresh
BTreeMap of new_fields for the rewritten Ids. For a node with N
fields, this allocated 2N+1 things per visit (the snapshot Vec, N
cloned children Vecs, the new BTreeMap entries) — even when nothing
in the subtree was rewritten.
Use std::mem::take to swap the parent's fields out by ownership: the
recursion can mutate the AST (including pushing new nodes from rule
firings) without any conflict, since we hold the owned BTreeMap
locally. Iterate values_mut() and only allocate a fresh children Vec
on the first divergence (lazy alloc): unchanged children stay in the
existing slot. When done, swap the fields back.
For a subtree with no rewrites, this is now zero allocations per node
(modulo the recursion itself). For nodes with rewrites, it's one Vec
allocation per field that contains a rewritten child, instead of two
plus the BTreeMap rebuild.
apply_rules_inner used to handle the "child was rewritten, so the
parent needs new field IDs" case by cloning the parent node, swapping
in the new fields, pushing the clone onto the arena, and returning the
new Id. Every ancestor on the path from the rewrite up to the root was
duplicated this way, with the originals retained as garbage in the
arena.
Switch to in-place mutation: assign `ast.nodes[id].fields = new_fields`
and return the same Id. Rule firings still produce genuinely new nodes
via BuildCtx (their structure differs from the input), but the
ancestor-rebuild spine no longer copies anything.
This is safe because apply_rules_inner already works entirely by Id:
the field snapshot is cloned out before recursing, no &Node references
are held across mutations of the arena, and captures are scoped to a
single rule firing so the now-stable Ids do not break anything.
Memory effect: a desugaring pass that rewrites R leaves of a tree of
average depth d previously appended R*d ancestor clones to the arena.
Now appends 0.
With Ids stable for the lifetime of an Ast, the Node::id field becomes
truly redundant and is removed (along with the Node::id() accessor).
AstCursor switches from caching `node: &Node` to tracking `node_id:
Id` and looking the node up via the arena on each access; ChildrenIter
now yields Ids directly. A new AstCursor::node_id() method gives
callers access to the cursor position by Id.
Previously, a bare child pattern in a query took whatever the next
child of the iterator was and either matched or failed: it would not
scan ahead to find a match. So `(foo ("baz"))` against a `foo` whose
implicit `child` field was `["bar", "baz"]` would fail (the pattern
took "bar" first).
Switch to forward-scan semantics: a SingleNode matcher advances through
the iterator until it finds a child that matches its sub-query. Patterns
that are named-only continue to skip past unnamed children for free.
Order is preserved across multiple bare patterns at the same level —
each pattern advances the shared iterator past whatever it consumed —
so a query cannot match children out of source order.
Captures from a failed match attempt are rolled back via a snapshot, so
partial captures from a complex sub-query do not leak across attempts.
Add two regression tests against the `do` body wrapper in a Ruby
for-loop, whose implicit `child` field contains [do, identifier, end]:
- a query for ("end") matches by skipping past `do` and the identifier
- a query for ("end") then ("do") fails, demonstrating order preservation
Schema::from_language registered unnamed kinds via or_insert(id), where
`id` came from iterating 0..node_kind_count. For names with multiple
unnamed IDs (notably "end" in tree-sitter-ruby has IDs 0 and 13, where
ID 0 is the reserved error token), this picked the first encountered
ID — typically the wrong one.
The visitor sets node.kind via language.id_for_node_kind(name, false),
which returns the canonical ID. So a query for ("end") would compare
node.kind=13 against schema=0 and silently fail to match, with no
diagnostic.
Use language.id_for_node_kind(name, false) to obtain the canonical ID
when registering, mirroring the named-kind path that already does the
same with id_for_node_kind(name, true).
Three improvements to the query parser, all aimed at allowing query
patterns to refer to unnamed tokens:
1. Bare-literal capture: `"=" @op` now captures the unnamed `=` token,
matching the parenthesized form `("=") @op`. Previously the literal
branch in parse_query_list skipped the maybe_wrap_capture call, so
the `@op` was a leftover token and would error.
2. Bare `_` matches any node, named or unnamed. Previously bare `_` and
`(_)` both produced QueryNode::Any with the same matches_named_only
behaviour, so bare `_` would skip unnamed children. Now Any carries a
match_unnamed flag: false for `(_)` (named-only, tree-sitter default)
and true for bare `_` (any node).
3. Named fields and bare child patterns may be intermixed in any order.
Previously, once parse_query_fields saw a bare pattern it would stop
accepting named fields. The fix accumulates bare patterns into the
implicit `child` field and keeps parsing.
Each named field independently selects its target field for matching, so
the source-order of fields in the query is purely cosmetic and intermixing
is safe.
Add tests covering parenthesized capture, bare-literal capture, and the
named-vs-any distinction between `(_)` and bare `_`. Update query-syntax
docs to reflect all three.
Extend the desugaring config from a single flat list of rules to an
ordered sequence of named Phases. Each phase runs to completion (a
full traversal applying its rules) before the next phase starts.
Rules in different phases never compete for matches.
The config is built via the new chainable API:
DesugaringConfig::new()
.add_phase("cleanup", cleanup_rules)
.add_phase("desugar", desugar_rules)
.with_output_node_types_yaml(yaml);
Single-phase configs are just .add_phase(...) called once.
A single FreshScope is shared across phases so generated identifier
names (e.g. $tmp-N) are unique throughout the run.
Phase names appear in error messages, e.g. "Phase `desugar`:
exceeded maximum rewrite depth".
Add two regression tests: one verifying basic two-phase chained
desugaring, and one verifying that errors include the failing phase
name.
Previously, after a rule fired the engine would always re-try that
same rule on the result root. A rule whose output matched its own
query (intentionally or by accident) would loop until the global
MAX_REWRITE_DEPTH safety net kicked in.
Make the default behavior fire-once-per-node: after a rule fires on
node N, the engine no longer tries that same rule on the result root.
Other rules and child traversal are unaffected. Rules that
intentionally rewrite iteratively can opt into the old behavior via
the new Rule::repeated() builder method.
Add two regression tests using a self-swapping assignment rule:
- with .repeated(), the swap loops and trips the depth limit
- without it (default), the swap fires once and terminates
Adds a regression test verifying that desugaring rules can chain across
output-only node kinds: a first rule rewrites an input kind to an
output-only kind, and a second rule then rewrites that output-only
kind into another output-only kind. This exercises the schema lookup
for query patterns whose root kind is not present in the input
tree-sitter grammar.
Add BUILD.bazel files for the yeast and yeast-macros crates, register
them as dependencies of the shared tree-sitter extractor, and refresh
the vendored crate dependencies via update_tree_sitter_extractors_deps.sh.
Language and LanguageSpec gain optional output_node_types field.
When set, the generator produces dbscheme/QL from the output types
and the extractor validates TRAP against them.
All existing extractors pass None (no behavior change).
Ruby extract() calls gain vec![] for the new rules parameter.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
extract() gains a rules parameter. When empty, uses tree-sitter native
traversal (no behavior change). When non-empty, runs yeast desugaring
and extracts via traverse_yeast.
Adds AstNode trait abstracting over tree_sitter::Node and yeast::Node,
with minimal changes to existing Visitor methods (Node -> &N in 6
signatures).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
12 tests covering parsing, queries, tree building, desugaring rules,
cursor navigation, and the shorthand rule! syntax.
Tests use a custom output node-types.yml with named fields for all
children (parameter, stmt, index), loaded via
schema_from_yaml_with_language.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Produces indented text showing node kinds, named fields, and leaf
content. Unnamed tokens are hidden unless inside a named field.
Used by tests for readable assertions.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Human-friendly YAML alternative to tree-sitter node-types.json with
three sections: supertypes, named, unnamed. Supports bidirectional
conversion and building Schema objects from YAML.
Includes CLI binary (node_types_yaml) and documentation.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per review feedback on #21741: File.canRead/canWrite/canExecute,
exists/isDirectory/isFile/isHidden only inspect a path, so move them
under the path-injection[read] sub-kind. Update TaintedPath.expected
and the experimental CWE-073 expected to match.
- shared/mad/codeql/mad/ModelValidation.qll: shorten the comment
for `path-injection[%]` to `// Java-only currently`, matching the
style of other language-scoped entries and dropping API examples
and the java/zipslip reference.
- java/ql/lib/semmle/code/java/security/ZipSlipQuery.qll: replace
the `File.exists` example in the QLDoc with `FileReader`, since
`File.exists` is still labelled plain `path-injection`, not
`path-injection[read]`.
Address review feedback by moving the shared method-name-based encryption/hash/digest
check into Sanitizers.qll, and reference it from both CleartextStorageQuery.qll and
SensitiveLoggingQuery.qll instead of duplicating the definition.
Address review feedback by introducing dedicated subclasses of
TrustBoundaryValidationSanitizer for SimpleTypeSanitizer, RegexpCheckBarrier,
and the HttpServletSession type check, so isBarrier only references the
abstract class.
Vercel API handlers more often return JSON than HTML, so res.send is
not the only response body sink that matters. Mirror Express's
ResponseJsonCall by also matching res.json(...) and res.jsonp(...) on
the response (direct and chained), and exercise the new behavior in
the library-test fixture.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CWE-089/untyped/vercel.ts fixture added in this PR introduces a
conn.query(...) call that DatabaseAccesses.ql reports, so its
.expected baseline needs the corresponding entry. Output produced by
`codeql test accept`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consider two implementations of the same trait to be siblings when the
type being implemented by one is an instantiation of the type being
implemented by the other.
The sink-model generator and the experimental java/file-path-injection
query now observe the new path-injection[read] sub-kind for the
FileInputStream and Files.copy source-argument models.
- CWE-073 FilePathInjection.expected: refresh the models table for the
renamed kind on FileInputStream(File); alerts unchanged.
- modelgenerator Sinks.java: update the inline sink annotation for
copyFileToDirectory(Path,Path,CopyOption[]) Argument[0] to the new
path-injection[read] sub-kind, mirroring the library change.
Introduce a new Models-as-Data sink sub-kind path-injection[read] for
models that only read from or inspect a path. The general
java/path-injection query and its PathInjectionSanitizer barrier
continue to consider both path-injection and path-injection[read]
sinks, so no alerts are lost. The java/zipslip query deliberately
selects only path-injection sinks, since read-only accesses such as
ClassLoader.getResource or FileInputStream are outside the archive
extraction threat model.
Addresses https://github.com/github/codeql/issues/21606 along the lines
proposed on the issue thread: prefer path-injection[read] over a
[create] sub-kind so that miscategorizing a sink causes a false
positive (easy to spot) rather than a false negative.
- shared/mad/codeql/mad/ModelValidation.qll: allow path-injection[...]
as a valid sink kind.
- java/ql/lib/ext/*.model.yml: relabel the models that PR #12916
migrated from the historical read-file kind (plus the newer
ClassLoader resource-lookup variants that share the same read-only
semantics).
- java/ql/lib/semmle/code/java/security/TaintedPathQuery.qll and
PathSanitizer.qll: select both path-injection and
path-injection[read] sinks/barriers.
- java/ql/lib/semmle/code/java/security/ZipSlipQuery.qll: keep only
path-injection, with a comment explaining why path-injection[read]
is excluded.
- java/ql/test/query-tests/security/CWE-022/semmle/tests/ZipTest.java:
add m7 regression covering the Dubbo-style classpath lookup from
issue #21606 and assert no alert is produced.
- Update TaintedPath.expected for the renamed kinds in the models list.
- Add change-notes under java/ql/lib/change-notes and
java/ql/src/change-notes.
Replace the five-way result = ... or result = ... disjunction with a
single equality on a set literal. Addresses the CodeQL style alert
"Use a set literal in place of or" reported by the self-scan on this
PR. Pure refactor, no semantic change.
`com.ctc.wstx.stax.WstxInputFactory` overrides `createXMLStreamReader`,
`createXMLEventReader` and `setProperty` from `XMLInputFactory`, so the
existing `XmlInputFactory` model in `XmlParsers.qll` does not match calls
where the static receiver type is `WstxInputFactory` (or its supertype
`org.codehaus.stax2.XMLInputFactory2`). Woodstox is vulnerable to XXE in
its default configuration, so these missed sinks were false negatives in
`java/xxe`.
This adds a scoped framework model under
`semmle/code/java/frameworks/woodstox/WoodstoxXml.qll` (registered in the
`Frameworks` module of `XmlParsers.qll`) that recognises these calls as
XXE sinks and treats the factory as safe when both
`javax.xml.stream.supportDTD` and
`javax.xml.stream.isSupportingExternalEntities` are disabled — mirroring
the existing `XMLInputFactory` safe-configuration logic.
Add the 'Publish data extension files in a CodeQL model pack to share'
section, matching the structure used in C#, C++, Go, and Java docs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
We won't be able to run these tests until Python 3.15 is actually out
(and our CI is using it), so it seemed easiest to just put them in their
own test directory.
Add barrierModel and barrierGuardModel sections to the Rust library
models documentation, following the pattern established in PR #21523
for other languages.
Includes:
- New extensible predicate descriptions in the overview
- Example: barrier for SQL injection using escape_sql
- Example: barrier guard for path injection using is_safe_path
- Reference material for both barrierModel and barrierGuardModel
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add documentation for customizing library models for Rust using data
extension files. This follows the pattern of existing documentation for
other languages (Java, Python, Ruby, Go, C#, C++, JavaScript).
The documentation covers:
- Rust-specific extensible predicates (sourceModel, sinkModel,
summaryModel, neutralModel) with their simplified schema
- Canonical path syntax for identifying Rust functions and methods
- Examples using real models from the codebase (sqlx, reqwest,
std::env, std::path, Iterator::map)
- Access path token reference (Argument, Parameter, ReturnValue,
Element, Field, Reference, Future)
- Source and sink kind reference
- Threat model integration
Also updates codeql-for-rust.rst to include the new page in the
toctree.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
First, we extend the various location overriding hacks to also accept
list and dict splats in various places. Having done this, we then have
to tackle how to actually desugar these new comprehension forms (as this
is what we currently do for the old forms).
As a reminder, a list comprehension like `[x for x in y]` currently gets
desugared into a small local function, something like
```python
def listcomp(a):
for x in a:
yield x
listcomp(y)
```
For `[*x for x in y]`, the behaviour we want is that we unpack `x`
before yielding its elements in turn. This is essentially what we would
get if we were to use `yield from x` instead of `yield x` in the above
desugaring, so that's what we do. This also works for set
comprehensions.
For dict comprehensions, it's slightly more complicated. Here, the
generator function instead yields a stream of `(key, value)` tuples.
(And apparently the old parser got this wrong and emitted `(value, key)`
pairs instead, which we faithfully recreated in the new parser as well.
We fix that bug in both parsers while we're at it). So, a bare `yield
from` is not enough, we also need a `.items()` call to get the
double-starred expression to emit its items as a stream of tuples (that
we then `yield from`.
To make this (hopefully) less verbose in the implementation, we defer
the decision of whether to use `yield` or `yield from` by introducing a
`yield_kind` scoped variable that determines the type of the actual AST
node. And of course for dict comprehensions with unpacking we need to
synthesise the extra machinery mentioned above.
On the plus side, this means we don't have to mess with control-flow, as
the existing machinery should be able to handle the desugared syntax
just fine.
Fixes US spelling (recognised -> recognized) across docs, QLDoc,
change note, and test fixture comments. Clarifies the handler QLDoc
to note sync/async support. Renames the supported-frameworks entry
from "vercel" to "Vercel (@vercel/node)" to avoid implying broader
platform coverage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the Vercel serverless handler detection to also match the
deprecated Zeit-era @now/node package with NowRequest/NowResponse
types. Per-review feedback from asgerf, these aliases still appear
in real-world code.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Before on Abseil Windows for `cpp/too-few-arguments:`:
```
Pipeline standard for TooFewArguments::isCompiledAsC/1#52fe29e8@994f9bgp was evaluated in 12 iterations totaling 2ms (delta sizes total: 50).
1198778 ~3% {1} r1 = JOIN `TooFewArguments::isCompiledAsC/1#52fe29e8#prev_delta` WITH `Element::Element.getFile/0#2b8c8740_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1
83 ~26% {1} | JOIN WITH includes ON FIRST 1 OUTPUT Rhs.1
50 ~4% {1} | AND NOT `TooFewArguments::isCompiledAsC/1#52fe29e8#prev`(FIRST 1)
return r1
```
After:
```
Pipeline standard for #File::File.getAnIncludedFile/0#dispred#e8d44cd1Plus#bf@b8d290i6 was evaluated in 11 iterations totaling 0ms (delta sizes total: 43).
47 ~0% {2} r1 = SCAN `#File::File.getAnIncludedFile/0#dispred#e8d44cd1Plus#bf#prev_delta` OUTPUT In.1, In.0
78 ~28% {2} | JOIN WITH `File::File.getAnIncludedFile/0#dispred#e8d44cd1` ON FIRST 1 OUTPUT Lhs.1, Rhs.1
43 ~0% {2} | AND NOT `#File::File.getAnIncludedFile/0#dispred#e8d44cd1Plus#bf#prev`(FIRST 2)
return r1
[2026-04-13 11:05:25] Evaluated non-recursive predicate TooFewArguments::isCompiledAsC/1#52fe29e8@4a3eb9jk in 0ms (size: 49).
Evaluated relational algebra for predicate TooFewArguments::isCompiledAsC/1#52fe29e8@4a3eb9jk with tuple counts:
1 ~0% {3} r1 = CONSTANT(unique int, unique string, unique string)[1,"compiled as c","1"]
1 ~0% {1} | JOIN WITH #fileannotationsMerge_1230#join_rhs ON FIRST 3 OUTPUT Rhs.3
48 ~0% {1} r2 = JOIN r1 WITH `#File::File.getAnIncludedFile/0#dispred#e8d44cd1Plus#bf` ON FIRST 1 OUTPUT Rhs.1
49 ~0% {1} r3 = r1 UNION r2
return r3
```
This adds a framework model for Vercel serverless functions so that
CodeQL's existing JavaScript security queries can detect vulnerabilities
in handlers of the form
export default function handler(req: VercelRequest, res: VercelResponse) { ... }
Handlers are identified as the default export of a module whose first
two parameters are typed as `VercelRequest`/`VercelResponse` from
`@vercel/node`. The default-export constraint excludes private helpers
that share the same signature. Type-based detection follows the same
pattern already used by `NextReqResHandler` in `Next.qll`.
The framework model covers:
- Route handler recognition (default-exported typed handlers only)
- Request input sources: `query`, `body`, `cookies`, and `url`
(the last inherited from Node's `IncomingMessage`)
- Named header accesses like `req.headers.host` and `req.headers.referer`,
modelled as `Http::RequestHeaderAccess` so header-specific queries fire
- Response sinks: `res.send`, `res.status(...).send`, `res.redirect`
- Header definitions via `res.setHeader`
Includes a library test exercising each model predicate (including a
negative case for private helpers) and query consistency fixtures
demonstrating end-to-end detection for js/reflected-xss,
js/request-forgery, js/sql-injection, and js/command-line-injection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a new `isLazy` predicate to the relevant classes, and adds the
relevant dbscheme (and up/downgrade) changes. On upgrades we do nothing,
and on downgrades we remove the `is_lazy` bits.
As defined in PEP-810. We implement this in much the same way as how we
handle `async` annotations currently. The relevant nodes get an
`is_lazy` field that defaults to being false.
The additions are unintentional, but the fault lies with the shared
SignAnalysis code. The removals are due to compile-time constant
initializers no longer having CFG nodes.
This allows us to build and test the extractor (for actual QL extraction
-- not just the extractor unit tests) entirely from within the
`github/codeql` repo, just as we do with Ruby. All that's needed is a
`--search-path` argument that points to the repo root.
At least in the case of function declarations there can be multiple
identical ones within the same module, causing data set check errors
if not differentiated.
Adds a dedicated test verifying that fields annotated with
@javax.validation.constraints.Pattern are recognized as sanitized
by RegexpCheckBarrier, in addition to the existing String.matches()
guard test.
secretQuestion is ambiguous: it could be the question text (not
sensitive) or a security question answer. Worse, the regex
secrets?(question) also matches secretQuestionAnswer, which is
clearly sensitive. Drop it to avoid false negatives.
The trust-boundary-violation query only recognized OWASP ESAPI validators
as sanitizers. ESAPI is rarely used in modern Java projects, while regex
validation via String.matches() and @javax.validation.constraints.Pattern
is the standard approach in Spring/Jakarta applications.
RegexpCheckBarrier already exists in Sanitizers.qll and is used by other
queries (e.g., RequestForgery). This wires it into TrustBoundaryConfig,
so patterns like input.matches("[a-zA-Z0-9]+") and @Pattern annotations
are recognized as sanitizers, consistent with the existing ESAPI treatment.
The sensitive-log query (CWE-532) lacked sanitizers for hashed or
encrypted data, while the sibling cleartext-storage query (CWE-312)
already recognized methods with "encrypt", "hash", or "digest" in their
names as sanitizers (CleartextStorageQuery.qll:86).
This adds an EncryptionBarrier to SensitiveLoggingQuery that applies the
same name-based heuristic, making the two queries consistent. Calls like
DigestUtils.sha256Hex(password) or hashPassword(secret) are no longer
flagged when their results are logged.
PathNormalizeSanitizer recognized Path.normalize() and
File.getCanonicalPath()/getCanonicalFile(), but not Path.toRealPath().
toRealPath() is strictly stronger than normalize() (resolves symlinks
and verifies file existence in addition to normalizing ".." components),
and is functionally equivalent to File.getCanonicalPath() for the NIO.2
API. CERT FIO16-J and OWASP both recommend it for path traversal defense.
This adds toRealPath to PathNormalizeSanitizer alongside normalize,
reducing false positives for code using idiomatic NIO.2 path handling.
The upstream repo (`okdshin/PicoSHA2`) is a personal GitHub account,
at risk of suspension — the same scenario that hit `rules_antlr`.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Clarify that arithmeticUsedInBoundsCheck applies to if-condition
comparisons, not all comparisons
- Update expected test line numbers to reflect added test calls
The java/tainted-arithmetic query now recognizes when an arithmetic
expression appears directly as an operand of a comparison (e.g.,
`if (off + len > array.length)`). Such expressions are bounds checks,
not vulnerable computations, and are excluded via the existing
overflowIrrelevant predicate.
Add test cases for bounds-checking patterns that should not be flagged.
- Model Signature.getInstance() as CryptoAlgoSpec sink (previously only
Signature constructor was modeled)
- Add HMAC-based algorithms (HMACSHA1/256/384/512, HmacSHA1/256/384/512)
and PBKDF2 to the secure algorithm whitelist
- Fix XDH/X25519/X448 tests to use KeyAgreement.getInstance() instead of
KeyPairGenerator.getInstance() to match their key agreement semantics
- Add test cases for SHA384withECDSA, HMACSHA*, and PBKDF2WithHmacSHA1
from user-reported false positives
- Update change note to document all additions
Looking at the results of the the previous DCA run, there was a bunch of
false positives where `bind` was being used with a `AF_UNIX` socket (a
filesystem path encoded as a string), not a `(host, port)` tuple. These
results should be excluded from the query, as they are not vulnerable.
Ideally, we would just add `.TupleElement[0]` to the MaD sink, except we
don't actually support this in Python MaD...
So, instead I opted for a more low-tech solution: check that the
argument in question flows from a tuple in the local scope.
This eliminates a bunch of false positives on `python/cpython` leaving
behind four true positive results.
This takes care of most of the false negatives from the preceding
commit.
Additionally, we add models for some known wrappers of `socket.socket`
from the `gevent` and `eventlet` packages.
Adds test cases from github/codeql#21582 demonstrating false negatives:
- Address stored in class attribute (`self.bind_addr`)
- `os.environ.get` with insecure default value
- `gevent.socket` (alternative socket module)
The `context` input is passed as a single array element through
`docker/actions-toolkit` and `@actions/exec` all the way to
`child_process.spawn()`, which does not perform shell splitting.
No code injection is possible.
Fixes https://github.com/github/codeql/issues/21428
The `allowed-endpoints` input only flows to `execFileSync("echo", [content])`
(no shell) and `fs.writeFileSync` (JSON config), neither of which is a
command injection vector.
Fixes https://github.com/github/codeql/issues/21568
SnakeYAML 2.3 has [a bug](https://bitbucket.org/snakeyaml/snakeyaml/issues/1098) where it crashes with an `IndexOutOfBoundsException` when a Unicode surrogate pair (e.g. an emoji) straddles the 1024 character internal buffer boundary. This happens because the high surrogate can end up as the last character in the data window, and the reader tries to read the low surrogate past the end of the buffer.
This caused languages that extract YAML, most notably JavaScript and Actions, to fail when the codebase contained a YAML file with an emoji at an unlucky position in the file.
These need to be looked at, but because data flow through default field
initialization is currently not working, let's postpone this as part of that
work.
Add `DOTNET_CLI_TELEMETRY_OPTOUT=1` to the minimal environment used for
all `dotnet` invocations. The telemetry is unnecessary and may even be
causing segfaults in some cases.
By limiting the results to the class that actually defines the `__del__`
method, we eliminate a bunch of FPs where a _subclass_ of such a class
would also get flagged.
On `wireshark` this reduces the intermediate tuple count from roughly 88
million tuples to roughly 3000 (with the new helper predicate
materialising ~300 tuples).
For module-level metaclass declarations, we now also check that the
right hand side in a `__metaclass__ = type` assignment is in fact the
built-in `type`.
These could arguably be moved to `Class` itself, but for now I'm
choosing to limit the changes to the `DuckTyping` module (until we
decide on a proper API).
This module (which for convenience currently resides inside
`DataFlowDispatch`, but this may change later) contains convenience
predicates for bridging the gap between the data-flow layer and the old
points-to analysis.
This way the changes do not alter the meaning of `UninitializedNode`.
In the meantime, the code still provides a specialized `Node` type
`IndirectUninitializedNode` to access the nodes behind levels of
indirection.
The Go libraries follow their own naming convention for "query
libraries". These need to be exempted from automatic `overlay[local?]`
annotations since otherwise it appears that too many predicates are
evaluated, possibly because of inadequate use of sentinels.
- Fix misplaced semicolons in test files (was inside comment, moved before it)
- Update QLdoc comments to reference new browser source kind names
- Update docs to list browser source kinds and fix outdated 'only remote' note
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The fix may look a bit obscure, so here's what's going on.
When we see `from . import helper`, we create an `ImportExpr` with level
equal to 1 (corresponding to the number of dots). To resolve such
imports, we compute the name of the enclosing package, as part of
`ImportExpr.qualifiedTopName()`. For this form of import expression, it
is equivalent to `this.getEnclosingModule().getPackageName()`. But
`qualifiedTopName` requires that `valid_module_name` holds for its
result, and this was _not_ the case for namespace packages.
To fix this, we extend `valid_module_name` to include the module names
of _any_ folder, not just regular package (which are the ones where
there's a `__init__.py` in the folder). Note that this doesn't simply
include all folders -- only the ones that result in valid module names
in Python.
Add test files with #!/usr/bin/env bun, #!/usr/bin/env tsx, and
#!/usr/bin/env node shebangs. The query lists extracted .ts files,
verifying that all three shebangs are recognized and the files are
not skipped by the extractor.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update the shebang regexp (renamed NODE_INVOCATION -> JS_INVOCATION) to
also match 'bun' and 'tsx' so that scripts using these runtimes are
correctly identified as JavaScript files.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This join had badness 1127 on the project FiacreT/M-moire, producing ~31
million tuples in order to end up with only ~27k tuples later in the
pipeline. With the fix, we reduce this by roughly the full 31 million
(the new materialised helper predicate accounting for roughly 130k
tuples on its own).
Co-authored-by: Mathias Vorreiter Pedersen <mathiasvp@github.com>
Adds `hasOverloadDecorator` as a predicate on functions. It looks for
decorators called `overload` or `something.overload` (usually
`typing.overload` or `t.overload`). These are then filtered out in the
predicates that (approximate) resolving methods according to the MRO.
As the test introduced in the previous commit shows, this removes the
spurious resolutions we had before.
Turns out in https://github.com/github/codeql/pull/21371 I was right
about `java_*` rules not relying on autoload anywhere, but it turns out
some `cc_*` rules still relied on autoload. This autoload is currently
configured in the internal repository, but we want to remove it
eventually. This patch:
* adds explicit loads to `rules_cc`
* removes an obsolete file (that depedency has its own bazel module
since some time, we just forgot to remove the old file)
https://github.com/github/codeql/pull/21276 worked together with the
internal changes but broke the standalone build of the Go extractor of
this repo in isolation.
The root cause was the lack of an auto-loaded `java_library` rule
definition. This fixes it.
I also checked this doesn't happen anywhere else.
The ones that no longer require points-to no longer import
`LegacyPointsTo`. The ones that do use the specific
`...MetricsWithPointsTo` classes that are applicable.
Moves the classes/predicates that _actually_ depend on points-to to the
`LegacyPointsTo` module, leaving behind a module that contains all of
the metrics-related stuff (line counts, nesting depth, etc.) that don't
need points-to to be evaluated.
Consequently, `Metrics` is now no longer a private import in
`python.qll`.
Update the Rust nightly toolchain from nightly/2025-08-01 to nightly/2026-01-22
(rustc 1.95.0-nightly), and rules_rust from 0.66.0 to 0.68.1.codeql.1.
The new nightly changed how stdlib metadata is distributed: .rlib files now
contain only a metadata stub, with full metadata in separate .rmeta files.
rules_rust's stdlib glob doesn't include *.rmeta, causing 'only metadata stub
found' errors. This is patched via a custom registry entry (0.68.1.codeql.1).
Upstream bug: https://github.com/bazelbuild/rules_rust/issues/3859
On `keras-team/keras`, this was producing ~200 million intermediate
tuples in order to produce a total of ... 2 tuples.
After the refactor, max intermediate tuple count is ~80k for the
charpred (and 4 for the new helper predicate).
These were causing the repo `gufolabs/noc` to spend ~30 seconds
evaluating `ControlFlowNode.strictlyDominates`. Just in case, I added
`overlay[caller] to the other instances of `pragma[inline]` as well.
Upgrade from 2.2.0-codeql.1 to 2.2.2-codeql.1 which includes:
- Fix Windows bzlmod builder classpath issue
- Move to official bazel worker api
This eliminates the need for --legacy_external_runfiles on Windows.
Also fix codegen templates to be included in runfiles.
rules_android has repository visibility issues with Bazel 9 when the
Android SDK is present. Since we don't use Android, disable detection
by setting ANDROID_HOME to empty.
Bazel 9 moves the C++ runfiles library from @bazel_tools to @rules_cc.
Update zipmerge_test.cpp:
- Change include from tools/cpp/runfiles to rules_cc/cc/runfiles
- Update namespace from bazel::tools::cpp::runfiles to rules_cc::cc::runfiles
Note: The BUILD.bazel dependency change is in a separate commit.
Add explicit load statements for java_library and java_test from
@rules_java//java:defs.bzl in:
- javascript/extractor/BUILD.bazel
- javascript/extractor/test/com/semmle/js/extractor/test/BUILD.bazel
Add +@rules_cc to --incompatible_autoload_externally to enable
graceful migration path for cc_* rule usages before all files
are updated with explicit imports.
rules_python 1.x requires explicit toolchain setup and no longer
auto-registers toolchains. Register Python 3.12 toolchain to ensure
Python tools work correctly with Bazel 9.
* Removed false positive injection sink models for the `context` input of `docker/build-push-action` and the `allowed-endpoints` input of `step-security/harden-runner`.
* Altered 2 patterns in the `poisonable_steps` modelling. Extra sinks are detected in the following cases: scripts executed via python modules and `go run` in directories are detected as potential mechanisms of injection. For the go execution pattern, the pattern is updated to now ignore flags that occur between go and the specific command. This change may lead to more results being detected by the following queries: `actions/untrusted-checkout/high`, `actions/untrusted-checkout/critical`, `actions/untrusted-checkout-toctou/high`, `actions/untrusted-checkout-toctou/critical`, `actions/cache-poisoning/poisonable-step`, `actions/cache-poisoning/direct-cache` and `actions/artifact-poisoning/path-traversal`.
* Removed false positive injection sink models for the `context` input of `docker/build-push-action` and the `allowed-endpoints` input of `step-security/harden-runner`.
* Fixed alert messages in `actions/artifact-poisoning/critical` and `actions/artifact-poisoning/medium` as they previously included a redundant placeholder in the alert message that would on occasion contain a long block of yml that makes the alert difficult to understand. Also improved the wording to make it clearer that it is not the artifact that is being poisoned, but instead a potentially untrusted artifact that is consumed. Finally, changed the alert location to be the source, to align more with other queries reporting an artifact (e.g. zipslip) which is more useful.
### Minor Analysis Improvements
* The query `actions/missing-workflow-permissions` no longer produces false positive results on reusable workflows where all callers set permissions.
GitHub workflows can be triggered through various repository events, including incoming pull requests (PRs) or comments on Issues/PRs. A potentially dangerous misuse of the triggers such as `pull_request_target` or `issue_comment` followed by an explicit checkout of untrusted code (Pull Request HEAD) may lead to repository compromise if untrusted code gets executed (e.g., due to a modified build script) in a privileged job.
GitHub workflows can be triggered through various repository events, including incoming pull requests (PRs) or comments on Issues/PRs. Under certain conditions described below, attackers can take over a repository by opening malicious PRs from forks. The attacks can result in malicious code execution causing unauthorized changes to the repository or exfiltration of repository secrets and a compromise of connected systems.
## Workflow Security Model
In GitHub Actions, there is a distinction between unprivileged and privileged workflows. For example, a workflow with a `pull_request` trigger is unprivileged while a workflow with `pull_request_target` is privileged.
This is relevant especially for PRs from forks. Normal PRs can only be submitted by people who have write access to a repository, while PRs from forks can be submitted by anyone.
On a PR from a fork, an unprivileged `pull_request` workflow has only limited capabilities but a privileged `pull_request_target` workflow is much more dangerous. A privileged workflow:
* Runs in the context of the base repository
* Has access to organization and repository secrets (e.g., API keys, deployment tokens)
* Has a read/write `GITHUB_TOKEN` by default
* Can access private resources
Certain triggers automatically grant a workflow elevated privileges:
*`pull_request_target` as described above
*`workflow_run`: Triggered when another workflow completes.
*`issue_comment`: Triggered when a comment is made on an issue or PR.
## Attack Details
* A repository has a privileged workflow
* An attacker forks the repository and adds malicious code (e.g., in the build script)
* The attacker opens a PR from the fork, and, if needed, comments on the PR
* The workflow in the base repository checks out the forked code
* The workflow runs, (e.g. the build script etc.), which contains the malicious code
Please note that not only build scripts can be malicious code vectors. There is a large number of other possibilities. Some of them are listed in the [LOTP](https://boostsecurityio.github.io/lotp/) catalog.
## Recommendation
@@ -133,3 +162,5 @@ jobs:
## References
- GitHub Security Lab Research: [Keeping your GitHub Actions and workflows secure Part 1: Preventing pwn requests](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/).
- Mitigating risks of untrusted checkout: [GitHub Docs](https://docs.github.com/en/enterprise-cloud@latest/actions/reference/security/secure-use#mitigating-the-risks-of-untrusted-code-checkout).
- Living Off the Pipeline: [LOTP](https://boostsecurityio.github.io/lotp/).
GitHub workflows can be triggered through various repository events, including incoming pull requests (PRs) or comments on Issues/PRs. A potentially dangerous misuse of the triggers such as `pull_request_target` or `issue_comment` followed by an explicit checkout of untrusted code (Pull Request HEAD) may lead to repository compromise if untrusted code gets executed (e.g., due to a modified build script) in a privileged job.
GitHub workflows can be triggered through various repository events, including incoming pull requests (PRs) or comments on Issues/PRs. Under certain conditions described below, attackers can take over a repository by opening malicious PRs from forks. The attacks can result in malicious code execution causing unauthorized changes to the repository or exfiltration of repository secrets and a compromise of connected systems.
## Workflow Security Model
In GitHub Actions, there is a distinction between unprivileged and privileged workflows. For example, a workflow with a `pull_request` trigger is unprivileged while a workflow with `pull_request_target` is privileged.
This is relevant especially for PRs from forks. Normal PRs can only be submitted by people who have write access to a repository, while PRs from forks can be submitted by anyone.
On a PR from a fork, an unprivileged `pull_request` workflow has only limited capabilities but a privileged `pull_request_target` workflow is much more dangerous. A privileged workflow:
* Runs in the context of the base repository
* Has access to organization and repository secrets (e.g., API keys, deployment tokens)
* Has a read/write `GITHUB_TOKEN` by default
* Can access private resources
Certain triggers automatically grant a workflow elevated privileges:
*`pull_request_target` as described above
*`workflow_run`: Triggered when another workflow completes.
*`issue_comment`: Triggered when a comment is made on an issue or PR.
## Attack Details
* A repository has a privileged workflow
* An attacker forks the repository and adds malicious code (e.g., in the build script)
* The attacker opens a PR from the fork, and, if needed, comments on the PR
* The workflow in the base repository checks out the forked code
* The workflow runs, (e.g. the build script etc.), which contains the malicious code
Please note that not only build scripts can be malicious code vectors. There is a large number of other possibilities. Some of them are listed in the [LOTP](https://boostsecurityio.github.io/lotp/) catalog.
## Recommendation
@@ -133,3 +162,5 @@ jobs:
## References
- GitHub Security Lab Research: [Keeping your GitHub Actions and workflows secure Part 1: Preventing pwn requests](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/).
- Mitigating risks of untrusted checkout: [GitHub Docs](https://docs.github.com/en/enterprise-cloud@latest/actions/reference/security/secure-use#mitigating-the-risks-of-untrusted-code-checkout).
- Living Off the Pipeline: [LOTP](https://boostsecurityio.github.io/lotp/).
GitHub workflows can be triggered through various repository events, including incoming pull requests (PRs) or comments on Issues/PRs. A potentially dangerous misuse of the triggers such as `pull_request_target` or `issue_comment` followed by an explicit checkout of untrusted code (Pull Request HEAD) may lead to repository compromise if untrusted code gets executed (e.g., due to a modified build script) in a privileged job.
GitHub workflows can be triggered through various repository events, including incoming pull requests (PRs) or comments on Issues/PRs. Under certain conditions described below, attackers can take over a repository by opening malicious PRs from forks. The attacks can result in malicious code execution causing unauthorized changes to the repository or exfiltration of repository secrets and a compromise of connected systems.
## Workflow Security Model
In GitHub Actions, there is a distinction between unprivileged and privileged workflows. For example, a workflow with a `pull_request` trigger is unprivileged while a workflow with `pull_request_target` is privileged.
This is relevant especially for PRs from forks. Normal PRs can only be submitted by people who have write access to a repository, while PRs from forks can be submitted by anyone.
On a PR from a fork, an unprivileged `pull_request` workflow has only limited capabilities but a privileged `pull_request_target` workflow is much more dangerous. A privileged workflow:
* Runs in the context of the base repository
* Has access to organization and repository secrets (e.g., API keys, deployment tokens)
* Has a read/write `GITHUB_TOKEN` by default
* Can access private resources
Certain triggers automatically grant a workflow elevated privileges:
*`pull_request_target` as described above
*`workflow_run`: Triggered when another workflow completes.
*`issue_comment`: Triggered when a comment is made on an issue or PR.
## Attack Details
* A repository has a privileged workflow
* An attacker forks the repository and adds malicious code (e.g., in the build script)
* The attacker opens a PR from the fork, and, if needed, comments on the PR
* The workflow in the base repository checks out the forked code
* The workflow runs, (e.g. the build script etc.), which contains the malicious code
Please note that not only build scripts can be malicious code vectors. There is a large number of other possibilities. Some of them are listed in the [LOTP](https://boostsecurityio.github.io/lotp/) catalog.
## Recommendation
@@ -133,3 +162,5 @@ jobs:
## References
- GitHub Security Lab Research: [Keeping your GitHub Actions and workflows secure Part 1: Preventing pwn requests](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/).
- Mitigating risks of untrusted checkout: [GitHub Docs](https://docs.github.com/en/enterprise-cloud@latest/actions/reference/security/secure-use#mitigating-the-risks-of-untrusted-code-checkout).
- Living Off the Pipeline: [LOTP](https://boostsecurityio.github.io/lotp/).
* Fixed help file descriptions for queries: `actions/untrusted-checkout/critical`, `actions/untrusted-checkout/high`, `actions/untrusted-checkout/medium`. Previously the messages were unclear as to why and how the vulnerabilities could occur.
* Fixed alert messages in `actions/artifact-poisoning/critical` and `actions/artifact-poisoning/medium` as they previously included a redundant placeholder in the alert message that would on occasion contain a long block of yml that makes the alert difficult to understand. Also improved the wording to make it clearer that it is not the artifact that is being poisoned, but instead a potentially untrusted artifact that is consumed. Finally, changed the alert location to be the source, to align more with other queries reporting an artifact (e.g. zipslip) which is more useful.
### Minor Analysis Improvements
* The query `actions/missing-workflow-permissions` no longer produces false positive results on reusable workflows where all callers set permissions.
| .github/workflows/artifactpoisoning11.yml:38:11:38:77 | ./sonarcloud-data/x.py build -j$(nproc) --compiler gcc --skip-build | .github/workflows/artifactpoisoning11.yml:13:9:32:6 | Uses Step | .github/workflows/artifactpoisoning11.yml:38:11:38:77 | ./sonarcloud-data/x.py build -j$(nproc) --compiler gcc --skip-build | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning11.yml:38:11:38:77 | ./sonarcloud-data/x.py build -j$(nproc) --compiler gcc --skip-build | ./sonarcloud-data/x.py build -j$(nproc) --compiler gcc --skip-build | .github/workflows/artifactpoisoning11.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning12.yml:38:11:38:25 | python foo/x.py | .github/workflows/artifactpoisoning12.yml:13:9:32:6 | Uses Step | .github/workflows/artifactpoisoning12.yml:38:11:38:25 | python foo/x.py | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning12.yml:38:11:38:25 | python foo/x.py | python foo/x.py | .github/workflows/artifactpoisoning12.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning21.yml:19:14:20:21 | sh foo/cmd\n | .github/workflows/artifactpoisoning21.yml:13:9:18:6 | Uses Step | .github/workflows/artifactpoisoning21.yml:19:14:20:21 | sh foo/cmd\n | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning21.yml:19:14:20:21 | sh foo/cmd\n | sh foo/cmd\n | .github/workflows/artifactpoisoning21.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning22.yml:18:14:18:19 | sh cmd | .github/workflows/artifactpoisoning22.yml:13:9:17:6 | Uses Step | .github/workflows/artifactpoisoning22.yml:18:14:18:19 | sh cmd | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning22.yml:18:14:18:19 | sh cmd | sh cmd | .github/workflows/artifactpoisoning22.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning31.yml:19:14:19:22 | ./foo/cmd | .github/workflows/artifactpoisoning31.yml:13:9:15:6 | Run Step | .github/workflows/artifactpoisoning31.yml:19:14:19:22 | ./foo/cmd | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning31.yml:19:14:19:22 | ./foo/cmd | ./foo/cmd | .github/workflows/artifactpoisoning31.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning32.yml:17:14:18:20 | ./bar/cmd\n | .github/workflows/artifactpoisoning32.yml:13:9:16:6 | Run Step | .github/workflows/artifactpoisoning32.yml:17:14:18:20 | ./bar/cmd\n | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning32.yml:17:14:18:20 | ./bar/cmd\n | ./bar/cmd\n | .github/workflows/artifactpoisoning32.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning33.yml:17:14:18:20 | ./bar/cmd\n | .github/workflows/artifactpoisoning33.yml:13:9:16:6 | Run Step | .github/workflows/artifactpoisoning33.yml:17:14:18:20 | ./bar/cmd\n | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning33.yml:17:14:18:20 | ./bar/cmd\n | ./bar/cmd\n | .github/workflows/artifactpoisoning33.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning34.yml:20:14:22:23 | npm install\nnpm run lint\n | .github/workflows/artifactpoisoning34.yml:13:9:16:6 | Run Step | .github/workflows/artifactpoisoning34.yml:20:14:22:23 | npm install\nnpm run lint\n | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning34.yml:20:14:22:23 | npm install\nnpm run lint\n | npm install\nnpm run lint\n | .github/workflows/artifactpoisoning34.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning41.yml:22:14:22:22 | ./foo/cmd | .github/workflows/artifactpoisoning41.yml:13:9:21:6 | Run Step | .github/workflows/artifactpoisoning41.yml:22:14:22:22 | ./foo/cmd | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning41.yml:22:14:22:22 | ./foo/cmd | ./foo/cmd | .github/workflows/artifactpoisoning41.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning42.yml:22:14:22:18 | ./cmd | .github/workflows/artifactpoisoning42.yml:13:9:21:6 | Run Step | .github/workflows/artifactpoisoning42.yml:22:14:22:18 | ./cmd | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning42.yml:22:14:22:18 | ./cmd | ./cmd | .github/workflows/artifactpoisoning42.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning71.yml:17:14:18:40 | sed -f config foo.md > bar.md\n | .github/workflows/artifactpoisoning71.yml:9:9:16:6 | Uses Step | .github/workflows/artifactpoisoning71.yml:17:14:18:40 | sed -f config foo.md > bar.md\n | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning71.yml:17:14:18:40 | sed -f config foo.md > bar.md\n | sed -f config foo.md > bar.md\n | .github/workflows/artifactpoisoning71.yml:4:5:4:16 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning81.yml:31:14:31:27 | python test.py | .github/workflows/artifactpoisoning81.yml:28:9:31:6 | Uses Step | .github/workflows/artifactpoisoning81.yml:31:14:31:27 | python test.py | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning81.yml:31:14:31:27 | python test.py | python test.py | .github/workflows/artifactpoisoning81.yml:3:5:3:23 | pull_request_target | pull_request_target |
| .github/workflows/artifactpoisoning92.yml:28:9:29:6 | Uses Step | .github/actions/download-artifact-2/action.yaml:6:7:25:4 | Uses Step | .github/workflows/artifactpoisoning92.yml:28:9:29:6 | Uses Step | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning92.yml:28:9:29:6 | Uses Step | Uses Step | .github/workflows/artifactpoisoning92.yml:3:3:3:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning92.yml:29:14:29:26 | make snapshot | .github/actions/download-artifact-2/action.yaml:6:7:25:4 | Uses Step | .github/workflows/artifactpoisoning92.yml:29:14:29:26 | make snapshot | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning92.yml:29:14:29:26 | make snapshot | make snapshot | .github/workflows/artifactpoisoning92.yml:3:3:3:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning96.yml:18:14:18:24 | npm install | .github/workflows/artifactpoisoning96.yml:13:9:18:6 | Uses Step | .github/workflows/artifactpoisoning96.yml:18:14:18:24 | npm install | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning96.yml:18:14:18:24 | npm install | npm install | .github/workflows/artifactpoisoning96.yml:2:3:2:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning101.yml:17:14:19:59 | PR_NUMBER=$(./get_pull_request_number.sh pr_number.txt)\necho "PR_NUMBER=$PR_NUMBER" >> $GITHUB_OUTPUT \n | .github/workflows/artifactpoisoning101.yml:10:9:16:6 | Uses Step | .github/workflows/artifactpoisoning101.yml:17:14:19:59 | PR_NUMBER=$(./get_pull_request_number.sh pr_number.txt)\necho "PR_NUMBER=$PR_NUMBER" >> $GITHUB_OUTPUT \n | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/artifactpoisoning101.yml:17:14:19:59 | PR_NUMBER=$(./get_pull_request_number.sh pr_number.txt)\necho "PR_NUMBER=$PR_NUMBER" >> $GITHUB_OUTPUT \n | PR_NUMBER=$(./get_pull_request_number.sh pr_number.txt)\necho "PR_NUMBER=$PR_NUMBER" >> $GITHUB_OUTPUT \n | .github/workflows/artifactpoisoning101.yml:4:3:4:21 | pull_request_target | pull_request_target |
| .github/workflows/test18.yml:36:15:40:58 | Uses Step | .github/workflows/test18.yml:12:15:33:12 | Uses Step | .github/workflows/test18.yml:36:15:40:58 | Uses Step | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/test18.yml:36:15:40:58 | Uses Step | Uses Step | .github/workflows/test18.yml:3:5:3:16 | workflow_run | workflow_run |
| .github/workflows/test25.yml:39:14:40:45 | ./gradlew buildScanPublishPrevious\n | .github/workflows/test25.yml:22:9:32:6 | Uses Step: downloadBuildScan | .github/workflows/test25.yml:39:14:40:45 | ./gradlew buildScanPublishPrevious\n | Potential artifact poisoning in $@, which may be controlled by an external user ($@). | .github/workflows/test25.yml:39:14:40:45 | ./gradlew buildScanPublishPrevious\n | ./gradlew buildScanPublishPrevious\n | .github/workflows/test25.yml:2:3:2:14 | workflow_run | workflow_run |
| .github/actions/download-artifact-2/action.yaml:6:7:25:4 | Uses Step | .github/actions/download-artifact-2/action.yaml:6:7:25:4 | Uses Step | .github/workflows/artifactpoisoning92.yml:28:9:29:6 | Uses Step | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning92.yml:3:3:3:14 | workflow_run | workflow_run |
| .github/actions/download-artifact-2/action.yaml:6:7:25:4 | Uses Step | .github/actions/download-artifact-2/action.yaml:6:7:25:4 | Uses Step | .github/workflows/artifactpoisoning92.yml:29:14:29:26 | make snapshot | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning92.yml:3:3:3:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning11.yml:13:9:32:6 | Uses Step | .github/workflows/artifactpoisoning11.yml:13:9:32:6 | Uses Step | .github/workflows/artifactpoisoning11.yml:38:11:38:77 | ./sonarcloud-data/x.py build -j$(nproc) --compiler gcc --skip-build | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning11.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning12.yml:13:9:32:6 | Uses Step | .github/workflows/artifactpoisoning12.yml:13:9:32:6 | Uses Step | .github/workflows/artifactpoisoning12.yml:38:11:38:25 | python foo/x.py | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning12.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning21.yml:13:9:18:6 | Uses Step | .github/workflows/artifactpoisoning21.yml:13:9:18:6 | Uses Step | .github/workflows/artifactpoisoning21.yml:19:14:20:21 | sh foo/cmd\n | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning21.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning22.yml:13:9:17:6 | Uses Step | .github/workflows/artifactpoisoning22.yml:13:9:17:6 | Uses Step | .github/workflows/artifactpoisoning22.yml:18:14:18:19 | sh cmd | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning22.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning31.yml:13:9:15:6 | Run Step | .github/workflows/artifactpoisoning31.yml:13:9:15:6 | Run Step | .github/workflows/artifactpoisoning31.yml:19:14:19:22 | ./foo/cmd | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning31.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning32.yml:13:9:16:6 | Run Step | .github/workflows/artifactpoisoning32.yml:13:9:16:6 | Run Step | .github/workflows/artifactpoisoning32.yml:17:14:18:20 | ./bar/cmd\n | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning32.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning33.yml:13:9:16:6 | Run Step | .github/workflows/artifactpoisoning33.yml:13:9:16:6 | Run Step | .github/workflows/artifactpoisoning33.yml:17:14:18:20 | ./bar/cmd\n | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning33.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning34.yml:13:9:16:6 | Run Step | .github/workflows/artifactpoisoning34.yml:13:9:16:6 | Run Step | .github/workflows/artifactpoisoning34.yml:20:14:22:23 | npm install\nnpm run lint\n | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning34.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning41.yml:13:9:21:6 | Run Step | .github/workflows/artifactpoisoning41.yml:13:9:21:6 | Run Step | .github/workflows/artifactpoisoning41.yml:22:14:22:22 | ./foo/cmd | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning41.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning42.yml:13:9:21:6 | Run Step | .github/workflows/artifactpoisoning42.yml:13:9:21:6 | Run Step | .github/workflows/artifactpoisoning42.yml:22:14:22:18 | ./cmd | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning42.yml:4:3:4:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning71.yml:9:9:16:6 | Uses Step | .github/workflows/artifactpoisoning71.yml:9:9:16:6 | Uses Step | .github/workflows/artifactpoisoning71.yml:17:14:18:40 | sed -f config foo.md > bar.md\n | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning71.yml:4:5:4:16 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning81.yml:28:9:31:6 | Uses Step | .github/workflows/artifactpoisoning81.yml:28:9:31:6 | Uses Step | .github/workflows/artifactpoisoning81.yml:31:14:31:27 | python test.py | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning81.yml:3:5:3:23 | pull_request_target | pull_request_target |
| .github/workflows/artifactpoisoning96.yml:13:9:18:6 | Uses Step | .github/workflows/artifactpoisoning96.yml:13:9:18:6 | Uses Step | .github/workflows/artifactpoisoning96.yml:18:14:18:24 | npm install | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning96.yml:2:3:2:14 | workflow_run | workflow_run |
| .github/workflows/artifactpoisoning101.yml:10:9:16:6 | Uses Step | .github/workflows/artifactpoisoning101.yml:10:9:16:6 | Uses Step | .github/workflows/artifactpoisoning101.yml:17:14:19:59 | PR_NUMBER=$(./get_pull_request_number.sh pr_number.txt)\necho "PR_NUMBER=$PR_NUMBER" >> $GITHUB_OUTPUT \n | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/artifactpoisoning101.yml:4:3:4:21 | pull_request_target | pull_request_target |
| .github/workflows/test18.yml:12:15:33:12 | Uses Step | .github/workflows/test18.yml:12:15:33:12 | Uses Step | .github/workflows/test18.yml:36:15:40:58 | Uses Step | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/test18.yml:3:5:3:16 | workflow_run | workflow_run |
| .github/workflows/test25.yml:22:9:32:6 | Uses Step: downloadBuildScan | .github/workflows/test25.yml:22:9:32:6 | Uses Step: downloadBuildScan | .github/workflows/test25.yml:39:14:40:45 | ./gradlew buildScanPublishPrevious\n | Potential artifact poisoning; the artifact being consumed has contents that may be controlled by an external user ($@). | .github/workflows/test25.yml:2:3:2:14 | workflow_run | workflow_run |
* A new predicate `getSwitchCase` was added to the `SwitchStmt` class, which yields the `n`th `case` statement from a `switch` statement.
* Data flow barriers and barrier guards can now be added using data extensions. For more information see [Customizing library models for C and C++](https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-cpp/).
### Minor Analysis Improvements
* Added taint flow models for the `Strsafe.h` header from the Windows SDK.
## 10.0.0
### Breaking Changes
* The deprecated `NonThrowingFunction` class has been removed, use `NonCppThrowingFunction` instead.
* The deprecated `ThrowingFunction` class has been removed, use `AlwaysSehThrowingFunction` instead.
### New Features
* Added a subclass `AutoconfConfigureTestFile` of `ConfigurationTestFile` that represents files created by GNU autoconf configure scripts to test the build configuration.
## 9.0.0
### Breaking Changes
* The `SourceModelCsv`, `SinkModelCsv`, and `SummaryModelCsv` classes and the associated CSV parsing infrastructure have been removed from `ExternalFlow.qll`. New models should be added as `.model.yml` files in the `ext/` directory.
### New Features
* Added a subclass `MesonPrivateTestFile` of `ConfigurationTestFile` that represents files created by Meson to test the build configuration.
* Added a class `ConstructorDirectFieldInit` to represent field initializations that occur in member initializer lists.
* Added a class `ConstructorDefaultFieldInit` to represent default field initializations.
* Added a class `DataFlow::IndirectParameterNode` to represent the indirection of a parameter as a dataflow node.
* Added a predicate `Node::asIndirectInstruction` which returns the `Instruction` that defines the indirect dataflow node, if any.
* Added a class `IndirectUninitializedNode` to represent the indirection of an uninitialized local variable as a dataflow node.
### Minor Analysis Improvements
* Added `HttpReceiveHttpRequest`, `HttpReceiveRequestEntityBody`, and `HttpReceiveClientCertificate` from Win32's `http.h` as remote flow sources.
* Added dataflow through members initialized via non-static data member initialization (NSDMI).
## 8.0.3
No user-facing changes.
## 8.0.2
No user-facing changes.
## 8.0.1
### Minor Analysis Improvements
* Inline expectations test comments, which are of the form `// $ tag` or `// $ tag=value`, are now parsed more strictly and will not be recognized if there isn't a space after the `$` symbol.
## 8.0.0
### Breaking Changes
* CodeQL version 2.24.2 accidentally introduced a syntactical breaking change to `BarrierGuard<...>::getAnIndirectBarrierNode` and `InstructionBarrierGuard<...>::getAnIndirectBarrierNode`. These breaking changes have now been reverted so that the original code compiles again.
*`MustFlow`, the inter-procedural must-flow data flow analysis library, has been re-worked to use parameterized modules. Like in the case of data flow and taint tracking, instead of extending the `MustFlowConfiguration` class, the user should now implement a module with the `MustFlow::ConfigSig` signature, and instantiate the `MustFlow::Global` parameterized module with the implemented module.
### Minor Analysis Improvements
* Refactored the "Year field changed using an arithmetic operation without checking for leap year" query (`cpp/leap-year/unchecked-after-arithmetic-year-modification`) to address large numbers of false positive results.
### Bug Fixes
* The `allowInterproceduralFlow` predicate of must-flow data flow configurations now correctly handles direct recursion.
* The deprecated `NonThrowingFunction` class has been removed, use `NonCppThrowingFunction` instead.
* The deprecated `ThrowingFunction` class has been removed, use `AlwaysSehThrowingFunction` instead.
### New Features
* Added a subclass `AutoconfConfigureTestFile` of `ConfigurationTestFile` that represents files created by GNU autoconf configure scripts to test the build configuration.
* A new predicate `getSwitchCase` was added to the `SwitchStmt` class, which yields the `n`th `case` statement from a `switch` statement.
* Data flow barriers and barrier guards can now be added using data extensions. For more information see [Customizing library models for C and C++](https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-cpp/).
### Minor Analysis Improvements
* Added taint flow models for the `Strsafe.h` header from the Windows SDK.
* CodeQL version 2.24.2 accidentally introduced a syntactical breaking change to `BarrierGuard<...>::getAnIndirectBarrierNode` and `InstructionBarrierGuard<...>::getAnIndirectBarrierNode`. These breaking changes have now been reverted so that the original code compiles again.
*`MustFlow`, the inter-procedural must-flow data flow analysis library, has been re-worked to use parameterized modules. Like in the case of data flow and taint tracking, instead of extending the `MustFlowConfiguration` class, the user should now implement a module with the `MustFlow::ConfigSig` signature, and instantiate the `MustFlow::Global` parameterized module with the implemented module.
### Minor Analysis Improvements
* Refactored the "Year field changed using an arithmetic operation without checking for leap year" query (`cpp/leap-year/unchecked-after-arithmetic-year-modification`) to address large numbers of false positive results.
### Bug Fixes
* The `allowInterproceduralFlow` predicate of must-flow data flow configurations now correctly handles direct recursion.
* Inline expectations test comments, which are of the form `// $ tag` or `// $ tag=value`, are now parsed more strictly and will not be recognized if there isn't a space after the `$` symbol.
* The `SourceModelCsv`, `SinkModelCsv`, and `SummaryModelCsv` classes and the associated CSV parsing infrastructure have been removed from `ExternalFlow.qll`. New models should be added as `.model.yml` files in the `ext/` directory.
### New Features
* Added a subclass `MesonPrivateTestFile` of `ConfigurationTestFile` that represents files created by Meson to test the build configuration.
* Added a class `ConstructorDirectFieldInit` to represent field initializations that occur in member initializer lists.
* Added a class `ConstructorDefaultFieldInit` to represent default field initializations.
* Added a class `DataFlow::IndirectParameterNode` to represent the indirection of a parameter as a dataflow node.
* Added a predicate `Node::asIndirectInstruction` which returns the `Instruction` that defines the indirect dataflow node, if any.
* Added a class `IndirectUninitializedNode` to represent the indirection of an uninitialized local variable as a dataflow node.
### Minor Analysis Improvements
* Added `HttpReceiveHttpRequest`, `HttpReceiveRequestEntityBody`, and `HttpReceiveClientCertificate` from Win32's `http.h` as remote flow sources.
* Added dataflow through members initialized via non-static data member initialization (NSDMI).
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.