Problematic part is
```codeql
/** A escape from string format with `markupsafe.Markup` as the format string. */
private class MarkupEscapeFromStringFormat extends MarkupSafeEscape, Markup::StringFormat {
override DataFlow::Node getAnInput() {
result in [this.getArg(_), this.getArgByName(_)] and
not result = Markup::instance()
}
override DataFlow::Node getOutput() { result = this }
}
```
since the char-pred still holds even if `getAnInput` has no results...
I will say that doing it this way feels kinda dirty, and we _could_ fix
this by including the logic in `getAnInput` in the char-pred as well.
But as I see it, that would just lead to a lot of code duplication,
which isn't very nice.
Since expectation tests had so many changes from ConceptsTest, I'm going
to do the changes for that on in a separate commit. The important part
is the changes to taint-tracking, which is highlighted in this commit.
The other approach felt a bit too much like specifying magic strings
that you had to get right. (crossing your fingers that no-one writes
`HTML` instead of `html`)
According to the official documentation, the purpose of `__main__.py`
files is that their presence in a package (say, `foo`) means one can
execute the package directly using `python -m foo` (which will run the
aforementioned `foo/__main__.py` file).
In principle this means that adding `if __name__ == "__main__"` in these
files is superfluous, as they are only intended to be executed (and not
imported by some other file).
However, in practice people often _do_ include the above construct.
Here are some instances of this on LGTM.com:
https://lgtm.com/query/7521266095072095777/
In particular, 10 out of 33 files in `cpython` have this construct.
This causes some confusion in our module naming, as we usually see the
presence of `__name__ == "__main__"` as an indication that a file may
be run directly (and hence with "absolute import" semantics). However,
when run with `python -m`, the interpreter uses the usual package
semantics, and this leads to modules getting multiple names.
For this reason, I think it makes sense to simply exclude `__main__.py`
files from consideration. Note that if there is a `#!` line mentioning
the Python interpreter, then they will still be included as entry
points.
Just like our support for the `PyYAML` PyPI package that you import with
`import yaml` is in `Yaml.qll`.
Since this file does not provide any public predicates/modules, it
should be safe to rename it.
So it better matches what is in `py/code-injection`. I had my doubts
about CWE-95, but after reading
https://owasp.org/www-community/attacks/Direct_Dynamic_Code_Evaluation_Eval%20Injection
I think it's fine to add CWE-95 as well 👍
Definitions are:
CWE-78: Improper Neutralization of Special Elements used in an OS
Command ('OS Command Injection')
CWE-94: Improper Control of Generation of Code ('Code Injection')
CWE-95: Improper Neutralization of Directives in Dynamically Evaluated
Code ('Eval Injection')
This seems like something we have been missing for a while now, so I
figured it might be useful to add. It is roughly based on the JavaScript
equivalent, with one major difference: in the JavaScript libraries,
`getAMethodCall` is reserved for syntactic method calls (`obj.m(...)`)
whereas `getAMemberInvocation` is used for both this and the case where
the bound method `obj.m` is stored in a temporary variable and then
subsequently invoked in the same local scope.
It seems to me that the more general predicate is more useful, and hence
should have the simpler name. (And also we don't really work with a
notion of "invocation" in the Python libraries, so we would need a
better name for it anyway.)
I think as long as the documentation makes the behaviour clear, it
should be okay.
This was slightly messier than anticipated, as I hadn't accounted for
the dozen uses of `startInAttr` in our codebase. To circumvent this,
I decided to put the type tracking implementation in the `internal`
directory, and wrap it with a file that ensures the old interface still
works.