Going all the way to the AST layer seemed excessive to me, so I rewrote
it to do most of the logic at the data-flow layer. In principle this
_could_ result in more names being computed (due to splitting), but in
practice I don't expect this make a big difference.
Problematic part is
```codeql
/** A escape from string format with `markupsafe.Markup` as the format string. */
private class MarkupEscapeFromStringFormat extends MarkupSafeEscape, Markup::StringFormat {
override DataFlow::Node getAnInput() {
result in [this.getArg(_), this.getArgByName(_)] and
not result = Markup::instance()
}
override DataFlow::Node getOutput() { result = this }
}
```
since the char-pred still holds even if `getAnInput` has no results...
I will say that doing it this way feels kinda dirty, and we _could_ fix
this by including the logic in `getAnInput` in the char-pred as well.
But as I see it, that would just lead to a lot of code duplication,
which isn't very nice.
Since expectation tests had so many changes from ConceptsTest, I'm going
to do the changes for that on in a separate commit. The important part
is the changes to taint-tracking, which is highlighted in this commit.
The other approach felt a bit too much like specifying magic strings
that you had to get right. (crossing your fingers that no-one writes
`HTML` instead of `html`)
According to the official documentation, the purpose of `__main__.py`
files is that their presence in a package (say, `foo`) means one can
execute the package directly using `python -m foo` (which will run the
aforementioned `foo/__main__.py` file).
In principle this means that adding `if __name__ == "__main__"` in these
files is superfluous, as they are only intended to be executed (and not
imported by some other file).
However, in practice people often _do_ include the above construct.
Here are some instances of this on LGTM.com:
https://lgtm.com/query/7521266095072095777/
In particular, 10 out of 33 files in `cpython` have this construct.
This causes some confusion in our module naming, as we usually see the
presence of `__name__ == "__main__"` as an indication that a file may
be run directly (and hence with "absolute import" semantics). However,
when run with `python -m`, the interpreter uses the usual package
semantics, and this leads to modules getting multiple names.
For this reason, I think it makes sense to simply exclude `__main__.py`
files from consideration. Note that if there is a `#!` line mentioning
the Python interpreter, then they will still be included as entry
points.