This commit sets initial dbschemes for cpp, csharp, java, javascript, and
python so that automated testing for upgrade scripts would also cover legacy
upgrades.
Observed on `mozilla/bugbug` on the 2.8.0 CLI branch, we had the
following line in the timing report:
```
FullServerSideRequestForgery.ql-17:regex::RegexString::charSet_dispred#fff#antijoin_rhs ............... 1m13s
```
Inspecting the logs, we see the following join:
```
(644s) Tuple counts for regex::RegexString::charSet_dispred#fff#antijoin_rhs/5@f295d1bk after 1m13s:
1 ~0% {1} r1 = CONSTANT(unique string)["]"]
2389 ~4% {3} r2 = JOIN r1 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg1', (Rhs.2 'arg1' + 1)
668873 ~0% {6} r3 = JOIN r2 WITH regex::RegexString::char_set_start_dispred#fff ON FIRST 1 OUTPUT Lhs.0 'arg0', "]", Lhs.1 'arg1', Lhs.2 'arg2', Rhs.1 'arg3', Rhs.2 'arg4'
537501371 ~4% {7} r4 = JOIN r3 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_021#join_rhs ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.4 'arg3', Lhs.5 'arg4', "]", Rhs.2
269085087 ~0% {7} r5 = SELECT r4 ON In.6 > In.4 'arg4'
89583155 ~3% {7} r6 = SELECT r5 ON In.6 < In.1 'arg1'
89583155 ~26634% {5} r7 = SCAN r6 OUTPUT In.0 'arg0', In.1 'arg1', In.2 'arg2', In.3 'arg3', In.4 'arg4'
return r7
```
Now, this is problematic not just because of the large intermediary join
but also because of the large number of tuples being materialised at the
end. The culprit in this case turns out to be this bit of `charSet`:
```
not exists(int mid | this.nonEscapedCharAt(mid) = "]" | mid > inner_start and mid < inner_end)
```
Rewriting this to instead look for the minimum index at which a `]`
appears resulted in a much nicer join.
I also fixed up a similar issue surrounding the `\N` unicode escape.
Not that I think this will necessarily be relevant, but the `min`-based
solution is more robust either way.
Since this caused bad performance (as we had to evaluate points-to).
Fixes https://github.com/github/codeql/issues/6964
This approach was motivated by the comment on the issue from @tausbn:
> We discussed this internally in the CodeQL Python team, and have
> agreed that the best approach for now is to disable the printing of
> regex ASTs.
I tried to keep our RegExpTerm logic, but doing the fix below did not
work, and still evaluated RegExpTerm :| I guess we will just have to
revert this PR if we want it back
```diff
TRegExpTermNode(RegExpTerm term) {
+ none() and
exists(StrConst str | term.getRootTerm() = getParsedRegExp(str) and shouldPrint(str, _))
}
```