This is a developer QoL improvement, where running codegen will skip
writing (and especially formatting) any files that were not changed.
**Why?** While code generation in itself was pretty much instant, QL
formatting of generated code was starting to take a long time. This made
unconditionally running codegen quite annoying, for example before each
test run as part of an IDE workflow or as part of the pre-commit hook.
**How?** This was not completely straightforward as we could not work
with the contents of the file prior to code generation as that was
already post-processed by the QL formatting, so we had no chance of
comparing the output of template rendering with that. We therefore store
the hashes of the files _prior_ to QL formatting in a checked-in file
(`swift/ql/.generated.list`). We can therefore load those hashes at
the beginning of code generation, use them to compare the template
rendering output and update them in this special registry file.
**What else?** We also extend this mechanism to detect accidental
modification of generated files in a more robust way. Before this patch,
we were doing it with a rough regexp based heuristic. Now, we just store
the hashes of the files _after_ QL formatting in the same checked file,
so we can check that and stop generation if a generated file was
modified, or a stub was modified without removing the `// generated`
header.
In those kinds of tests the results may have different final classes
that are not necessarily visible (or tested) solely through the string
representation. For better testing and reading of expected results,
`getQlPrimaryClasses` is added in these cases.
The information that was contained in `schema.yml` is now in
`swift/schema.py`, which allows a more integrated IDE experience
for writing and navigating it.
Another minor change is that `schema.Class` now has a `str` `group`
field instead of a `pathlib.Path` `dir` one.
This makes the result of code generation independent of the order
in which classes are defined in the schema, and makes additional
topological sorting not required.
Being independent from schema order will be important for reviewing the
move to a pure python schema, as generated code will be left untouched.
There was an issue in case multiple inheritance from classes with
children was involved, where indexes would overlap.
The generated code structure has been reshuffled a bit, with
`Impl::getImmediateChildOf<Class>` predicates giving 0-based children
for a given class, including those coming from bases, and the final
`Impl::getImmediateChild` disjuncting the above on final classes only.
This removes the need of `getMaximumChildrenIndex<Class>`, and also
removes the code scanning alerts.
Also, comments were fixed addressing the review.
It turns out the threshold of 5 lines for stub modification detection
was too strict: in case of a long class name the QL formatter will put
the closing brace of the empty class definition on a new line, leading
to codegen fail with an error thinking the stub was modified.
On the other side of things, also adding a base to a stub class was not
being detected as a modification.
Now the modification test is slightly smarter. If the stub still marked
as generated and
* has more than 6 lines, or
* the contents does not match a regexp aproximation of a plain stub
then codegen will abort. The test will still avoid reading the whole
contents of all the stubs.
If one modifies a QL stub but forgets to remove the `// generated`
header comment, codegen will now abort with an error rather than
silently reverting the change.
This is based on the rough heuristic of just counting the lines. If any
change is done to the stub class, the number of lines is bound to be
5 or more.
By explicitly marking children in the `schema.yml` file, an internal
`getAChild` predicate is implemented, that is in turn used in `AstNode`
to implement `getParent`.
This is yet to be used in the control flow library to replace the
hand-rolled implementation.
A further, more complex step is to use the same information to fully
generate the core implementation of `PrintAst` (including the
accessor string). This will be done later.
The `parent` tests use the same swift code as the extractor tests, and
this is currently enforced by `sync-files.py`. Notice that `qltest.sh`
had to be modified to deal with multiple files, which was not working
yet.
This is solving a papercut, where the C++ build was relying on the
local dbscheme file to be up-to-date, even if all the information for
building is actually in `schema.yml`. This made a pure C++ development
cycle with changes to `schema.yml` clumsy, as it required a further
dbscheme generation step.
Now for C++ the dbscheme is generated internally in the build files, and
thus a change in `schema.yml` is reflected immediately in the C++ build.
A `swift/codegen` step for checked in generated code (including the
dbscheme) is still required, but a developer can do it just before
running QL tests or committing, instead of during each C++
recompilation.
Some directory reorganization was also carried out, moving specific
generator modules to a new `generators` python package, and only leaving
the two drivers at the top level.
Properties marked with `predicate` in the schema are now accepted.
* in the dbscheme, they will translate to a table with a single `id`
column (and the table name will not be pluralized)
* in C++ classes, they will translate to `bool` fields
* in QL classes, they will translate to predicates
Closes https://github.com/github/codeql-c-team/issues/1016
Tests can be run with
```
bazel test //swift/codegen:tests
```
Coverage can be checked installing `pytest-cov` and running
```
pytest --cov=swift/codegen swift/codegen/test
```