80 Commits

Author SHA1 Message Date
Tom Hvitved
cee6f003fd Tree-sitter: Split up ast_node_info table into two tables 2024-03-19 10:52:37 +01:00
Rasmus Wriedt Larsen
07223031e8 Merge branch 'main' into lgtm_index_filter_handling 2024-02-26 09:56:02 +01:00
Nick Rolfe
514a92d5bd Tree-sitter extractors: use fresh IDs for locations
Since locations for any given source file are never referenced in any
TRAP files besides the one for that particular source file, it's not
necessary to use global IDs. Using fresh IDs will reduce the size of the
ID pool (both on disk and in memory) and the speed of multi-threaded
TRAP import.

The one exception is the empty location, which still uses a global ID.
2024-02-02 15:06:10 +00:00
Rasmus Wriedt Larsen
f20d4e22fe Handle only exclude 2024-01-18 13:54:45 +01:00
Rasmus Wriedt Larsen
54c7c5e8be Tree sitter extractor: Proper handling of LGTM_INDEX_FILTERS
If someone had used `LGTM_INDEX_FILTERS=exclude:**/*\ninclude:*.rb`
before, we would have mistakenly excluded all files :|
(LGTM_INDEX_FILTERS is a prioritized list where later matches take
priority over earlier ones)

This change is needed to support adding `exclude:**/*` as the first
filter if `paths` include a glob, which currently causes bad behavior in
the Python extractor. However, we can first introduce that change once
this PR has been merged.

I realize this change can cause more folders and files to be traversed
(since they are not just skipped with --exclude). We plan to make a
better long term fix which should bring back the previous performance.
2024-01-18 11:44:31 +01:00
Taus
ff35f9fb8c Shared: Clean up NodeInfo in shared extractor
I was perusing the shared extractor the other day, when I came across
the `NodeInfo` struct. I noticed that the `fields` and `subtypes` fields
on this struct had two seemingly identical ways of expressing the same
thing: `None` and `Some(empty)` (where `empty` is respectively the empty
map and the empty vector). As far as I can tell, there's no semantic
difference in either case, so we can just elide the option type entirely
and use the empty value directly. This has the nice side-effect of
cleaning up some of the other code.
2023-09-27 12:29:07 +00:00
Harry Maclean
b76842ad3d Shared: Fix clippy lint 2023-08-23 16:24:57 +01:00
Harry Maclean
3680613f2d Shared: Restrict extractor file globs to filenames 2023-08-23 16:09:56 +01:00
Harry Maclean
cc7ef5dac1 Shared: Fix clippy lint in shared extractor 2023-08-23 14:11:22 +01:00
Harry Maclean
ed40d72e4f Shared: Bump extractor version 2023-08-23 14:11:22 +01:00
Harry Maclean
7e2abf20c6 Shared: Support glob patterns in shared extractor
Replace the `file_extensions` field with `file_globs`, which supports
UNIX style glob patterns powered by the `globset` crate.

This allows files with no extension (e.g. Dockerfiles) to be extracted,
by specifying a glob such as `*Dockerfile`.

One surprising aspect of this change is that the globs match against the
whole path, rather than just the file name.

This is a breaking change.
2023-08-23 14:11:21 +01:00
Arthur Baars
2416568489 Tree-sitter-xtractor: fix clippy warnings 2023-05-22 19:37:58 +02:00
Arthur Baars
d2bc66e393 QL: switch to shared YAML extractor 2023-05-22 19:28:59 +02:00
Arthur Baars
9f83dd5c7a Tree-sitter extractor: extract shared dbscheme fragments into 'prefix.dbscheme' 2023-05-22 19:28:51 +02:00
Harry Maclean
48f22681a5 Merge pull request #13029 from hmac/ruby-autobuilder-refactor
Shared: Share autobuilder code between Ruby and QL
2023-05-12 18:24:06 +07:00
Harry Maclean
9203efbdc4 Shared: Share autobuilder code between Ruby and QL 2023-05-05 07:20:14 +00:00
Harry Maclean
c7e8f0d12a Shared: Pin rust version for shared extractor 2023-05-05 06:36:55 +00:00
Harry Maclean
a577bec22c Shared: Fix clippy warnings in shared extractor 2023-05-05 06:30:12 +00:00
Harry Maclean
8a89aec220 Shared: Handle trap compression option properly
Extracting the compression setting from an environment variable is the
responsibility of the API consumer.
2023-04-27 05:06:57 +00:00
Harry Maclean
3f6087e179 Shared: formatting 2023-04-23 06:04:55 +00:00
Harry Maclean
9005684b10 Shared: Add integration test for shared extractor
This is a very basic test but provides some confidence that the extractor is
working.
2023-04-23 05:29:22 +00:00
Harry Maclean
ac1d250596 Shared: fix language prefix in extractor 2023-04-21 15:07:47 +07:00
Harry Maclean
8091d57f03 Shared: Remove unused type 2023-04-20 08:07:40 +07:00
Harry Maclean
c4d7658cc6 Shared: high level API for the shared extractor
This API makes it easy to create an extractor for simple use cases.
2023-04-20 08:07:40 +07:00
Harry Maclean
2107533822 Shared: Clippy fixes
Use clearer methods where appropriate.
2023-04-05 18:46:57 +08:00
Harry Maclean
6a8d417588 Shared: Clippy fixes
Remove unnecessary borrows and lifetime specifiers.
2023-04-05 18:46:57 +08:00
Harry Maclean
a59215f3b9 Shared: Clippy fixes 2023-04-05 18:46:57 +08:00
Harry Maclean
b6c071a10b Shared: Further consolidate generators 2023-04-05 18:46:57 +08:00
Harry Maclean
f74d13cf06 Shared: Add db generation functions
These are currently duplicated across the Ruby and QL extractors. Adding
them to the shared extractor library will get rid of this duplication.
2023-04-05 18:46:56 +08:00
Harry Maclean
6b2e8847f5 Rename shared extractor
It is now called `tree-sitter-extractor`, to make it clearer that it
builds on tree-sitter grammars.
2023-03-25 10:43:07 +13:00