Commit Graph

148 Commits

Author SHA1 Message Date
vanaigr
5edbabdbec perf: add on_range in treesitter highlighting 2025-08-28 08:22:38 -05:00
bfredl
442f297c63 refactor(build): remove INCLUDE_GENERATED_DECLARATIONS guards
These are not needed after #35129 but making uncrustify still play nice
with them was a bit tricky.

Unfortunately `uncrustify --update-config-with-doc` breaks strings
with backslashes. This issue has been reported upstream,
and in the meanwhile auto-update on every single run has been disabled.
2025-08-14 09:34:38 +02:00
Rodrigodd
168bf0024e fix(treesitter): ensure TSLuaTree is always immutable
Problem:

The previous fix in #34314 relies on copying the tree in `tree_root` to
ensure the `TSNode`'s tree cannot be mutated. But that causes the
problem where two calls to `tree_root` return nodes from different
copies of a tree, which do not compare as equal. This has broken at
least one plugin.

Solution:

Make all `TSTree`s on the Lua side always immutable, avoiding the need
to copy the tree in `tree_root`, and make the only mutation point,
`tree_edit`, copy the tree instead.
2025-07-02 17:05:17 +01:00
Rodrigodd
99e6294819 fix(treesitter): ensure TSNode's tree is immutable
Problem:

TSNode contains a `const TSTree*` and a `const void *id`. The `id`
points to Tree-sitter's internal type `Subtree`, which resides inside
the `TSTree` but may be deallocated if the `TSTree` is mutated (which
is likely why it is `const`).

The Lua method `TSTree:edit()` mutates the tree, which can deallocate
`id`.

See #25254 and #31758.

Solution:

To avoid this, we now make a copy of the tree before pushing its root to
the Lua stack. This also removes the fenv from TSLuaTree, as it was only
used when pushing the tree root to the Lua stack.

We also copy the tree in `node_tree`.

`ts_tree_copy()` just increments a couple of reference counters, so it's
relatively cheap to call.
2025-06-06 15:35:52 +01:00
Lewis Russell
40b64e9100 refactor(treesitter): move functions from executor.c to treesitter.c 2025-05-13 15:44:42 +01:00
Riley Bruins
7369f80b19 refactor(treesitter): remove empty parse callback
Now that we have bumped to tree-sitter 0.25.4, we no longer need to do
this since upstream does it for us when calling the regular parse
method.
2025-05-11 20:14:08 +02:00
Justin M. Keyes
fc2dee1736 feat(messages): cleanup Lua error messages
"Error" in error messages is redundant. Just provide the context, don't
say "Error ...".
2025-05-04 11:22:57 -04:00
Riley Bruins
60af1a1db2 fix(treesitter): clear parse options state #33437
Apparently after parsing with options in tree-sitter, the options data
persists in the parser object, and thus successive calls to
`ts_parser_parse()` will act like `ts_parser_parse_with_options()`. This
is problematic because `languagetree.lua` makes coroutine-environment
assumptions based on if a nullptr has been returned by the parser
function. This commit makes it so that the parse options state is reset
upon a regular parse (would be nice if this was done upstream).

Fixes #33277
2025-04-12 15:51:29 -07:00
Riley Bruins
f4fc769c81 refactor(treesitter): migrate to ts parser callback API #33141
Remove the `set_timeout` functions for `TSParser` and instead add a timeout
parameter to the regular parse function. Remove these deprecated tree-sitter
API functions and replace them with the preferred `TSParseOptions` style.
2025-03-29 10:57:22 -07:00
Ian Chamberlain
8b5a0a00c8 feat(treesitter): allow disabling captures and patterns on TSQuery (#32790)
Problem: Cannot disable individual captures and patterns in treesitter queries.

Solution: 
* Expose the corresponding tree-sitter API functions for `TSQuery` object. 
* Add documentation for `TSQuery`.
* Return the pattern ID from `get_captures_at_pos()` (and hence `:Inspect!`).
2025-03-11 14:45:01 +01:00
Lewis Russell
ec8922978e feat(treesitter): add more metadata to language.inspect() (#32657)
Problem: No way to check the version of a treesitter parser.

Solution: Add version metadata (ABI 15 parsers only) as well as parser state count and supertype information (ABI 15) in `vim.treesitter.language.inspect()`. Also graduate the `abi_version` field, as this is now the official upstream name.

---------

Co-authored-by: Christian Clason <c.clason@uni-graz.at>
2025-03-01 15:51:09 +00:00
Riley Bruins
55b165ac15 fix(treesitter): TSNode:field() returns all children with the given field 2025-02-21 09:47:02 +00:00
Riley Bruins
77be44563a refactor(treesitter): always return valid range from parse() #32273
Problem:
When running an initial parse, parse() returns an empty table rather
than an actual range. In `languagetree.lua`, we manually check if
a parse was incremental to determine the changed parse region.

Solution:
- Always return a range (in the C side) from parse().
- Simplify the language tree code a bit.
- Logger no longer shows empty ranges on the initial parse.
2025-02-02 03:46:26 -08:00
Christian Clason
eb60cd74fb build(deps)!: bump tree-sitter to HEAD, wasmtime to v29.0.1 (#32200)
Breaking change: `ts_node_child_containing_descendant()` was removed

Breaking change: tree-sitter 0.25 (HEAD) required
2025-01-27 16:16:06 +01:00
Horror Proton
5a54681025 fix(treesitter): uv_dlclose after uv_dlerror 2025-01-14 09:15:35 +00:00
Riley Bruins
45e606b1fd feat(treesitter): async parsing
**Problem:** Parsing can be slow for large files, and it is a blocking
operation which can be disruptive and annoying.

**Solution:** Provide a function for asynchronous parsing, which accepts
a callback to be run after parsing completes.

Co-authored-by: Lewis Russell <lewis6991@gmail.com>
Co-authored-by: Luuk van Baal <luukvbaal@gmail.com>
Co-authored-by: VanaIgr <vanaigranov@gmail.com>
2025-01-12 08:10:47 -08:00
dundargoc
25abcd243e fix: fix broken wasmtime build
Regression from 2a7d0ed614, which removed
header that is only needed if wasmtime support is enabled. Prevent this
from happening again by wrapping the include in a `HAVE_WASMTIME` check.
2024-12-23 16:07:09 +01:00
Justin M. Keyes
2a7d0ed614 refactor: iwyu #31637
Result of `make iwyu` (after some "fixups").
2024-12-23 05:43:52 -08:00
Riley Bruins
36990f324d fix(treesitter): show proper node name error messages
**Problem:** Currently node names with non-alphanumeric, non
underscore/hyphen characters (only possible with anonymous nodes) are
not given a proper error message. See tree-sitter issue 3892 for more
details.

**Solution:** Apply a different scanning logic to anonymous nodes to
correctly identify the entire node name (i.e., up until the final double
quote)
2024-11-13 13:32:58 +01:00
Amaan Qureshi
7a20f93a92 fix(treesitter): correct condition in __has_ancestor 2024-10-27 17:56:06 +00:00
Riley Bruins
4b90952851 fix(treesitter): mark supertype nodes as named
**Problem:** Tree-sitter 0.24.0 introduced a new symbol type to denote
supertype nodes (`TSSymbolTypeSupertype`). Now, `language.inspect()`
(and the query `omnifunc`) return supertype symbols, but with double
quotes around them.

**Solution:** Mark a symbol as "named" based on it *not* being an
anonymous node, rather than checking that it is a regular node (which a
supertype also is not).
2024-10-12 09:59:44 +02:00
Riley Bruins
d3193afc25 fix(treesitter): remove duplicate symbol names in language.inspect()
**Problems:**

- `vim.treesitter.language.inspect()` returns duplicate
  symbol names, sometimes up to 6 of one kind in the case of `markdown`
- The list-like `symbols` table can have holes and is thus not even a
  valid msgpack table anyway, mentioned in a test

**Solution:** Return symbols as a map, rather than a list, where field
names are the names of the symbol. The boolean value associated with the
field encodes whether or not the symbol is named.

Note that anonymous nodes are surrounded with double quotes (`"`) to
prevent potential collisions with named counterparts that have the same
identifier.
2024-10-11 18:15:07 +02:00
Riley Bruins
267c7525f7 feat(treesitter): introduce child_with_descendant()
This commit also marks `child_containing_descendant()` as deprecated
(per upstream's documentation), and uses `child_with_descendant()` in
its place. Minimum required tree-sitter version will now be `0.24`.
2024-10-11 17:29:45 +02:00
Lewis Russell
c6abc97006 perf(treesitter): do not use tree cursors with a small lifetime
Problem:
Tree cursors can only be efficient when they are re-used.
Short-lived cursors are very slow.

Solution:
Reimplement functions that use short-lived cursors.
2024-10-03 11:19:59 +01:00
Lewis Russell
688b961d13 feat(treesitter): add support for wasm parsers
Problem: Installing treesitter parser is hard (harder than
climbing to heaven).

Solution: Add optional support for wasm parsers with `wasmtime`.

Notes:

* Needs to be enabled by setting `ENABLE_WASMTIME` for tree-sitter and
  Neovim. Build with
  `make CMAKE_EXTRA_FLAGS=-DENABLE_WASMTIME=ON
  DEPS_CMAKE_FLAGS=-DENABLE_WASMTIME=ON`
* Adds optional Rust (obviously) and C11 dependencies.
* Wasmtime comes with a lot of features that can negatively affect
  Neovim performance due to library and symbol table size. Make sure to
  build with minimal features and full LTO.
* To reduce re-compilation times, install `sccache` and build with
  `RUSTC_WRAPPER=<path/to/sccache> make ...`
2024-08-26 16:44:03 +02:00
James Tirta Halim
200e7ad157 fixup: apply the change on more files 2024-06-04 09:42:19 +01:00
vanaigr
4b02916334 perf(treesitter): use child_containing_descendant() in has-ancestor? (#28512)
Problem: `has-ancestor?` is O(n²) for the depth of the tree since it iterates over each of the node's ancestors (bottom-up), and each ancestor takes O(n) time.
This happens because tree-sitter's nodes don't store their parent nodes, and the tree is searched (top-down) each time a new parent is requested.

Solution: Make use of new `ts_node_child_containing_descendant()` in tree-sitter v0.22.6 (which is now the minimum required version) to rewrite the `has-ancestor?` predicate in C to become O(n).

For a sample file, decreases the time taken by `has-ancestor?` from 360ms to 6ms.
2024-05-16 16:57:58 +02:00
bfredl
0df681a91d fix(treesitter): make tests for memoize more robust
Instead of painfully messing with timing to determine if queries were
reparsed, we can simply keep a counter next to the call to ts_query_new

Also memoization had a hidden dependency on the garbage collection of
the the key, a hash value which never is kept around in memory. this was
done intentionally as the hash does not capture all relevant state for the
query (external included files) even if actual query objects still
would be reachable in memory. To make the test fully deterministic in
CI, we explicitly control GC.
2024-04-29 16:20:46 +02:00
Lewis Russell
032df963bb refactor(treesitter): language loading 2024-04-21 14:09:27 +01:00
Lewis Russell
47388614cb refactor(treesitter): handle coverity warnings better 2024-03-20 12:22:54 +00:00
Lewis Russell
0f85aeb478 fix(treesitter): treecursor regression
- Also address some coverity warnings

Fixes #27942
2024-03-20 10:56:16 +00:00
Lewis Russell
597d4c63bd refactor(treesitter): reorder functions 2024-03-19 18:40:08 +00:00
Lewis Russell
aca6c93002 refactor(treesitter): simplify argument checks for userdata 2024-03-19 16:16:54 +00:00
Lewis Russell
aca2048bcd refactor(treesitter): redesign query iterating
Problem:

  `TSNode:_rawquery()` is complicated, has known issues and the Lua and
  C code is awkwardly coupled (see logic with `active`).

Solution:

  - Add `TSQueryCursor` and `TSQueryMatch` bindings.
  - Replace `TSNode:_rawquery()` with `TSQueryCursor:next_capture()` and `TSQueryCursor:next_match()`
  - Do more stuff in Lua
  - API for `Query:iter_captures()` and `Query:iter_matches()` remains the same.
  - `treesitter.c` no longer contains any logic related to predicates.
  - Add `match_limit` option to `iter_matches()`. Default is still 256.
2024-03-19 14:24:59 +00:00
zeertzjq
ac8cd5368d refactor: use ml_get_buf_len() in API code (#27825) 2024-03-12 10:44:53 +08:00
Thomas Vigouroux
bd5008de07 fix(treesitter): correctly handle query quantifiers (#24738)
Query patterns can contain quantifiers (e.g. (foo)+ @bar), so a single
capture can map to multiple nodes. The iter_matches API can not handle
this situation because the match table incorrectly maps capture indices
to a single node instead of to an array of nodes.

The match table should be updated to map capture indices to an array of
nodes. However, this is a massively breaking change, so must be done
with a proper deprecation period.

`iter_matches`, `add_predicate` and `add_directive` must opt-in to the
correct behavior for backward compatibility. This is done with a new
"all" option. This option will become the default and removed after the
0.10 release.

Co-authored-by: Christian Clason <c.clason@uni-graz.at>
Co-authored-by: MDeiml <matthias@deiml.net>
Co-authored-by: Gregory Anders <greg@gpanders.com>
2024-02-16 11:54:47 -06:00
Jongwook Choi
800134ea5e refactor(treesitter): typing for Query, TSQuery, and TSQueryInfo
- `TSQuery`: userdata object for parsed query.

- `vim.treesitter.Query`: renamed from `Query`.
  - Add a new field `lang`.

- `TSQueryInfo`:
  - Move to `vim/treesitter/_meta.lua`, because C code owns it.
  - Correct typing for `patterns`, should be a map from `integer`
    (pattern_id) to `(integer|string)[][]` (list of predicates or
    directives).

- `vim.treesitter.QueryInfo` is added.
  - This currently has the same structure as `TSQueryInfo` (exported
    from C code).
  - Document the fields (see `TSQuery:inspect`).

- Add typing for `vim._ts_parse_query()`.
2024-02-08 12:40:16 +00:00
Jongwook Choi
5b1b765610 docs: enforce "treesitter" spelling #27110
It's the "tree-sitter" project, but "treesitter" in our code and docs.
2024-01-28 17:53:14 -08:00
Christian Clason
83b51b36aa fixup: raise TS min version 2024-01-25 23:39:25 +01:00
dundargoc
79b6ff28ad refactor: fix headers with IWYU 2023-11-28 22:23:56 +01:00
dundargoc
6c14ae6bfa refactor: rename types.h to types_defs.h 2023-11-27 21:57:51 +01:00
dundargoc
f4aedbae4c build(IWYU): fix includes for undo_defs.h 2023-11-27 19:33:17 +01:00
zeertzjq
574d25642f refactor: move Arena and ArenaMem to memory_defs.h (#26240) 2023-11-27 17:21:58 +08:00
dundargoc
353a4be7e8 build: remove PVS
We already have an extensive suite of static analysis tools we use,
which causes a fair bit of redundancy as we get duplicate warnings. PVS
is also prone to give false warnings which creates a lot of work to
identify and disable.
2023-11-12 21:26:39 +01:00
dundargoc
5f03a1eaab build(lint): remove unnecessary clint.py rules
Uncrustify is the source of truth where possible.
Remove any redundant checks from clint.py.
2023-10-23 20:06:21 +02:00
dundargoc
8e932480f6 refactor: the long goodbye
long is 32 bits on windows, while it is 64 bits on other architectures.
This makes the type suboptimal for a codebase meant to be
cross-platform. Replace it with more appropriate integer types.
2023-10-09 11:45:46 +02:00
zeertzjq
cf8b2c0e74 build(iwyu): add a few more _defs.h mappings (#25435) 2023-09-30 12:05:28 +08:00
bfredl
5970157e1d refactor(map): enhanced implementation, Clean Code™, etc etc
This involves two redesigns of the map.c implementations:

1. Change of macro style and code organization

The old khash.h and map.c implementation used huge #define blocks with a
lot of backslash line continuations.

This instead uses the "implementation file" .c.h pattern. Such a file is
meant to be included multiple times, with different macros set prior to
inclusion as parameters. we already use this pattern e.g. for
eval/typval_encode.c.h to implement different typval encoders reusing a
similar structure.

We can structure this code into two parts. one that only depends on key
type and is enough to implement sets, and one which depends on both key
and value to implement maps (as a wrapper around sets, with an added
value[] array)

2. Separate the main hash buckets from the key / value arrays

Change the hack buckets to only contain an index into separate key /
value arrays
This is a common pattern in modern, state of the art hashmap
implementations. Even though this leads to one more allocated array, it
is this often is a net reduction of memory consumption. Consider
key+value consuming at least 12 bytes per pair. On average, we will have
twice as many buckets per item.
Thus old implementation:

  2*12 = 24 bytes per item

New implementation

  1*12 + 2*4 = 20 bytes per item

And the difference gets bigger with larger items.
One might think we have pulled a fast one here, as wouldn't the average size of
the new key/value arrays be 1.5 slots per items due to amortized grows?
But remember, these arrays are fully dense, and thus the accessed memory,
measured in _cache lines_, the unit which actually matters, will be the
fully used memory but just rounded up to the nearest cache line
boundary.

This has some other interesting properties, such as an insert-only
set/map will be fully ordered by insert only. Preserving this ordering
in face of deletions is more tricky tho. As we currently don't use
ordered maps, the "delete" operation maintains compactness of the item
arrays in the simplest way by breaking the ordering. It would be
possible to implement an order-preserving delete although at some cost,
like allowing the items array to become non-dense until the next rehash.

Finally, in face of these two major changes, all code used in khash.h
has been integrated into map.c and friends. Given the heavy edits it
makes no sense to "layer" the code into a vendored and a wrapper part.
Rather, the layered cake follows the specialization depth: code shared
for all maps, code specialized to a key type (and its equivalence
relation), and finally code specialized to value+key type.
2023-09-08 12:48:46 +02:00
Lewis Russell
dd0e77d48a fix(query_error): multiline bug 2023-08-31 15:12:17 +01:00
Amaan Qureshi
845d5b8b64 feat(treesitter): improve query error message 2023-08-31 13:33:40 +01:00