neovim

mirror of https://github.com/neovim/neovim.git synced 2026-08-02 21:59:11 +00:00

Files

zeertzjq 869000e7ce vim-patch:9.1.1627: fuzzy matching can be improved

Problem:  fuzzy-matching can be improved
Solution: Implement a better fuzzy matching algorithm
          (Girish Palya)

Replace fuzzy matching algorithm with improved fzy-based implementation

The
[current](https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/)
fuzzy matching algorithm has several accuracy issues:

* It struggles with CamelCase
* It fails to prioritize matches at the beginning of strings, often
  ranking middle matches higher.

After evaluating alternatives (see my comments
[here](https://github.com/vim/vim/issues/17531#issuecomment-3112046897)
and
[here](https://github.com/vim/vim/issues/17531#issuecomment-3121593900)),
I chose to adopt the [fzy](https://github.com/jhawthorn/fzy) algorithm,
which:

* Resolves the aforementioned issues.
* Performs better.

Implementation details

This version is based on the original fzy
[algorithm](https://github.com/jhawthorn/fzy/blob/master/src/match.c),
with one key enhancement: **multibyte character support**.

* The original implementation supports only ASCII.
* This patch replaces ascii lookup tables with function calls, making it
  compatible with multibyte character sets.
* Core logic (`match_row()` and `match_positions()`) remains faithful to
  the original, but now operates on codepoints rather than single-byte
  characters.

Performance

Tested against a dataset of **90,000 Linux kernel filenames**. Results
(in milliseconds) show a **\~2x performance improvement** over the
current fuzzy matching algorithm.

```
Search String            Current Algo    FZY Algo
-------------------------------------------------
init                          131.759    66.916
main                          83.688     40.861
sig                           98.348     39.699
index                         109.222    30.738
ab                            72.222     44.357
cd                            83.036     54.739
a                             58.94      62.242
b                             43.612     43.442
c                             64.39      67.442
k                             40.585     36.371
z                             34.708     22.781
w                             38.033     30.109
cpa                           82.596     38.116
arz                           84.251     23.964
zzzz                          35.823     22.75
dimag                         110.686    29.646
xa                            43.188     29.199
nha                           73.953     31.001
nedax                         94.775     29.568
dbue                          79.846     25.902
fp                            46.826     31.641
tr                            90.951     55.883
kw                            38.875     23.194
rp                            101.575    55.775
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519     30.921
```

```vim
vim9script

var haystack = readfile('/Users/gp/linux.files')

var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b',
'c', 'k',
    'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax',
'dbue',
    'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk']
for needle in needles
    var start = reltime()
    var tmp = matchfuzzy(haystack, needle)
    echom $'{needle}' (start->reltime()->reltimefloat() * 1000)
endfor
```

Additional changes

* Removed the "camelcase" option from both matchfuzzy() and
  matchfuzzypos(), as it's now obsolete with the improved algorithm.

related: neovim/neovim#34101
fixes vim/vim#17531
closes: vim/vim#17900

7e0df5eee9

Co-authored-by: Girish Palya <girishji@gmail.com>