Problem: fuzzy-matching can be improved
Solution: Implement a better fuzzy matching algorithm
(Girish Palya)
Replace fuzzy matching algorithm with improved fzy-based implementation
The
[current](https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/)
fuzzy matching algorithm has several accuracy issues:
* It struggles with CamelCase
* It fails to prioritize matches at the beginning of strings, often
ranking middle matches higher.
After evaluating alternatives (see my comments
[here](https://github.com/vim/vim/issues/17531#issuecomment-3112046897)
and
[here](https://github.com/vim/vim/issues/17531#issuecomment-3121593900)),
I chose to adopt the [fzy](https://github.com/jhawthorn/fzy) algorithm,
which:
* Resolves the aforementioned issues.
* Performs better.
Implementation details
This version is based on the original fzy
[algorithm](https://github.com/jhawthorn/fzy/blob/master/src/match.c),
with one key enhancement: **multibyte character support**.
* The original implementation supports only ASCII.
* This patch replaces ascii lookup tables with function calls, making it
compatible with multibyte character sets.
* Core logic (`match_row()` and `match_positions()`) remains faithful to
the original, but now operates on codepoints rather than single-byte
characters.
Performance
Tested against a dataset of **90,000 Linux kernel filenames**. Results
(in milliseconds) show a **\~2x performance improvement** over the
current fuzzy matching algorithm.
```
Search String Current Algo FZY Algo
-------------------------------------------------
init 131.759 66.916
main 83.688 40.861
sig 98.348 39.699
index 109.222 30.738
ab 72.222 44.357
cd 83.036 54.739
a 58.94 62.242
b 43.612 43.442
c 64.39 67.442
k 40.585 36.371
z 34.708 22.781
w 38.033 30.109
cpa 82.596 38.116
arz 84.251 23.964
zzzz 35.823 22.75
dimag 110.686 29.646
xa 43.188 29.199
nha 73.953 31.001
nedax 94.775 29.568
dbue 79.846 25.902
fp 46.826 31.641
tr 90.951 55.883
kw 38.875 23.194
rp 101.575 55.775
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519 30.921
```
```vim
vim9script
var haystack = readfile('/Users/gp/linux.files')
var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b',
'c', 'k',
'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax',
'dbue',
'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk']
for needle in needles
var start = reltime()
var tmp = matchfuzzy(haystack, needle)
echom $'{needle}' (start->reltime()->reltimefloat() * 1000)
endfor
```
Additional changes
* Removed the "camelcase" option from both matchfuzzy() and
matchfuzzypos(), as it's now obsolete with the improved algorithm.
related: neovim/neovim#34101fixesvim/vim#17531closes: vim/vim#179007e0df5eee9
Co-authored-by: Girish Palya <girishji@gmail.com>
Problem: negative matchfuzzy scores although there is a match
(Maxim Kim)
Solution: reset the score if a match has been found but the score is
negative (Girish Palya)
The fuzzy algorithm may miss some matches in long strings due to recursion
limits. As a result, the score can end up negative even when matches exist.
In such cases, reset the score to ensure it is non-negative.
fixes: #vim/vim#17449
closes: vim/vim#17469328332b0b0
Co-authored-by: Girish Palya <girishji@gmail.com>
Problem: Strange error with type for matchfuzzy() "camelcase".
Solution: Show the error "Invalid value for argument camelcase" instead
of "Invalid argument: camelcase" (zeertzjq).
Note that using tv_get_string() will lead to confusion, as when the
value cannot be converted to a string tv_get_string() will also give an
error about that, but "camelcase" takes a boolean, not a string. Also
don't use tv_get_string() for the "limit" argument above.
closes: vim/vim#16926c4815c157b
Problem: tests: typos in test_matchfuzzy.vim (after 9.1.1214).
Solution: Fix the typos. Consistently put the function call on the
second line in assertions for camelcase (zeertzjq).
closes: vim/vim#1690785627732e0
Problem: When searching for "Cur", CamelCase matches like "lCursor" score
higher than exact prefix matches like Cursor, which is
counter-intuitive (Maxim Kim).
Solution: Add a 'camelcase' option to matchfuzzy() that lets users disable
CamelCase bonuses when needed, making prefix matches rank higher.
(glepnir)
fixes: vim/vim#16504closes: vim/vim#1679728e40a7b55
Co-authored-by: glepnir <glephunter@gmail.com>
Problem: fuzzy-matching does not prefer full match
(Maxim Kim)
Solution: add additional score for a full match
(glepnir)
fixes: vim/vim#15654closes: vim/vim#163005a04999a74
Problem: Using uninitialized memory with fuzzy matching.
Solution: Initialize the arrays used to store match positions.
caf642c25d
Co-authored-by: Bram Moolenaar <Bram@vim.org>
Problem: Checks for Dictionary argument often give a vague error message.
Solution: Give a useful error message. (Yegappan Lakshmanan, closesvim/vim#11009)
04c4c5746e
Cherry-pick removal of E922 from docs from patch 9.0.1403.
Co-authored-by: Yegappan Lakshmanan <yegappan@yahoo.com>