vim-patch:9.1.1276: inline word diff treats multibyte chars as word char (#33323)

Problem: inline word diff treats multibyte chars as word char (after 9.1.1243) Solution: treat all non-alphanumeric characters as non-word characters (Yee Cheng Chin) Previously inline word diff simply used Vim's definition of keyword to determine what is a word, which leads to multi-byte character classes such as emojis and CJK (Chinese/Japanese/Korean) characters all classifying as word characters, leading to entire sentences being grouped as a single word which does not provide meaningful information in a diff highlight. Fix this by treating all non-alphanumeric characters (with class number above 2) as non-word characters, as there is usually no benefit in using word diff on them. These include CJK characters, emojis, and also subscript/superscript numbers. Meanwhile, multi-byte characters like Cyrillic and Greek letters will still continue to considered as words. Note that this is slightly inconsistent with how words are defined elsewhere, as Vim usually considers any character with class >=2 to be a "word". related: vim/vim#16881 (diff inline highlight) closes: vim/vim#17050 9aa120f7ad Co-authored-by: Yee Cheng Chin <ychin.git@gmail.com>
2026-07-15 22:00:40 +00:00 · 2025-04-05 09:42:00 +08:00
parent 1e1384b6dd
commit e8785c2e94
7 changed files with 45 additions and 10 deletions
--- a/src/nvim/diff.c
+++ b/src/nvim/diff.c
@@ -2990,10 +2990,15 @@ static void diff_find_change_inline_diff(diff_T *dp)

      char *s = curline;
      while (*s != NUL) {
-        // Always use the first buffer's 'iskeyword' to have a consistent diff
        bool new_in_keyword = false;
        if (diff_flags & DIFF_INLINE_WORD) {
-          new_in_keyword = vim_iswordp_buf(s, curtab->tp_diffbuf[file1_idx]);
+          // Always use the first buffer's 'iskeyword' to have a
+          // consistent diff.
+          // For multibyte chars, only treat alphanumeric chars
+          // (class 2) as "word", as other classes such as emojis and
+          // CJK ideographs do not usually benefit from word diff as
+          // Vim doesn't have a good way to segment them.
+          new_in_keyword = (mb_get_class_tab(s, curtab->tp_diffbuf[file1_idx]->b_chartab) == 2);
        }
        if (in_keyword && !new_in_keyword) {
          ga_append(curstr, NL);