feat(mbyte): support extended grapheme clusters including more emoji

Use the grapheme break algorithm from utf8proc to support grapheme
clusters from recent unicode versions.

Handle variant selector VS16 turning some codepoints into double-width
emoji. This means we need to use ptr2cells rather than char2cells when
possible.
This commit is contained in:
bfredl
2024-08-08 10:42:08 +02:00
parent 4353996d0f
commit cfdf68a7ac
34 changed files with 657 additions and 221 deletions

View File

@@ -646,6 +646,12 @@ widespread as file format.
A composing or combining character is used to change the meaning of the
character before it. The combining characters are drawn on top of the
preceding character.
Nvim largely follows the definition of extended grapheme clusters in UAX#29
in the Unicode standard, with some modifications: An ascii char will always
start a new cluster. In addition 'arabicshape' enables the combining of some
arabic letters, when they are shaped to be displayed together in a single cell.
Too big combined characters cannot be displayed, but they can still be
inspected using the |g8| and |ga| commands described below.
When editing text a composing character is mostly considered part of the

View File

@@ -200,6 +200,12 @@ These existing features changed their behavior.
top lines are calculated using screen line numbers which take virtual lines
into account.
• The implementation of grapheme clusters (or combining chars |mbyte-combining|)
was upgraded to closely follow extended grapheme clusters as defined by UAX#29
in the unicode standard. Noteworthily, this enables proper display of many
more emoji characters than before, including those encoded with multiple
emoji codepoints combined with ZWJ (zero width joiner) codepoints.
==============================================================================
REMOVED FEATURES *news-removed*

View File

@@ -2217,9 +2217,12 @@ A jump table for the options with a short description can be found at |Q_op|.
global
When on all Unicode emoji characters are considered to be full width.
This excludes "text emoji" characters, which are normally displayed as
single width. Unfortunately there is no good specification for this
and it has been determined on trial-and-error basis. Use the
|setcellwidths()| function to change the behavior.
single width. However, such "text emoji" are treated as full-width
emoji if they are followed by the U+FE0F variant selector.
Unfortunately there is no good specification for this and it has been
determined on trial-and-error basis. Use the |setcellwidths()|
function to change the behavior.
*'encoding'* *'enc'*
'encoding' 'enc' string (default "utf-8")

View File

@@ -1829,9 +1829,12 @@ vim.go.ead = vim.go.eadirection
--- When on all Unicode emoji characters are considered to be full width.
--- This excludes "text emoji" characters, which are normally displayed as
--- single width. Unfortunately there is no good specification for this
--- and it has been determined on trial-and-error basis. Use the
--- `setcellwidths()` function to change the behavior.
--- single width. However, such "text emoji" are treated as full-width
--- emoji if they are followed by the U+FE0F variant selector.
---
--- Unfortunately there is no good specification for this and it has been
--- determined on trial-and-error basis. Use the `setcellwidths()`
--- function to change the behavior.
---
--- @type boolean
vim.o.emoji = true