Problem: Wrong cursor position after using setcellwidths().
Solution: Invalidate cursor position in addition to redrawing.
(zeertzjq)
closes: vim/vim#1454505aacec6ab
Reorder functions in test_utf8.vim to match upstream.
Problem: Patch 9.1.0296 causes too many issues
(Tony Mechelynck, chdiza, CI)
Solution: Back out the change for now
Revert "patch 9.1.0296: regexp: engines do not handle case-folding well"
This reverts commit 7a27c108e0509f3255ebdcb6558e896c223e4d23 it causes
issues with syntax highlighting and breaks the FreeBSD and MacOS CI. It
needs more work.
fixes: vim/vim#14487c97f4d61cd
Co-authored-by: Christian Brabandt <cb@256bit.org>
Problem: Regex engines do not handle case-folding well
Solution: Correctly calculate byte length of characters to skip
When the regexp engine compares two utf-8 codepoints case insensitively
it may match an adjacent character, because it assumes it can step over
as many bytes as the pattern contains.
This however is not necessarily true because of case-folding, a
multi-byte UTF-8 character can be considered equal to some single-byte
value.
Let's consider the pattern 'ſ' and the string 's'. When comparing and
ignoring case, the single character 's' matches, and since it matches
Vim will try to step over the match (by the amount of bytes of the
pattern), assuming that since it matches, the length of both strings is
the same.
However in that case, it should only step over the single byte
value 's' so by 1 byte and try to start matching after it again. So for the
backtracking engine we need to ensure:
- we try to match the correct length for the pattern and the text
- in case of a match, we step over it correctly
The same thing can happen for the NFA engine, when skipping to the next
character to test for a match. We are skipping over the regstart
pointer, however we do not consider the case that because of
case-folding we may need to adjust the number of bytes to skip over. So
this needs to be adjusted in find_match_text() as well.
A related issue turned out, when prog->match_text is actually empty. In
that case we should try to find the next match and skip this condition.
fixes: vim/vim#14294closes: vim/vim#144337a27c108e0
Co-authored-by: Christian Brabandt <cb@256bit.org>
Problems:
- Illegal bytes after valid UTF-8 char cause utf_cp_*_off() to fail.
- When stream isn't NUL-terminated, utf_cp_*_off() may go over the end.
Solution: Don't go over end of the char of end of the string.
Problem: upper-case of ß should be U+1E9E (CAPITAL LETTER SHARP S)
(fenuks)
Solution: Make gU, ~ and g~ convert the U+00DF LATIN SMALL LETTER SHARP S (ß)
to U+1E9E LATIN CAPITAL LETTER SHARP S (ẞ), update tests
(glepnir)
This is part of Unicode 5.1.0 from April 2008, so should be fairly safe
to use now and since 2017 is part of the German standard orthography,
according to Wikipedia:
https://en.wikipedia.org/wiki/Capital_%E1%BA%9E#cite_note-auto-12
There is however one exception: UnicodeData.txt for U+00DF
LATIN SMALL LETTER SHARP S does NOT define U+1E9E LATIN CAPITAL LETTER
SHARP S as its upper case version. Therefore, toupper() won't be able
to convert from lower sharp s to upper case sharp s (the other way
around however works, since U+00DF is considered the lower case
character of U+1E9E and therefore tolower() works correctly for the
upper case version).
fixes: vim/vim#5573closes: vim/vim#14018bd1232a1fa
Co-authored-by: glepnir <glephunter@gmail.com>
Problem: qsort() comparison functions should be transitive
Solution: Do not subtract values, but rather use explicit comparisons
Improve qsort() comparison functions
There has been a recent report on qsort() causing out-of-bounds read &
write in glibc for non transitive comparison functions
https://www.qualys.com/2024/01/30/qsort.txt
Even so the bug is in glibc's implementation of the qsort() algorithm,
it's bad style to just use substraction for the comparison functions,
which may cause overflow issues and as hinted at in OpenBSD's manual
page for qsort(): "It is almost always an error to use subtraction to
compute the return value of the comparison function."
So check the qsort() comparison functions and change them to be safe.
closes: vim/vim#13980e06e437665
Co-authored-by: Christian Brabandt <cb@256bit.org>
`utf_char2cells()` calls `utf_printable()` twice (sometimes indirectly,
through `vim_isprintc()`) for characters >= 128. The function can be
refactored to call to it only once.
`utf_printable()` uses binary search on ranges of unprintable characters
to determine if a given character is printable. Since there are only 9
ranges, and the first range contains only one character, binary search
can be replaced with SSE2 SIMD comparisons that check 8 ranges at a
time, and the first range is checked separately. SSE2 is enabled by
default in GCC, Clang and MSVC for x86-64.
Add 3-byte utf-8 to screenpos_spec benchmarks.
The optimized virtual column calculation loop in getvcol()
was decoding the current character twice: once in ptr2cells()
and the second time in utfc_ptr2len(). For combining charcters, they were
decoded up to 2 times in utfc_ptr2len(). Additionally, the function used to
decode the character could be further optimised.
Remove `export` pramgas from defs headers as it causes IWYU to believe
that the definitions from the defs headers comes from main header, which
is not what we really want.
Problem: buffer text with composing chars are converted from UTF-8
to an array of up to seven UTF-32 values and then converted back
to UTF-8 strings.
Solution: Convert buffer text directly to UTF-8 based schar_T values.
The limit of the text size is now in schar_T bytes, which is currently
31+1 but easily could be raised as it no longer multiplies the size
of the entire screen grid when not used, the full size is only required
for temporary scratch buffers.
Also does some general cleanup to win_line text handling, which was
unnecessarily complicated due to multibyte rendering being an "opt-in"
feature long ago. Nowadays, a char is just a char, regardless if it consists
of one ASCII byte or multiple bytes.
We already have an extensive suite of static analysis tools we use,
which causes a fair bit of redundancy as we get duplicate warnings. PVS
is also prone to give false warnings which creates a lot of work to
identify and disable.
long is 32 bits on windows, while it is 64 bits on other architectures.
This makes the type suboptimal for a codebase meant to be
cross-platform. Replace it with more appropriate integer types.
long is 32 bits on windows, while it is 64 bits on other architectures.
This makes the type suboptimal for a codebase meant to be
cross-platform. Replace it with more appropriate integer types.
Problem: cannot complete option values
Solution: Add completion functions for several options
Add cmdline tab-completion for setting string options
Add tab-completion for setting string options on the cmdline using
`:set=` (along with `:set+=` and `:set-=`).
The existing tab completion for setting options currently only works
when nothing is typed yet, and it only fills in with the existing value,
e.g. when the user does `:set diffopt=<Tab>` it will be completed to
`set diffopt=internal,filler,closeoff` and nothing else. This isn't too
useful as a user usually wants auto-complete to suggest all the possible
values, such as 'iblank', or 'algorithm:patience'.
For set= and set+=, this adds a new optional callback function for each
option that can be invoked when doing completion. This allows for each
option to have control over how completion works. For example, in
'diffopt', it will suggest the default enumeration, but if `algorithm:`
is selected, it will further suggest different algorithm types like
'meyers' and 'patience'. When using set=, the existing option value will
be filled in as the first choice to preserve the existing behavior. When
using set+= this won't happen as it doesn't make sense.
For flag list options (e.g. 'mouse' and 'guioptions'), completion will
take into account existing typed values (and in the case of set+=, the
existing option value) to make sure it doesn't suggest duplicates.
For set-=, there is a new `ExpandSettingSubtract` function which will
handle flag list and comma-separated options smartly, by only suggesting
values that currently exist in the option.
Note that Vim has some existing code that adds special handling for
'filetype', 'syntax', and misc dir options like 'backupdir'. This change
preserves them as they already work, instead of converting to the new
callback API for each option.
closes: vim/vim#13182900894b09a
Co-authored-by: Yee Cheng Chin <ychin.git@gmail.com>
- Move vimoption_T to option.h
- option_defs.h is for option-related types
- option_vars.h corresponds to Vim's option.h
- option_defs.h and option_vars.h don't include each other
ml_get_buf() takes a third parameters to indicate whether the
caller wants to mutate the memline data in place. However
the vast majority of the call sites is using this function
just to specify a buffer but without any mutation. This makes
it harder to grep for the places which actually perform mutation.
Solution: Remove the bool param from ml_get_buf(). it now works
like ml_get() except for a non-current buffer. Add a new
ml_get_buf_mut() function for the mutating use-case, which can
be grepped along with the other ml_replace() etc functions which
can modify the memline.
Problem: Functions for string manipulation are spread out.
Solution: Move string related functions to a new source file. (Yegappan
Lakshmanan, closesvim/vim#8470)
a2438132a6
Co-authored-by: Yegappan Lakshmanan <yegappan@yahoo.com>
libnvim couldn't be easily used in C++ due to the use of reserved keywords.
Additionally, add explicit casts to *alloc function calls used in inline
functions, as C++ doesn't allow implicit casts from void pointers.
drawscreen.c vs screen.c makes absolutely no sense.
The screen exists only to draw upon it, therefore helper functions
are distributed randomly between screen.c and the file that
does the redrawing. In addition screen.c does a lot of drawing on the
screen.
It made more sense for vim/vim as our grid.c is their screen.c
Not sure if we want to dump all the code for option chars into
optionstr.c, so keep these in a optionchar.c for now.