mirror of
https://github.com/neovim/neovim.git
synced 2025-09-06 19:38:20 +00:00
encoding: update documentation
This commit is contained in:
@@ -1029,8 +1029,8 @@ A string constant accepts these special characters:
|
||||
\x. byte specified with one hex number (must be followed by non-hex char)
|
||||
\X.. same as \x..
|
||||
\X. same as \x.
|
||||
\u.... character specified with up to 4 hex numbers, stored according to the
|
||||
current value of 'encoding' (e.g., "\u02a4")
|
||||
\u.... character specified with up to 4 hex numbers, stored as UTF-8
|
||||
(e.g., "\u02a4")
|
||||
\U.... same as \u but allows up to 8 hex numbers.
|
||||
\b backspace <BS>
|
||||
\e escape <Esc>
|
||||
@@ -1045,8 +1045,7 @@ A string constant accepts these special characters:
|
||||
utf-8 character, use \uxxxx as mentioned above.
|
||||
|
||||
Note that "\xff" is stored as the byte 255, which may be invalid in some
|
||||
encodings. Use "\u00ff" to store character 255 according to the current value
|
||||
of 'encoding'.
|
||||
encodings. Use "\u00ff" to store character 255 correctly as UTF-8.
|
||||
|
||||
Note that "\000" and "\x00" force the end of the string.
|
||||
|
||||
@@ -2532,8 +2531,6 @@ byteidxcomp({expr}, {nr}) *byteidxcomp()*
|
||||
< The first and third echo result in 3 ('e' plus composing
|
||||
character is 3 bytes), the second echo results in 1 ('e' is
|
||||
one byte).
|
||||
Only works different from byteidx() when 'encoding' is set to
|
||||
a Unicode encoding.
|
||||
|
||||
call({func}, {arglist} [, {dict}]) *call()* *E699*
|
||||
Call function {func} with the items in |List| {arglist} as
|
||||
@@ -2568,11 +2565,11 @@ char2nr({expr}[, {utf8}]) *char2nr()*
|
||||
Return number value of the first char in {expr}. Examples: >
|
||||
char2nr(" ") returns 32
|
||||
char2nr("ABC") returns 65
|
||||
< When {utf8} is omitted or zero, the current 'encoding' is used.
|
||||
Example for "utf-8": >
|
||||
char2nr("á") returns 225
|
||||
char2nr("á"[0]) returns 195
|
||||
< With {utf8} set to 1, always treat as utf-8 characters.
|
||||
< Non-ASCII characters are always treated as UTF-8 characters.
|
||||
{utf8} has no effect, and exists only for
|
||||
backwards-compatibility.
|
||||
A combining character is a separate character.
|
||||
|nr2char()| does the opposite.
|
||||
|
||||
@@ -4225,11 +4222,7 @@ iconv({expr}, {from}, {to}) *iconv()*
|
||||
Most conversions require Vim to be compiled with the |+iconv|
|
||||
feature. Otherwise only UTF-8 to latin1 conversion and back
|
||||
can be done.
|
||||
This can be used to display messages with special characters,
|
||||
no matter what 'encoding' is set to. Write the message in
|
||||
UTF-8 and use: >
|
||||
echo iconv(utf8_str, "utf-8", &enc)
|
||||
< Note that Vim uses UTF-8 for all Unicode encodings, conversion
|
||||
Note that Vim uses UTF-8 for all Unicode encodings, conversion
|
||||
from/to UCS-2 is automatically changed to use UTF-8. You
|
||||
cannot use UCS-2 in a string anyway, because of the NUL bytes.
|
||||
{only available when compiled with the |+multi_byte| feature}
|
||||
@@ -4513,9 +4506,7 @@ join({list} [, {sep}]) *join()*
|
||||
json_decode({expr}) *json_decode()*
|
||||
Convert {expr} from JSON object. Accepts |readfile()|-style
|
||||
list as the input, as well as regular string. May output any
|
||||
Vim value. When 'encoding' is not UTF-8 string is converted
|
||||
from UTF-8 to 'encoding', failing conversion fails
|
||||
json_decode(). In the following cases it will output
|
||||
Vim value. In the following cases it will output
|
||||
|msgpack-special-dict|:
|
||||
1. Dictionary contains duplicate key.
|
||||
2. Dictionary contains empty key.
|
||||
@@ -4523,33 +4514,22 @@ json_decode({expr}) *json_decode()*
|
||||
dictionary and for string will be emitted in case string
|
||||
with NUL byte was a dictionary key.
|
||||
|
||||
Note: function treats its input as UTF-8 always regardless of
|
||||
'encoding' value. This is needed because JSON source is
|
||||
supposed to be external (e.g. |readfile()|) and JSON standard
|
||||
allows only a few encodings, of which UTF-8 is recommended and
|
||||
the only one required to be supported. Non-UTF-8 characters
|
||||
are an error.
|
||||
Note: function treats its input as UTF-8 always. The JSON
|
||||
standard allows only a few encodings, of which UTF-8 is
|
||||
recommended and the only one required to be supported.
|
||||
Non-UTF-8 characters are an error.
|
||||
|
||||
json_encode({expr}) *json_encode()*
|
||||
Convert {expr} into a JSON string. Accepts
|
||||
|msgpack-special-dict| as the input. Converts from 'encoding'
|
||||
to UTF-8 when encoding strings. Will not convert |Funcref|s,
|
||||
|msgpack-special-dict| as the input. Will not convert |Funcref|s,
|
||||
mappings with non-string keys (can be created as
|
||||
|msgpack-special-dict|), values with self-referencing
|
||||
containers, strings which contain non-UTF-8 characters,
|
||||
pseudo-UTF-8 strings which contain codepoints reserved for
|
||||
surrogate pairs (such strings are not valid UTF-8 strings).
|
||||
When converting 'encoding' is taken into account, if it is not
|
||||
"utf-8", then conversion is performed before encoding strings.
|
||||
Non-printable characters are converted into "\u1234" escapes
|
||||
or special escapes like "\t", other are dumped as-is.
|
||||
|
||||
Note: all characters above U+0079 are considered non-printable
|
||||
when 'encoding' is not UTF-8. This function always outputs
|
||||
UTF-8 strings as required by the standard thus when 'encoding'
|
||||
is not unicode resulting string will look incorrect if
|
||||
"\u1234" notation is not used.
|
||||
|
||||
keys({dict}) *keys()*
|
||||
Return a |List| with all the keys of {dict}. The |List| is in
|
||||
arbitrary order.
|
||||
@@ -4651,9 +4631,9 @@ line2byte({lnum}) *line2byte()*
|
||||
Return the byte count from the start of the buffer for line
|
||||
{lnum}. This includes the end-of-line character, depending on
|
||||
the 'fileformat' option for the current buffer. The first
|
||||
line returns 1. 'encoding' matters, 'fileencoding' is ignored.
|
||||
This can also be used to get the byte count for the line just
|
||||
below the last line: >
|
||||
line returns 1. UTF-8 encoding is used, 'fileencoding' is
|
||||
ignored. This can also be used to get the byte count for the
|
||||
line just below the last line: >
|
||||
line2byte(line("$") + 1)
|
||||
< This is the buffer size plus one. If 'fileencoding' is empty
|
||||
it is the file size plus one.
|
||||
@@ -5172,10 +5152,10 @@ nr2char({expr}[, {utf8}]) *nr2char()*
|
||||
value {expr}. Examples: >
|
||||
nr2char(64) returns "@"
|
||||
nr2char(32) returns " "
|
||||
< When {utf8} is omitted or zero, the current 'encoding' is used.
|
||||
Example for "utf-8": >
|
||||
< Example for "utf-8": >
|
||||
nr2char(300) returns I with bow character
|
||||
< With {utf8} set to 1, always return utf-8 characters.
|
||||
< UTF-8 encoding is always used, {utf8} option has no effect,
|
||||
and exists only for backwards-compatibility.
|
||||
Note that a NUL character in the file is specified with
|
||||
nr2char(10), because NULs are represented with newline
|
||||
characters. nr2char(0) is a real NUL and terminates the
|
||||
@@ -5417,7 +5397,7 @@ py3eval({expr}) *py3eval()*
|
||||
converted to Vim data structures.
|
||||
Numbers and strings are returned as they are (strings are
|
||||
copied though, Unicode strings are additionally converted to
|
||||
'encoding').
|
||||
UTF-8).
|
||||
Lists are represented as Vim |List| type.
|
||||
Dictionaries are represented as Vim |Dictionary| type with
|
||||
keys converted to strings.
|
||||
@@ -5467,8 +5447,7 @@ readfile({fname} [, {binary} [, {max}]])
|
||||
Otherwise:
|
||||
- CR characters that appear before a NL are removed.
|
||||
- Whether the last line ends in a NL or not does not matter.
|
||||
- When 'encoding' is Unicode any UTF-8 byte order mark is
|
||||
removed from the text.
|
||||
- Any UTF-8 byte order mark is removed from the text.
|
||||
When {max} is given this specifies the maximum number of lines
|
||||
to be read. Useful if you only want to check the first ten
|
||||
lines of a file: >
|
||||
@@ -6621,8 +6600,7 @@ string({expr}) Return {expr} converted to a String. If {expr} is a Number,
|
||||
for infinite and NaN floating-point values representations
|
||||
which use |str2float()|. Strings are also dumped literally,
|
||||
only single quote is escaped, which does not allow using YAML
|
||||
for parsing back binary strings (including text when
|
||||
'encoding' is not UTF-8). |eval()| should always work for
|
||||
for parsing back binary strings. |eval()| should always work for
|
||||
strings and floats though and this is the only official
|
||||
method, use |msgpackdump()| or |json_encode()| if you need to
|
||||
share data with other application.
|
||||
|
@@ -70,29 +70,24 @@ See |mbyte-locale| for details.
|
||||
|
||||
ENCODING
|
||||
|
||||
If your locale works properly, Vim will try to set the 'encoding' option
|
||||
accordingly. If this doesn't work you can overrule its value: >
|
||||
Nvim always uses UTF-8 internally. Thus 'encoding' option is always set
|
||||
to "utf-8" and cannot be changed.
|
||||
|
||||
:set encoding=utf-8
|
||||
All the text that is used inside Vim will be in UTF-8. Not only the text in
|
||||
the buffers, but also in registers, variables, etc.
|
||||
|
||||
See |encoding-values| for a list of acceptable values.
|
||||
|
||||
The result is that all the text that is used inside Vim will be in this
|
||||
encoding. Not only the text in the buffers, but also in registers, variables,
|
||||
etc. 'encoding' is read-only after startup because changing it would make the
|
||||
existing text invalid.
|
||||
|
||||
You can edit files in another encoding than what 'encoding' is set to. Vim
|
||||
You can edit files in different encodings than UTF-8. Nvim
|
||||
will convert the file when you read it and convert it back when you write it.
|
||||
See 'fileencoding', 'fileencodings' and |++enc|.
|
||||
|
||||
|
||||
DISPLAY AND FONTS
|
||||
|
||||
If you are working in a terminal (emulator) you must make sure it accepts the
|
||||
same encoding as which Vim is working with.
|
||||
If you are working in a terminal (emulator) you must make sure it accepts
|
||||
UTF-8, the encoding which Vim is working with. Otherwise only ASCII can
|
||||
be displayed and edited correctly.
|
||||
|
||||
For the GUI you must select fonts that work with the current 'encoding'. This
|
||||
For the GUI you must select fonts that work with UTF-8. This
|
||||
is the difficult part. It depends on the system you are using, the locale and
|
||||
a few other things. See the chapters on fonts: |mbyte-fonts-X11| for
|
||||
X-Windows and |mbyte-fonts-MSwin| for MS-Windows.
|
||||
@@ -216,10 +211,9 @@ You could make a small shell script for this.
|
||||
==============================================================================
|
||||
3. Encoding *mbyte-encoding*
|
||||
|
||||
Vim uses the 'encoding' option to specify how characters are identified and
|
||||
encoded when they are used inside Vim. This applies to all the places where
|
||||
text is used, including buffers (files loaded into memory), registers and
|
||||
variables.
|
||||
In Nvim UTF-8 is always used internally to encode characters.
|
||||
This applies to all the places where text is used, including buffers (files
|
||||
loaded into memory), registers and variables.
|
||||
|
||||
*charset* *codeset*
|
||||
Charset is another name for encoding. There are subtle differences, but these
|
||||
@@ -240,7 +234,7 @@ matter what language is used. Thus you might see the right text even when the
|
||||
encoding was set wrong.
|
||||
|
||||
*encoding-names*
|
||||
Vim can use many different character encodings. There are three major groups:
|
||||
Vim can edit files in different character encodings. There are three major groups:
|
||||
|
||||
1 8bit Single-byte encodings, 256 different characters. Mostly used
|
||||
in USA and Europe. Example: ISO-8859-1 (Latin1). All
|
||||
@@ -255,11 +249,10 @@ u Unicode Universal encoding, can replace all others. ISO 10646.
|
||||
Millions of different characters. Example: UTF-8. The
|
||||
relation between bytes and screen cells is complex.
|
||||
|
||||
Other encodings cannot be used by Vim internally. But files in other
|
||||
Only UTF-8 is used by Vim internally. But files in other
|
||||
encodings can be edited by using conversion, see 'fileencoding'.
|
||||
Note that all encodings must use ASCII for the characters up to 128.
|
||||
|
||||
Supported 'encoding' values are: *encoding-values*
|
||||
Recognized 'fileencoding' values include: *encoding-values*
|
||||
1 latin1 8-bit characters (ISO 8859-1, also used for cp1252)
|
||||
1 iso-8859-n ISO_8859 variant (n = 2 to 15)
|
||||
1 koi8-r Russian
|
||||
@@ -311,11 +304,11 @@ u ucs-4 32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1)
|
||||
u ucs-4le like ucs-4, little endian
|
||||
|
||||
The {name} can be any encoding name that your system supports. It is passed
|
||||
to iconv() to convert between the encoding of the file and the current locale.
|
||||
to iconv() to convert between UTF-8 and the encoding of the file.
|
||||
For MS-Windows "cp{number}" means using codepage {number}.
|
||||
Examples: >
|
||||
:set encoding=8bit-cp1252
|
||||
:set encoding=2byte-cp932
|
||||
:set fileencoding=8bit-cp1252
|
||||
:set fileencoding=2byte-cp932
|
||||
|
||||
The MS-Windows codepage 1252 is very similar to latin1. For practical reasons
|
||||
the same encoding is used and it's called latin1. 'isprint' can be used to
|
||||
@@ -337,8 +330,7 @@ u ucs-2be same as ucs-2 (big endian)
|
||||
u ucs-4be same as ucs-4 (big endian)
|
||||
u utf-32 same as ucs-4
|
||||
u utf-32le same as ucs-4le
|
||||
default stands for the default value of 'encoding', depends on the
|
||||
environment
|
||||
default the encoding of the current locale.
|
||||
|
||||
For the UCS codes the byte order matters. This is tricky, use UTF-8 whenever
|
||||
you can. The default is to use big-endian (most significant byte comes
|
||||
@@ -363,13 +355,12 @@ or when conversion is not possible:
|
||||
CONVERSION *charset-conversion*
|
||||
|
||||
Vim will automatically convert from one to another encoding in several places:
|
||||
- When reading a file and 'fileencoding' is different from 'encoding'
|
||||
- When writing a file and 'fileencoding' is different from 'encoding'
|
||||
- When reading a file and 'fileencoding' is different from "utf-8"
|
||||
- When writing a file and 'fileencoding' is different from "utf-8"
|
||||
- When displaying messages and the encoding used for LC_MESSAGES differs from
|
||||
'encoding' (requires a gettext version that supports this).
|
||||
"utf-8" (requires a gettext version that supports this).
|
||||
- When reading a Vim script where |:scriptencoding| is different from
|
||||
'encoding'.
|
||||
- When reading or writing a |shada| file.
|
||||
"utf-8".
|
||||
Most of these require the |+iconv| feature. Conversion for reading and
|
||||
writing files may also be specified with the 'charconvert' option.
|
||||
|
||||
@@ -408,11 +399,11 @@ Useful utilities for converting the charset:
|
||||
|
||||
|
||||
*mbyte-conversion*
|
||||
When reading and writing files in an encoding different from 'encoding',
|
||||
When reading and writing files in an encoding different from "utf-8",
|
||||
conversion needs to be done. These conversions are supported:
|
||||
- All conversions between Latin-1 (ISO-8859-1), UTF-8, UCS-2 and UCS-4 are
|
||||
handled internally.
|
||||
- For MS-Windows, when 'encoding' is a Unicode encoding, conversion from and
|
||||
- For MS-Windows, conversion from and
|
||||
to any codepage should work.
|
||||
- Conversion specified with 'charconvert'
|
||||
- Conversion with the iconv library, if it is available.
|
||||
@@ -468,8 +459,6 @@ and you will have a working UTF-8 terminal emulator. Try both >
|
||||
with the demo text that comes with ucs-fonts.tar.gz in order to see
|
||||
whether there are any problems with UTF-8 in your xterm.
|
||||
|
||||
For Vim you may need to set 'encoding' to "utf-8".
|
||||
|
||||
==============================================================================
|
||||
5. Fonts on X11 *mbyte-fonts-X11*
|
||||
|
||||
@@ -864,11 +853,11 @@ between two keyboard settings.
|
||||
The value of the 'keymap' option specifies a keymap file to use. The name of
|
||||
this file is one of these two:
|
||||
|
||||
keymap/{keymap}_{encoding}.vim
|
||||
keymap/{keymap}_utf-8.vim
|
||||
keymap/{keymap}.vim
|
||||
|
||||
Here {keymap} is the value of the 'keymap' option and {encoding} of the
|
||||
'encoding' option. The file name with the {encoding} included is tried first.
|
||||
Here {keymap} is the value of the 'keymap' option.
|
||||
The file name with "utf-8" included is tried first.
|
||||
|
||||
'runtimepath' is used to find these files. To see an overview of all
|
||||
available keymap files, use this: >
|
||||
@@ -950,7 +939,7 @@ this is unusual. But you can use various ways to specify the character: >
|
||||
A <char-0141> octal value
|
||||
x <Space> special key name
|
||||
|
||||
The characters are assumed to be encoded for the current value of 'encoding'.
|
||||
The characters are assumed to be encoded in UTF-8.
|
||||
It's possible to use ":scriptencoding" when all characters are given
|
||||
literally. That doesn't work when using the <char-> construct, because the
|
||||
conversion is done on the keymap file, not on the resulting character.
|
||||
@@ -1170,21 +1159,13 @@ Useful commands:
|
||||
message is truncated, use ":messages").
|
||||
- "g8" shows the bytes used in a UTF-8 character, also the composing
|
||||
characters, as hex numbers.
|
||||
- ":set encoding=utf-8 fileencodings=" forces using UTF-8 for all files. The
|
||||
default is to use the current locale for 'encoding' and set 'fileencodings'
|
||||
to automatically detect the encoding of a file.
|
||||
- ":set fileencodings=" forces using UTF-8 for all files. The
|
||||
default is to automatically detect the encoding of a file.
|
||||
|
||||
|
||||
STARTING VIM
|
||||
|
||||
If your current locale is in an utf-8 encoding, Vim will automatically start
|
||||
in utf-8 mode.
|
||||
|
||||
If you are using another locale: >
|
||||
|
||||
set encoding=utf-8
|
||||
|
||||
You might also want to select the font used for the menus. Unfortunately this
|
||||
You might want to select the font used for the menus. Unfortunately this
|
||||
doesn't always work. See the system specific remarks below, and 'langmenu'.
|
||||
|
||||
|
||||
@@ -1245,10 +1226,9 @@ not everybody is able to type a composing character.
|
||||
These options are relevant for editing multi-byte files. Check the help in
|
||||
options.txt for detailed information.
|
||||
|
||||
'encoding' Encoding used for the keyboard and display. It is also the
|
||||
default encoding for files.
|
||||
'encoding' Internal text encoding, always "utf-8".
|
||||
|
||||
'fileencoding' Encoding of a file. When it's different from 'encoding'
|
||||
'fileencoding' Encoding of a file. When it's different from "utf-8"
|
||||
conversion is done when reading or writing the file.
|
||||
|
||||
'fileencodings' List of possible encodings of a file. When opening a file
|
||||
|
@@ -52,7 +52,6 @@ achieve special effects. These options come in three forms:
|
||||
:se[t] all& Set all options to their default value. The values of
|
||||
these options are not changed:
|
||||
'columns'
|
||||
'encoding'
|
||||
'lines'
|
||||
Warning: This may have a lot of side effects.
|
||||
|
||||
@@ -615,7 +614,6 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
global
|
||||
{only available when compiled with the |+multi_byte|
|
||||
feature}
|
||||
Only effective when 'encoding' is "utf-8" or another Unicode encoding.
|
||||
Tells Vim what to do with characters with East Asian Width Class
|
||||
Ambiguous (such as Euro, Registered Sign, Copyright Sign, Greek
|
||||
letters, Cyrillic letters).
|
||||
@@ -668,7 +666,6 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
- Set the 'keymap' option to "arabic"; in Insert mode CTRL-^ toggles
|
||||
between typing English and Arabic key mapping.
|
||||
- Set the 'delcombine' option
|
||||
Note that 'encoding' must be "utf-8" for working with Arabic text.
|
||||
|
||||
Resetting this option will:
|
||||
- Reset the 'rightleft' option.
|
||||
@@ -1078,8 +1075,7 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
{not available when compiled without the |+linebreak|
|
||||
feature}
|
||||
This option lets you choose which characters might cause a line
|
||||
break if 'linebreak' is on. Only works for ASCII and also for 8-bit
|
||||
characters when 'encoding' is an 8-bit encoding.
|
||||
break if 'linebreak' is on. Only works for ASCII characters.
|
||||
|
||||
*'breakindent'* *'bri'*
|
||||
'breakindent' 'bri' boolean (default off)
|
||||
@@ -1214,11 +1210,9 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
Specifies details about changing the case of letters. It may contain
|
||||
these words, separated by a comma:
|
||||
internal Use internal case mapping functions, the current
|
||||
locale does not change the case mapping. This only
|
||||
matters when 'encoding' is a Unicode encoding,
|
||||
"latin1" or "iso-8859-15". When "internal" is
|
||||
omitted, the towupper() and towlower() system library
|
||||
functions are used when available.
|
||||
locale does not change the case mapping. When
|
||||
"internal" is omitted, the towupper() and towlower()
|
||||
system library functions are used when available.
|
||||
keepascii For the ASCII characters (0x00 to 0x7f) use the US
|
||||
case mapping, the current locale is not effective.
|
||||
This probably only matters for Turkish.
|
||||
@@ -1271,13 +1265,12 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
file to convert from. You will have to save the text in a file first.
|
||||
The expression must return zero or an empty string for success,
|
||||
non-zero for failure.
|
||||
The possible encoding names encountered are in 'encoding'.
|
||||
See |encoding-names| for possible encoding names.
|
||||
Additionally, names given in 'fileencodings' and 'fileencoding' are
|
||||
used.
|
||||
Conversion between "latin1", "unicode", "ucs-2", "ucs-4" and "utf-8"
|
||||
is done internally by Vim, 'charconvert' is not used for this.
|
||||
'charconvert' is also used to convert the shada file, if 'encoding' is
|
||||
not "utf-8". Also used for Unicode conversion.
|
||||
Also used for Unicode conversion.
|
||||
Example: >
|
||||
set charconvert=CharConvert()
|
||||
fun CharConvert()
|
||||
@@ -1292,8 +1285,6 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
v:fname_in name of the input file
|
||||
v:fname_out name of the output file
|
||||
Note that v:fname_in and v:fname_out will never be the same.
|
||||
Note that v:charconvert_from and v:charconvert_to may be different
|
||||
from 'encoding'. Vim internally uses UTF-8 instead of UCS-2 or UCS-4.
|
||||
This option cannot be set from a |modeline| or in the |sandbox|, for
|
||||
security reasons.
|
||||
|
||||
@@ -2140,44 +2131,14 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
|
||||
|
||||
*'encoding'* *'enc'* *E543*
|
||||
'encoding' 'enc' string (default: "utf-8")
|
||||
global
|
||||
{only available when compiled with the |+multi_byte|
|
||||
feature}
|
||||
Sets the character encoding used inside Vim. It applies to text in
|
||||
the buffers, registers, Strings in expressions, text stored in the
|
||||
shada file, etc. It sets the kind of characters which Vim can work
|
||||
with. See |encoding-names| for the possible values.
|
||||
'encoding' 'enc' Removed. |vim-differences| {Nvim}
|
||||
Nvim always uses UTF-8 internally. RPC communication
|
||||
(remote plugins/GUIs) must use UTF-8 strings.
|
||||
|
||||
'encoding' cannot be changed after startup, because (1) it causes
|
||||
non-ASCII text inside Vim to become invalid, and (2) it complicates
|
||||
runtime logic. The recommended 'encoding' is "utf-8". Remote plugins
|
||||
and GUIs only support utf-8. See |multibyte|.
|
||||
|
||||
The character encoding of files can be different from 'encoding'.
|
||||
The character encoding of files can be different than UTF-8.
|
||||
This is specified with 'fileencoding'. The conversion is done with
|
||||
iconv() or as specified with 'charconvert'.
|
||||
|
||||
If you need to know whether 'encoding' is a multi-byte encoding, you
|
||||
can use: >
|
||||
if has("multi_byte_encoding")
|
||||
<
|
||||
When you set this option, it fires the |EncodingChanged| autocommand
|
||||
event so that you can set up fonts if necessary.
|
||||
|
||||
When the option is set, the value is converted to lowercase. Thus
|
||||
you can set it with uppercase values too. Underscores are translated
|
||||
to '-' signs.
|
||||
When the encoding is recognized, it is changed to the standard name.
|
||||
For example "Latin-1" becomes "latin1", "ISO_88592" becomes
|
||||
"iso-8859-2" and "utf8" becomes "utf-8".
|
||||
|
||||
When "unicode", "ucs-2" or "ucs-4" is used, Vim internally uses utf-8.
|
||||
You don't notice this while editing, but it does matter for the
|
||||
|shada-file|. And Vim expects the terminal to use utf-8 too. Thus
|
||||
setting 'encoding' to one of these values instead of utf-8 only has
|
||||
effect for encoding used for files when 'fileencoding' is empty.
|
||||
|
||||
*'endofline'* *'eol'* *'noendofline'* *'noeol'*
|
||||
'endofline' 'eol' boolean (default on)
|
||||
local to buffer
|
||||
@@ -2304,20 +2265,14 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
feature}
|
||||
Sets the character encoding for the file of this buffer.
|
||||
|
||||
When 'fileencoding' is different from 'encoding', conversion will be
|
||||
When 'fileencoding' is different from "utf-8", conversion will be
|
||||
done when writing the file. For reading see below.
|
||||
When 'fileencoding' is empty, the same value as 'encoding' will be
|
||||
used (no conversion when reading or writing a file).
|
||||
Conversion will also be done when 'encoding' and 'fileencoding' are
|
||||
both a Unicode encoding and 'fileencoding' is not utf-8. That's
|
||||
because internally Unicode is always stored as utf-8.
|
||||
WARNING: Conversion can cause loss of information! When
|
||||
'encoding' is "utf-8" or another Unicode encoding, conversion
|
||||
is most likely done in a way that the reverse conversion
|
||||
results in the same text. When 'encoding' is not "utf-8" some
|
||||
characters may be lost!
|
||||
When 'fileencoding' is empty, the file will be saved with utf-8
|
||||
encoding. (no conversion when reading or writing a file).
|
||||
WARNING: Conversion to a non-Unicode encoding can cause loss of
|
||||
information!
|
||||
|
||||
See 'encoding' for the possible values. Additionally, values may be
|
||||
See |encoding-names| for the possible values. Additionally, values may be
|
||||
specified that can be handled by the converter, see
|
||||
|mbyte-conversion|.
|
||||
|
||||
@@ -2330,8 +2285,8 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
Prepending "8bit-" and "2byte-" has no meaning here, they are ignored.
|
||||
When the option is set, the value is converted to lowercase. Thus
|
||||
you can set it with uppercase values too. '_' characters are
|
||||
replaced with '-'. If a name is recognized from the list for
|
||||
'encoding', it is replaced by the standard name. For example
|
||||
replaced with '-'. If a name is recognized from the list at
|
||||
|encoding-names|, it is replaced by the standard name. For example
|
||||
"ISO8859-2" becomes "iso-8859-2".
|
||||
|
||||
When this option is set, after starting to edit a file, the 'modified'
|
||||
@@ -2354,12 +2309,8 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
mentioned character encoding. If an error is detected, the next one
|
||||
in the list is tried. When an encoding is found that works,
|
||||
'fileencoding' is set to it. If all fail, 'fileencoding' is set to
|
||||
an empty string, which means the value of 'encoding' is used.
|
||||
WARNING: Conversion can cause loss of information! When
|
||||
'encoding' is "utf-8" (or one of the other Unicode variants)
|
||||
conversion is most likely done in a way that the reverse
|
||||
conversion results in the same text. When 'encoding' is not
|
||||
"utf-8" some non-ASCII characters may be lost! You can use
|
||||
an empty string, which means that UTF-8 is used.
|
||||
WARNING: Conversion can cause loss of information! You can use
|
||||
the |++bad| argument to specify what is done with characters
|
||||
that can't be converted.
|
||||
For an empty file or a file with only ASCII characters most encodings
|
||||
@@ -2385,11 +2336,11 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
because Vim cannot detect an error, thus the encoding is always
|
||||
accepted.
|
||||
The special value "default" can be used for the encoding from the
|
||||
environment. It is useful when 'encoding' is set to "utf-8" and
|
||||
your environment uses a non-latin1 encoding, such as Russian.
|
||||
When 'encoding' is "utf-8" and a file contains an illegal byte
|
||||
sequence it won't be recognized as UTF-8. You can use the |8g8|
|
||||
command to find the illegal byte sequence.
|
||||
environment. It is useful when your environment uses a non-latin1
|
||||
encoding, such as Russian.
|
||||
When a file contains an illegal UTF-8 byte sequence it won't be
|
||||
recognized as "utf-8". You can use the |8g8| command to find the
|
||||
illegal byte sequence.
|
||||
WRONG VALUES: WHAT'S WRONG:
|
||||
latin1,utf-8 "latin1" will always be used
|
||||
utf-8,ucs-bom,latin1 BOM won't be recognized in an utf-8
|
||||
@@ -3048,8 +2999,7 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
Note: The size of these fonts must be exactly twice as wide as the one
|
||||
specified with 'guifont' and the same height.
|
||||
|
||||
'guifontwide' is only used when 'encoding' is set to "utf-8" and
|
||||
'guifontset' is empty or invalid.
|
||||
'guifontwide' is only used when 'guifontset' is empty or invalid.
|
||||
When 'guifont' is set and a valid font is found in it and
|
||||
'guifontwide' is empty Vim will attempt to find a matching
|
||||
double-width font and set 'guifontwide' to it.
|
||||
@@ -3702,7 +3652,7 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
128 - 159 "~@" - "~_"
|
||||
160 - 254 "| " - "|~"
|
||||
255 "~?"
|
||||
When 'encoding' is a Unicode one, illegal bytes from 128 to 255 are
|
||||
Illegal bytes from 128 to 255 (invalid UTF-8) are
|
||||
displayed as <xx>, with the hexadecimal value of the byte.
|
||||
When 'display' contains "uhex" all unprintable characters are
|
||||
displayed as <xx>.
|
||||
@@ -3980,8 +3930,7 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
omitted.
|
||||
|
||||
The characters ':' and ',' should not be used. UTF-8 characters can
|
||||
be used when 'encoding' is "utf-8", otherwise only printable
|
||||
characters are allowed. All characters must be single width.
|
||||
be used. All characters must be single width.
|
||||
|
||||
Examples: >
|
||||
:set lcs=tab:>-,trail:-
|
||||
@@ -4078,7 +4027,6 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
{only available when compiled with the |+multi_byte|
|
||||
feature}
|
||||
The maximum number of combining characters supported for displaying.
|
||||
Only used when 'encoding' is "utf-8".
|
||||
The default is OK for most languages. Hebrew may require 4.
|
||||
Maximum value is 6.
|
||||
Even when this option is set to 2 you can still edit text with more
|
||||
@@ -5825,9 +5773,6 @@ A jump table for the options with a short description can be found at |Q_op|.
|
||||
(_xx is an underscore, two letters and followed by a non-letter).
|
||||
This is mainly for testing purposes. You must make sure the correct
|
||||
encoding is used, Vim doesn't check it.
|
||||
When 'encoding' is set the word lists are reloaded. Thus it's a good
|
||||
idea to set 'spelllang' after setting 'encoding' to avoid loading the
|
||||
files twice.
|
||||
How the related spell files are found is explained here: |spell-load|.
|
||||
|
||||
If the |spellfile.vim| plugin is active and you use a language name
|
||||
|
@@ -40,7 +40,6 @@ these differences.
|
||||
- 'complete' doesn't include "i"
|
||||
- 'directory' defaults to ~/.local/share/nvim/swap// (|xdg|), auto-created
|
||||
- 'display' defaults to "lastline"
|
||||
- 'encoding' defaults to "utf-8"
|
||||
- 'formatoptions' defaults to "tcqj"
|
||||
- 'history' defaults to 10000 (the maximum)
|
||||
- 'hlsearch' is set by default
|
||||
@@ -159,7 +158,7 @@ are always available and may be used simultaneously in separate plugins. The
|
||||
'p')) mkdir() will silently exit. In Vim this was an error.
|
||||
3. mkdir() error messages now include strerror() text when mkdir fails.
|
||||
|
||||
'encoding' cannot be changed after startup.
|
||||
'encoding' is always "utf-8".
|
||||
|
||||
|string()| and |:echo| behaviour changed:
|
||||
1. No maximum recursion depth limit is applied to nested container
|
||||
@@ -266,6 +265,7 @@ Highlight groups:
|
||||
Other options:
|
||||
'antialias'
|
||||
'cpoptions' ("g", "w", "H", "*", "-", "j", and all POSIX flags were removed)
|
||||
'encoding' ("utf-8" is always used)
|
||||
'guioptions' "t" flag was removed
|
||||
*'guipty'* (Nvim uses pipes and PTYs consistently on all platforms.)
|
||||
*'imactivatefunc'* *'imaf'*
|
||||
|
Reference in New Issue
Block a user