mirror of
https://github.com/neovim/neovim.git
synced 2025-09-08 20:38:18 +00:00
encoding: update documentation
This commit is contained in:
@@ -1029,8 +1029,8 @@ A string constant accepts these special characters:
|
|||||||
\x. byte specified with one hex number (must be followed by non-hex char)
|
\x. byte specified with one hex number (must be followed by non-hex char)
|
||||||
\X.. same as \x..
|
\X.. same as \x..
|
||||||
\X. same as \x.
|
\X. same as \x.
|
||||||
\u.... character specified with up to 4 hex numbers, stored according to the
|
\u.... character specified with up to 4 hex numbers, stored as UTF-8
|
||||||
current value of 'encoding' (e.g., "\u02a4")
|
(e.g., "\u02a4")
|
||||||
\U.... same as \u but allows up to 8 hex numbers.
|
\U.... same as \u but allows up to 8 hex numbers.
|
||||||
\b backspace <BS>
|
\b backspace <BS>
|
||||||
\e escape <Esc>
|
\e escape <Esc>
|
||||||
@@ -1045,8 +1045,7 @@ A string constant accepts these special characters:
|
|||||||
utf-8 character, use \uxxxx as mentioned above.
|
utf-8 character, use \uxxxx as mentioned above.
|
||||||
|
|
||||||
Note that "\xff" is stored as the byte 255, which may be invalid in some
|
Note that "\xff" is stored as the byte 255, which may be invalid in some
|
||||||
encodings. Use "\u00ff" to store character 255 according to the current value
|
encodings. Use "\u00ff" to store character 255 correctly as UTF-8.
|
||||||
of 'encoding'.
|
|
||||||
|
|
||||||
Note that "\000" and "\x00" force the end of the string.
|
Note that "\000" and "\x00" force the end of the string.
|
||||||
|
|
||||||
@@ -2532,8 +2531,6 @@ byteidxcomp({expr}, {nr}) *byteidxcomp()*
|
|||||||
< The first and third echo result in 3 ('e' plus composing
|
< The first and third echo result in 3 ('e' plus composing
|
||||||
character is 3 bytes), the second echo results in 1 ('e' is
|
character is 3 bytes), the second echo results in 1 ('e' is
|
||||||
one byte).
|
one byte).
|
||||||
Only works different from byteidx() when 'encoding' is set to
|
|
||||||
a Unicode encoding.
|
|
||||||
|
|
||||||
call({func}, {arglist} [, {dict}]) *call()* *E699*
|
call({func}, {arglist} [, {dict}]) *call()* *E699*
|
||||||
Call function {func} with the items in |List| {arglist} as
|
Call function {func} with the items in |List| {arglist} as
|
||||||
@@ -2568,11 +2565,11 @@ char2nr({expr}[, {utf8}]) *char2nr()*
|
|||||||
Return number value of the first char in {expr}. Examples: >
|
Return number value of the first char in {expr}. Examples: >
|
||||||
char2nr(" ") returns 32
|
char2nr(" ") returns 32
|
||||||
char2nr("ABC") returns 65
|
char2nr("ABC") returns 65
|
||||||
< When {utf8} is omitted or zero, the current 'encoding' is used.
|
|
||||||
Example for "utf-8": >
|
|
||||||
char2nr("á") returns 225
|
char2nr("á") returns 225
|
||||||
char2nr("á"[0]) returns 195
|
char2nr("á"[0]) returns 195
|
||||||
< With {utf8} set to 1, always treat as utf-8 characters.
|
< Non-ASCII characters are always treated as UTF-8 characters.
|
||||||
|
{utf8} has no effect, and exists only for
|
||||||
|
backwards-compatibility.
|
||||||
A combining character is a separate character.
|
A combining character is a separate character.
|
||||||
|nr2char()| does the opposite.
|
|nr2char()| does the opposite.
|
||||||
|
|
||||||
@@ -4225,11 +4222,7 @@ iconv({expr}, {from}, {to}) *iconv()*
|
|||||||
Most conversions require Vim to be compiled with the |+iconv|
|
Most conversions require Vim to be compiled with the |+iconv|
|
||||||
feature. Otherwise only UTF-8 to latin1 conversion and back
|
feature. Otherwise only UTF-8 to latin1 conversion and back
|
||||||
can be done.
|
can be done.
|
||||||
This can be used to display messages with special characters,
|
Note that Vim uses UTF-8 for all Unicode encodings, conversion
|
||||||
no matter what 'encoding' is set to. Write the message in
|
|
||||||
UTF-8 and use: >
|
|
||||||
echo iconv(utf8_str, "utf-8", &enc)
|
|
||||||
< Note that Vim uses UTF-8 for all Unicode encodings, conversion
|
|
||||||
from/to UCS-2 is automatically changed to use UTF-8. You
|
from/to UCS-2 is automatically changed to use UTF-8. You
|
||||||
cannot use UCS-2 in a string anyway, because of the NUL bytes.
|
cannot use UCS-2 in a string anyway, because of the NUL bytes.
|
||||||
{only available when compiled with the |+multi_byte| feature}
|
{only available when compiled with the |+multi_byte| feature}
|
||||||
@@ -4513,9 +4506,7 @@ join({list} [, {sep}]) *join()*
|
|||||||
json_decode({expr}) *json_decode()*
|
json_decode({expr}) *json_decode()*
|
||||||
Convert {expr} from JSON object. Accepts |readfile()|-style
|
Convert {expr} from JSON object. Accepts |readfile()|-style
|
||||||
list as the input, as well as regular string. May output any
|
list as the input, as well as regular string. May output any
|
||||||
Vim value. When 'encoding' is not UTF-8 string is converted
|
Vim value. In the following cases it will output
|
||||||
from UTF-8 to 'encoding', failing conversion fails
|
|
||||||
json_decode(). In the following cases it will output
|
|
||||||
|msgpack-special-dict|:
|
|msgpack-special-dict|:
|
||||||
1. Dictionary contains duplicate key.
|
1. Dictionary contains duplicate key.
|
||||||
2. Dictionary contains empty key.
|
2. Dictionary contains empty key.
|
||||||
@@ -4523,33 +4514,22 @@ json_decode({expr}) *json_decode()*
|
|||||||
dictionary and for string will be emitted in case string
|
dictionary and for string will be emitted in case string
|
||||||
with NUL byte was a dictionary key.
|
with NUL byte was a dictionary key.
|
||||||
|
|
||||||
Note: function treats its input as UTF-8 always regardless of
|
Note: function treats its input as UTF-8 always. The JSON
|
||||||
'encoding' value. This is needed because JSON source is
|
standard allows only a few encodings, of which UTF-8 is
|
||||||
supposed to be external (e.g. |readfile()|) and JSON standard
|
recommended and the only one required to be supported.
|
||||||
allows only a few encodings, of which UTF-8 is recommended and
|
Non-UTF-8 characters are an error.
|
||||||
the only one required to be supported. Non-UTF-8 characters
|
|
||||||
are an error.
|
|
||||||
|
|
||||||
json_encode({expr}) *json_encode()*
|
json_encode({expr}) *json_encode()*
|
||||||
Convert {expr} into a JSON string. Accepts
|
Convert {expr} into a JSON string. Accepts
|
||||||
|msgpack-special-dict| as the input. Converts from 'encoding'
|
|msgpack-special-dict| as the input. Will not convert |Funcref|s,
|
||||||
to UTF-8 when encoding strings. Will not convert |Funcref|s,
|
|
||||||
mappings with non-string keys (can be created as
|
mappings with non-string keys (can be created as
|
||||||
|msgpack-special-dict|), values with self-referencing
|
|msgpack-special-dict|), values with self-referencing
|
||||||
containers, strings which contain non-UTF-8 characters,
|
containers, strings which contain non-UTF-8 characters,
|
||||||
pseudo-UTF-8 strings which contain codepoints reserved for
|
pseudo-UTF-8 strings which contain codepoints reserved for
|
||||||
surrogate pairs (such strings are not valid UTF-8 strings).
|
surrogate pairs (such strings are not valid UTF-8 strings).
|
||||||
When converting 'encoding' is taken into account, if it is not
|
|
||||||
"utf-8", then conversion is performed before encoding strings.
|
|
||||||
Non-printable characters are converted into "\u1234" escapes
|
Non-printable characters are converted into "\u1234" escapes
|
||||||
or special escapes like "\t", other are dumped as-is.
|
or special escapes like "\t", other are dumped as-is.
|
||||||
|
|
||||||
Note: all characters above U+0079 are considered non-printable
|
|
||||||
when 'encoding' is not UTF-8. This function always outputs
|
|
||||||
UTF-8 strings as required by the standard thus when 'encoding'
|
|
||||||
is not unicode resulting string will look incorrect if
|
|
||||||
"\u1234" notation is not used.
|
|
||||||
|
|
||||||
keys({dict}) *keys()*
|
keys({dict}) *keys()*
|
||||||
Return a |List| with all the keys of {dict}. The |List| is in
|
Return a |List| with all the keys of {dict}. The |List| is in
|
||||||
arbitrary order.
|
arbitrary order.
|
||||||
@@ -4651,9 +4631,9 @@ line2byte({lnum}) *line2byte()*
|
|||||||
Return the byte count from the start of the buffer for line
|
Return the byte count from the start of the buffer for line
|
||||||
{lnum}. This includes the end-of-line character, depending on
|
{lnum}. This includes the end-of-line character, depending on
|
||||||
the 'fileformat' option for the current buffer. The first
|
the 'fileformat' option for the current buffer. The first
|
||||||
line returns 1. 'encoding' matters, 'fileencoding' is ignored.
|
line returns 1. UTF-8 encoding is used, 'fileencoding' is
|
||||||
This can also be used to get the byte count for the line just
|
ignored. This can also be used to get the byte count for the
|
||||||
below the last line: >
|
line just below the last line: >
|
||||||
line2byte(line("$") + 1)
|
line2byte(line("$") + 1)
|
||||||
< This is the buffer size plus one. If 'fileencoding' is empty
|
< This is the buffer size plus one. If 'fileencoding' is empty
|
||||||
it is the file size plus one.
|
it is the file size plus one.
|
||||||
@@ -5172,10 +5152,10 @@ nr2char({expr}[, {utf8}]) *nr2char()*
|
|||||||
value {expr}. Examples: >
|
value {expr}. Examples: >
|
||||||
nr2char(64) returns "@"
|
nr2char(64) returns "@"
|
||||||
nr2char(32) returns " "
|
nr2char(32) returns " "
|
||||||
< When {utf8} is omitted or zero, the current 'encoding' is used.
|
< Example for "utf-8": >
|
||||||
Example for "utf-8": >
|
|
||||||
nr2char(300) returns I with bow character
|
nr2char(300) returns I with bow character
|
||||||
< With {utf8} set to 1, always return utf-8 characters.
|
< UTF-8 encoding is always used, {utf8} option has no effect,
|
||||||
|
and exists only for backwards-compatibility.
|
||||||
Note that a NUL character in the file is specified with
|
Note that a NUL character in the file is specified with
|
||||||
nr2char(10), because NULs are represented with newline
|
nr2char(10), because NULs are represented with newline
|
||||||
characters. nr2char(0) is a real NUL and terminates the
|
characters. nr2char(0) is a real NUL and terminates the
|
||||||
@@ -5417,7 +5397,7 @@ py3eval({expr}) *py3eval()*
|
|||||||
converted to Vim data structures.
|
converted to Vim data structures.
|
||||||
Numbers and strings are returned as they are (strings are
|
Numbers and strings are returned as they are (strings are
|
||||||
copied though, Unicode strings are additionally converted to
|
copied though, Unicode strings are additionally converted to
|
||||||
'encoding').
|
UTF-8).
|
||||||
Lists are represented as Vim |List| type.
|
Lists are represented as Vim |List| type.
|
||||||
Dictionaries are represented as Vim |Dictionary| type with
|
Dictionaries are represented as Vim |Dictionary| type with
|
||||||
keys converted to strings.
|
keys converted to strings.
|
||||||
@@ -5467,8 +5447,7 @@ readfile({fname} [, {binary} [, {max}]])
|
|||||||
Otherwise:
|
Otherwise:
|
||||||
- CR characters that appear before a NL are removed.
|
- CR characters that appear before a NL are removed.
|
||||||
- Whether the last line ends in a NL or not does not matter.
|
- Whether the last line ends in a NL or not does not matter.
|
||||||
- When 'encoding' is Unicode any UTF-8 byte order mark is
|
- Any UTF-8 byte order mark is removed from the text.
|
||||||
removed from the text.
|
|
||||||
When {max} is given this specifies the maximum number of lines
|
When {max} is given this specifies the maximum number of lines
|
||||||
to be read. Useful if you only want to check the first ten
|
to be read. Useful if you only want to check the first ten
|
||||||
lines of a file: >
|
lines of a file: >
|
||||||
@@ -6621,8 +6600,7 @@ string({expr}) Return {expr} converted to a String. If {expr} is a Number,
|
|||||||
for infinite and NaN floating-point values representations
|
for infinite and NaN floating-point values representations
|
||||||
which use |str2float()|. Strings are also dumped literally,
|
which use |str2float()|. Strings are also dumped literally,
|
||||||
only single quote is escaped, which does not allow using YAML
|
only single quote is escaped, which does not allow using YAML
|
||||||
for parsing back binary strings (including text when
|
for parsing back binary strings. |eval()| should always work for
|
||||||
'encoding' is not UTF-8). |eval()| should always work for
|
|
||||||
strings and floats though and this is the only official
|
strings and floats though and this is the only official
|
||||||
method, use |msgpackdump()| or |json_encode()| if you need to
|
method, use |msgpackdump()| or |json_encode()| if you need to
|
||||||
share data with other application.
|
share data with other application.
|
||||||
|
@@ -70,29 +70,24 @@ See |mbyte-locale| for details.
|
|||||||
|
|
||||||
ENCODING
|
ENCODING
|
||||||
|
|
||||||
If your locale works properly, Vim will try to set the 'encoding' option
|
Nvim always uses UTF-8 internally. Thus 'encoding' option is always set
|
||||||
accordingly. If this doesn't work you can overrule its value: >
|
to "utf-8" and cannot be changed.
|
||||||
|
|
||||||
:set encoding=utf-8
|
All the text that is used inside Vim will be in UTF-8. Not only the text in
|
||||||
|
the buffers, but also in registers, variables, etc.
|
||||||
|
|
||||||
See |encoding-values| for a list of acceptable values.
|
You can edit files in different encodings than UTF-8. Nvim
|
||||||
|
|
||||||
The result is that all the text that is used inside Vim will be in this
|
|
||||||
encoding. Not only the text in the buffers, but also in registers, variables,
|
|
||||||
etc. 'encoding' is read-only after startup because changing it would make the
|
|
||||||
existing text invalid.
|
|
||||||
|
|
||||||
You can edit files in another encoding than what 'encoding' is set to. Vim
|
|
||||||
will convert the file when you read it and convert it back when you write it.
|
will convert the file when you read it and convert it back when you write it.
|
||||||
See 'fileencoding', 'fileencodings' and |++enc|.
|
See 'fileencoding', 'fileencodings' and |++enc|.
|
||||||
|
|
||||||
|
|
||||||
DISPLAY AND FONTS
|
DISPLAY AND FONTS
|
||||||
|
|
||||||
If you are working in a terminal (emulator) you must make sure it accepts the
|
If you are working in a terminal (emulator) you must make sure it accepts
|
||||||
same encoding as which Vim is working with.
|
UTF-8, the encoding which Vim is working with. Otherwise only ASCII can
|
||||||
|
be displayed and edited correctly.
|
||||||
|
|
||||||
For the GUI you must select fonts that work with the current 'encoding'. This
|
For the GUI you must select fonts that work with UTF-8. This
|
||||||
is the difficult part. It depends on the system you are using, the locale and
|
is the difficult part. It depends on the system you are using, the locale and
|
||||||
a few other things. See the chapters on fonts: |mbyte-fonts-X11| for
|
a few other things. See the chapters on fonts: |mbyte-fonts-X11| for
|
||||||
X-Windows and |mbyte-fonts-MSwin| for MS-Windows.
|
X-Windows and |mbyte-fonts-MSwin| for MS-Windows.
|
||||||
@@ -216,10 +211,9 @@ You could make a small shell script for this.
|
|||||||
==============================================================================
|
==============================================================================
|
||||||
3. Encoding *mbyte-encoding*
|
3. Encoding *mbyte-encoding*
|
||||||
|
|
||||||
Vim uses the 'encoding' option to specify how characters are identified and
|
In Nvim UTF-8 is always used internally to encode characters.
|
||||||
encoded when they are used inside Vim. This applies to all the places where
|
This applies to all the places where text is used, including buffers (files
|
||||||
text is used, including buffers (files loaded into memory), registers and
|
loaded into memory), registers and variables.
|
||||||
variables.
|
|
||||||
|
|
||||||
*charset* *codeset*
|
*charset* *codeset*
|
||||||
Charset is another name for encoding. There are subtle differences, but these
|
Charset is another name for encoding. There are subtle differences, but these
|
||||||
@@ -240,7 +234,7 @@ matter what language is used. Thus you might see the right text even when the
|
|||||||
encoding was set wrong.
|
encoding was set wrong.
|
||||||
|
|
||||||
*encoding-names*
|
*encoding-names*
|
||||||
Vim can use many different character encodings. There are three major groups:
|
Vim can edit files in different character encodings. There are three major groups:
|
||||||
|
|
||||||
1 8bit Single-byte encodings, 256 different characters. Mostly used
|
1 8bit Single-byte encodings, 256 different characters. Mostly used
|
||||||
in USA and Europe. Example: ISO-8859-1 (Latin1). All
|
in USA and Europe. Example: ISO-8859-1 (Latin1). All
|
||||||
@@ -255,11 +249,10 @@ u Unicode Universal encoding, can replace all others. ISO 10646.
|
|||||||
Millions of different characters. Example: UTF-8. The
|
Millions of different characters. Example: UTF-8. The
|
||||||
relation between bytes and screen cells is complex.
|
relation between bytes and screen cells is complex.
|
||||||
|
|
||||||
Other encodings cannot be used by Vim internally. But files in other
|
Only UTF-8 is used by Vim internally. But files in other
|
||||||
encodings can be edited by using conversion, see 'fileencoding'.
|
encodings can be edited by using conversion, see 'fileencoding'.
|
||||||
Note that all encodings must use ASCII for the characters up to 128.
|
|
||||||
|
|
||||||
Supported 'encoding' values are: *encoding-values*
|
Recognized 'fileencoding' values include: *encoding-values*
|
||||||
1 latin1 8-bit characters (ISO 8859-1, also used for cp1252)
|
1 latin1 8-bit characters (ISO 8859-1, also used for cp1252)
|
||||||
1 iso-8859-n ISO_8859 variant (n = 2 to 15)
|
1 iso-8859-n ISO_8859 variant (n = 2 to 15)
|
||||||
1 koi8-r Russian
|
1 koi8-r Russian
|
||||||
@@ -311,11 +304,11 @@ u ucs-4 32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1)
|
|||||||
u ucs-4le like ucs-4, little endian
|
u ucs-4le like ucs-4, little endian
|
||||||
|
|
||||||
The {name} can be any encoding name that your system supports. It is passed
|
The {name} can be any encoding name that your system supports. It is passed
|
||||||
to iconv() to convert between the encoding of the file and the current locale.
|
to iconv() to convert between UTF-8 and the encoding of the file.
|
||||||
For MS-Windows "cp{number}" means using codepage {number}.
|
For MS-Windows "cp{number}" means using codepage {number}.
|
||||||
Examples: >
|
Examples: >
|
||||||
:set encoding=8bit-cp1252
|
:set fileencoding=8bit-cp1252
|
||||||
:set encoding=2byte-cp932
|
:set fileencoding=2byte-cp932
|
||||||
|
|
||||||
The MS-Windows codepage 1252 is very similar to latin1. For practical reasons
|
The MS-Windows codepage 1252 is very similar to latin1. For practical reasons
|
||||||
the same encoding is used and it's called latin1. 'isprint' can be used to
|
the same encoding is used and it's called latin1. 'isprint' can be used to
|
||||||
@@ -337,8 +330,7 @@ u ucs-2be same as ucs-2 (big endian)
|
|||||||
u ucs-4be same as ucs-4 (big endian)
|
u ucs-4be same as ucs-4 (big endian)
|
||||||
u utf-32 same as ucs-4
|
u utf-32 same as ucs-4
|
||||||
u utf-32le same as ucs-4le
|
u utf-32le same as ucs-4le
|
||||||
default stands for the default value of 'encoding', depends on the
|
default the encoding of the current locale.
|
||||||
environment
|
|
||||||
|
|
||||||
For the UCS codes the byte order matters. This is tricky, use UTF-8 whenever
|
For the UCS codes the byte order matters. This is tricky, use UTF-8 whenever
|
||||||
you can. The default is to use big-endian (most significant byte comes
|
you can. The default is to use big-endian (most significant byte comes
|
||||||
@@ -363,13 +355,12 @@ or when conversion is not possible:
|
|||||||
CONVERSION *charset-conversion*
|
CONVERSION *charset-conversion*
|
||||||
|
|
||||||
Vim will automatically convert from one to another encoding in several places:
|
Vim will automatically convert from one to another encoding in several places:
|
||||||
- When reading a file and 'fileencoding' is different from 'encoding'
|
- When reading a file and 'fileencoding' is different from "utf-8"
|
||||||
- When writing a file and 'fileencoding' is different from 'encoding'
|
- When writing a file and 'fileencoding' is different from "utf-8"
|
||||||
- When displaying messages and the encoding used for LC_MESSAGES differs from
|
- When displaying messages and the encoding used for LC_MESSAGES differs from
|
||||||
'encoding' (requires a gettext version that supports this).
|
"utf-8" (requires a gettext version that supports this).
|
||||||
- When reading a Vim script where |:scriptencoding| is different from
|
- When reading a Vim script where |:scriptencoding| is different from
|
||||||
'encoding'.
|
"utf-8".
|
||||||
- When reading or writing a |shada| file.
|
|
||||||
Most of these require the |+iconv| feature. Conversion for reading and
|
Most of these require the |+iconv| feature. Conversion for reading and
|
||||||
writing files may also be specified with the 'charconvert' option.
|
writing files may also be specified with the 'charconvert' option.
|
||||||
|
|
||||||
@@ -408,11 +399,11 @@ Useful utilities for converting the charset:
|
|||||||
|
|
||||||
|
|
||||||
*mbyte-conversion*
|
*mbyte-conversion*
|
||||||
When reading and writing files in an encoding different from 'encoding',
|
When reading and writing files in an encoding different from "utf-8",
|
||||||
conversion needs to be done. These conversions are supported:
|
conversion needs to be done. These conversions are supported:
|
||||||
- All conversions between Latin-1 (ISO-8859-1), UTF-8, UCS-2 and UCS-4 are
|
- All conversions between Latin-1 (ISO-8859-1), UTF-8, UCS-2 and UCS-4 are
|
||||||
handled internally.
|
handled internally.
|
||||||
- For MS-Windows, when 'encoding' is a Unicode encoding, conversion from and
|
- For MS-Windows, conversion from and
|
||||||
to any codepage should work.
|
to any codepage should work.
|
||||||
- Conversion specified with 'charconvert'
|
- Conversion specified with 'charconvert'
|
||||||
- Conversion with the iconv library, if it is available.
|
- Conversion with the iconv library, if it is available.
|
||||||
@@ -468,8 +459,6 @@ and you will have a working UTF-8 terminal emulator. Try both >
|
|||||||
with the demo text that comes with ucs-fonts.tar.gz in order to see
|
with the demo text that comes with ucs-fonts.tar.gz in order to see
|
||||||
whether there are any problems with UTF-8 in your xterm.
|
whether there are any problems with UTF-8 in your xterm.
|
||||||
|
|
||||||
For Vim you may need to set 'encoding' to "utf-8".
|
|
||||||
|
|
||||||
==============================================================================
|
==============================================================================
|
||||||
5. Fonts on X11 *mbyte-fonts-X11*
|
5. Fonts on X11 *mbyte-fonts-X11*
|
||||||
|
|
||||||
@@ -864,11 +853,11 @@ between two keyboard settings.
|
|||||||
The value of the 'keymap' option specifies a keymap file to use. The name of
|
The value of the 'keymap' option specifies a keymap file to use. The name of
|
||||||
this file is one of these two:
|
this file is one of these two:
|
||||||
|
|
||||||
keymap/{keymap}_{encoding}.vim
|
keymap/{keymap}_utf-8.vim
|
||||||
keymap/{keymap}.vim
|
keymap/{keymap}.vim
|
||||||
|
|
||||||
Here {keymap} is the value of the 'keymap' option and {encoding} of the
|
Here {keymap} is the value of the 'keymap' option.
|
||||||
'encoding' option. The file name with the {encoding} included is tried first.
|
The file name with "utf-8" included is tried first.
|
||||||
|
|
||||||
'runtimepath' is used to find these files. To see an overview of all
|
'runtimepath' is used to find these files. To see an overview of all
|
||||||
available keymap files, use this: >
|
available keymap files, use this: >
|
||||||
@@ -950,7 +939,7 @@ this is unusual. But you can use various ways to specify the character: >
|
|||||||
A <char-0141> octal value
|
A <char-0141> octal value
|
||||||
x <Space> special key name
|
x <Space> special key name
|
||||||
|
|
||||||
The characters are assumed to be encoded for the current value of 'encoding'.
|
The characters are assumed to be encoded in UTF-8.
|
||||||
It's possible to use ":scriptencoding" when all characters are given
|
It's possible to use ":scriptencoding" when all characters are given
|
||||||
literally. That doesn't work when using the <char-> construct, because the
|
literally. That doesn't work when using the <char-> construct, because the
|
||||||
conversion is done on the keymap file, not on the resulting character.
|
conversion is done on the keymap file, not on the resulting character.
|
||||||
@@ -1170,21 +1159,13 @@ Useful commands:
|
|||||||
message is truncated, use ":messages").
|
message is truncated, use ":messages").
|
||||||
- "g8" shows the bytes used in a UTF-8 character, also the composing
|
- "g8" shows the bytes used in a UTF-8 character, also the composing
|
||||||
characters, as hex numbers.
|
characters, as hex numbers.
|
||||||
- ":set encoding=utf-8 fileencodings=" forces using UTF-8 for all files. The
|
- ":set fileencodings=" forces using UTF-8 for all files. The
|
||||||
default is to use the current locale for 'encoding' and set 'fileencodings'
|
default is to automatically detect the encoding of a file.
|
||||||
to automatically detect the encoding of a file.
|
|
||||||
|
|
||||||
|
|
||||||
STARTING VIM
|
STARTING VIM
|
||||||
|
|
||||||
If your current locale is in an utf-8 encoding, Vim will automatically start
|
You might want to select the font used for the menus. Unfortunately this
|
||||||
in utf-8 mode.
|
|
||||||
|
|
||||||
If you are using another locale: >
|
|
||||||
|
|
||||||
set encoding=utf-8
|
|
||||||
|
|
||||||
You might also want to select the font used for the menus. Unfortunately this
|
|
||||||
doesn't always work. See the system specific remarks below, and 'langmenu'.
|
doesn't always work. See the system specific remarks below, and 'langmenu'.
|
||||||
|
|
||||||
|
|
||||||
@@ -1245,10 +1226,9 @@ not everybody is able to type a composing character.
|
|||||||
These options are relevant for editing multi-byte files. Check the help in
|
These options are relevant for editing multi-byte files. Check the help in
|
||||||
options.txt for detailed information.
|
options.txt for detailed information.
|
||||||
|
|
||||||
'encoding' Encoding used for the keyboard and display. It is also the
|
'encoding' Internal text encoding, always "utf-8".
|
||||||
default encoding for files.
|
|
||||||
|
|
||||||
'fileencoding' Encoding of a file. When it's different from 'encoding'
|
'fileencoding' Encoding of a file. When it's different from "utf-8"
|
||||||
conversion is done when reading or writing the file.
|
conversion is done when reading or writing the file.
|
||||||
|
|
||||||
'fileencodings' List of possible encodings of a file. When opening a file
|
'fileencodings' List of possible encodings of a file. When opening a file
|
||||||
|
@@ -52,7 +52,6 @@ achieve special effects. These options come in three forms:
|
|||||||
:se[t] all& Set all options to their default value. The values of
|
:se[t] all& Set all options to their default value. The values of
|
||||||
these options are not changed:
|
these options are not changed:
|
||||||
'columns'
|
'columns'
|
||||||
'encoding'
|
|
||||||
'lines'
|
'lines'
|
||||||
Warning: This may have a lot of side effects.
|
Warning: This may have a lot of side effects.
|
||||||
|
|
||||||
@@ -615,7 +614,6 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
global
|
global
|
||||||
{only available when compiled with the |+multi_byte|
|
{only available when compiled with the |+multi_byte|
|
||||||
feature}
|
feature}
|
||||||
Only effective when 'encoding' is "utf-8" or another Unicode encoding.
|
|
||||||
Tells Vim what to do with characters with East Asian Width Class
|
Tells Vim what to do with characters with East Asian Width Class
|
||||||
Ambiguous (such as Euro, Registered Sign, Copyright Sign, Greek
|
Ambiguous (such as Euro, Registered Sign, Copyright Sign, Greek
|
||||||
letters, Cyrillic letters).
|
letters, Cyrillic letters).
|
||||||
@@ -668,7 +666,6 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
- Set the 'keymap' option to "arabic"; in Insert mode CTRL-^ toggles
|
- Set the 'keymap' option to "arabic"; in Insert mode CTRL-^ toggles
|
||||||
between typing English and Arabic key mapping.
|
between typing English and Arabic key mapping.
|
||||||
- Set the 'delcombine' option
|
- Set the 'delcombine' option
|
||||||
Note that 'encoding' must be "utf-8" for working with Arabic text.
|
|
||||||
|
|
||||||
Resetting this option will:
|
Resetting this option will:
|
||||||
- Reset the 'rightleft' option.
|
- Reset the 'rightleft' option.
|
||||||
@@ -1078,8 +1075,7 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
{not available when compiled without the |+linebreak|
|
{not available when compiled without the |+linebreak|
|
||||||
feature}
|
feature}
|
||||||
This option lets you choose which characters might cause a line
|
This option lets you choose which characters might cause a line
|
||||||
break if 'linebreak' is on. Only works for ASCII and also for 8-bit
|
break if 'linebreak' is on. Only works for ASCII characters.
|
||||||
characters when 'encoding' is an 8-bit encoding.
|
|
||||||
|
|
||||||
*'breakindent'* *'bri'*
|
*'breakindent'* *'bri'*
|
||||||
'breakindent' 'bri' boolean (default off)
|
'breakindent' 'bri' boolean (default off)
|
||||||
@@ -1214,11 +1210,9 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
Specifies details about changing the case of letters. It may contain
|
Specifies details about changing the case of letters. It may contain
|
||||||
these words, separated by a comma:
|
these words, separated by a comma:
|
||||||
internal Use internal case mapping functions, the current
|
internal Use internal case mapping functions, the current
|
||||||
locale does not change the case mapping. This only
|
locale does not change the case mapping. When
|
||||||
matters when 'encoding' is a Unicode encoding,
|
"internal" is omitted, the towupper() and towlower()
|
||||||
"latin1" or "iso-8859-15". When "internal" is
|
system library functions are used when available.
|
||||||
omitted, the towupper() and towlower() system library
|
|
||||||
functions are used when available.
|
|
||||||
keepascii For the ASCII characters (0x00 to 0x7f) use the US
|
keepascii For the ASCII characters (0x00 to 0x7f) use the US
|
||||||
case mapping, the current locale is not effective.
|
case mapping, the current locale is not effective.
|
||||||
This probably only matters for Turkish.
|
This probably only matters for Turkish.
|
||||||
@@ -1271,13 +1265,12 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
file to convert from. You will have to save the text in a file first.
|
file to convert from. You will have to save the text in a file first.
|
||||||
The expression must return zero or an empty string for success,
|
The expression must return zero or an empty string for success,
|
||||||
non-zero for failure.
|
non-zero for failure.
|
||||||
The possible encoding names encountered are in 'encoding'.
|
See |encoding-names| for possible encoding names.
|
||||||
Additionally, names given in 'fileencodings' and 'fileencoding' are
|
Additionally, names given in 'fileencodings' and 'fileencoding' are
|
||||||
used.
|
used.
|
||||||
Conversion between "latin1", "unicode", "ucs-2", "ucs-4" and "utf-8"
|
Conversion between "latin1", "unicode", "ucs-2", "ucs-4" and "utf-8"
|
||||||
is done internally by Vim, 'charconvert' is not used for this.
|
is done internally by Vim, 'charconvert' is not used for this.
|
||||||
'charconvert' is also used to convert the shada file, if 'encoding' is
|
Also used for Unicode conversion.
|
||||||
not "utf-8". Also used for Unicode conversion.
|
|
||||||
Example: >
|
Example: >
|
||||||
set charconvert=CharConvert()
|
set charconvert=CharConvert()
|
||||||
fun CharConvert()
|
fun CharConvert()
|
||||||
@@ -1292,8 +1285,6 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
v:fname_in name of the input file
|
v:fname_in name of the input file
|
||||||
v:fname_out name of the output file
|
v:fname_out name of the output file
|
||||||
Note that v:fname_in and v:fname_out will never be the same.
|
Note that v:fname_in and v:fname_out will never be the same.
|
||||||
Note that v:charconvert_from and v:charconvert_to may be different
|
|
||||||
from 'encoding'. Vim internally uses UTF-8 instead of UCS-2 or UCS-4.
|
|
||||||
This option cannot be set from a |modeline| or in the |sandbox|, for
|
This option cannot be set from a |modeline| or in the |sandbox|, for
|
||||||
security reasons.
|
security reasons.
|
||||||
|
|
||||||
@@ -2140,44 +2131,14 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
|
|
||||||
|
|
||||||
*'encoding'* *'enc'* *E543*
|
*'encoding'* *'enc'* *E543*
|
||||||
'encoding' 'enc' string (default: "utf-8")
|
'encoding' 'enc' Removed. |vim-differences| {Nvim}
|
||||||
global
|
Nvim always uses UTF-8 internally. RPC communication
|
||||||
{only available when compiled with the |+multi_byte|
|
(remote plugins/GUIs) must use UTF-8 strings.
|
||||||
feature}
|
|
||||||
Sets the character encoding used inside Vim. It applies to text in
|
|
||||||
the buffers, registers, Strings in expressions, text stored in the
|
|
||||||
shada file, etc. It sets the kind of characters which Vim can work
|
|
||||||
with. See |encoding-names| for the possible values.
|
|
||||||
|
|
||||||
'encoding' cannot be changed after startup, because (1) it causes
|
The character encoding of files can be different than UTF-8.
|
||||||
non-ASCII text inside Vim to become invalid, and (2) it complicates
|
|
||||||
runtime logic. The recommended 'encoding' is "utf-8". Remote plugins
|
|
||||||
and GUIs only support utf-8. See |multibyte|.
|
|
||||||
|
|
||||||
The character encoding of files can be different from 'encoding'.
|
|
||||||
This is specified with 'fileencoding'. The conversion is done with
|
This is specified with 'fileencoding'. The conversion is done with
|
||||||
iconv() or as specified with 'charconvert'.
|
iconv() or as specified with 'charconvert'.
|
||||||
|
|
||||||
If you need to know whether 'encoding' is a multi-byte encoding, you
|
|
||||||
can use: >
|
|
||||||
if has("multi_byte_encoding")
|
|
||||||
<
|
|
||||||
When you set this option, it fires the |EncodingChanged| autocommand
|
|
||||||
event so that you can set up fonts if necessary.
|
|
||||||
|
|
||||||
When the option is set, the value is converted to lowercase. Thus
|
|
||||||
you can set it with uppercase values too. Underscores are translated
|
|
||||||
to '-' signs.
|
|
||||||
When the encoding is recognized, it is changed to the standard name.
|
|
||||||
For example "Latin-1" becomes "latin1", "ISO_88592" becomes
|
|
||||||
"iso-8859-2" and "utf8" becomes "utf-8".
|
|
||||||
|
|
||||||
When "unicode", "ucs-2" or "ucs-4" is used, Vim internally uses utf-8.
|
|
||||||
You don't notice this while editing, but it does matter for the
|
|
||||||
|shada-file|. And Vim expects the terminal to use utf-8 too. Thus
|
|
||||||
setting 'encoding' to one of these values instead of utf-8 only has
|
|
||||||
effect for encoding used for files when 'fileencoding' is empty.
|
|
||||||
|
|
||||||
*'endofline'* *'eol'* *'noendofline'* *'noeol'*
|
*'endofline'* *'eol'* *'noendofline'* *'noeol'*
|
||||||
'endofline' 'eol' boolean (default on)
|
'endofline' 'eol' boolean (default on)
|
||||||
local to buffer
|
local to buffer
|
||||||
@@ -2304,20 +2265,14 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
feature}
|
feature}
|
||||||
Sets the character encoding for the file of this buffer.
|
Sets the character encoding for the file of this buffer.
|
||||||
|
|
||||||
When 'fileencoding' is different from 'encoding', conversion will be
|
When 'fileencoding' is different from "utf-8", conversion will be
|
||||||
done when writing the file. For reading see below.
|
done when writing the file. For reading see below.
|
||||||
When 'fileencoding' is empty, the same value as 'encoding' will be
|
When 'fileencoding' is empty, the file will be saved with utf-8
|
||||||
used (no conversion when reading or writing a file).
|
encoding. (no conversion when reading or writing a file).
|
||||||
Conversion will also be done when 'encoding' and 'fileencoding' are
|
WARNING: Conversion to a non-Unicode encoding can cause loss of
|
||||||
both a Unicode encoding and 'fileencoding' is not utf-8. That's
|
information!
|
||||||
because internally Unicode is always stored as utf-8.
|
|
||||||
WARNING: Conversion can cause loss of information! When
|
|
||||||
'encoding' is "utf-8" or another Unicode encoding, conversion
|
|
||||||
is most likely done in a way that the reverse conversion
|
|
||||||
results in the same text. When 'encoding' is not "utf-8" some
|
|
||||||
characters may be lost!
|
|
||||||
|
|
||||||
See 'encoding' for the possible values. Additionally, values may be
|
See |encoding-names| for the possible values. Additionally, values may be
|
||||||
specified that can be handled by the converter, see
|
specified that can be handled by the converter, see
|
||||||
|mbyte-conversion|.
|
|mbyte-conversion|.
|
||||||
|
|
||||||
@@ -2330,8 +2285,8 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
Prepending "8bit-" and "2byte-" has no meaning here, they are ignored.
|
Prepending "8bit-" and "2byte-" has no meaning here, they are ignored.
|
||||||
When the option is set, the value is converted to lowercase. Thus
|
When the option is set, the value is converted to lowercase. Thus
|
||||||
you can set it with uppercase values too. '_' characters are
|
you can set it with uppercase values too. '_' characters are
|
||||||
replaced with '-'. If a name is recognized from the list for
|
replaced with '-'. If a name is recognized from the list at
|
||||||
'encoding', it is replaced by the standard name. For example
|
|encoding-names|, it is replaced by the standard name. For example
|
||||||
"ISO8859-2" becomes "iso-8859-2".
|
"ISO8859-2" becomes "iso-8859-2".
|
||||||
|
|
||||||
When this option is set, after starting to edit a file, the 'modified'
|
When this option is set, after starting to edit a file, the 'modified'
|
||||||
@@ -2354,12 +2309,8 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
mentioned character encoding. If an error is detected, the next one
|
mentioned character encoding. If an error is detected, the next one
|
||||||
in the list is tried. When an encoding is found that works,
|
in the list is tried. When an encoding is found that works,
|
||||||
'fileencoding' is set to it. If all fail, 'fileencoding' is set to
|
'fileencoding' is set to it. If all fail, 'fileencoding' is set to
|
||||||
an empty string, which means the value of 'encoding' is used.
|
an empty string, which means that UTF-8 is used.
|
||||||
WARNING: Conversion can cause loss of information! When
|
WARNING: Conversion can cause loss of information! You can use
|
||||||
'encoding' is "utf-8" (or one of the other Unicode variants)
|
|
||||||
conversion is most likely done in a way that the reverse
|
|
||||||
conversion results in the same text. When 'encoding' is not
|
|
||||||
"utf-8" some non-ASCII characters may be lost! You can use
|
|
||||||
the |++bad| argument to specify what is done with characters
|
the |++bad| argument to specify what is done with characters
|
||||||
that can't be converted.
|
that can't be converted.
|
||||||
For an empty file or a file with only ASCII characters most encodings
|
For an empty file or a file with only ASCII characters most encodings
|
||||||
@@ -2385,11 +2336,11 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
because Vim cannot detect an error, thus the encoding is always
|
because Vim cannot detect an error, thus the encoding is always
|
||||||
accepted.
|
accepted.
|
||||||
The special value "default" can be used for the encoding from the
|
The special value "default" can be used for the encoding from the
|
||||||
environment. It is useful when 'encoding' is set to "utf-8" and
|
environment. It is useful when your environment uses a non-latin1
|
||||||
your environment uses a non-latin1 encoding, such as Russian.
|
encoding, such as Russian.
|
||||||
When 'encoding' is "utf-8" and a file contains an illegal byte
|
When a file contains an illegal UTF-8 byte sequence it won't be
|
||||||
sequence it won't be recognized as UTF-8. You can use the |8g8|
|
recognized as "utf-8". You can use the |8g8| command to find the
|
||||||
command to find the illegal byte sequence.
|
illegal byte sequence.
|
||||||
WRONG VALUES: WHAT'S WRONG:
|
WRONG VALUES: WHAT'S WRONG:
|
||||||
latin1,utf-8 "latin1" will always be used
|
latin1,utf-8 "latin1" will always be used
|
||||||
utf-8,ucs-bom,latin1 BOM won't be recognized in an utf-8
|
utf-8,ucs-bom,latin1 BOM won't be recognized in an utf-8
|
||||||
@@ -3048,8 +2999,7 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
Note: The size of these fonts must be exactly twice as wide as the one
|
Note: The size of these fonts must be exactly twice as wide as the one
|
||||||
specified with 'guifont' and the same height.
|
specified with 'guifont' and the same height.
|
||||||
|
|
||||||
'guifontwide' is only used when 'encoding' is set to "utf-8" and
|
'guifontwide' is only used when 'guifontset' is empty or invalid.
|
||||||
'guifontset' is empty or invalid.
|
|
||||||
When 'guifont' is set and a valid font is found in it and
|
When 'guifont' is set and a valid font is found in it and
|
||||||
'guifontwide' is empty Vim will attempt to find a matching
|
'guifontwide' is empty Vim will attempt to find a matching
|
||||||
double-width font and set 'guifontwide' to it.
|
double-width font and set 'guifontwide' to it.
|
||||||
@@ -3702,7 +3652,7 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
128 - 159 "~@" - "~_"
|
128 - 159 "~@" - "~_"
|
||||||
160 - 254 "| " - "|~"
|
160 - 254 "| " - "|~"
|
||||||
255 "~?"
|
255 "~?"
|
||||||
When 'encoding' is a Unicode one, illegal bytes from 128 to 255 are
|
Illegal bytes from 128 to 255 (invalid UTF-8) are
|
||||||
displayed as <xx>, with the hexadecimal value of the byte.
|
displayed as <xx>, with the hexadecimal value of the byte.
|
||||||
When 'display' contains "uhex" all unprintable characters are
|
When 'display' contains "uhex" all unprintable characters are
|
||||||
displayed as <xx>.
|
displayed as <xx>.
|
||||||
@@ -3980,8 +3930,7 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
omitted.
|
omitted.
|
||||||
|
|
||||||
The characters ':' and ',' should not be used. UTF-8 characters can
|
The characters ':' and ',' should not be used. UTF-8 characters can
|
||||||
be used when 'encoding' is "utf-8", otherwise only printable
|
be used. All characters must be single width.
|
||||||
characters are allowed. All characters must be single width.
|
|
||||||
|
|
||||||
Examples: >
|
Examples: >
|
||||||
:set lcs=tab:>-,trail:-
|
:set lcs=tab:>-,trail:-
|
||||||
@@ -4078,7 +4027,6 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
{only available when compiled with the |+multi_byte|
|
{only available when compiled with the |+multi_byte|
|
||||||
feature}
|
feature}
|
||||||
The maximum number of combining characters supported for displaying.
|
The maximum number of combining characters supported for displaying.
|
||||||
Only used when 'encoding' is "utf-8".
|
|
||||||
The default is OK for most languages. Hebrew may require 4.
|
The default is OK for most languages. Hebrew may require 4.
|
||||||
Maximum value is 6.
|
Maximum value is 6.
|
||||||
Even when this option is set to 2 you can still edit text with more
|
Even when this option is set to 2 you can still edit text with more
|
||||||
@@ -5825,9 +5773,6 @@ A jump table for the options with a short description can be found at |Q_op|.
|
|||||||
(_xx is an underscore, two letters and followed by a non-letter).
|
(_xx is an underscore, two letters and followed by a non-letter).
|
||||||
This is mainly for testing purposes. You must make sure the correct
|
This is mainly for testing purposes. You must make sure the correct
|
||||||
encoding is used, Vim doesn't check it.
|
encoding is used, Vim doesn't check it.
|
||||||
When 'encoding' is set the word lists are reloaded. Thus it's a good
|
|
||||||
idea to set 'spelllang' after setting 'encoding' to avoid loading the
|
|
||||||
files twice.
|
|
||||||
How the related spell files are found is explained here: |spell-load|.
|
How the related spell files are found is explained here: |spell-load|.
|
||||||
|
|
||||||
If the |spellfile.vim| plugin is active and you use a language name
|
If the |spellfile.vim| plugin is active and you use a language name
|
||||||
|
@@ -40,7 +40,6 @@ these differences.
|
|||||||
- 'complete' doesn't include "i"
|
- 'complete' doesn't include "i"
|
||||||
- 'directory' defaults to ~/.local/share/nvim/swap// (|xdg|), auto-created
|
- 'directory' defaults to ~/.local/share/nvim/swap// (|xdg|), auto-created
|
||||||
- 'display' defaults to "lastline"
|
- 'display' defaults to "lastline"
|
||||||
- 'encoding' defaults to "utf-8"
|
|
||||||
- 'formatoptions' defaults to "tcqj"
|
- 'formatoptions' defaults to "tcqj"
|
||||||
- 'history' defaults to 10000 (the maximum)
|
- 'history' defaults to 10000 (the maximum)
|
||||||
- 'hlsearch' is set by default
|
- 'hlsearch' is set by default
|
||||||
@@ -159,7 +158,7 @@ are always available and may be used simultaneously in separate plugins. The
|
|||||||
'p')) mkdir() will silently exit. In Vim this was an error.
|
'p')) mkdir() will silently exit. In Vim this was an error.
|
||||||
3. mkdir() error messages now include strerror() text when mkdir fails.
|
3. mkdir() error messages now include strerror() text when mkdir fails.
|
||||||
|
|
||||||
'encoding' cannot be changed after startup.
|
'encoding' is always "utf-8".
|
||||||
|
|
||||||
|string()| and |:echo| behaviour changed:
|
|string()| and |:echo| behaviour changed:
|
||||||
1. No maximum recursion depth limit is applied to nested container
|
1. No maximum recursion depth limit is applied to nested container
|
||||||
@@ -266,6 +265,7 @@ Highlight groups:
|
|||||||
Other options:
|
Other options:
|
||||||
'antialias'
|
'antialias'
|
||||||
'cpoptions' ("g", "w", "H", "*", "-", "j", and all POSIX flags were removed)
|
'cpoptions' ("g", "w", "H", "*", "-", "j", and all POSIX flags were removed)
|
||||||
|
'encoding' ("utf-8" is always used)
|
||||||
'guioptions' "t" flag was removed
|
'guioptions' "t" flag was removed
|
||||||
*'guipty'* (Nvim uses pipes and PTYs consistently on all platforms.)
|
*'guipty'* (Nvim uses pipes and PTYs consistently on all platforms.)
|
||||||
*'imactivatefunc'* *'imaf'*
|
*'imactivatefunc'* *'imaf'*
|
||||||
|
Reference in New Issue
Block a user