Updated whitespace ranges

Ranges sourced from <http://www.unicode.org/Public/7.0.0/ucd/PropList.txt>_. Wikipedia also uses these ranges on its information page <http://en.wikipedia.org/wiki/Whitespace_character#Unicode>_. 0xfeff isn't included in the list, but it is a no-break space, so I guess it makes sense. 0x200b is actually a format character, but it is a zero-width space. To fit Unicode, both 0x200b and 0xfeff would be removed.
This commit is contained in:
apense
2015-06-08 19:48:57 -04:00
parent c4009c6182
commit 0ee1672d69

View File

@@ -372,11 +372,17 @@ const
0xfe74] #
spaceRanges = [
0x0009, 0x000a, # tab and newline
0x0009, 0x000d, # tab and newline
0x0020, 0x0020, # space
0x0085, 0x0085, # next line
0x00a0, 0x00a0, #
0x2000, 0x200b, # -
0x1680, 0x1680, # Ogham space mark
0x2000, 0x200b, # en dash .. zero-width space
0x200e, 0x200f, # LTR mark .. RTL mark (pattern whitespace)
0x2028, 0x2029, # - 0x3000, 0x3000, #
0x202f, 0x202f, # narrow no-break space
0x205f, 0x205f, # medium mathematical space
0x3000, 0x3000, # ideographic space
0xfeff, 0xfeff] #
toupperRanges = [