Because of the bug in `tools/parse_unicodedata.nim`, CJK Ideographs were
not considered letters in `isAlpha()`, even though they have category
Lo. This is because they are specified as range in `UnicodeData.txt`,
not as separate characters:
```
4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
9FEF;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;
```
The parser was not prepared to parse such ranges and thus omitted almost
all CJK Ideographs from consideration.
To fix this, we need to consider ranges from `UnicodeData.txt` in
`tools/parse_unicodedata.nim`.
Fixes an issue that comes up when using strutils.`%` or any other
strutils/strformat feature that uses the unicode lookup tables behind
the scenes, on systems where ints are than 32-bit wide.
Tested with:
```bash
./koch test cat lib
```
Refer to the discussion in #23125.
* update unicode.nim
* create a script to create the needed unicode data
* make unicode.nim compatible with Unicode v12.0.0
* slightly improve unicode.nim documentation (fixes#4795)
* more documentation