Commit Graph

38 Commits

Author SHA1 Message Date
Jacob Sandlund
36c3295806 unicode: don't narrow invalid text presentation (VS15) sequences 2025-11-23 22:39:21 -05:00
Jacob Sandlund
97926ca307 Update uucode to the latest, for future width and grapheme break changes 2025-11-23 17:26:25 -05:00
Mitchell Hashimoto
3d58dc51c9 terminal: keypad variation sequences should respect VS16
This fixes the VS16 issues found in this test:
https://ucs-detect.readthedocs.io/sw_results/ghostty.html#ghostty
This is also a more robust way to handle VS15/16 in general. 

This commit also changes our propeties to be a packed struct which
reduces its size from 4 bytes to 1 and likewise drops our unicode table
size 4x.
2025-11-06 07:05:57 -08:00
Mitchell Hashimoto
913d2dfb23 unicode: fix lookup table generation 2025-10-03 07:10:43 -07:00
Mitchell Hashimoto
16deea2761 nuke ziglyph from orbit
Since we now use uucode, we don't need ziglyph anymore. Ziglyph was kept
around as a test-only dep so we can verify matching but this is
complicating our Zig 0.15 upgrade because ziglyph doesn't support Zig
0.15. Let's just drop it.
2025-09-30 12:17:10 -07:00
Jacob Sandlund
c7ad29ca91 move tests over to _uucode.zig files to avoid needing deps for vt tests 2025-09-23 10:49:59 -04:00
Jacob Sandlund
b5c6c044a7 Fix merge 2025-09-23 10:01:23 -04:00
Jacob Sandlund
b01770c21c Merge remote-tracking branch 'upstream/main' into jacob/uucode 2025-09-23 09:36:41 -04:00
Mitchell Hashimoto
10dc9353b7 unicode: delete props.zig and clean up symbols deps too
Follow up to #8810

Same reasoning.
2025-09-20 20:28:25 -07:00
Mitchell Hashimoto
bf1278deff unicode: isolate properties, tables, and ziglyph into separate files
This makes it cleaner to add new sources of table generation and also
avoids inadvertently depending on different modules (despite Zig's lazy
analysis). 

This also fixes up terminal to only use our look up tables which avoids
bringing ziglyph in for the terminal module.
2025-09-20 15:00:55 -07:00
Jacob Sandlund
7b0722bf16 Remove comment above test. it's not too slow 2025-09-19 01:26:17 -04:00
Jacob Sandlund
cf3b514efc pr feedback: get, remove todos for case_folding_simple 2025-09-19 01:24:13 -04:00
Jacob Sandlund
b83315cb81 set max for unicode grapheme executable 2025-09-18 14:26:04 -04:00
Jacob Sandlund
69594119c3 fix up diff from benchmarks, and add tests against ziglyph 2025-09-18 11:46:05 -04:00
Jacob Sandlund
3275903611 update uucode and cleanups 2025-09-18 09:26:09 -04:00
Jacob Sandlund
4d37853f6c benchmark sources 2025-09-11 10:30:01 -04:00
Jacob Sandlund
cffa52e658 changes after benchmarking 2025-09-09 11:38:10 -04:00
Jacob Sandlund
b0db51c45e fast getX(.is_symbol) 2025-09-06 15:01:29 -04:00
Jacob Sandlund
f86a3a9b50 Merge remote-tracking branch 'upstream/main' into jacob/uucode 2025-09-06 14:31:41 -04:00
Jacob Sandlund
2af08bdbe3 trying a bunch of things to get performance to match 2025-09-06 10:42:02 -04:00
Jeffrey C. Ollie
1ef220a679 render: address review feedback
1. `inline` the table get.
2. Delete unused functions on the LUT table.
3. Disable the isSymbol test under valgrind
2025-09-05 11:40:03 -05:00
Jeffrey C. Ollie
e024b77ad5 drop the new LUT type as no performance advantage detected 2025-09-05 07:58:05 -05:00
Jeffrey C. Ollie
a7da96faee add two LUT-based implementations of isSymbol 2025-09-05 07:58:01 -05:00
Jacob Sandlund
0444c614da update for new grapheme_break 2025-08-21 22:29:34 -04:00
Jacob Sandlund
e84d8535f5 removing all ziglyph imports (aside from unicode/grapheme.zig) 2025-08-17 21:24:27 -04:00
Jacob Sandlund
f5a036a6a0 update after refactor (string field config, etc) 2025-08-12 09:43:12 -04:00
Jacob Sandlund
0c393299b0 using just get 2025-08-05 23:59:30 -04:00
Qwerasd
2384bd69cc style: use decl literals
This commit changes a LOT of areas of the code to use decl literals
instead of redundantly referring to the type.

These changes were mostly driven by some regex searches and then manual
adjustment on a case-by-case basis.

I almost certainly missed quite a few places where decl literals could
be used, but this is a good first step in converting things, and other
instances can be addressed when they're discovered.

I tested GLFW+Metal and building the framework on macOS and tested a GTK
build on Linux, so I'm 99% sure I didn't introduce any syntax errors or
other problems with this. (fingers crossed)
2025-05-26 21:50:14 -06:00
Mitchell Hashimoto
0f4d2bb237 Lots of 0.14 changes 2025-03-12 09:55:52 -07:00
Ryan Liptak
2d3db866e6 unigen: Remove libc dependency, use ArenaAllocator
Not linking libc avoids potential problems when compiling from/for certain targets (see https://github.com/ghostty-org/ghostty/discussions/3218), and using an ArenaAllocator makes unigen run just as fast (in both release and debug modes) while also taking less memory.

Benchmark 1 (3 runs): ./zig-out/bin/unigen-release-c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.75s  ± 15.8ms    1.73s  … 1.76s           0 ( 0%)        0%
  peak_rss           2.23MB ±    0      2.23MB … 2.23MB          0 ( 0%)        0%
  cpu_cycles         7.22G  ± 62.8M     7.16G  … 7.29G           0 ( 0%)        0%
  instructions       11.5G  ± 16.0      11.5G  … 11.5G           0 ( 0%)        0%
  cache_references    436M  ± 6.54M      430M  …  443M           0 ( 0%)        0%
  cache_misses        310K  ±  203K      134K  …  532K           0 ( 0%)        0%
  branch_misses      1.03M  ± 29.9K      997K  … 1.06M           0 ( 0%)        0%
Benchmark 2 (3 runs): ./zig-out/bin/unigen-release-arena
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.73s  ± 6.40ms    1.72s  … 1.73s           0 ( 0%)          -  1.0% ±  1.6%
  peak_rss           1.27MB ± 75.7KB    1.18MB … 1.31MB          0 ( 0%)        - 43.1% ±  5.4%
  cpu_cycles         7.16G  ± 26.5M     7.13G  … 7.18G           0 ( 0%)          -  0.9% ±  1.5%
  instructions       11.4G  ± 28.2      11.4G  … 11.4G           0 ( 0%)          -  0.8% ±  0.0%
  cache_references    441M  ± 2.89M      439M  …  444M           0 ( 0%)          +  1.2% ±  2.6%
  cache_misses        152K  ±  102K     35.2K  …  220K           0 ( 0%)          - 50.8% ± 117.8%
  branch_misses      1.05M  ± 13.4K     1.04M  … 1.06M           0 ( 0%)          +  2.0% ±  5.1%

Benchmark 1 (3 runs): ./zig-out/bin/unigen-debug-c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.75s  ± 32.4ms    1.71s  … 1.77s           0 ( 0%)        0%
  peak_rss           2.23MB ±    0      2.23MB … 2.23MB          0 ( 0%)        0%
  cpu_cycles         7.23G  ±  136M     7.08G  … 7.34G           0 ( 0%)        0%
  instructions       11.5G  ± 37.9      11.5G  … 11.5G           0 ( 0%)        0%
  cache_references    448M  ± 1.03M      447M  …  449M           0 ( 0%)        0%
  cache_misses        148K  ± 42.6K     99.3K  …  180K           0 ( 0%)        0%
  branch_misses       987K  ± 5.27K      983K  …  993K           0 ( 0%)        0%
Benchmark 2 (3 runs): ./zig-out/bin/unigen-debug-arena
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.76s  ± 4.12ms    1.76s  … 1.76s           0 ( 0%)          +  0.4% ±  3.0%
  peak_rss           1.22MB ± 75.7KB    1.18MB … 1.31MB          0 ( 0%)        - 45.1% ±  5.4%
  cpu_cycles         7.27G  ± 17.1M     7.26G  … 7.29G           0 ( 0%)          +  0.6% ±  3.0%
  instructions       11.4G  ± 3.79      11.4G  … 11.4G           0 ( 0%)          -  0.8% ±  0.0%
  cache_references    440M  ± 4.52M      435M  …  444M           0 ( 0%)          -  1.7% ±  1.7%
  cache_misses       43.6K  ± 19.2K     26.5K  … 64.3K           0 ( 0%)        - 70.5% ± 50.8%
  branch_misses      1.04M  ± 2.25K     1.04M  … 1.05M           0 ( 0%)        💩+  5.8% ±  0.9%
2025-01-20 18:30:22 -08:00
Mitchell Hashimoto
fd1201323e unicode: emoji modifier requires emoji modifier base preceding to not break
Fixes #2941

This fixes the rendering of the text below. For those that can't see it,
it is the following in UTF-32: `0x22 0x1F3FF 0x22`.

```
"🏿"
```

`0x1F3FF` is the Fitzpatrick modifier for dark skin tone. It has the
Unicode property `Emoji_Modifier`. Emoji modifiers are defined in UTS
#51 and are only valid based on ED-13:

```
emoji_modifier_sequence := emoji_modifier_base emoji_modifier
emoji_modifier_base := \p{Emoji_Modifier_Base}
emoji_modifier := \p{Emoji_Modifier}
```

Additional quote from UTS #51:

> To have an effect on an emoji, an emoji modifier must immediately follow
> that base emoji character. Emoji presentation selectors are neither needed
> nor recommended for emoji characters when they are followed by emoji
> modifiers, and should not be used in newly generated emoji modifier
> sequences; the emoji modifier automatically implies the emoji presentation
> style.

Our precomputed grapheme break table was mistakingly not following this
rule. This commit fixes that by adding a check for that every
`Emoji_Modifier` character must be preceded by an `Emoji_Modifier_Base`.
This only has a cost during compilation (table generation). The runtime
cost is identical; the table size didn't increase since we had leftover
bits we could use.
2024-12-12 12:53:08 -08:00
Mitchell Hashimoto
004405ccf9 terminal: only apply VS15/16 to emoji
Fixes #1482
2024-02-10 17:26:45 -08:00
Mitchell Hashimoto
5275d44e7d unicode: precompute grapheme break data 2024-02-09 20:50:13 -08:00
Mitchell Hashimoto
132fbb3a46 unicode: use packed struct for break state 2024-02-09 20:29:36 -08:00
Mitchell Hashimoto
c47ad97f62 unicode: remove unused 2024-02-09 20:23:29 -08:00
Mitchell Hashimoto
5f3574a4bf unicode: direct port of ziglyph to start 2024-02-09 19:44:57 -08:00
Mitchell Hashimoto
0632410857 unicode: get grapheme boundary class 2024-02-09 12:22:23 -08:00
Mitchell Hashimoto
9755d0696e unicode: generate our own lookup tables 2024-02-08 21:01:11 -08:00