This PR builds on https://github.com/ghostty-org/ghostty/pull/9678 ~so
the diff from there is included here (it's not possible to stack PRs
unless it's a PR against my own fork)--review that one first!~
This PR updates the `graphemeBreak` calculation to use `uucode`'s
`computeGraphemeBreakNoControl`, which has [tests in
uucode](215ff09730/src/x/grapheme.zig (L753))
that confirm it passes the `GraphemeBreakTest.txt` (minus some
exceptions).
Note that the `grapheme_break` (and `grapheme_break_no_control`)
property in `uucode` incorporates `emoji_modifier` and
`emoji_modifier_base`, diverging from UAX #29 but matching UTS #51. See
[this comment in
uucode](215ff09730/src/grapheme.zig (L420-L434))
for details.
The `grapheme_break_no_control` property and
`computeGraphemeBreakNoControl` both assume `control`, `cr`, and `lf`
have been filtered out, matching the current grapheme break logic in
Ghostty.
This PR keeps the `Precompute.data` logic mostly equivalent, since the
`uucode` `precomputedGraphemeBreak` lacks benchmarks in the `uucode`
repository (it was benchmarked in [the original PR adding `uucode` to
Ghostty](https://github.com/ghostty-org/ghostty/pull/8757)). Note
however, that due to `grapheme_break` being one bit larger than
`grapheme_boundary_class` and the new `BreakState` also being one bit
larger, the state jumps up by a factor of 8 (u10 -> u13), to 8KB.
## Benchmarks
~I benchmarked the old `main` version versus this PR for
`+grapheme-break` and surprisingly this PR is 2% faster (?). Looking at
the assembly though, I'm thinking something else might be causing that.
Once I get to the bottom of that I'll remove the below TODO and include
the benchmark results here.~
When seeing the speedup with `data.txt` and maybe a tiny speedup on
English wiki, I was surprised given the 1KB -> 8KB tables. Here's what
AI said when I asked it to inspect the assembly:
https://ampcode.com/threads/T-979b1743-19e7-47c9-8074-9778b4b2a61e, and
here's what it said when I asked it to predict the faster version:
https://ampcode.com/threads/T-3291dcd3-7a21-4d24-a192-7b3f6e18cd31
It looks like two loads got reordered and that put the load that
depended on stage1 -> stage2 -> stage3 second, "hiding memory latency".
So that makes the new one faster when looking up the `grapheme_break`
property. These gains go away with the Japanese and Arabic benchmarks,
which spend more time processing utf8, and may even have more grapheme
clusters too.
### with data.txt (200 MB ghostty-gen random utf8)
<img width="1822" height="464" alt="CleanShot 2025-11-26 at 08 42 03@2x"
src="https://github.com/user-attachments/assets/56d4ee98-21db-4eab-93ab-a0463a653883"
/>
### with English wiki dump
<img width="2012" height="506" alt="CleanShot 2025-11-26 at 08 43 15@2x"
src="https://github.com/user-attachments/assets/230fbfb7-272d-4a2a-93e7-7268962a9814"
/>
### with Japanese wiki dump
<img width="2008" height="518" alt="CleanShot 2025-11-26 at 08 43 49@2x"
src="https://github.com/user-attachments/assets/edb408c8-a604-4a8f-bd5b-80f19e3d65ee"
/>
### with Arabic wiki dump
<img width="2010" height="512" alt="CleanShot 2025-11-26 at 08 44 25@2x"
src="https://github.com/user-attachments/assets/81a29ac8-0586-4e82-8276-d7fa90c31c90"
/>
TODO:
* [x] Take a closer look at the assembly and understand why this PR (8
KB vs 1 KB table) is faster on my machine.
* [x] _(**edit**: checking this off because it seems unnecessary)_ If
this turns out to actually be unacceptably slower, one possibility is to
switch to `uucode`'s `precomputedGraphemeBreak` which uses a 1445 byte
table since it uses a dense table (indexed using multiplication instead
of bitCast, though, which did show up in the initial benchmarks from
https://github.com/ghostty-org/ghostty/pull/8757 a small amount.)
AI was used in some of the uucode changes in
https://github.com/ghostty-org/ghostty/pull/9678 (Amp--primarily for
tests), but everything was carefully vetted and much of it done by hand.
This PR was made without AI with the exception of consulting AI about
whether the "Prepend + ASCII" scenario is common (hopefully it's right
about that being uncommon).
This fixes the VS16 issues found in this test:
https://ucs-detect.readthedocs.io/sw_results/ghostty.html#ghostty
This is also a more robust way to handle VS15/16 in general.
This commit also changes our propeties to be a packed struct which
reduces its size from 4 bytes to 1 and likewise drops our unicode table
size 4x.
Since we now use uucode, we don't need ziglyph anymore. Ziglyph was kept
around as a test-only dep so we can verify matching but this is
complicating our Zig 0.15 upgrade because ziglyph doesn't support Zig
0.15. Let's just drop it.
This makes it cleaner to add new sources of table generation and also
avoids inadvertently depending on different modules (despite Zig's lazy
analysis).
This also fixes up terminal to only use our look up tables which avoids
bringing ziglyph in for the terminal module.
This commit changes a LOT of areas of the code to use decl literals
instead of redundantly referring to the type.
These changes were mostly driven by some regex searches and then manual
adjustment on a case-by-case basis.
I almost certainly missed quite a few places where decl literals could
be used, but this is a good first step in converting things, and other
instances can be addressed when they're discovered.
I tested GLFW+Metal and building the framework on macOS and tested a GTK
build on Linux, so I'm 99% sure I didn't introduce any syntax errors or
other problems with this. (fingers crossed)
Fixes#2941
This fixes the rendering of the text below. For those that can't see it,
it is the following in UTF-32: `0x22 0x1F3FF 0x22`.
```
"🏿"
```
`0x1F3FF` is the Fitzpatrick modifier for dark skin tone. It has the
Unicode property `Emoji_Modifier`. Emoji modifiers are defined in UTS
#51 and are only valid based on ED-13:
```
emoji_modifier_sequence := emoji_modifier_base emoji_modifier
emoji_modifier_base := \p{Emoji_Modifier_Base}
emoji_modifier := \p{Emoji_Modifier}
```
Additional quote from UTS #51:
> To have an effect on an emoji, an emoji modifier must immediately follow
> that base emoji character. Emoji presentation selectors are neither needed
> nor recommended for emoji characters when they are followed by emoji
> modifiers, and should not be used in newly generated emoji modifier
> sequences; the emoji modifier automatically implies the emoji presentation
> style.
Our precomputed grapheme break table was mistakingly not following this
rule. This commit fixes that by adding a check for that every
`Emoji_Modifier` character must be preceded by an `Emoji_Modifier_Base`.
This only has a cost during compilation (table generation). The runtime
cost is identical; the table size didn't increase since we had leftover
bits we could use.