ghostty

mirror of https://github.com/ghostty-org/ghostty.git synced 2026-05-30 16:45:44 +00:00

Files

Mitchell Hashimoto 49b2b8d644 unicode: switch to uucode grapheme break to (mostly) match unicode spec (#9680 )

This PR builds on https://github.com/ghostty-org/ghostty/pull/9678 ~so
the diff from there is included here (it's not possible to stack PRs
unless it's a PR against my own fork)--review that one first!~

This PR updates the `graphemeBreak` calculation to use `uucode`'s
`computeGraphemeBreakNoControl`, which has [tests in
uucode](215ff09730/src/x/grapheme.zig (L753))
that confirm it passes the `GraphemeBreakTest.txt` (minus some
exceptions).

Note that the `grapheme_break` (and `grapheme_break_no_control`)
property in `uucode` incorporates `emoji_modifier` and
`emoji_modifier_base`, diverging from UAX #29 but matching UTS #51. See
[this comment in
uucode](215ff09730/src/grapheme.zig (L420-L434))
for details.

The `grapheme_break_no_control` property and
`computeGraphemeBreakNoControl` both assume `control`, `cr`, and `lf`
have been filtered out, matching the current grapheme break logic in
Ghostty.

This PR keeps the `Precompute.data` logic mostly equivalent, since the
`uucode` `precomputedGraphemeBreak` lacks benchmarks in the `uucode`
repository (it was benchmarked in [the original PR adding `uucode` to
Ghostty](https://github.com/ghostty-org/ghostty/pull/8757)). Note
however, that due to `grapheme_break` being one bit larger than
`grapheme_boundary_class` and the new `BreakState` also being one bit
larger, the state jumps up by a factor of 8 (u10 -> u13), to 8KB.

## Benchmarks

~I benchmarked the old `main` version versus this PR for
`+grapheme-break` and surprisingly this PR is 2% faster (?). Looking at
the assembly though, I'm thinking something else might be causing that.
Once I get to the bottom of that I'll remove the below TODO and include
the benchmark results here.~

When seeing the speedup with `data.txt` and maybe a tiny speedup on
English wiki, I was surprised given the 1KB -> 8KB tables. Here's what
AI said when I asked it to inspect the assembly:
https://ampcode.com/threads/T-979b1743-19e7-47c9-8074-9778b4b2a61e, and
here's what it said when I asked it to predict the faster version:
https://ampcode.com/threads/T-3291dcd3-7a21-4d24-a192-7b3f6e18cd31

It looks like two loads got reordered and that put the load that
depended on stage1 -> stage2 -> stage3 second, "hiding memory latency".
So that makes the new one faster when looking up the `grapheme_break`
property. These gains go away with the Japanese and Arabic benchmarks,
which spend more time processing utf8, and may even have more grapheme
clusters too.

### with data.txt (200 MB ghostty-gen random utf8)

<img width="1822" height="464" alt="CleanShot 2025-11-26 at 08 42 03@2x"
src="https://github.com/user-attachments/assets/56d4ee98-21db-4eab-93ab-a0463a653883"
/>

### with English wiki dump

<img width="2012" height="506" alt="CleanShot 2025-11-26 at 08 43 15@2x"
src="https://github.com/user-attachments/assets/230fbfb7-272d-4a2a-93e7-7268962a9814"
/>

### with Japanese wiki dump

<img width="2008" height="518" alt="CleanShot 2025-11-26 at 08 43 49@2x"
src="https://github.com/user-attachments/assets/edb408c8-a604-4a8f-bd5b-80f19e3d65ee"
/>

### with Arabic wiki dump

<img width="2010" height="512" alt="CleanShot 2025-11-26 at 08 44 25@2x"
src="https://github.com/user-attachments/assets/81a29ac8-0586-4e82-8276-d7fa90c31c90"
/>


TODO:

* [x] Take a closer look at the assembly and understand why this PR (8
KB vs 1 KB table) is faster on my machine.
* [x] _(**edit**: checking this off because it seems unnecessary)_ If
this turns out to actually be unacceptably slower, one possibility is to
switch to `uucode`'s `precomputedGraphemeBreak` which uses a 1445 byte
table since it uses a dense table (indexed using multiplication instead
of bitCast, though, which did show up in the initial benchmarks from
https://github.com/ghostty-org/ghostty/pull/8757 a small amount.)

AI was used in some of the uucode changes in
https://github.com/ghostty-org/ghostty/pull/9678 (Amp--primarily for
tests), but everything was carefully vetted and much of it done by hand.
This PR was made without AI with the exception of consulting AI about
whether the "Prepend + ASCII" scenario is common (hopefully it's right
about that being uncommon).