Commit Graph

45 Commits

Author SHA1 Message Date
Mitchell Hashimoto
49b2b8d644 unicode: switch to uucode grapheme break to (mostly) match unicode spec (#9680)
This PR builds on https://github.com/ghostty-org/ghostty/pull/9678 ~so
the diff from there is included here (it's not possible to stack PRs
unless it's a PR against my own fork)--review that one first!~

This PR updates the `graphemeBreak` calculation to use `uucode`'s
`computeGraphemeBreakNoControl`, which has [tests in
uucode](215ff09730/src/x/grapheme.zig (L753))
that confirm it passes the `GraphemeBreakTest.txt` (minus some
exceptions).

Note that the `grapheme_break` (and `grapheme_break_no_control`)
property in `uucode` incorporates `emoji_modifier` and
`emoji_modifier_base`, diverging from UAX #29 but matching UTS #51. See
[this comment in
uucode](215ff09730/src/grapheme.zig (L420-L434))
for details.

The `grapheme_break_no_control` property and
`computeGraphemeBreakNoControl` both assume `control`, `cr`, and `lf`
have been filtered out, matching the current grapheme break logic in
Ghostty.

This PR keeps the `Precompute.data` logic mostly equivalent, since the
`uucode` `precomputedGraphemeBreak` lacks benchmarks in the `uucode`
repository (it was benchmarked in [the original PR adding `uucode` to
Ghostty](https://github.com/ghostty-org/ghostty/pull/8757)). Note
however, that due to `grapheme_break` being one bit larger than
`grapheme_boundary_class` and the new `BreakState` also being one bit
larger, the state jumps up by a factor of 8 (u10 -> u13), to 8KB.

## Benchmarks

~I benchmarked the old `main` version versus this PR for
`+grapheme-break` and surprisingly this PR is 2% faster (?). Looking at
the assembly though, I'm thinking something else might be causing that.
Once I get to the bottom of that I'll remove the below TODO and include
the benchmark results here.~

When seeing the speedup with `data.txt` and maybe a tiny speedup on
English wiki, I was surprised given the 1KB -> 8KB tables. Here's what
AI said when I asked it to inspect the assembly:
https://ampcode.com/threads/T-979b1743-19e7-47c9-8074-9778b4b2a61e, and
here's what it said when I asked it to predict the faster version:
https://ampcode.com/threads/T-3291dcd3-7a21-4d24-a192-7b3f6e18cd31

It looks like two loads got reordered and that put the load that
depended on stage1 -> stage2 -> stage3 second, "hiding memory latency".
So that makes the new one faster when looking up the `grapheme_break`
property. These gains go away with the Japanese and Arabic benchmarks,
which spend more time processing utf8, and may even have more grapheme
clusters too.

### with data.txt (200 MB ghostty-gen random utf8)

<img width="1822" height="464" alt="CleanShot 2025-11-26 at 08 42 03@2x"
src="https://github.com/user-attachments/assets/56d4ee98-21db-4eab-93ab-a0463a653883"
/>

### with English wiki dump

<img width="2012" height="506" alt="CleanShot 2025-11-26 at 08 43 15@2x"
src="https://github.com/user-attachments/assets/230fbfb7-272d-4a2a-93e7-7268962a9814"
/>

### with Japanese wiki dump

<img width="2008" height="518" alt="CleanShot 2025-11-26 at 08 43 49@2x"
src="https://github.com/user-attachments/assets/edb408c8-a604-4a8f-bd5b-80f19e3d65ee"
/>

### with Arabic wiki dump

<img width="2010" height="512" alt="CleanShot 2025-11-26 at 08 44 25@2x"
src="https://github.com/user-attachments/assets/81a29ac8-0586-4e82-8276-d7fa90c31c90"
/>


TODO:

* [x] Take a closer look at the assembly and understand why this PR (8
KB vs 1 KB table) is faster on my machine.
* [x] _(**edit**: checking this off because it seems unnecessary)_ If
this turns out to actually be unacceptably slower, one possibility is to
switch to `uucode`'s `precomputedGraphemeBreak` which uses a 1445 byte
table since it uses a dense table (indexed using multiplication instead
of bitCast, though, which did show up in the initial benchmarks from
https://github.com/ghostty-org/ghostty/pull/8757 a small amount.)

AI was used in some of the uucode changes in
https://github.com/ghostty-org/ghostty/pull/9678 (Amp--primarily for
tests), but everything was carefully vetted and much of it done by hand.
This PR was made without AI with the exception of consulting AI about
whether the "Prepend + ASCII" scenario is common (hopefully it's right
about that being uncommon).
2026-01-20 09:44:15 -08:00
Jeffrey C. Ollie
f180f1c9b8 osc: remove inline from Parser.next 2026-01-08 14:12:16 -06:00
Jeffrey C. Ollie
fb1268a908 benchmark: add doNotOptimizeAway to OSC benchmark 2026-01-08 13:50:43 -06:00
Jeffrey C. Ollie
01a75ceec4 benchmark: add option to microbenchmark OSC parser 2025-12-10 22:31:27 -06:00
Jacob Sandlund
c49d50eb80 Merge remote-tracking branch 'upstream/main' into grapheme-break 2025-11-30 08:57:08 -05:00
Mitchell Hashimoto
dbfc3eb679 Remove unused imports 2025-11-27 13:37:53 -08:00
Jacob Sandlund
f68abdef75 Merge branch 'main' into grapheme-break 2025-11-26 17:31:47 -05:00
Jacob Sandlund
807febcb5e benchmarks: align read_buf to cache line 2025-11-25 09:07:21 -05:00
Jacob Sandlund
97aff0743e unicode: switch to uucode grapheme break to match unicode 16 spec 2025-11-23 22:33:01 -05:00
Mitchell Hashimoto
4caefb807c terminal: fix up some performance issues with render state 2025-11-20 22:00:42 -08:00
Mitchell Hashimoto
5d85f2382e terminal: render state needs to preserve as much allocation as possible 2025-11-20 22:00:42 -08:00
Mitchell Hashimoto
7195cab7d3 benchmark: add RenderState to ScreenClone benchmark 2025-11-20 22:00:42 -08:00
Mitchell Hashimoto
2f49e0c902 remove screenclone test cause it leaks memory on purpose 2025-11-17 06:42:00 -10:00
Mitchell Hashimoto
9a46397b59 benchmark: screen clone 2025-11-17 05:08:19 -10:00
Mitchell Hashimoto
2ef89c153a terminal: convert C0 2025-10-23 15:52:42 -07:00
Mitchell Hashimoto
f7189d14b9 terminal: convert Stream to use Action tagged union 2025-10-23 15:50:56 -07:00
Mitchell Hashimoto
06ad3b77b7 Zig 0.15.2 (#9200) 2025-10-14 07:11:10 -07:00
Mitchell Hashimoto
cb295b84a0 Zig 0.15: zig build test 2025-10-03 07:10:43 -07:00
Jacob Sandlund
b01770c21c Merge remote-tracking branch 'upstream/main' into jacob/uucode 2025-09-23 09:36:41 -04:00
Mitchell Hashimoto
10dc9353b7 unicode: delete props.zig and clean up symbols deps too
Follow up to #8810

Same reasoning.
2025-09-20 20:28:25 -07:00
Jacob Sandlund
cf3b514efc pr feedback: get, remove todos for case_folding_simple 2025-09-19 01:24:13 -04:00
Jacob Sandlund
18e9989f63 forgot to align buf 2025-09-18 14:20:41 -04:00
Jacob Sandlund
69594119c3 fix up diff from benchmarks, and add tests against ziglyph 2025-09-18 11:46:05 -04:00
Jacob Sandlund
3275903611 update uucode and cleanups 2025-09-18 09:26:09 -04:00
Jacob Sandlund
8a40c1f59f Merge remote-tracking branch 'upstream/main' into jacob/uucode 2025-09-10 10:01:06 -04:00
Jacob Sandlund
cffa52e658 changes after benchmarking 2025-09-09 11:38:10 -04:00
Jacob Sandlund
77b4c52634 [benchmarks] Align buf to cache line for consistency 2025-09-08 17:59:17 -04:00
Jacob Sandlund
113e89b389 [benchmarks] Use std.mem.doNotOptimizeAway to avoid data collisions 2025-09-06 15:06:00 -04:00
Jacob Sandlund
b0db51c45e fast getX(.is_symbol) 2025-09-06 15:01:29 -04:00
Jacob Sandlund
c3994347c0 doNotOptimizeAway 2025-09-06 14:55:21 -04:00
Jacob Sandlund
f86a3a9b50 Merge remote-tracking branch 'upstream/main' into jacob/uucode 2025-09-06 14:31:41 -04:00
Jacob Sandlund
2af08bdbe3 trying a bunch of things to get performance to match 2025-09-06 10:42:02 -04:00
Jeffrey C. Ollie
e024b77ad5 drop the new LUT type as no performance advantage detected 2025-09-05 07:58:05 -05:00
Jeffrey C. Ollie
a7da96faee add two LUT-based implementations of isSymbol 2025-09-05 07:58:01 -05:00
Jeffrey C. Ollie
0bc90b2a20 ci: build on freebsd 2025-08-30 13:58:25 -05:00
Mitchell Hashimoto
131f170f89 terminal: change OSC parser to explicit init to set undefined
This works around: https://github.com/ziglang/zig/issues/19148
This lets our `test-valgrind` command catch some issues. We'll have to
follow this pattern in more places but I want to do it incrementally so
things keep passing.

I **do not** want to blindly follow this pattern everywhere. I want to
start by focusing in only on the structs that set `undefined` as default
fields that we're also about to test in isolation with Valgrind. Its
just too much noise otherwise and not a general style I'm sure of; it's
worth it for Valgrind though.
2025-08-20 12:38:29 -07:00
Mitchell Hashimoto
a28b7e9205 synthetic cli (ghostty-gen) 2025-07-09 15:06:24 -07:00
Mitchell Hashimoto
99ed984af2 benchmark: add GraphemeBreak and TerminalParser benchmarks 2025-07-09 15:06:24 -07:00
Mitchell Hashimoto
5890826356 benchmark: add codepoint width benchmark 2025-07-09 15:06:24 -07:00
Mitchell Hashimoto
c990d35d6d macos: add benchmark tests to our Xcode project 2025-07-09 15:06:24 -07:00
Mitchell Hashimoto
20bb71c627 libghostty: export benchmark CLI API 2025-07-09 15:06:24 -07:00
Mitchell Hashimoto
01b2545d1d macos: fix signpost API to use proper mach header base addr 2025-07-09 15:06:24 -07:00
Mitchell Hashimoto
d30771ecff pkg/macos: use new @ptrcast for os.log 2025-07-09 15:06:24 -07:00
Mitchell Hashimoto
b8f5cf9d52 initial ghostty-bench program 2025-07-09 15:06:24 -07:00
Mitchell Hashimoto
0e8ccc7352 benchmark: a new package and framework for benchmarking 2025-07-09 15:06:23 -07:00