mirror of
https://github.com/ghostty-org/ghostty.git
synced 2025-12-29 01:24:41 +00:00
I used the new CPU counter mode in Instruments.app to track down functions that had instruction delivery bottlenecks (indicating i-cache misses) and picked a bunch of trivial functions to mark as inline (plus a couple that are only used once or twice and which benefit from inlining). The size of `macos-arm64/libghostty-fat.a` built with `zig build -Doptimize=ReleaseFast -Dxcframework-target=native` goes from `145,538,856` bytes on `main` to `145,595,952` on this branch, a negligible increase. These changes resulted in some pretty sizable improvements in vtebench results on my machine (Apple M3 Max): <img width="983" height="696" alt="image" src="https://github.com/user-attachments/assets/cac595ca-7616-48ed-983c-208c2ca2023f" /> With this, the only vtebench test we're slower than Alacritty in (on my machine, at 130x51 window size) is `dense_cells` (which, IMO, is so artificial that optimizing for it might actually negatively impact real world performance). I also did a pretty simple improvement to how we copy the screen in the renderer, gave it its own page pool for less memory churn. Further optimization in that area should be explored since in some scenarios it seems like as much as 35% of the time on the `io-reader` thread is spent waiting for the lock. > [!NOTE] > Before this is merged, someone really ought to test this on an x86 processor to see how the performance compares there, since this *is* tuning for my processor specifically, and I know that M chips have pretty big i-cache compared to some x86 processors which could impact the performance characteristics of these changes.