This makes `cursorStyle` utilize `RenderState` to determine the
appropriate cursor style. This moves the cursor style logic outside the
critical area, although it was cheap to begin with.
This always removes `viewport_is_bottom` which had no practical use.
This adds a new API called `RenderState` that produces the proper state
required for a renderer to draw a viewport and updates our renderer to
use it instead of the prior screen clone method.
The newsworthy change is here is that we've shortened the critical area
where the renderer holds the terminal lock and blocks IO by anywhere
from **2x to 5x faster**, and in about half the frames we're now in the
critical area for **zero microseconds** because `RenderState` has much
better dirty/damage tracking.
**For libghostty Zig users**, this API is available to the Zig module
and can be used to create your own renderers (to any artifact, it
doesn't have to be graphical! It is even useful for text like a tmux
clone or something).
## Differences vs Old Method
The renderer previously called `Screen.clone`. This produces a copy of
all the screen data (even stuff the renderer may not need) and attempts
to create a standalone, fully functional `Screen` allocation. I didn't
microbenchmark this to understand exactly where the slowdown was, but I
think it was based primarily in two places:
- Screens have a minimum allocation of two Ghostty "pages" which are
quite large. I think this allocation was the primary expensive part.
Without refactoring our entire Screen/PageList work, this was
unavoidable.
- The clone was unconditional and would produce a fully new Screen on
every frame.
The new structure `RenderState` is stateful and each frame calls
`update` on the prior value and it modifies itself in place. This
addresses the above two points in a few ways:
- Allocation is minimized to only what we need, no full pages.
- Since it is stateful (updating in-place), we only change dirty data
and don't need to copy everything. **Note the benchmarks below only test
_full updates_, so you aren't even seeing the benefits of this!**
- Also since it is stateful, we cache expensive calculations (such as
for selections) so future screen updates can reuse those cached values
rather than recompute them.
- We retain memory from prior updates even when the screen is dirty, so
even in dirty states, it's unlikely we allocate.
More details in the "Design" section.
## Benchmarks
Here are some microbenchmarks on the previous render critical area
versus the new one.
For each of the benchmarks below, ignore the time units (milliseconds)
and instead **focus on the relative speedup.** The benchmarks are all
doing a full render frame setup around 1000 times, because the actual
cost of a frame update is in dozens or hundreds of microseconds.
> [!NOTE]
>
> I'm still working on some more benchmarks before merging. I'll update
this space.
### Full screen, single style
<img width="1542" height="480" alt="CleanShot 2025-11-21 at 07 35 13@2x"
src="https://github.com/user-attachments/assets/c28c9f33-d1aa-4723-8a8e-3c6d70fe3667"
/>
### Full screen, plaintext
<img width="1642" height="612" alt="CleanShot 2025-11-21 at 07 36 06@2x"
src="https://github.com/user-attachments/assets/b51f57cf-7c48-46c8-a347-8ecc0bdd3d47"
/>
### Full screen, different style per cell (pathological case)
<img width="1704" height="456" alt="CleanShot 2025-11-21 at 07 37 55@2x"
src="https://github.com/user-attachments/assets/71a98250-d8d1-47ab-ae69-5e6b3b60bf2d"
/>
### Critical Area: Typing at Shell Prompt
| This PR | Main Branch |
---------|--------
| <img width="1064" height="764" alt="CleanShot 2025-11-21 at 14 44
31@2x"
src="https://github.com/user-attachments/assets/8a0ab3a1-3d68-41f0-9469-bc08a4569286"
/> | <img width="1040" height="684" alt="CleanShot 2025-11-21 at 14 47
10@2x"
src="https://github.com/user-attachments/assets/04ffa607-8841-436b-b6e9-eeeb6ee9482d"
/> |
### Critical Area: Neovim Scrolling
| This PR | Main Branch |
---------|--------
| <img width="1054" height="748" alt="CleanShot 2025-11-21 at 14 45
06@2x"
src="https://github.com/user-attachments/assets/ccafaee8-720f-41be-820d-fd705835607a"
/> | <img width="1068" height="796" alt="CleanShot 2025-11-21 at 14 47
48@2x"
src="https://github.com/user-attachments/assets/68087496-d371-4c7c-8b4c-b967dbaeaa7c"
/> |
### Critical Area: `btop -u 100`
This is closer to a pathological case, about as close as you get with a
real tool in the wild. `btop` uses hundreds of unique styles and updates
many cells across many rows very frequently (every 100ms in this case).
You can see that some of our frame times in this case are similar but
there are _so many more near-idle frames_ thanks to our dirty tracking.
| This PR | Main Branch |
---------|--------
| <img width="1088" height="900" alt="CleanShot 2025-11-21 at 14 45
44@2x"
src="https://github.com/user-attachments/assets/ea63f0eb-f06e-4d00-95a3-c55a3755cc67"
/> | <img width="1078" height="906" alt="CleanShot 2025-11-21 at 14 48
18@2x"
src="https://github.com/user-attachments/assets/cef360de-2b12-440f-8c4c-6a69b5ce4058"
/> |
### "DOOM Fire"
Fullscreen on my macOS when from 770 FPS to ~808 FPS consistently, a
solid 5% increase repeatedly.
<img width="3520" height="2392" alt="CleanShot 2025-11-21 at 07 45
29@2x"
src="https://github.com/user-attachments/assets/033effca-0abb-4ff8-a21b-83214d118d12"
/>
### IO
While this was rendering focused, the smaller critical area does help IO
performance a bit.
We've already tracked down the remaining issues to the IO thread going
to sleep and overhead with context switching. We're investigating
switching to a spin lock for the IO thread only in another track of
work.
> [!NOTE]
>
> **This is comparing with `main`, which already has a 20-30%
performance improvement over v1.2.3.**
<img width="982" height="698" alt="image"
src="https://github.com/user-attachments/assets/52a86f6c-6f09-45fe-9ac7-ca62c7ac6ee4"
/>
## Design
The design of the API is a _stateful_ `RenderState` struct that you call
`update` on each frame with the target terminal, and it only updates
what is needed. RenderState keeps track of rows, cells, hyperlinks,
selections, etc.
```zig
// Start empty
var state: terminal.RenderState = .empty;
defer state.deinit(alloc);
// Each frame update it with a terminal.
// THIS IS THE ONLY PART THAT ACCESS `t`
try state.update(alloc, &t);
// Access render data (can be outside any locking for `t`)
...
```
The ergonomics of the `RenderState` structure a wee bit clunky because
we make use of struct-of-arrays (SoA, Zig's MultiArrayList) to better
optimize cache locality for renderer data vs. update data (what we need
to update the render state is different from what we need to draw).
Once you get used to the API though, it's pretty beautiful. I mean, look
at this:
```zig
for (
0..,
row_data.items(.raw),
row_data.items(.cells),
) |y, row, cells| {
const cells_slice = cells.slice();
for (
0..,
cells_slice.items(.raw),
cells_slice.items(.grapheme),
) |x, cell, graphemes| {
```
## Improvements
This PR makes various improvements across the board:
- It bears repeating in case it was missed previously that the critical
area time of a render has gone down 2x to 5x when there is work and is
now free when there is no work (the previous implementation always did
work).
- Font shaping is much more efficient now and only requires access to a
render state.
- Selection handling is now cached and works with dirty tracking.
Previously, if you had an active selection, we'd search the entire
screen multiple times (like... once per row). Yikes.
- Hyperlink handling is _much_ more efficient. Instead of iterating
through the entire screen contents _per configured link_ we now cache
the screen contents as a string and search one whole string multiple
times. Obvious, but we didn't do this before.
- The `contrainedWidth` and `rowNeverExtendBg` helper methods are now
both much more efficient and live within the renderer package rather
than being awkwardly in the terminal package.
## Future Notes
- Our `terminal.Selection` API is very bad. It conceptually makes sense
and I understand why I designed it this way (easy) but it makes it hard
to render or manipulate performantly.
**AI Disclosure:** AI was used only to assist with writing some tests
and converting some tests. The primary logic is all organic, meatbag
produced.
This change should give more consistent results between high and low DPI
displays, and generally approximate the authorial intent of the metrics
a little better.
Also changed the cell height adjustment to prioritize the top or bottom
when adjusting by an odd number depending on whether the face is higher
or lower in the cell than it "should" be. This should make it easier for
users who have an issue with a glyph protruding from the cell to adjust
the height and resolve it.
The code that re-creates the font descriptor from scratch using the same
attributes also rubs off the magic dust that makes CoreText not throw a
fit at us for using a "hidden" system font (name prefixed with a dot) by
name when we use the descriptor. This means that a small subset of chars
that only have glyphs in these fallback system fonts like ".CJK Symbols
Fallback HK Regular" and ".DecoType Nastaleeq Urdu UI" would not be able
to be rendered, since when we requested the font with the non-magical
descriptor CoreText would complain in the console and give us Times New
Roman instead.
Using `CTFontDescriptorCreateCopyWithAttributes` to clear the charset
attribute instead of recreating from scratch makes the copy come out
magical, and CoreText lets us instantiate the font from it, yippee!
### ℹ️ For anyone who came to this PR from a search engine rabbit hole
trying to figure this out for your own code:
It seems like if you want to use one of these fonts, you have to avoid
creating a new descriptor from scratch with the name. To create a
descriptor with one of these names that CoreText won't get upset about,
you have to get it in one of these ways (there may be others but these
are ones that definitely work at time of writing):
- Get it from `CTFontCreateForString`(`WithLanguage`) with a character
only provided by that font.
- Get it from `CTFontCreateUIFontForLanguage` or any of the mechanisms
for generally receiving a system font that you don't necessarily have
full control over the exact choice of font with.
- Get it from a collection created with
`CTFontCollectionCreateMatchingFontDescriptors`
- Get it from `CTFontCreateWithGraphicsFont` ... and it seems that for
now, `CGFontCreateWithFontName` lets you use these "hidden" names
without complaining :)
I wouldn't trust any of these methods to necessarily *continue* working
in to the future, since Apple seems to be trying to dissuade
*intentional* use of any particular named hidden system font.
Realistically, the `CreateForString` option is the most likely to stick
around.
If all else fails, check FireFox or Chrome's source code to see how
they're solving it in current year.
And to help it get indexed for those looking for it...
<details>
<summary>CoreText's angry message</summary>
```
CoreText note: Client requested name ".CJKSymbolsFallbackHK-Regular", it will get TimesNewRomanPSMT rather than the intended font. All system UI font access should be through proper APIs such as CTFontCreateUIFontForLanguage() or +[NSFont systemFontOfSize:].
CoreText note: Set a breakpoint on CTFontLogSystemFontNameRequest to debug.
```
</details>
Draw proper anti-aliased dots now instead of rectangles, thanks to z2d
this is very easy to do, and the results are very nice, no more weird
gaps in dotted underlines if your cell is the wrong number of pixels
across.
Use z2d to draw the undercurl instead of the manual raster code we had
before- the code was cool but unnecessarily complicated. Plus z2d lets
us have rounded caps on the undercurl which is neat.
Also make sure we won't draw off the canvas with our underlines-- the
canvas has padding but it's not infinite.
This change should give more consistent results between high and low DPI
displays, and generally approximate the authorial intent of the metrics
a little better.
Also changed the cell height adjustment to prioritize the top or bottom
when adjusting by an odd number depending on whether the face is higher
or lower in the cell than it "should" be. This should make it easier for
users who have an issue with a glyph protruding from the cell to adjust
the height and resolve it.
Add descriptions to fish shell completions
Claude Code used to understand the codebase and reason through the edge
cases (several iterations)
- Using first sentence from the Config to add description for each
config, even if sentence spans few lines
- Several options can share the same description, code doesn't duplicate
description, see screenshot in the posted issue thread below
Fixes [#9531](https://github.com/ghostty-org/ghostty/issues/9531)
The code that re-creates the font descriptor from scratch using the same
attributes also rubs off the magic dust that makes CoreText not throw a
fit at us for using a "hidden" system font (name prefixed with a dot) by
name when we use the descriptor. This means that a small subset of chars
that only have glyphs in these fallback system fonts like ".CJK Symbols
Fallback HK Regular" and ".DecoType Nastaleeq Urdu UI" would not be able
to be rendered, since when we requested the font with the non-magical
descriptor CoreText would complain in the console and give us Times New
Roman instead.
Using `CTFontDescriptorCreateCopyWithAttributes` to clear the charset
attribute instead of recreating from scratch makes the copy come out
magical, and CoreText lets us instantiate the font from it, yippee!