Add descriptions to fish shell completions
Claude Code used to understand the codebase and reason through the edge
cases (several iterations)
- Using first sentence from the Config to add description for each
config, even if sentence spans few lines
- Several options can share the same description, code doesn't duplicate
description, see screenshot in the posted issue thread below
Fixes [#9531](https://github.com/ghostty-org/ghostty/issues/9531)
A whole bunch of optimizations in hot paths in the IO processing areas
of our code (well, one of them covers everything). I validated that each
commit either improved one or more of our vtebench results, or improved
the time it takes to process 2 years worth (2.4GB) of data from
asciinema.
## vtebench
<img width="1278" height="903" alt="image"
src="https://github.com/user-attachments/assets/bad46777-4606-4870-b7d7-8df0c4bb3b39"
/>
(I decided to patch vtebench to report in nanoseconds instead of
milliseconds since clearly it was not designed for a machine as fast as
mine. Nanoseconds gives much more useful results when the numbers are
this low.)
Do note the *slight* regression in the "unicode" test, this is probably
because I added a branch hint in `Terminal.print` in order to optimize
for printing narrow characters, since they make up the vast majority of
characters typically printed in the terminal, but the vtebench "unicode"
test is pretty much all wide characters.
This shouldn't have a negative effect on users of CJK languages since
it's a *very* slight reduction in speed and they will still be printing
many narrow characters, especially in TUIs; spaces, box drawing
characters, symbols, punctuation, etc.
## asciinema processing
I wrote a program that uses libghostty to push 2 years worth (2.4GB) of
data from publicly uploaded asciinema recordings in to the terminal as
fast as possible- since it's just libghostty, there's no renderer
overhead happening, it's just the core terminal emulation, effectively
everything that io-reader thread does if it didn't have wait for the
renderer ever.
On main, this took roughly 26.1–26.7 seconds to process, on this branch
it takes just 18.4–18.6 seconds, that's a ~30% improvement in raw IO
processing speed when processing real world data!
## Summary of changes
In order of commits:
- Fixed a bug that I hit when trying to have Ghostty process all that
asciinema data, in certain bad cases it was possible to accidentally
insert the `0` hyperlink ID in to a page, which would then cause a
lockup in ReleaseFast mode when trying to clone that page since the
string alloc would try to iterate `1..0` to allocate 0 chunks.
- I noticed in profiling Ghostty that `std.debug.assert` was showing up
in the profile, which it should not have been since its doc comment
promises that it will be optimized out in ReleaseFast- but evidently
something is wrong with Zig, or that comment's promise is based on an
expectation from LLVM that it fails to meet - but either way, by
replacing all uses of `assert` with a version that is explicitly marked
`inline`, that function call overhead in tight loops and hotpaths is
avoided. This change alone accounts for like a third of the IO
processing time improvement, though it had minimal impact on vtebench
scores.
- I optimized the SGR parser somewhat by adding branch hints and
removing the `.reset_underline` action, replacing it with `.{ .underline
= .none }`.
- Gated a somewhat expensive assert in RefCountedSet behind a runtime
safety check.
- Improved the performance of `Style.eql` and `Style.hash` since these
are hot functions, called extremely frequently since adding styles to
the style set is a very common operation. Achieved this by making `eql`
less generic - explicitly comparing each part of the style rather than
looping over fields - and ordering checks from most likely to differ to
least likely to differ so that differences can be found as soon as
possible; and changed the hash from xxhash to simply folding the packed
struct down to 64 bits and then using `std.hash.int`. Also manually
inlined the code from `std.meta.activeTag` in `Packed.fromStyle`, since
profiling showed it in the callstack and it's a single cast so it really
should not have the function call overhead.
- Explicitly marked some trivial functions as inline, the optimizer
would already have been doing this (probably) but doing it explicitly
gives the optimizer more time to spend on other things. Added cold
branch hints to "should be impossible" and error-returning paths that
should be very rare, and unlikely branch hints to a lot of "invalid"
paths- to optimize for receiving valid data.
- Removed a branch in the parser csi param action, just unconditionally
multiply by 10 before adding digit value, even if it's the first digit.
This codepath is rarely hit since we have a fast path for this in the
stream code, but the stream code already has this optimization so I just
copied it over.
- `CharsetState.charsets` used to be an `EnumArray`, but the
layout/access logic for that was less-than-ideal, and the access
functions were not inlining-- and these are very hot since we access
this for every single print, so I wrote a bespoke struct to hold that
info instead, gained a couple percent of IO perf with that.
- Added branch hints based on the data I derived from the asciinema
dump, which gave big boost to vtebench results, especially for the
cursor movement and dense cells tests (which makes sense, since cursor
movement and setting attributes both got `likely` hints :p) -- data at
https://github.com/qwerasd205/asciinema-stats
- This is probably the most invasive change in this PR: I removed the
dirty bitset from `Page` and replaced it with a dirty flag on each row,
for the majority of operations this is faster to write, since the row
being dirtied is probably already loaded and probably will be written to
for other changes as well. This gave a couple percent IO processing
improvement. The only exception is scrolling-type operations, which are
extremely efficient by just moving rows around with a single memmov, so
looping through the rows to mark each dirty slows them down, and indeed
after this change the scrolling benchmarks in vtebench regressed,
*however*...
- Added a "full page dirty" flag on `Page`, which is set when an
operation is performed that dirties most or all the rows in the page,
which is used for scrolling-type operations. This *does* make the dirty
tracking slightly less precise for these operations, but with the
caching and stuff we do in the renderer, I don't think `rebuildCells` is
a bottleneck, so rebuilding a few extra rows shouldn't hurt. After this
change, all the scrolling benchmarks in vtebench improved drastically.
- Tiny micro-improvements to RefCountedSet; streamlined the control flow
in `lookup`, added an unlikely branch hint in `insert` for the branch
that resurrects dead items since dead items aren't that common.
- Improve SGR parser performance again by using `@call(.always_inline`
to explicitly inline calls to `StaticBitSet.isSet` (for the separator
list), since I noticed they weren't being inlined, causing function call
overhead in a hotpath.
- I noticed that `clearGrapheme` and `clearHyperlink` would check every
cell in the row after they were done in order to update the
`grapheme`/`hyperlink` flag on the row if there were none left, which
isn't great since `clearCells` called these functions for multiple cells
in the same row back-to-back, which leads to a ton of excess work. I
separated the flag updating parts of these functions out and called them
only if necessary (if the cells being cleared were the full row then the
flag could unconditionally be set to false) and only after all the cells
were cleared. This gave a nice improvement to IO processing since
clearCells is evidently a very hot function.
- Removed inline annotations on `Page.clearGrapheme` and
`Page.clearHyperlink` in favor of inlining directly at the one callsite
that benefited from inlining, this improved IO processing speed.
- Inlined trivial function `Charset.table`.
- Inlined `size.getOffset` and `size.intFromBase` as they are both
trivial pointer math that often benefits from surrounding context.
---
If you'd like me to separate out the trivial improvements (branch hints,
inline annotations, 1-line changes) from the functionality-changing ones
(pretty much just the changes to dirty tracking), just let me know!
These were actually hurting performance lol, except in the places where
I added the `.always_inline` calls- for some reason if these functions
aren't inlined there it really messes up the top region scrolling
benchmark in vtebench and I'm not entirely certain why...
This PR partially addresses #4504 with a one-liner, all feedback is very
welcome.
### AI disclaimer
I used Claude Code to help navigate the various layers of the rendering
stack, to instrument a ton of intermediate now-deleted log statements,
and ultimately to identify and fix this bug. I directed it
conversationally, gave it experiments to run, audited all of its work,
threw most of it out, and finally landed on this extremely small and
simple change that fixes the issue _for me_ but I think for a lot of
other cases as well. The fix ended up being small, so hopefully it's
easy to review and discuss.
Details:
# Fix horizontal glyph spacing on non-Retina displays
On external monitors (1.0x scale, 72 DPI), text had excessive horizontal
spacing between glyphs. The issue was less noticeable on Retina displays
(2.0x scale, 144 DPI) due to higher pixel density, but the proportional
spacing error was identical.
## Before
Background is ghostty 1.2.3 from homebrew, foregreound is iTerm2:
<img width="862" height="938" alt="Screenshot 2025-10-31 at 3 04 47 PM"
src="https://github.com/user-attachments/assets/feff5279-05cc-4008-b2f5-8ea3b4d6d14b"
/>
## After
Background is ghostty tip + this change, foreground is iTerm2:
<img width="659" height="774" alt="Screenshot 2025-10-31 at 3 00 42 PM"
src="https://github.com/user-attachments/assets/702dc7f8-bb46-43ec-8156-f69d003e8a37"
/>
(my iTerm2 has some custom thickening / brightening you can see; that's
not a part of this change)
## Root Cause
The metrics calculation in Metrics.zig used `@ceil()` to round the cell
width from CoreText's glyph measurements, which created a mismatch
between cell width (which determines glyph positions) and the actual
glyph advances.
At 72 DPI with a 12pt font:
- CoreText returns glyph advance: 7.224609375 pixels
- Cell width was ceiled to: 8 pixels
- Gap per character: 0.78 pixels (~10.8% error)
At 144 DPI with the same font:
- CoreText returns glyph advance: 14.44921875 pixels (exactly 2x)
- Cell width was ceiled to: 15 pixels
- Gap per character: 0.55 pixels (~3.8% error)
The error at high DPI is much better than at low DPI, since the absolute
error is always no more than 1px.
## Fix
Changed `@ceil(face_width)` to `@round(face_width)`. This makes cell
width match the glyph advances better, reducing the error to at most
0.5px:
- 72 DPI: round(7.22) = 7
- 144 DPI: round(14.45) = 14
Height continues using `@ceil()` since vertical space can be slightly
larger without visual issues, but... should it? I'm not sure; this is a
good topic for discussion.
This improves the `clearCells` function since it only has to update once
after clearing all of the individual cells, or not at all if the whole
row was cleared since then it knows for sure that it cleared them all.
This also makes it so that the row style flag is properly tracked when
cells are cleared but not the whole row.
I am so sick and tired of people complaining that the build instructions
on the website are wrong when they clearly haven't realized the difference
between Git-based and tarball-based builds, so here's the extra work to
make sure people actually realize that
Bumps [actions/checkout](https://github.com/actions/checkout) from 5.0.0
to 5.0.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5...v5.0.1">https://github.com/actions/checkout/compare/v5...v5.0.1</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="93cb6efe18"><code>93cb6ef</code></a>
Cleanup actions/checkout@v6 auth style (<a
href="https://redirect.github.com/actions/checkout/issues/2301">#2301</a>)</li>
<li>See full diff in <a
href="08c6903cd8...93cb6efe18">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Bumps
[cachix/install-nix-action](https://github.com/cachix/install-nix-action)
from 31.8.3 to 31.8.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/cachix/install-nix-action/releases">cachix/install-nix-action's
releases</a>.</em></p>
<blockquote>
<h2>v31.8.4</h2>
<h2>What's Changed</h2>
<ul>
<li>nix: 2.32.3 -> 2.32.4 by <a
href="https://github.com/github-actions"><code>@github-actions</code></a>[bot]
in <a
href="https://redirect.github.com/cachix/install-nix-action/pull/261">cachix/install-nix-action#261</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/cachix/install-nix-action/compare/v31.8.3...v31.8.4">https://github.com/cachix/install-nix-action/compare/v31.8.3...v31.8.4</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0b0e072294"><code>0b0e072</code></a>
Merge pull request <a
href="https://redirect.github.com/cachix/install-nix-action/issues/261">#261</a>
from cachix/create-pull-request/patch</li>
<li><a
href="16d2e3294d"><code>16d2e32</code></a>
nix: 2.32.3 -> 2.32.4</li>
<li>See full diff in <a
href="7ec16f2c06...0b0e072294">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Currently, the scroller's appearance is the same as the window's.
So, with the light system appearance and the following config:
```
window-theme = system
macos-titlebar-style = native
```
It's hard to see where the scroller is. This pr changes the
scroller’s(ScrollView) appearance to match the surface's background
colour, so it's always easier to find.
> Changing `verticalScroller?.appearance` doesn't seem to work
<img width="601" height="630" alt="image"
src="https://github.com/user-attachments/assets/9dc18439-9dcb-479a-802a-de439b7dc9d8"
/>
This adds a benchmark and some test coverage for a `screen-clone`
benchmark. This benchmarks the screen cloning which is a hot spot for
lock contention for the renderer + IO threads. I wasn't able to
meaningfully speed this up, but still want to commit this benchmark.
### Problem
Custom icon configuration (`macos-icon = custom-style` with
`macos-icon-screen-color`) stopped working, reverting to the default
icon.
### Root cause
The `ColorList.clone()` method only cloned the `colors` array but not
the `colors_c` array. The Swift code reads from `colors_c` via the C API
(`ghostty_config_get`), so when configs were cloned, the C-accessible
color list was empty.
### Why it broke
This bug was introduced in the original implementation in 29929a473 (Dec
2024), but remained dormant until commit f60bdb0fa (Sep 19, 2025), which
moved the icon-setting `switch` statement into `syncAppearance()`. Since
`syncAppearance()` is called with cloned configs from
`ghosttyConfigDidChange()` (which receives cloned configs from
`Ghostty.App.swift:1639`), the icon code now ran with cloned configs
that had empty `colors_c` arrays.
### Fix
Clone both arrays in `ColorList.clone()`:
```zig
.colors = try self.colors.clone(alloc),
.colors_c = try self.colors_c.clone(alloc), // Added
```
### Testing
- Added ColorList.test.clone test case that verifies both colors and
colors_c arrays are properly cloned
- Verified test fails without the fix (expected 3 colors_c items, found
0)
- Verified test passes with the fix
- Confirmed custom icon now persists correctly with both initial config
load and subsequent config change notifications
### Discussion
I opened a discussion to report this
([9616](https://github.com/ghostty-org/ghostty/discussions/9616)) that
this PR will resolve.
> [!NOTE]
> **LLM Usage Disclosure**
> This bug was investigated and debugged with assistance from Claude
Code. The root cause analysis and fix were developed through interactive
debugging.
This could cause a 0-length hyperlink to be present in the screen,
which, in ReleaseFast, causes a lockup as the string alloc tries to
iterate `1..0` to allocate 0 chunks.
I encountered these bugs while trying to benchmark Ghostty for
performance work.
- Tmux control mode parsing would start accessing deallocated memory
after entering the "broken" state, and worse yet, would cause a
double-free once it was supposed to be deinited.
- Despite our best efforts, CoreText can still produce non-monotonic
(non-ltr) runs. Our renderer code relies on monotonic ltr ordering so in
the rare case where this happens we just sort the buffer before
returning it.
- C1 (8-bit) controls can be executed in certain parser states, so we
need to handle them in the stream's `execute` function. Luckily this was
pretty straightforward since all C1 controls are equivalent to `ESC`
followed by `C1 - 0x40`.
- `Terminal.Screen`'s `cursorScrollDown` function could cause memory
corruption because of `eraseRow` moving the cursor's tracked pin to a
different page. In fixing this, I actually reduced the complexity of
that codepath.
- **Bonus!** Added a nice helper function to `Offset.Slice` so that you
can just do `offset_slice.slice()` instead of
`offset_slice.offset.ptr(base)[0..offset_slice.len]`. Much more
readable.
### `vtebench` before/after
<img width="984" height="691" alt="image"
src="https://github.com/user-attachments/assets/ef20dcc5-d611-4763-9107-355d715a6c0b"
/>
Doesn't seem like any of these changes caused a performance regression.
It was previously possible for `eraseRow` to move the cursor pin to a
different page, and then the call to `cursorChangePin` would try to free
the cursor style from that page even though that's not the page it
belongs to, which creates memory corruption in release modes and
integrity violations or assertions in debug mode.
As a bonus, this should actually be faster this way than the old code,
since it avoids needless work that `cursorChangePin` otherwise does.
These can be unambiguously invoked in certain parser states, and as such
we need to handle them. In real world use they are extremely rare, hence
the branch hint. Without this, we get illegal behavior by trying to cast
the value to the 7-bit C0 enum.