A whole bunch of optimizations in hot paths in the IO processing areas
of our code (well, one of them covers everything). I validated that each
commit either improved one or more of our vtebench results, or improved
the time it takes to process 2 years worth (2.4GB) of data from
asciinema.
## vtebench
<img width="1278" height="903" alt="image"
src="https://github.com/user-attachments/assets/bad46777-4606-4870-b7d7-8df0c4bb3b39"
/>
(I decided to patch vtebench to report in nanoseconds instead of
milliseconds since clearly it was not designed for a machine as fast as
mine. Nanoseconds gives much more useful results when the numbers are
this low.)
Do note the *slight* regression in the "unicode" test, this is probably
because I added a branch hint in `Terminal.print` in order to optimize
for printing narrow characters, since they make up the vast majority of
characters typically printed in the terminal, but the vtebench "unicode"
test is pretty much all wide characters.
This shouldn't have a negative effect on users of CJK languages since
it's a *very* slight reduction in speed and they will still be printing
many narrow characters, especially in TUIs; spaces, box drawing
characters, symbols, punctuation, etc.
## asciinema processing
I wrote a program that uses libghostty to push 2 years worth (2.4GB) of
data from publicly uploaded asciinema recordings in to the terminal as
fast as possible- since it's just libghostty, there's no renderer
overhead happening, it's just the core terminal emulation, effectively
everything that io-reader thread does if it didn't have wait for the
renderer ever.
On main, this took roughly 26.1–26.7 seconds to process, on this branch
it takes just 18.4–18.6 seconds, that's a ~30% improvement in raw IO
processing speed when processing real world data!
## Summary of changes
In order of commits:
- Fixed a bug that I hit when trying to have Ghostty process all that
asciinema data, in certain bad cases it was possible to accidentally
insert the `0` hyperlink ID in to a page, which would then cause a
lockup in ReleaseFast mode when trying to clone that page since the
string alloc would try to iterate `1..0` to allocate 0 chunks.
- I noticed in profiling Ghostty that `std.debug.assert` was showing up
in the profile, which it should not have been since its doc comment
promises that it will be optimized out in ReleaseFast- but evidently
something is wrong with Zig, or that comment's promise is based on an
expectation from LLVM that it fails to meet - but either way, by
replacing all uses of `assert` with a version that is explicitly marked
`inline`, that function call overhead in tight loops and hotpaths is
avoided. This change alone accounts for like a third of the IO
processing time improvement, though it had minimal impact on vtebench
scores.
- I optimized the SGR parser somewhat by adding branch hints and
removing the `.reset_underline` action, replacing it with `.{ .underline
= .none }`.
- Gated a somewhat expensive assert in RefCountedSet behind a runtime
safety check.
- Improved the performance of `Style.eql` and `Style.hash` since these
are hot functions, called extremely frequently since adding styles to
the style set is a very common operation. Achieved this by making `eql`
less generic - explicitly comparing each part of the style rather than
looping over fields - and ordering checks from most likely to differ to
least likely to differ so that differences can be found as soon as
possible; and changed the hash from xxhash to simply folding the packed
struct down to 64 bits and then using `std.hash.int`. Also manually
inlined the code from `std.meta.activeTag` in `Packed.fromStyle`, since
profiling showed it in the callstack and it's a single cast so it really
should not have the function call overhead.
- Explicitly marked some trivial functions as inline, the optimizer
would already have been doing this (probably) but doing it explicitly
gives the optimizer more time to spend on other things. Added cold
branch hints to "should be impossible" and error-returning paths that
should be very rare, and unlikely branch hints to a lot of "invalid"
paths- to optimize for receiving valid data.
- Removed a branch in the parser csi param action, just unconditionally
multiply by 10 before adding digit value, even if it's the first digit.
This codepath is rarely hit since we have a fast path for this in the
stream code, but the stream code already has this optimization so I just
copied it over.
- `CharsetState.charsets` used to be an `EnumArray`, but the
layout/access logic for that was less-than-ideal, and the access
functions were not inlining-- and these are very hot since we access
this for every single print, so I wrote a bespoke struct to hold that
info instead, gained a couple percent of IO perf with that.
- Added branch hints based on the data I derived from the asciinema
dump, which gave big boost to vtebench results, especially for the
cursor movement and dense cells tests (which makes sense, since cursor
movement and setting attributes both got `likely` hints :p) -- data at
https://github.com/qwerasd205/asciinema-stats
- This is probably the most invasive change in this PR: I removed the
dirty bitset from `Page` and replaced it with a dirty flag on each row,
for the majority of operations this is faster to write, since the row
being dirtied is probably already loaded and probably will be written to
for other changes as well. This gave a couple percent IO processing
improvement. The only exception is scrolling-type operations, which are
extremely efficient by just moving rows around with a single memmov, so
looping through the rows to mark each dirty slows them down, and indeed
after this change the scrolling benchmarks in vtebench regressed,
*however*...
- Added a "full page dirty" flag on `Page`, which is set when an
operation is performed that dirties most or all the rows in the page,
which is used for scrolling-type operations. This *does* make the dirty
tracking slightly less precise for these operations, but with the
caching and stuff we do in the renderer, I don't think `rebuildCells` is
a bottleneck, so rebuilding a few extra rows shouldn't hurt. After this
change, all the scrolling benchmarks in vtebench improved drastically.
- Tiny micro-improvements to RefCountedSet; streamlined the control flow
in `lookup`, added an unlikely branch hint in `insert` for the branch
that resurrects dead items since dead items aren't that common.
- Improve SGR parser performance again by using `@call(.always_inline`
to explicitly inline calls to `StaticBitSet.isSet` (for the separator
list), since I noticed they weren't being inlined, causing function call
overhead in a hotpath.
- I noticed that `clearGrapheme` and `clearHyperlink` would check every
cell in the row after they were done in order to update the
`grapheme`/`hyperlink` flag on the row if there were none left, which
isn't great since `clearCells` called these functions for multiple cells
in the same row back-to-back, which leads to a ton of excess work. I
separated the flag updating parts of these functions out and called them
only if necessary (if the cells being cleared were the full row then the
flag could unconditionally be set to false) and only after all the cells
were cleared. This gave a nice improvement to IO processing since
clearCells is evidently a very hot function.
- Removed inline annotations on `Page.clearGrapheme` and
`Page.clearHyperlink` in favor of inlining directly at the one callsite
that benefited from inlining, this improved IO processing speed.
- Inlined trivial function `Charset.table`.
- Inlined `size.getOffset` and `size.intFromBase` as they are both
trivial pointer math that often benefits from surrounding context.
---
If you'd like me to separate out the trivial improvements (branch hints,
inline annotations, 1-line changes) from the functionality-changing ones
(pretty much just the changes to dirty tracking), just let me know!
If a UTF-8 byte order mark starts a config file, it should be ignored.
This also refactors config file loading a bit to reduce redundant code
and to make it possible to test loading config from a file.
Fixes#9490
This replaces the logic of Screen.selectionString with calls to
ScreenFormatter.
This means that all our various selection-based features like copying to
clipboards now uses the new formatter. The formatter code is now
user-facing.
This forced us to pass all selectionString tests which revealed some
edge cases that were not handled correctly before in the formatter! The
formatter now handles:
- Plain text now emits `\n` instead of `\r\n`. VT emits `\r\n`
- Rectangular selections
- Various wide character edge cases
- Selection is now inclusive on the end, not exclusive
Closes#8430
A few questions:
* Should I set a default keybind for `toggle-mouse-reporting`? The issue
mentioned one, it's currently unset.
* Am I handling the `toggle-mouse-reporting` action properly in
`performAction` (gtk) / `action` (macos)?
Copilot was used to understand the codebase, but code was authored
manually.
- update nixpkgs now that Zig 0.15.2 is available in nixpkgs
- drop hack that worked around compile failures on systems with more
than 32 cores
- enforce patch version of Zig
Resolves#8689
For various reason, ghostty wants to have a unique file extension for
the config files. The name was settled on `config.ghostty`. This will
help with tooling. See #8438 (original discussion) for more details.
This PR introduces the preferred default of `.ghostty` while still
supporting the previous `config` file. If both files exist, a warning
log is sent.
The docs / website will need to be updated to reflect this change.
> [!NOTE]
> Only tested on macOS 26.0.
---------
Co-authored-by: Mitchell Hashimoto <m@mitchellh.com>
As pointed out in #9156, an unintended consequence of all the work to
get icon sizing right is that `adjust-icon-height` now only applies to
the small icons you get when the next cell is not whitespace. Large
icons are unaffected.
With this PR, `adjust-icon-height` affects the maximum height of every
symbol specifying the `.icon` constraint height, regardless of
constraint width. This includes most Nerd Font icons, but excludes emoji
and other unicode symbols, and also excludes terminal graphics-oriented
Nerd Font symbols such as Powerline symbols.
In the following screenshots, **Baseline** is without
`adjust-icon-height`, while **Before** and **After** are with
`adjust-icon-height = -25%`.
**Baseline**
<img width="711" height="95" alt="Screenshot 2025-10-11 at 23 28 20"
src="https://github.com/user-attachments/assets/7499db4d-75a4-4dbd-b107-8cb5849e31a3"
/>
**Before** (only small icons affected)
<img width="711" height="95" alt="Screenshot 2025-10-11 at 23 20 12"
src="https://github.com/user-attachments/assets/9afd9fbf-ef25-44cc-9d8e-c39a69875163"
/>
**After** (both small and large icons affected, but not emoji)
<img width="711" height="95" alt="Screenshot 2025-10-11 at 23 21 05"
src="https://github.com/user-attachments/assets/90999f59-3b43-4684-9c8e-2c3c1edd6d18"
/>
This modernizes `KeyEncoder` to a new `std.Io.Writer`-based API.
Additionally, instead of a single struct, it is now an `encode` function
that takes a series of more focused options. This is more idiomatic Zig
while also making it easier to expose via libghostty-vt.
libghostty-vt also gains access to key encoding APIs.
Fixes#8991
Uses OSC 133 esc sequences to keep track of how long commands take to
execute. If the user chooses, commands that take longer than a user
specified limit will trigger a notification. The user can choose between
a bell notification or a desktop notification.
Fixes#8991
Uses OSC 133 esc sequences to keep track of how long commands take to
execute. If the user chooses, commands that take longer than a user
specified limit will trigger a notification. The user can choose between
a bell notification or a desktop notification.
Fixes#8849
Previously, the `parseAutoStruct` function that was used to parse
generic structs for the config simply split the input value on commas
without taking into account quoting or escapes. This led to problems
because it was impossible to include a comma in the value of config
entries that were parsed by `parseAutoStruct`. This is particularly
problematic because `ghostty +show-config --default` would produce
output like the following:
```
command-palette-entry = title:Focus Split: Next,description:Focus the next split, if any.,action:goto_split:next
```
Because the `description` contains a comma, Ghostty is unable to parse
this correctly. The value would be split into four parts:
```
title:Focus Split: Next
description:Focus the next split
if any.
action:goto_split:next
```
Instead of three parts:
```
title:Focus Split: Next
description:Focus the next split, if any.
action:goto_split:next
```
Because `parseAutoStruct` simply looked for commas to split on, no
amount of quoting or escaping would allow that to be parsed correctly.
This is fixed by (1) introducing a parser that will split the input to
`parseAutoStruct` into fields while taking into account quotes and
escaping. And (2) changing the `ghostty +show-config` output to put the
values in `command-palette-entry` into quotes so that Ghostty can parse
it's own output.
`parseAutoStruct` will also now parse double quoted values as a Zig
string literal. This makes it easier to embed control codes, whitespace,
and commas in values.