Go to file
Mitchell Hashimoto 6529baea46 Change renderer from screen clones to new RenderState (#9662)
This adds a new API called `RenderState` that produces the proper state
required for a renderer to draw a viewport and updates our renderer to
use it instead of the prior screen clone method.

The newsworthy change is here is that we've shortened the critical area
where the renderer holds the terminal lock and blocks IO by anywhere
from **2x to 5x faster**, and in about half the frames we're now in the
critical area for **zero microseconds** because `RenderState` has much
better dirty/damage tracking.

**For libghostty Zig users**, this API is available to the Zig module
and can be used to create your own renderers (to any artifact, it
doesn't have to be graphical! It is even useful for text like a tmux
clone or something).

## Differences vs Old Method

The renderer previously called `Screen.clone`. This produces a copy of
all the screen data (even stuff the renderer may not need) and attempts
to create a standalone, fully functional `Screen` allocation. I didn't
microbenchmark this to understand exactly where the slowdown was, but I
think it was based primarily in two places:

- Screens have a minimum allocation of two Ghostty "pages" which are
quite large. I think this allocation was the primary expensive part.
Without refactoring our entire Screen/PageList work, this was
unavoidable.
- The clone was unconditional and would produce a fully new Screen on
every frame.

The new structure `RenderState` is stateful and each frame calls
`update` on the prior value and it modifies itself in place. This
addresses the above two points in a few ways:

- Allocation is minimized to only what we need, no full pages.
- Since it is stateful (updating in-place), we only change dirty data
and don't need to copy everything. **Note the benchmarks below only test
_full updates_, so you aren't even seeing the benefits of this!**
- Also since it is stateful, we cache expensive calculations (such as
for selections) so future screen updates can reuse those cached values
rather than recompute them.
- We retain memory from prior updates even when the screen is dirty, so
even in dirty states, it's unlikely we allocate.

More details in the "Design" section.

## Benchmarks

Here are some microbenchmarks on the previous render critical area
versus the new one.

For each of the benchmarks below, ignore the time units (milliseconds)
and instead **focus on the relative speedup.** The benchmarks are all
doing a full render frame setup around 1000 times, because the actual
cost of a frame update is in dozens or hundreds of microseconds.

> [!NOTE]
>
> I'm still working on some more benchmarks before merging. I'll update
this space.

### Full screen, single style

<img width="1542" height="480" alt="CleanShot 2025-11-21 at 07 35 13@2x"
src="https://github.com/user-attachments/assets/c28c9f33-d1aa-4723-8a8e-3c6d70fe3667"
/>

### Full screen, plaintext

<img width="1642" height="612" alt="CleanShot 2025-11-21 at 07 36 06@2x"
src="https://github.com/user-attachments/assets/b51f57cf-7c48-46c8-a347-8ecc0bdd3d47"
/>

### Full screen, different style per cell (pathological case)

<img width="1704" height="456" alt="CleanShot 2025-11-21 at 07 37 55@2x"
src="https://github.com/user-attachments/assets/71a98250-d8d1-47ab-ae69-5e6b3b60bf2d"
/>

### Critical Area: Typing at Shell Prompt

| This PR | Main Branch |
---------|--------
| <img width="1064" height="764" alt="CleanShot 2025-11-21 at 14 44
31@2x"
src="https://github.com/user-attachments/assets/8a0ab3a1-3d68-41f0-9469-bc08a4569286"
/> | <img width="1040" height="684" alt="CleanShot 2025-11-21 at 14 47
10@2x"
src="https://github.com/user-attachments/assets/04ffa607-8841-436b-b6e9-eeeb6ee9482d"
/> |

### Critical Area: Neovim Scrolling

| This PR | Main Branch |
---------|--------
| <img width="1054" height="748" alt="CleanShot 2025-11-21 at 14 45
06@2x"
src="https://github.com/user-attachments/assets/ccafaee8-720f-41be-820d-fd705835607a"
/> | <img width="1068" height="796" alt="CleanShot 2025-11-21 at 14 47
48@2x"
src="https://github.com/user-attachments/assets/68087496-d371-4c7c-8b4c-b967dbaeaa7c"
/> |

### Critical Area: `btop -u 100`

This is closer to a pathological case, about as close as you get with a
real tool in the wild. `btop` uses hundreds of unique styles and updates
many cells across many rows very frequently (every 100ms in this case).
You can see that some of our frame times in this case are similar but
there are _so many more near-idle frames_ thanks to our dirty tracking.

| This PR | Main Branch |
---------|--------
| <img width="1088" height="900" alt="CleanShot 2025-11-21 at 14 45
44@2x"
src="https://github.com/user-attachments/assets/ea63f0eb-f06e-4d00-95a3-c55a3755cc67"
/> | <img width="1078" height="906" alt="CleanShot 2025-11-21 at 14 48
18@2x"
src="https://github.com/user-attachments/assets/cef360de-2b12-440f-8c4c-6a69b5ce4058"
/> |

### "DOOM Fire" 

Fullscreen on my macOS when from 770 FPS to ~808 FPS consistently, a
solid 5% increase repeatedly.

<img width="3520" height="2392" alt="CleanShot 2025-11-21 at 07 45
29@2x"
src="https://github.com/user-attachments/assets/033effca-0abb-4ff8-a21b-83214d118d12"
/>


### IO 

While this was rendering focused, the smaller critical area does help IO
performance a bit.

We've already tracked down the remaining issues to the IO thread going
to sleep and overhead with context switching. We're investigating
switching to a spin lock for the IO thread only in another track of
work.

> [!NOTE]
>
> **This is comparing with `main`, which already has a 20-30%
performance improvement over v1.2.3.**

<img width="982" height="698" alt="image"
src="https://github.com/user-attachments/assets/52a86f6c-6f09-45fe-9ac7-ca62c7ac6ee4"
/>

## Design

The design of the API is a _stateful_ `RenderState` struct that you call
`update` on each frame with the target terminal, and it only updates
what is needed. RenderState keeps track of rows, cells, hyperlinks,
selections, etc.

```zig
// Start empty
var state: terminal.RenderState = .empty;
defer state.deinit(alloc);

// Each frame update it with a terminal. 
// THIS IS THE ONLY PART THAT ACCESS `t`
try state.update(alloc, &t);

// Access render data (can be outside any locking for `t`)
...
```

The ergonomics of the `RenderState` structure a wee bit clunky because
we make use of struct-of-arrays (SoA, Zig's MultiArrayList) to better
optimize cache locality for renderer data vs. update data (what we need
to update the render state is different from what we need to draw).

Once you get used to the API though, it's pretty beautiful. I mean, look
at this:

```zig
        for (
            0..,
            row_data.items(.raw),
            row_data.items(.cells),
        ) |y, row, cells| {
            const cells_slice = cells.slice();
            for (
                0..,
                cells_slice.items(.raw),
                cells_slice.items(.grapheme),
            ) |x, cell, graphemes| {
```

## Improvements

This PR makes various improvements across the board:

- It bears repeating in case it was missed previously that the critical
area time of a render has gone down 2x to 5x when there is work and is
now free when there is no work (the previous implementation always did
work).
- Font shaping is much more efficient now and only requires access to a
render state.
- Selection handling is now cached and works with dirty tracking.
Previously, if you had an active selection, we'd search the entire
screen multiple times (like... once per row). Yikes.
- Hyperlink handling is _much_ more efficient. Instead of iterating
through the entire screen contents _per configured link_ we now cache
the screen contents as a string and search one whole string multiple
times. Obvious, but we didn't do this before.
- The `contrainedWidth` and `rowNeverExtendBg` helper methods are now
both much more efficient and live within the renderer package rather
than being awkwardly in the terminal package.

## Future Notes

- Our `terminal.Selection` API is very bad. It conceptually makes sense
and I understand why I designed it this way (easy) but it makes it hard
to render or manipulate performantly.

**AI Disclosure:** AI was used only to assist with writing some tests
and converting some tests. The primary logic is all organic, meatbag
produced.
2025-11-22 06:36:05 -08:00
2025-10-06 08:47:02 -07:00
2025-11-16 00:15:33 +00:00
2025-07-29 12:10:42 -07:00
2025-11-06 09:22:18 -08:00
2025-10-15 20:04:37 -07:00
2025-11-21 16:01:22 -08:00
2025-11-06 12:56:43 -08:00
2024-02-05 21:22:27 -08:00
2023-10-07 14:51:45 -07:00
2025-10-05 20:16:42 -07:00
2025-11-06 09:22:18 -08:00
2025-07-04 14:12:18 -07:00
2025-10-30 13:14:23 -07:00
2023-12-12 11:38:39 -06:00
2025-09-05 10:10:52 +02:00

Logo
Ghostty

Fast, native, feature-rich terminal emulator pushing modern features.
About · Download · Documentation · Contributing · Developing

About

Ghostty is a terminal emulator that differentiates itself by being fast, feature-rich, and native. While there are many excellent terminal emulators available, they all force you to choose between speed, features, or native UIs. Ghostty provides all three.

In all categories, I am not trying to claim that Ghostty is the best (i.e. the fastest, most feature-rich, or most native). But Ghostty is competitive in all three categories and Ghostty doesn't make you choose between them.

Ghostty also intends to push the boundaries of what is possible with a terminal emulator by exposing modern, opt-in features that enable CLI tool developers to build more feature rich, interactive applications.

While aiming for this ambitious goal, our first step is to make Ghostty one of the best fully standards compliant terminal emulator, remaining compatible with all existing shells and software while supporting all of the latest terminal innovations in the ecosystem. You can use Ghostty as a drop-in replacement for your existing terminal emulator.

For more details, see About Ghostty.

Download

See the download page on the Ghostty website.

Documentation

See the documentation on the Ghostty website.

Contributing and Developing

If you have any ideas, issues, etc. regarding Ghostty, or would like to contribute to Ghostty through pull requests, please check out our "Contributing to Ghostty" document. Those who would like to get involved with Ghostty's development as well should also read the "Developing Ghostty" document for more technical details.

Roadmap and Status

The high-level ambitious plan for the project, in order:

# Step Status
1 Standards-compliant terminal emulation
2 Competitive performance
3 Basic customizability -- fonts, bg colors, etc.
4 Richer windowing features -- multi-window, tabbing, panes
5 Native Platform Experiences (i.e. Mac Preference Panel) ⚠️
6 Cross-platform libghostty for Embeddable Terminals ⚠️
7 Windows Terminals (including PowerShell, Cmd, WSL)
N Fancy features (to be expanded upon later)

Additional details for each step in the big roadmap below:

Standards-Compliant Terminal Emulation

Ghostty implements enough control sequences to be used by hundreds of testers daily for over the past year. Further, we've done a comprehensive xterm audit comparing Ghostty's behavior to xterm and building a set of conformance test cases.

We believe Ghostty is one of the most compliant terminal emulators available.

Terminal behavior is partially a de jure standard (i.e. ECMA-48) but mostly a de facto standard as defined by popular terminal emulators worldwide. Ghostty takes the approach that our behavior is defined by (1) standards, if available, (2) xterm, if the feature exists, (3) other popular terminals, in that order. This defines what the Ghostty project views as a "standard."

Competitive Performance

We need better benchmarks to continuously verify this, but Ghostty is generally in the same performance category as the other highest performing terminal emulators.

For rendering, we have a multi-renderer architecture that uses OpenGL on Linux and Metal on macOS. As far as I'm aware, we're the only terminal emulator other than iTerm that uses Metal directly. And we're the only terminal emulator that has a Metal renderer that supports ligatures (iTerm uses a CPU renderer if ligatures are enabled). We can maintain around 60fps under heavy load and much more generally -- though the terminal is usually rendering much lower due to little screen changes.

For IO, we have a dedicated IO thread that maintains very little jitter under heavy IO load (i.e. cat <big file>.txt). On benchmarks for IO, we're usually within a small margin of other fast terminal emulators. For example, reading a dump of plain text is 4x faster compared to iTerm and Kitty, and 2x faster than Terminal.app. Alacritty is very fast but we're still around the same speed (give or take) and our app experience is much more feature rich.

Note

Despite being very fast, there is a lot of room for improvement here.

Richer Windowing Features

The Mac and Linux (build with GTK) apps support multi-window, tabbing, and splits.

Native Platform Experiences

Ghostty is a cross-platform terminal emulator but we don't aim for a least-common-denominator experience. There is a large, shared core written in Zig but we do a lot of platform-native things:

  • The macOS app is a true SwiftUI-based application with all the things you would expect such as real windowing, menu bars, a settings GUI, etc.
  • macOS uses a true Metal renderer with CoreText for font discovery.
  • The Linux app is built with GTK.

There are more improvements to be made. The macOS settings window is still a work-in-progress. Similar improvements will follow with Linux.

Cross-platform libghostty for Embeddable Terminals

In addition to being a standalone terminal emulator, Ghostty is a C-compatible library for embedding a fast, feature-rich terminal emulator in any 3rd party project. This library is called libghostty.

Due to the scope of this project, we're breaking libghostty down into separate actually libraries, starting with libghostty-vt. The goal of this project is to focus on parsing terminal sequences and maintaining terminal state. This is covered in more detail in this blog post.

libghostty-vt is already available and usable today for Zig and C and is compatible for macOS, Linux, Windows, and WebAssembly. At the time of writing this, the API isn't stable yet and we haven't tagged an official release, but the core logic is well proven (since Ghostty uses it) and we're working hard on it now.

The ultimate goal is not hypothetical! The macOS app is a libghostty consumer. The macOS app is a native Swift app developed in Xcode and main() is within Swift. The Swift app links to libghostty and uses the C API to render terminals.

Crash Reports

Ghostty has a built-in crash reporter that will generate and save crash reports to disk. The crash reports are saved to the $XDG_STATE_HOME/ghostty/crash directory. If $XDG_STATE_HOME is not set, the default is ~/.local/state. Crash reports are not automatically sent anywhere off your machine.

Crash reports are only generated the next time Ghostty is started after a crash. If Ghostty crashes and you want to generate a crash report, you must restart Ghostty at least once. You should see a message in the log that a crash report was generated.

Note

Use the ghostty +crash-report CLI command to get a list of available crash reports. A future version of Ghostty will make the contents of the crash reports more easily viewable through the CLI and GUI.

Crash reports end in the .ghosttycrash extension. The crash reports are in Sentry envelope format. You can upload these to your own Sentry account to view their contents, but the format is also publicly documented so any other available tools can also be used. The ghostty +crash-report CLI command can be used to list any crash reports. A future version of Ghostty will show you the contents of the crash report directly in the terminal.

To send the crash report to the Ghostty project, you can use the following CLI command using the Sentry CLI:

SENTRY_DSN=https://e914ee84fd895c4fe324afa3e53dac76@o4507352570920960.ingest.us.sentry.io/4507850923638784 sentry-cli send-envelope --raw <path to ghostty crash>

Warning

The crash report can contain sensitive information. The report doesn't purposely contain sensitive information, but it does contain the full stack memory of each thread at the time of the crash. This information is used to rebuild the stack trace but can also contain sensitive data depending on when the crash occurred.

Description
👻 Ghostty is a fast, feature-rich, and cross-platform terminal emulator that uses platform-native UI and GPU acceleration.
Readme MIT 377 MiB
Languages
Zig 78.6%
Swift 11.5%
C 6.8%
Shell 0.6%
HTML 0.6%
Other 1.8%