buf to cache line for consistency (#8569)
This aligns the `buf` of `4096` bytes in the benchmarks to the cache line, to ensure a consistent number of cache lines are used, and also to avoid any sub-`usize` alignment issues as seen in https://github.com/ghostty-org/ghostty/pull/8548. This has less of an effect as https://github.com/ghostty-org/ghostty/pull/8548, and looking at the before and after of the current benchmarks in the repo doesn't show any noticeable difference. In my case, I've been comparing the `table` option with [uucode in this branch](https://github.com/ghostty-org/ghostty/compare/main...jacobsandlund:jacob/uucode?expand=1), and I did see a difference. ### Before I ran the before code several times (6 with the exact same binary, but several more with essentially the same code), always getting something like this, with `table` edging out `uucode` by something like 3-4ms: ``` Benchmark 1: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table Time (mean ± σ): 927.8 ms ± 1.3 ms [User: 883.7 ms, System: 42.5 ms] Range (min … max): 926.0 ms … 929.8 ms 10 runs Benchmark 2: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode Time (mean ± σ): 930.9 ms ± 1.4 ms [User: 886.8 ms, System: 42.5 ms] Range (min … max): 928.5 ms … 933.4 ms 10 runs ``` ### After After this change, it shows `uucode` coming in at 10-11ms (~1%) faster: ``` Benchmark 1: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table Time (mean ± σ): 930.6 ms ± 1.3 ms [User: 886.5 ms, System: 42.4 ms] Range (min … max): 928.9 ms … 932.4 ms 10 runs Benchmark 2: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode Time (mean ± σ): 920.1 ms ± 1.4 ms [User: 876.3 ms, System: 42.1 ms] Range (min … max): 918.4 ms … 923.3 ms 10 runs Summary zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode ran 1.01 ± 0.00 times faster than zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table ``` This ~1% faster time checks out, since from looking at the assembly, it's an exact match minus this small place where the compiler can optimize `uucode` a little better: ``` # both table.asm/uucode.asm: 140 const high = cp >> 8; 141 const low = cp & 0xFF; ** 142 return self.stage3[self.stage2[self.stage1[high] + low]]; <+464>: ubfx x12, x11, #8, #13 <+468>: ldrh w12, [x27, x12, lsl #1] <+472>: add x11, x28, w11, uxtb #1 <+476>: ldrh w11, [x11, x12, lsl #1] # table.asm: <+480>: lsl x11, x11, #1 ** 158 table.get(@intCast(cp)).width); 159 } 160 } <+484>: ldrb w11, [x22, x11] # uucode.asm: ** 148 return @field(data(stages, cp), name); <+480>: ldrh w11, [x22, x11, lsl #1] ``` ### More confusion with showing addresses Confusingly, when I added `std.debug.print("buf addr={}\n", .{@intFromPtr(&buf)})` to show the addresses, this somehow made the `before` case show `uucode` as being faster. Then, when I added alignment, `uucode` and `table` were taking about the same time (**edit:** _uucode was only ~4 ms faster, but see more in "Edit: more investigation"_) If I run without the `std.debug.print` and with `--show-output`, the times are different, so just making a note of this. ``` Benchmark 1: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table Time (mean ± σ): 904.2 ms ± 1.2 ms [User: 884.6 ms, System: 40.3 ms] Range (min … max): 902.8 ms … 906.1 ms 10 runs Benchmark 2: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode Time (mean ± σ): 892.7 ms ± 2.0 ms [User: 873.2 ms, System: 40.1 ms] Range (min … max): 887.9 ms … 895.6 ms 10 runs Summary zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode ran 1.01 ± 0.00 times faster than zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table ``` I think, even with this confusing case, aligning is going to be more consistent than not. ### Edit: more investigation I wasn't satisfied with the discovery that adding `std.debug.print` made this difference and I wanted to dig in and figure out exactly what's going on, but I didn't get a satisfactory answer. Here's what I tried: * I compared the un-aligned addresses from `stepTable` and `stepUucode`, but both seemed similar (not aligned to 128, different each run, but aligned to 8). Note though that `uucode` was running ~1% faster still, similar to the aligned case even though here it was un-aligned. * Instead of doing `std.debug.print` in the step function, I printed in teardown, just in case. This had no difference in the unaligned case, but with alignment it brought the ~4 ms faster `uucode` (as noted above) back closer to the original "after" at around 11-12 ms faster (~1%). * I forced the `buf` in `stepUucode` to not be aligned (e.g. by making it `= other_aligned_buf[3..4096 + 3]`). Still it was ~1% faster. * I compared the assembly of `stepTable` and `stepUucode` for both aligned and not aligned cases, including doing a diff of the diff of these two across aligned and not aligned. The only difference between `stepTable` and `stepUucode` is what's noted above, and nothing stood out in the double diff. * I tried going back to the original un-aligned non-printing code, but then swapped the lines that get from `table` or `uucode`, so that `stepTable` and `stepUucode` were actually doing the opposite. And the result is`stepTable` (actually `uucode`) was 10-11 ms (~1%) faster, just like the aligned case! In summary, I wasn't able to replicate the original benchmark behavior _and print out buffer addresses that pointed to alignment being the issue_. I still feel like in theory aligning the buffer ought to make the benchmark more reliable, and indeed the original un-aligned version gives the result that is more of an outlier, but the evidence here is weak, so I'm alright if we stick with the status quo and close. I think a lesson here is benchmarks are hard to get precise.
Fast, native, feature-rich terminal emulator pushing modern features.
About
·
Download
·
Documentation
·
Contributing
·
Developing
About
Ghostty is a terminal emulator that differentiates itself by being fast, feature-rich, and native. While there are many excellent terminal emulators available, they all force you to choose between speed, features, or native UIs. Ghostty provides all three.
In all categories, I am not trying to claim that Ghostty is the best (i.e. the fastest, most feature-rich, or most native). But Ghostty is competitive in all three categories and Ghostty doesn't make you choose between them.
Ghostty also intends to push the boundaries of what is possible with a terminal emulator by exposing modern, opt-in features that enable CLI tool developers to build more feature rich, interactive applications.
While aiming for this ambitious goal, our first step is to make Ghostty one of the best fully standards compliant terminal emulator, remaining compatible with all existing shells and software while supporting all of the latest terminal innovations in the ecosystem. You can use Ghostty as a drop-in replacement for your existing terminal emulator.
For more details, see About Ghostty.
Download
See the download page on the Ghostty website.
Documentation
See the documentation on the Ghostty website.
Contributing and Developing
If you have any ideas, issues, etc. regarding Ghostty, or would like to contribute to Ghostty through pull requests, please check out our "Contributing to Ghostty" document. Those who would like to get involved with Ghostty's development as well should also read the "Developing Ghostty" document for more technical details.
Roadmap and Status
The high-level ambitious plan for the project, in order:
| # | Step | Status |
|---|---|---|
| 1 | Standards-compliant terminal emulation | ✅ |
| 2 | Competitive performance | ✅ |
| 3 | Basic customizability -- fonts, bg colors, etc. | ✅ |
| 4 | Richer windowing features -- multi-window, tabbing, panes | ✅ |
| 5 | Native Platform Experiences (i.e. Mac Preference Panel) | ⚠️ |
| 6 | Cross-platform libghostty for Embeddable Terminals |
⚠️ |
| 7 | Windows Terminals (including PowerShell, Cmd, WSL) | ❌ |
| N | Fancy features (to be expanded upon later) | ❌ |
Additional details for each step in the big roadmap below:
Standards-Compliant Terminal Emulation
Ghostty implements enough control sequences to be used by hundreds of testers daily for over the past year. Further, we've done a comprehensive xterm audit comparing Ghostty's behavior to xterm and building a set of conformance test cases.
We believe Ghostty is one of the most compliant terminal emulators available.
Terminal behavior is partially a de jure standard (i.e. ECMA-48) but mostly a de facto standard as defined by popular terminal emulators worldwide. Ghostty takes the approach that our behavior is defined by (1) standards, if available, (2) xterm, if the feature exists, (3) other popular terminals, in that order. This defines what the Ghostty project views as a "standard."
Competitive Performance
We need better benchmarks to continuously verify this, but Ghostty is generally in the same performance category as the other highest performing terminal emulators.
For rendering, we have a multi-renderer architecture that uses OpenGL on Linux and Metal on macOS. As far as I'm aware, we're the only terminal emulator other than iTerm that uses Metal directly. And we're the only terminal emulator that has a Metal renderer that supports ligatures (iTerm uses a CPU renderer if ligatures are enabled). We can maintain around 60fps under heavy load and much more generally -- though the terminal is usually rendering much lower due to little screen changes.
For IO, we have a dedicated IO thread that maintains very little jitter
under heavy IO load (i.e. cat <big file>.txt). On benchmarks for IO,
we're usually within a small margin of other fast terminal emulators.
For example, reading a dump of plain text is 4x faster compared to iTerm and
Kitty, and 2x faster than Terminal.app. Alacritty is very fast but we're still
around the same speed (give or take) and our app experience is much more
feature rich.
Note
Despite being very fast, there is a lot of room for improvement here.
Richer Windowing Features
The Mac and Linux (build with GTK) apps support multi-window, tabbing, and splits.
Native Platform Experiences
Ghostty is a cross-platform terminal emulator but we don't aim for a least-common-denominator experience. There is a large, shared core written in Zig but we do a lot of platform-native things:
- The macOS app is a true SwiftUI-based application with all the things you would expect such as real windowing, menu bars, a settings GUI, etc.
- macOS uses a true Metal renderer with CoreText for font discovery.
- The Linux app is built with GTK.
There are more improvements to be made. The macOS settings window is still a work-in-progress. Similar improvements will follow with Linux.
Cross-platform libghostty for Embeddable Terminals
In addition to being a standalone terminal emulator, Ghostty is a
C-compatible library for embedding a fast, feature-rich terminal emulator
in any 3rd party project. This library is called libghostty.
This goal is not hypothetical! The macOS app is a libghostty consumer.
The macOS app is a native Swift app developed in Xcode and main() is
within Swift. The Swift app links to libghostty and uses the C API to
render terminals.
This step encompasses expanding libghostty support to more platforms
and more use cases. At the time of writing this, libghostty is very
Mac-centric -- particularly around rendering -- and we have work to do to
expand this to other platforms.
Crash Reports
Ghostty has a built-in crash reporter that will generate and save crash
reports to disk. The crash reports are saved to the $XDG_STATE_HOME/ghostty/crash
directory. If $XDG_STATE_HOME is not set, the default is ~/.local/state.
Crash reports are not automatically sent anywhere off your machine.
Crash reports are only generated the next time Ghostty is started after a crash. If Ghostty crashes and you want to generate a crash report, you must restart Ghostty at least once. You should see a message in the log that a crash report was generated.
Note
Use the
ghostty +crash-reportCLI command to get a list of available crash reports. A future version of Ghostty will make the contents of the crash reports more easily viewable through the CLI and GUI.
Crash reports end in the .ghosttycrash extension. The crash reports are in
Sentry envelope format. You can
upload these to your own Sentry account to view their contents, but the format
is also publicly documented so any other available tools can also be used.
The ghostty +crash-report CLI command can be used to list any crash reports.
A future version of Ghostty will show you the contents of the crash report
directly in the terminal.
To send the crash report to the Ghostty project, you can use the following CLI command using the Sentry CLI:
SENTRY_DSN=https://e914ee84fd895c4fe324afa3e53dac76@o4507352570920960.ingest.us.sentry.io/4507850923638784 sentry-cli send-envelope --raw <path to ghostty crash>
Warning
The crash report can contain sensitive information. The report doesn't purposely contain sensitive information, but it does contain the full stack memory of each thread at the time of the crash. This information is used to rebuild the stack trace but can also contain sensitive data depending when the crash occurred.