These now properly match the FreeType API- compared directly in the unit
tests against the values provided by the FreeType header itself.
This was ridiculously wrong before, like... wow.
FcLangSetHasLang returns FcLangResult enum values:
- FcLangEqual (0): Exact match
- FcLangDifferentTerritory (1): Same language, different territory
- FcLangDifferentLang (2): Different language
The previous comparison to FcTrue (1) caused:
- Exact matches (0) to incorrectly return false
- Partial matches (1) to incorrectly return true
This fix changes the comparison to FcLangEqual (0) so hasLang()
correctly returns true only for exact language matches.
Fixes emoji font detection which relies on checking for 'und-zsye'
language tag support.
## Summary
Adds a new environment variable `GHOSTTY_QUICK_TERMINAL=1` that is set
when a terminal surface is launched as a quick terminal. This allows
shell configurations to conditionally adjust behavior based on whether
the terminal is a quick terminal.
Closes#3985
<img width="1430" height="228" alt="CleanShot 2025-11-23 at 12 39 27
png"
src="https://github.com/user-attachments/assets/1085c19b-9d58-4603-a03d-662bfde47095"
/>
## Motivation
Quick terminals are designed as ephemeral, singleton windows for quick
commands. Many users (especially tmux users) have shell configurations
that automatically attach to or start tmux sessions on terminal launch.
This automatic behavior conflicts with the quick terminal use case.
Example from the issue:
```zsh
# Users want to skip this in quick terminals
if [[ -z "$TMUX" ]]; then
tmux attach || tmux new
fi
With this PR, users can now detect quick terminals and adjust their shell behavior:
if [[ -z "$TMUX" ]] && [[ -z "$GHOSTTY_QUICK_TERMINAL" ]]; then
tmux attach || tmux new
fi
```
## Implementation
- macOS: Added GHOSTTY_QUICK_TERMINAL=1 to SurfaceConfiguration
environment variables in QuickTerminalController.swift:345-351
- Linux/GTK: Added environment variable check in
src/apprt/gtk/class/surface.zig:1469-1472 within defaultTermioEnv()
The environment variable is automatically inherited by split surfaces.
## Note on Scripting API
I'm aware that @rhodes-b mentioned this functionality was planned for
the scripting API (issue #3985). However, I believe this simpler
environment variable approach provides immediate value for tmux users
and shell configuration use cases, while the scripting API can later
provide more comprehensive surface introspection capabilities.
## AI Assistance Disclosure
This PR was written with Claude Code assistance. The Linux/GTK
implementation was fully AI-generated. I implemented the mac version
myself after Claude Code helped me understand the codebase architecture
and locate the appropriate files.
This PR builds on https://github.com/ghostty-org/ghostty/pull/9678 ~so
the diff from there is included here (it's not possible to stack PRs
unless it's a PR against my own fork)--review that one first!~
This PR fixes an issue with the VS15/VS16 handling in terminal `print`,
where before we didn't check for valid sequences if the base code point
is an emoji. This meant that a VS15 after an emoji that isn't part of a
valid sequence would narrow the cell, despite that the emoji doesn't
have a text presentation.
This PR also uses a single `is_emoji_vs_base` instead of the
`is_emoji_vs_emoji` and `is_emoji_vs_text`. See the [uucode
comment](215ff09730/src/config.zig (L239-L254))
for why I'm recommending this.
AI was used in some of the uucode changes in
https://github.com/ghostty-org/ghostty/pull/9678 (Amp--primarily for
tests), but everything was carefully vetted and much of it done by hand.
This PR was made without AI.
This updates uucode to the latest version, with the following changes:
b309dfb4e2...31655fba3c
For this PR, the new `uucode` version has two changes that contribute to
the diff:
* The `grapheme_break` now includes `emoji_modifier` and
`emoji_modifier_base`, so we don't need to grab those from
`is_emoji_modifier` and `is_emoji_modifier_base`.
* The `wcwidth` calculation has been split into `wcwidth_standalone`
(width of a code point when it's the only code point in a grapheme) and
`wcwidth_zero_in_grapheme` (whether the code point is _not_ contributing
to the width of a multi-code-point grapheme). To keep the current
Ghostty behavior for this PR, this sets `width` to 0 if
`wcwidth_zero_in_grapheme`, but with the exception of
`is_emoji_modifier` (see comment).
* While this PR isn't affected by it, take a look at
[wcwidth.zig](https://github.com/jacobsandlund/uucode/blob/main/src/x/config_x/wcwidth.zig)
for the considerations that went into determining the width of a code
point, along with many comments.
* See
[x/grapheme.zig](https://github.com/jacobsandlund/uucode/blob/main/src/x/grapheme.zig)
for the calculation of the width of a grapheme based on the width of the
code points with exceptions, again with many comments.
* See
[resources/wcwidth](https://github.com/jacobsandlund/uucode/tree/main/resources/wcwidth)
for a comparison of other unicode libraries calculation of "wcwidth".
PRs will follow this in a moment to also take advantage of the new
`uucode` version for:
* Better grapheme segmentation
* Correct VS15/VS16 handling
* More correct cell width calculation (especially certain scripts such
as Devanagari)
This makes `cursorStyle` utilize `RenderState` to determine the
appropriate cursor style. This moves the cursor style logic outside the
critical area, although it was cheap to begin with.
This always removes `viewport_is_bottom` which had no practical use.
This makes `cursorStyle` utilize `RenderState` to determine the
appropriate cursor style. This moves the cursor style logic outside the
critical area, although it was cheap to begin with.
This always removes `viewport_is_bottom` which had no practical use.
This adds a new API called `RenderState` that produces the proper state
required for a renderer to draw a viewport and updates our renderer to
use it instead of the prior screen clone method.
The newsworthy change is here is that we've shortened the critical area
where the renderer holds the terminal lock and blocks IO by anywhere
from **2x to 5x faster**, and in about half the frames we're now in the
critical area for **zero microseconds** because `RenderState` has much
better dirty/damage tracking.
**For libghostty Zig users**, this API is available to the Zig module
and can be used to create your own renderers (to any artifact, it
doesn't have to be graphical! It is even useful for text like a tmux
clone or something).
## Differences vs Old Method
The renderer previously called `Screen.clone`. This produces a copy of
all the screen data (even stuff the renderer may not need) and attempts
to create a standalone, fully functional `Screen` allocation. I didn't
microbenchmark this to understand exactly where the slowdown was, but I
think it was based primarily in two places:
- Screens have a minimum allocation of two Ghostty "pages" which are
quite large. I think this allocation was the primary expensive part.
Without refactoring our entire Screen/PageList work, this was
unavoidable.
- The clone was unconditional and would produce a fully new Screen on
every frame.
The new structure `RenderState` is stateful and each frame calls
`update` on the prior value and it modifies itself in place. This
addresses the above two points in a few ways:
- Allocation is minimized to only what we need, no full pages.
- Since it is stateful (updating in-place), we only change dirty data
and don't need to copy everything. **Note the benchmarks below only test
_full updates_, so you aren't even seeing the benefits of this!**
- Also since it is stateful, we cache expensive calculations (such as
for selections) so future screen updates can reuse those cached values
rather than recompute them.
- We retain memory from prior updates even when the screen is dirty, so
even in dirty states, it's unlikely we allocate.
More details in the "Design" section.
## Benchmarks
Here are some microbenchmarks on the previous render critical area
versus the new one.
For each of the benchmarks below, ignore the time units (milliseconds)
and instead **focus on the relative speedup.** The benchmarks are all
doing a full render frame setup around 1000 times, because the actual
cost of a frame update is in dozens or hundreds of microseconds.
> [!NOTE]
>
> I'm still working on some more benchmarks before merging. I'll update
this space.
### Full screen, single style
<img width="1542" height="480" alt="CleanShot 2025-11-21 at 07 35 13@2x"
src="https://github.com/user-attachments/assets/c28c9f33-d1aa-4723-8a8e-3c6d70fe3667"
/>
### Full screen, plaintext
<img width="1642" height="612" alt="CleanShot 2025-11-21 at 07 36 06@2x"
src="https://github.com/user-attachments/assets/b51f57cf-7c48-46c8-a347-8ecc0bdd3d47"
/>
### Full screen, different style per cell (pathological case)
<img width="1704" height="456" alt="CleanShot 2025-11-21 at 07 37 55@2x"
src="https://github.com/user-attachments/assets/71a98250-d8d1-47ab-ae69-5e6b3b60bf2d"
/>
### Critical Area: Typing at Shell Prompt
| This PR | Main Branch |
---------|--------
| <img width="1064" height="764" alt="CleanShot 2025-11-21 at 14 44
31@2x"
src="https://github.com/user-attachments/assets/8a0ab3a1-3d68-41f0-9469-bc08a4569286"
/> | <img width="1040" height="684" alt="CleanShot 2025-11-21 at 14 47
10@2x"
src="https://github.com/user-attachments/assets/04ffa607-8841-436b-b6e9-eeeb6ee9482d"
/> |
### Critical Area: Neovim Scrolling
| This PR | Main Branch |
---------|--------
| <img width="1054" height="748" alt="CleanShot 2025-11-21 at 14 45
06@2x"
src="https://github.com/user-attachments/assets/ccafaee8-720f-41be-820d-fd705835607a"
/> | <img width="1068" height="796" alt="CleanShot 2025-11-21 at 14 47
48@2x"
src="https://github.com/user-attachments/assets/68087496-d371-4c7c-8b4c-b967dbaeaa7c"
/> |
### Critical Area: `btop -u 100`
This is closer to a pathological case, about as close as you get with a
real tool in the wild. `btop` uses hundreds of unique styles and updates
many cells across many rows very frequently (every 100ms in this case).
You can see that some of our frame times in this case are similar but
there are _so many more near-idle frames_ thanks to our dirty tracking.
| This PR | Main Branch |
---------|--------
| <img width="1088" height="900" alt="CleanShot 2025-11-21 at 14 45
44@2x"
src="https://github.com/user-attachments/assets/ea63f0eb-f06e-4d00-95a3-c55a3755cc67"
/> | <img width="1078" height="906" alt="CleanShot 2025-11-21 at 14 48
18@2x"
src="https://github.com/user-attachments/assets/cef360de-2b12-440f-8c4c-6a69b5ce4058"
/> |
### "DOOM Fire"
Fullscreen on my macOS when from 770 FPS to ~808 FPS consistently, a
solid 5% increase repeatedly.
<img width="3520" height="2392" alt="CleanShot 2025-11-21 at 07 45
29@2x"
src="https://github.com/user-attachments/assets/033effca-0abb-4ff8-a21b-83214d118d12"
/>
### IO
While this was rendering focused, the smaller critical area does help IO
performance a bit.
We've already tracked down the remaining issues to the IO thread going
to sleep and overhead with context switching. We're investigating
switching to a spin lock for the IO thread only in another track of
work.
> [!NOTE]
>
> **This is comparing with `main`, which already has a 20-30%
performance improvement over v1.2.3.**
<img width="982" height="698" alt="image"
src="https://github.com/user-attachments/assets/52a86f6c-6f09-45fe-9ac7-ca62c7ac6ee4"
/>
## Design
The design of the API is a _stateful_ `RenderState` struct that you call
`update` on each frame with the target terminal, and it only updates
what is needed. RenderState keeps track of rows, cells, hyperlinks,
selections, etc.
```zig
// Start empty
var state: terminal.RenderState = .empty;
defer state.deinit(alloc);
// Each frame update it with a terminal.
// THIS IS THE ONLY PART THAT ACCESS `t`
try state.update(alloc, &t);
// Access render data (can be outside any locking for `t`)
...
```
The ergonomics of the `RenderState` structure a wee bit clunky because
we make use of struct-of-arrays (SoA, Zig's MultiArrayList) to better
optimize cache locality for renderer data vs. update data (what we need
to update the render state is different from what we need to draw).
Once you get used to the API though, it's pretty beautiful. I mean, look
at this:
```zig
for (
0..,
row_data.items(.raw),
row_data.items(.cells),
) |y, row, cells| {
const cells_slice = cells.slice();
for (
0..,
cells_slice.items(.raw),
cells_slice.items(.grapheme),
) |x, cell, graphemes| {
```
## Improvements
This PR makes various improvements across the board:
- It bears repeating in case it was missed previously that the critical
area time of a render has gone down 2x to 5x when there is work and is
now free when there is no work (the previous implementation always did
work).
- Font shaping is much more efficient now and only requires access to a
render state.
- Selection handling is now cached and works with dirty tracking.
Previously, if you had an active selection, we'd search the entire
screen multiple times (like... once per row). Yikes.
- Hyperlink handling is _much_ more efficient. Instead of iterating
through the entire screen contents _per configured link_ we now cache
the screen contents as a string and search one whole string multiple
times. Obvious, but we didn't do this before.
- The `contrainedWidth` and `rowNeverExtendBg` helper methods are now
both much more efficient and live within the renderer package rather
than being awkwardly in the terminal package.
## Future Notes
- Our `terminal.Selection` API is very bad. It conceptually makes sense
and I understand why I designed it this way (easy) but it makes it hard
to render or manipulate performantly.
**AI Disclosure:** AI was used only to assist with writing some tests
and converting some tests. The primary logic is all organic, meatbag
produced.