I used the new CPU counter mode in Instruments.app to track down
functions that had instruction delivery bottlenecks (indicating i-cache
misses) and picked a bunch of trivial functions to mark as inline (plus
a couple that are only used once or twice and which benefit from
inlining).
The size of `macos-arm64/libghostty-fat.a` built with `zig build
-Doptimize=ReleaseFast -Dxcframework-target=native` goes from
`145,538,856` bytes on `main` to `145,595,952` on this branch, a
negligible increase.
These changes resulted in some pretty sizable improvements in vtebench
results on my machine (Apple M3 Max):
<img width="983" height="696" alt="image"
src="https://github.com/user-attachments/assets/cac595ca-7616-48ed-983c-208c2ca2023f"
/>
With this, the only vtebench test we're slower than Alacritty in (on my
machine, at 130x51 window size) is `dense_cells` (which, IMO, is so
artificial that optimizing for it might actually negatively impact real
world performance).
I also did a pretty simple improvement to how we copy the screen in the
renderer, gave it its own page pool for less memory churn. Further
optimization in that area should be explored since in some scenarios it
seems like as much as 35% of the time on the `io-reader` thread is spent
waiting for the lock.
> [!NOTE]
> Before this is merged, someone really ought to test this on an x86
processor to see how the performance compares there, since this *is*
tuning for my processor specifically, and I know that M chips have
pretty big i-cache compared to some x86 processors which could impact
the performance characteristics of these changes.
Powerline glyphs were treated as whitespace, giving the preceding cell a
constraint width of 2 and cutting off icons in people's prompts and
statuslines. It is however correct to not treat Powerline glyphs like
other Nerd Font symbols; they should simply be treated as normal
characters, just like their relatives in the block elements unicode
block.
This resolves
https://discord.com/channels/1005603569187160125/1417236683266592798
(never promoted to an issue, but real and easy to reproduce).
**Tip**
<img width="215" height="63" alt="Screenshot 2025-09-21 at 16 57 58"
src="https://github.com/user-attachments/assets/81e770c5-d688-4d8e-839c-1f4288703c06"
/>
**This PR**
<img width="215" height="63" alt="Screenshot 2025-09-21 at 16 58 42"
src="https://github.com/user-attachments/assets/5d2dd770-0314-46f6-99b5-237a0933998e"
/>
The constraint width logic was untested but contains some quite subtle
interactions, so I wrote a suite of tests covering the cases I'm aware
of.
While working on this code I also resolved a TODO comment to add all the
box drawing/block element type characters to the set of codepoints
excluded from the minimum contrast settings.
Without this change, a phantom space appears after any character with
default emoji presentation that is converted to text with VS15. The only
other terminal I know of that respects variation selectors is Kitty, and
it walks the cursor back, which feels like the best choice, since that
way the behavior is observable (no way to know if the terminal supports
variation selectors otherwise without hard-coding that info per term)
and "dumb" programs like `cat` will output things correctly, and not
gain a phantom space after any VS15'd emoji.
This is a small, but I think worthwhile micro optimization in style.zig,
I uncovered while investigating wider ranging optimizations in the
rendering section.
For me it results in ~4-5% increase in fps for DOOM-fire-zig benchmark,
which maximally stresses this code path.
Comparing the fields directly is actually faster than PackedStyle.
I wrote the code in style.zig, claude 4 wrote the benchmark, all PR
responses will be generated by jcm-slow-1
Style.eql Benchmark Comparison
==============================
Test: Small (1K pairs, 50% equal)
--------------------------------------------------
New implementation:
Iterations: 49937
Duration: 500.01 ms
Throughput: 99872402 comparisons/sec
Old implementation:
Iterations: 8508
Duration: 500.06 ms
Throughput: 17014026 comparisons/sec
Performance improvement:
Speedup: 5.87x
Improvement: +487.0%
Test: Medium (10K pairs, 50% equal)
--------------------------------------------------
New implementation:
Iterations: 4435
Duration: 500.09 ms
Throughput: 88684746 comparisons/sec
Old implementation:
Iterations: 850
Duration: 500.50 ms
Throughput: 16983017 comparisons/sec
Performance improvement:
Speedup: 5.22x
Improvement: +422.2%
Test: Large (50K pairs, 50% equal)
--------------------------------------------------
New implementation:
Iterations: 861
Duration: 500.41 ms
Throughput: 86030144 comparisons/sec
Old implementation:
Iterations: 171
Duration: 501.70 ms
Throughput: 17041989 comparisons/sec
Performance improvement:
Speedup: 5.05x
Improvement: +404.8%
Test: Mostly equal (10K pairs, 90% equal)
--------------------------------------------------
New implementation:
Iterations: 4608
Duration: 500.03 ms
Throughput: 92154471 comparisons/sec
Old implementation:
Iterations: 854
Duration: 500.45 ms
Throughput: 17064744 comparisons/sec
Performance improvement:
Speedup: 5.40x
Improvement: +440.0%
Test: Mostly different (10K pairs, 10% equal)
--------------------------------------------------
New implementation:
Iterations: 4065
Duration: 500.03 ms
Throughput: 81294960 comparisons/sec
Old implementation:
Iterations: 848
Duration: 500.21 ms
Throughput: 16952948 comparisons/sec
Performance improvement:
Speedup: 4.80x
Improvement: +379.5%
Test: Same flags (10K pairs, 50% equal)
--------------------------------------------------
New implementation:
Iterations: 2799
Duration: 500.00 ms
Throughput: 55979776 comparisons/sec
Old implementation:
Iterations: 859
Duration: 500.13 ms
Throughput: 17175672 comparisons/sec
Performance improvement:
Speedup: 3.26x
Improvement: +225.9%
Adds tests to ensure CSI and SGR sequences with 17 or more parameters are correctly parsed, fixing a bug where later parameters were previously dropped.
- Add more comments, and make existing ones more consistent.
- Rename commands so they consitently have a `conemu_` prefix.
- Ensure that OSC 9 desktop notifications can be sent in the maximum
number of circumstances. There are still many notifications that can't
be sent because of our support for the ConEmu OSCs but that's the
tradeoff we have chosen. We recommend that you switch to OSC 777 to
ensure desktop notifications can be sent in all circumstances.
- Make sure that the tests that exercise the ConEmu OSCs have a
consistent naming structure. That will make them easier to find
through searching as well as make it easier to filter only the ConEmu
OSC tests.
- Add more tests to make sure that desktop notifications are sent
properly.
This works around: https://github.com/ziglang/zig/issues/19148
This lets our `test-valgrind` command catch some issues. We'll have to
follow this pattern in more places but I want to do it incrementally so
things keep passing.
I **do not** want to blindly follow this pattern everywhere. I want to
start by focusing in only on the structs that set `undefined` as default
fields that we're also about to test in isolation with Valgrind. Its
just too much noise otherwise and not a general style I'm sure of; it's
worth it for Valgrind though.
It's possible for the hyperlink or string capacity to be exceeded in a
single row, in which case it doesn't matter if we move the row to a new
page, it will still be a problem. This was causing actual crashes under
some circumstances.
This previously had logic in it that was very wrong and could lead to
memory corruption or a failure to properly mark data as freed.
Also introduces a bunch of tests for various edge case behavior.
xterm docs explicitly say that empty payloads should be permitted and
are used to clear the selected clipboards, so we need to implement that
correctly. The GTK apprt still shows a "Copied to Clipboard" toast though
and we might want to change that too
Even though the viewport pin isn't used unless the `viewport` is `pin`,
it's still possible to access undefined data through `clone`. Valgrind
found this:
```
==107091== Conditional jump or move depends on uninitialised value(s)
==107091== at 0x392B96A: terminal.PageList.clone (PageList.zig:540)
==107091== by 0x392C9A0: terminal.Screen.clonePool (Screen.zig:348)
==107091== by 0x392DF7A: terminal.Screen.clone (Screen.zig:330)
==107091== by 0x394E6D4: renderer.generic.Renderer(renderer.OpenGL).updateFrame (generic.zig:1129)
==107091== by 0x3919BF8: renderer.Thread.renderCallback (Thread.zig:607)
==107091== by 0x3919A6F: renderer.Thread.wakeupCallback (Thread.zig:524)
==107091== by 0x394FA6E: callback (async.zig:679)
==107091== by 0x394FA6E: watcher.async.AsyncEventFd(api.Xev(.io_uring,backend.io_uring)).waitPoll__anon_436371__struct_440666.callback (async.zig:181)
==107091== by 0x38F781E: backend.io_uring.Completion.invoke (io_uring.zig:804)
==107091== by 0x38FA448: backend.io_uring.Loop.tick___anon_431479 (io_uring.zig:193)
==107091== by 0x38FA53D: backend.io_uring.Loop.run (io_uring.zig:84)
==107091== by 0x38FEFE3: dynamic.Xev(&.{ .io_uring, .epoll }[0..2]).Loop.run (dynamic.zig:172)
==107091== by 0x38FF2E2: renderer.Thread.threadMain_ (Thread.zig:263)
==107091== by 0x38DDF80: renderer.Thread.threadMain (Thread.zig:202)
==107091== by 0x38B5C0A: Thread.callFn__anon_421402 (Thread.zig:488)
==107091== by 0x3888604: Thread.PosixThreadImpl.spawn__anon_418943.Instance.entryFn (Thread.zig:757)
==107091== by 0x6C6E7EA: start_thread (pthread_create.c:448)
==107091== by 0x6CF1FB3: clone (clone.S:100)
==107091==
```
Ghostty has had support for a while (since PR #3124) for parsing progress
reports but never did anything with them. This PR adds the core
infrastructure and an implementation for GTK.
On GTK, the progress bar will show up as a thin bar along the top of
the terminal. Under normal circumstances it will use whatever you have
set as your accent color. If the progam sending the progress report
indicates an error, it will change to a reddish color.