Use relative cluster positioning to allow identical texts runs in
different row positions to share the same cache entry.
I am opening this PR clean w/o the cache size change. There could be
some benefit to a larger 256->512 shaper cache, but this still performs
amazingly well and I don't know the full memory impacts of moving the
cache size up.
https://github.com/ghostty-org/ghostty/discussions/8547#discussioncomment-14329590
These keys are present in some old unix keyboards, but more importantly,
their keycodes can be mapped to physical keys in modern programmable
keyboards.
Using them in Linux is a way to be able to have the same keys for
copy/pasting in GUI apps and in terminal apps instead of switching between
ctrl-c/ctrl-v and ctrl-shift-c/ctrl-shift-v.
this test previously didn't fail when accessing freed members of config
because deiniting `command_arena` was a no-op; `command_arena` was derived
from `arena`, which allocated memory after `command_arena` was created/used
Without this change, a phantom space appears after any character with
default emoji presentation that is converted to text with VS15. The only
other terminal I know of that respects variation selectors is Kitty, and
it walks the cursor back, which feels like the best choice, since that
way the behavior is observable (no way to know if the terminal supports
variation selectors otherwise without hard-coding that info per term)
and "dumb" programs like `cat` will output things correctly, and not
gain a phantom space after any VS15'd emoji.
I've been playing with benchmarks over in my [branch swapping out
ziglyph for
uucode](https://github.com/ghostty-org/ghostty/compare/main...jacobsandlund:jacob/uucode?expand=1),
and I ran into an interesting issue where benchmarks were giving odd
numbers.
TL;DR: writing to `buf[0]` ends up slowing down the benchmark in
inconsistent ways because it's the same buffer that's being written and
read in the loop, so switching to `std.mem.doNotOptimizeAway` fixes
this.
## Full story:
I ran the `codepoint-width` benchmark with the following (and also did
similarly for `grapheme-bench` and `is-symbol`):
```
zig-out/bin/ghostty-gen +utf8 | head -c 200000000 > data.txt
hyperfine --warmup 4 'zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table' 'zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode'
```
... and I was surprised to see that `uucode` was 3% slower than Ghostty,
despite similar implementations. I debugged this, bringing the `uucode`
implementation to the exact same assembly (minus offsets) as Ghostty,
even re-using the same table data (fun fact I learned is that even
though these tables are large, zig or LLVM saw they were byte-by-byte
equivalent and optimized them down to one table). Still though, 3%
slower.
Then I realized that if I wrote to a separate `buf` on `self` the
difference went away, and I figured out it's this writing to `buf[0]`
that is tripping up the CPU, because in the next outer loop it'll write
over that again when reading from the data file, and then it's read as
part of getting the code point.
### with buf[0]
```
Benchmark 1: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table
Time (mean ± σ): 944.7 ms ± 0.8 ms [User: 900.2 ms, System: 42.8 ms]
Range (min … max): 943.4 ms … 945.9 ms 10 runs
Benchmark 2: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode
Time (mean ± σ): 974.0 ms ± 0.7 ms [User: 929.3 ms, System: 43.1 ms]
Range (min … max): 973.3 ms … 975.2 ms 10 runs
Summary
zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table ran
1.03 ± 0.00 times faster than zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode
```
### with mem.doNotOptimizeAway
```
Benchmark 1: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table
Time (mean ± σ): 929.4 ms ± 2.7 ms [User: 884.8 ms, System: 43.0 ms]
Range (min … max): 926.7 ms … 936.3 ms 10 runs
Benchmark 2: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode
Time (mean ± σ): 931.2 ms ± 2.5 ms [User: 886.6 ms, System: 42.9 ms]
Range (min … max): 927.3 ms … 935.7 ms 10 runs
Summary
zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table ran
1.00 ± 0.00 times faster than zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode
```
### with buf[0], mode = .uucode
Another interesting thing is that with `buf[0]`, it's highly dependent
on the offsets somehow. If I switched the default mode line from `mode:
Mode = .noop` to `mode: Mode = .uucode`, it shifts the offsets ever so
slightly and even though that default mode is not getting used (since
it's passed in), it flips the results of the benchmark around:
```
Benchmark 1: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table
Time (mean ± σ): 973.3 ms ± 2.2 ms [User: 928.9 ms, System: 42.9 ms]
Range (min … max): 968.0 ms … 975.9 ms 10 runs
Benchmark 2: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode
Time (mean ± σ): 945.8 ms ± 1.4 ms [User: 901.2 ms, System: 42.8 ms]
Range (min … max): 943.5 ms … 948.5 ms 10 runs
Summary
zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode ran
1.03 ± 0.00 times faster than zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table
```
looking at the assembly with `mode: Mode = .noop`:
```
# table.txt:
165 // away
** 166 buf[0] = @intCast(width);
ghostty-bench[0x100017370] <+508>: strb w11, [x21, #0x4]
ghostty-bench[0x100017374] <+512>: b 0x100017288 ; <+276> at CodepointWidth.zig:168:9
ghostty-bench[0x100017378] <+516>: mov w0, #0x0 ; =0
# uucode.txt:
** 229 buf[0] = @intCast(width);
ghostty-bench[0x1000177bc] <+508>: strb w11, [x21, #0x4]
ghostty-bench[0x1000177c0] <+512>: b 0x1000176d4 ; <+276> at CodepointWidth.zig:231:9
ghostty-bench[0x1000177c4] <+516>: mov w0, #0x0 ; =0
```
vs `mode: Mode = .uucode`:
```
# table.txt:
** 166 buf[0] = @intCast(width);
ghostty-bench[0x100017374] <+508>: strb w11, [x21, #0x4]
ghostty-bench[0x100017378] <+512>: b 0x10001728c ; <+276> at CodepointWidth.zig:168:9
ghostty-bench[0x10001737c] <+516>: mov w0, #0x0 ; =0
# uucode.txt:
** 229 buf[0] = @intCast(width);
ghostty-bench[0x1000177c0] <+508>: strb w11, [x21, #0x4]
ghostty-bench[0x1000177c4] <+512>: b 0x1000176d8 ; <+276> at CodepointWidth.zig:231:9
ghostty-bench[0x1000177c8] <+516>: mov w0, #0x0 ; =0
```
... shows the only difference is the offsets, which somehow have a large
impact on the result of the benchmark.
Use fast hash function on key for better distribution.
Direct compare glyph in eql to avoid Packed.from() if not neccessary.
16% -> 6.4% reduction during profiling runs.
I noticed that there was an off-by-one error in cell height adjustment
when the number of pixels to add/subtract is odd. The metrics measured
from the top would be shifted by one less than they should, so, for
example, the underline position would move one pixel closer to the
baseline than it had been (or one pixel further away if subtracting).
Also noticed that the overline position was missing here, so added that.
Use fast hash function on key for better distribution.
Direct compare glyph in eql to avoid Packed.from() if not neccessary.
16% -> 6.4% reduction during profiling runs.
Fixes#5934
This was never confirmed to be a real issue on GTK, but it is
theoretically possible and good hygience in general. Typically, we'd get
the title through a binding which comes from a bindinggroup which comes
from the active surface in the active tab. All of this takes multiple
event loop ticks to settle, if you will.
This commit changes it so that if an explicit, static title is set, we
set that title on startup before the window is mapped. The syncing still
happens later, but at least the window will have a title from the
initialization.
Fixes#8533
Replace the usage of `Stacked` for error pages with programmatically
swapping the child of the `adw.Bin`.
I regret to say I don't know the root cause of this. I only know that
the usage of `Stacked` plus `Gtk.Paned` and the way we programmatically
change the paned position and stack child during initialization causes
major issues.
This change isn't without its warts, too, and you can see them heavily
commented in the diff.
(1) We have to workaround a GTK template double-free bug that is well
known to us: if you bind a template child that is also the direct child
of the template class, GTK does a double free on dispose. We workaround
this by removing our child in dispose. Valgrind verifies the fix.
(2) We have to workaround an issue where setting an `Adw.Bin` child
during a glarea realize causes some kind of critical GTK error that
results in a hard crash. We delay changing our bin child to an idle
tick.
Fixes#8533
Replace the usage of `Stacked` for error pages with programmatically
swapping the child of the `adw.Bin`.
I regret to say I don't know the root cause of this. I only know that
the usage of `Stacked` plus `Gtk.Paned` and the way we programmatically
change the paned position and stack child during initialization causes
major issues.
This change isn't without its warts, too, and you can see them heavily
commented in the diff.
(1) We have to workaround a GTK template double-free bug that is well known
to us: if you bind a template child that is also the direct child of the
template class, GTK does a double free on dispose. We workaround this by
removing our child in dispose. Valgrind verifies the fix.
(2) We have to workaround an issue where setting an `Adw.Bin` child
during a glarea realize causes some kind of critical GTK error that
results in a hard crash. We delay changing our bin child to an idle
tick.
This removes `launched-from` entirely and moves our `gtk-single-instance`
detection logic to assume true unless we detect CLI instead of assume
false unless we detect desktop/dbus/systemd.
The "assume true" scenario for single instance is desirable because
detecting a CLI instance is much more reliable.
Removing `launched-from` fixes an issue where we had a
difficult-to-understand relationship between `launched-from`,
`gtk-single-instance`, and `initial-window`. Now, only
`gtk-single-instance` has some hueristic logic. And `initial-window`
ALWAYS sends a GTK activation signal regardless of single instance or
not.
As a result, we need to be explicit in our systemd, dbus, desktop files
about what we want Ghostty to do, but everything works as you'd mostly
expect.
Now, if you put plain old `ghostty` in your terminal, you get a new
Ghostty instance. If you put it anywhere else, you get a GTK single
instance activation call (either creates a first instance or opens a new
window in the existing instance). Works for launchers and so on.
Detecting the launch source frequently failed because various launchers
fail to sanitize the environment variables that Ghostty used to
detect the launch source. For example, if your desktop environment was
launched by `systemd`, but your desktop environment did not sanitize the
`INVOCATION_ID` or the `JOURNAL_STREAM` environment variables, Ghostty
would assume that it had been launched by `systemd` and behave as such.
This led to complaints about Ghostty not creating new windows when users
expected that it would.
To remedy this, Ghostty no longer does any detection of the launch
source. If your launch source is something other than the CLI, it must
be explicitly speciflied on the CLI. All of Ghostty's default desktop
and service files do this. Users or packagers that create custom desktop
or service files will need to take this into account.
On GTK, the `desktop` setting for `gtk-single-instance` is replaced with
`detect`. `detect` behaves as `gtk-single-instance=true` if one of the
following conditions is true:
1. If no CLI arguments have been set.
2. If `--launched-from` has been set to `desktop`, `dbus`, or `systemd`.
Otherwise `detect` behaves as `gtk-single-instance=false`.