Previously we're just feeding the compiler source files generated via
capturing stdout, which all by default have the filename `stdout`.
Under some particular yet uncertain circumstances in Zig 0.14 (only
factor we've found so far is a large amount of cores/compilation shards)
the compiler will actually crash on an unreachable code path that assumes
all source files either end in .zig or .zon, causing crashes for packagers
for distros like Nixpkgs and Gentoo.
Given this has been explicitly made illegal in Zig 0.15
(see ziglang/zig#24957) I don't really see why we shouldn't fix this for
1.2 too. We have to do this at some point no matter what anyways.
The documentation shows that the enum values should be "new-window" and
"new-tab",
However, "new-window" currently fails, but "window" is still accepted.
The GLSL to MSL conversion process uses a passed-in sampler state for
the `iChannel0` parameter and we weren't providing it. This magically
worked on Apple Silicon for unknown reasons but failed on Intel GPUs.
In normal, hand-written MSL, we'd explicitly create the sampler state as
a normal variable (we do this in `shaders.metal` already!), but the
Shadertoy conversion stuff doesn't do this, probably because the exact
sampler parameters can't be safely known.
This fixes a Metal validation error when using custom shaders:
```
-[MTLDebugRenderCommandEncoder validateCommonDrawErrors:]:5970: failed
assertion `Draw Errors Validation Fragment Function(main0): missing Sampler
binding at index 0 for iChannel0Smplr[0].
```
This fixes a Metal validation error in Xcode when using custom shaders.
I suspect this is one part of custom shaders not working properly on
Intel macs (probably anything with a discrete GPU).
This happens to work on Apple Silicon but this is undefined behavior and
we're just getting lucky.
There is one more issue I'm chasing down that I think is also still
blocking custom shaders working on Intel macs.
Fixes#8683
The selection scrolling logic should only depend on the y value of the
cursor position, not the x value. This presents unwanted scroll
behaviors, such as reversing the scroll direction which was just a side
effect of attempting to scroll tick to begin with.
This was a very common pitfall for users. The new logic will reload the
font-size at runtime, but only if the font wasn't manually set by the
user using actions such as `increase_font_size`, `decrease_font_size`,
or `set_font_size`. The `reset_font_size` action will reset our state
to assume the font-size wasn't manually set.
I also updated a comment about `font-family` not reloading at runtime;
this wasn't true even prior to this commit.
Fixes#8667
The binding `a=text:=` didn't parse properly.
This is a band-aid solution. It works and we have test coverage for it
thankfully. Longer term we should move the parser to a fully
state-machine based parser that parses the trigger first then the
action, to avoid these kind of things.
Use relative cluster positioning to allow identical texts runs in
different row positions to share the same cache entry.
I am opening this PR clean w/o the cache size change. There could be
some benefit to a larger 256->512 shaper cache, but this still performs
amazingly well and I don't know the full memory impacts of moving the
cache size up.
https://github.com/ghostty-org/ghostty/discussions/8547#discussioncomment-14329590
These keys are present in some old unix keyboards, but more importantly,
their keycodes can be mapped to physical keys in modern programmable
keyboards.
Using them in Linux is a way to be able to have the same keys for
copy/pasting in GUI apps and in terminal apps instead of switching between
ctrl-c/ctrl-v and ctrl-shift-c/ctrl-shift-v.
this test previously didn't fail when accessing freed members of config
because deiniting `command_arena` was a no-op; `command_arena` was derived
from `arena`, which allocated memory after `command_arena` was created/used
Without this change, a phantom space appears after any character with
default emoji presentation that is converted to text with VS15. The only
other terminal I know of that respects variation selectors is Kitty, and
it walks the cursor back, which feels like the best choice, since that
way the behavior is observable (no way to know if the terminal supports
variation selectors otherwise without hard-coding that info per term)
and "dumb" programs like `cat` will output things correctly, and not
gain a phantom space after any VS15'd emoji.
I've been playing with benchmarks over in my [branch swapping out
ziglyph for
uucode](https://github.com/ghostty-org/ghostty/compare/main...jacobsandlund:jacob/uucode?expand=1),
and I ran into an interesting issue where benchmarks were giving odd
numbers.
TL;DR: writing to `buf[0]` ends up slowing down the benchmark in
inconsistent ways because it's the same buffer that's being written and
read in the loop, so switching to `std.mem.doNotOptimizeAway` fixes
this.
## Full story:
I ran the `codepoint-width` benchmark with the following (and also did
similarly for `grapheme-bench` and `is-symbol`):
```
zig-out/bin/ghostty-gen +utf8 | head -c 200000000 > data.txt
hyperfine --warmup 4 'zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table' 'zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode'
```
... and I was surprised to see that `uucode` was 3% slower than Ghostty,
despite similar implementations. I debugged this, bringing the `uucode`
implementation to the exact same assembly (minus offsets) as Ghostty,
even re-using the same table data (fun fact I learned is that even
though these tables are large, zig or LLVM saw they were byte-by-byte
equivalent and optimized them down to one table). Still though, 3%
slower.
Then I realized that if I wrote to a separate `buf` on `self` the
difference went away, and I figured out it's this writing to `buf[0]`
that is tripping up the CPU, because in the next outer loop it'll write
over that again when reading from the data file, and then it's read as
part of getting the code point.
### with buf[0]
```
Benchmark 1: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table
Time (mean ± σ): 944.7 ms ± 0.8 ms [User: 900.2 ms, System: 42.8 ms]
Range (min … max): 943.4 ms … 945.9 ms 10 runs
Benchmark 2: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode
Time (mean ± σ): 974.0 ms ± 0.7 ms [User: 929.3 ms, System: 43.1 ms]
Range (min … max): 973.3 ms … 975.2 ms 10 runs
Summary
zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table ran
1.03 ± 0.00 times faster than zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode
```
### with mem.doNotOptimizeAway
```
Benchmark 1: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table
Time (mean ± σ): 929.4 ms ± 2.7 ms [User: 884.8 ms, System: 43.0 ms]
Range (min … max): 926.7 ms … 936.3 ms 10 runs
Benchmark 2: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode
Time (mean ± σ): 931.2 ms ± 2.5 ms [User: 886.6 ms, System: 42.9 ms]
Range (min … max): 927.3 ms … 935.7 ms 10 runs
Summary
zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table ran
1.00 ± 0.00 times faster than zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode
```
### with buf[0], mode = .uucode
Another interesting thing is that with `buf[0]`, it's highly dependent
on the offsets somehow. If I switched the default mode line from `mode:
Mode = .noop` to `mode: Mode = .uucode`, it shifts the offsets ever so
slightly and even though that default mode is not getting used (since
it's passed in), it flips the results of the benchmark around:
```
Benchmark 1: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table
Time (mean ± σ): 973.3 ms ± 2.2 ms [User: 928.9 ms, System: 42.9 ms]
Range (min … max): 968.0 ms … 975.9 ms 10 runs
Benchmark 2: zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode
Time (mean ± σ): 945.8 ms ± 1.4 ms [User: 901.2 ms, System: 42.8 ms]
Range (min … max): 943.5 ms … 948.5 ms 10 runs
Summary
zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=uucode ran
1.03 ± 0.00 times faster than zig-out/bin/ghostty-bench +codepoint-width --data=data.txt --mode=table
```
looking at the assembly with `mode: Mode = .noop`:
```
# table.txt:
165 // away
** 166 buf[0] = @intCast(width);
ghostty-bench[0x100017370] <+508>: strb w11, [x21, #0x4]
ghostty-bench[0x100017374] <+512>: b 0x100017288 ; <+276> at CodepointWidth.zig:168:9
ghostty-bench[0x100017378] <+516>: mov w0, #0x0 ; =0
# uucode.txt:
** 229 buf[0] = @intCast(width);
ghostty-bench[0x1000177bc] <+508>: strb w11, [x21, #0x4]
ghostty-bench[0x1000177c0] <+512>: b 0x1000176d4 ; <+276> at CodepointWidth.zig:231:9
ghostty-bench[0x1000177c4] <+516>: mov w0, #0x0 ; =0
```
vs `mode: Mode = .uucode`:
```
# table.txt:
** 166 buf[0] = @intCast(width);
ghostty-bench[0x100017374] <+508>: strb w11, [x21, #0x4]
ghostty-bench[0x100017378] <+512>: b 0x10001728c ; <+276> at CodepointWidth.zig:168:9
ghostty-bench[0x10001737c] <+516>: mov w0, #0x0 ; =0
# uucode.txt:
** 229 buf[0] = @intCast(width);
ghostty-bench[0x1000177c0] <+508>: strb w11, [x21, #0x4]
ghostty-bench[0x1000177c4] <+512>: b 0x1000176d8 ; <+276> at CodepointWidth.zig:231:9
ghostty-bench[0x1000177c8] <+516>: mov w0, #0x0 ; =0
```
... shows the only difference is the offsets, which somehow have a large
impact on the result of the benchmark.
Use fast hash function on key for better distribution.
Direct compare glyph in eql to avoid Packed.from() if not neccessary.
16% -> 6.4% reduction during profiling runs.
I noticed that there was an off-by-one error in cell height adjustment
when the number of pixels to add/subtract is odd. The metrics measured
from the top would be shifted by one less than they should, so, for
example, the underline position would move one pixel closer to the
baseline than it had been (or one pixel further away if subtracting).
Also noticed that the overline position was missing here, so added that.
Use fast hash function on key for better distribution.
Direct compare glyph in eql to avoid Packed.from() if not neccessary.
16% -> 6.4% reduction during profiling runs.