std.net.isValidHostname is currently too generous. It considers strings
like ".example.com", "exa..mple.com", and "-example.com" to be valid
hostnames, which is incorrect according to RFC 1123 (the currently
accepted standard).
Until the standard library function is improved, we can use this local
implementation that does follow the RFC 1123 standard.
I asked Claude to perform an audit of the code based on its understand
of the RFC. It suggested some additional test cases and considers the
overall implementation to be robust (its words) and standards compliant.
Ref: https://www.rfc-editor.org/rfc/rfc1123
This reimplements the MAC address-aware URI parsing logic used by the
OSC 7 handler and adds an additional .raw_path option that returns the
full, unencoded path string (including query and fragment values), which
is needed for compliant kitty-shell-cwd:// handling.
Notably, this implementation takes an options-based approach that allows
these additional behaviors to be enabled at runtime. It also leverages
two std.Uri.parse guarantees:
1. Return slices point into the original text string.
2. .raw components don't require unescaping (.percent_encoded does).
The implementation is in a new 'os.uri' module because its now generic
enough to not be hostname-oriented.
We use os.uri.parseUri and its parsing options to reimplement our OSC 7
file-style URI handling. This has two advantages:
First, it fixes kitty-shell-cwd scheme handling. This scheme expects the
full, unencoded path string, whereas the file scheme expects normal URI
percent encoding. This was preventing paths containing "special" URI
characters (like "path?") from working correctly in our bash, zsh, and
elvish shell integrations, which report working directories using the
kitty-shell-cwd scheme. (fish uses file URIs, which work as expected.)
Second, we can greatly simplify our hostname and path string handling
because we can now rely on the "raw" std.Uri component form to always
provide the correct representation.
Lastly, this lets us remove the previous URI-related code from the
os.hostname module, restoring its focus to hostname-related functions.
See: #5289
I hit an edge case when using Private Wi-Fi addressing on macOS where
the last section of the randomized mac address starts with a '0'. The
hostname parsing for the shell integration didn't handle this case, so
issue #2512 reappeared.
This fixes that by explicitly handling port numbers < 10.
Note that this includes some failing tests because I want to make the
uri handling better and more specific. It's a little bit too general
right now so I want to lock it down to:
1. macOS only; and
2. valid mac address values
because that's how the macOS private Wi-Fi address thing works;
randomizes your mac address and sets that as your hostname.