feat(stdlib): overload vim.str_byteindex, vim.str_utfindex #30735

PROBLEM:
There are several limitations to vim.str_byteindex, vim.str_utfindex:
1. They throw given out-of-range indexes. An invalid (often user/lsp-provided)
   index doesn't feel exceptional and should be handled by the caller.
   `:help dev-error-patterns` suggests that `retval, errmsg` is the preferred
   way to handle this kind of failure.
2. They cannot accept an encoding. So LSP needs wrapper functions. #25272
3. The current signatures are not extensible.
    * Calling: The function currently uses a fairly opaque boolean value to
      indicate to identify the encoding.
    * Returns: The fact it can throw requires wrapping in pcall.
4. The current name doesn't follow suggestions in `:h dev-naming` and I think
   `get` would be suitable.

SOLUTION:
- Because these are performance-sensitive, don't introduce `opts`.
- Introduce an "overload" that accepts `encoding:string` and
  `strict_indexing:bool` params.

```lua
local col = vim.str_utfindex(line, encoding, [index, [no_out_of_range]])
```

Support the old versions by dispatching on the type of argument 2, and
deprecate that form.

```lua
vim.str_utfindex(line)                             -- (utf-32 length, utf-16 length), deprecated
vim.str_utfindex(line, index)                      -- (utf-32 index, utf-16 index), deprecated
vim.str_utfindex(line, 'utf-16')                   -- utf-16 length
vim.str_utfindex(line, 'utf-16', index)            -- utf-16 index
vim.str_utfindex(line, 'utf-16', math.huge)        -- error: index out of range
vim.str_utfindex(line, 'utf-16', math.huge, false) -- utf-16 length
```
This commit is contained in:
Tristan Knight
2024-10-23 14:33:57 +01:00
committed by GitHub
parent 3a86b60032
commit 230b0c7f02
5 changed files with 283 additions and 68 deletions

View File

@@ -181,7 +181,9 @@ int nlua_str_utfindex(lua_State *const lstate) FUNC_ATTR_NONNULL_ALL
} else {
idx = luaL_checkinteger(lstate, 2);
if (idx < 0 || idx > (intptr_t)s1_len) {
return luaL_error(lstate, "index out of range");
lua_pushnil(lstate);
lua_pushnil(lstate);
return 2;
}
}
@@ -272,7 +274,8 @@ int nlua_str_byteindex(lua_State *const lstate) FUNC_ATTR_NONNULL_ALL
const char *s1 = luaL_checklstring(lstate, 1, &s1_len);
intptr_t idx = luaL_checkinteger(lstate, 2);
if (idx < 0) {
return luaL_error(lstate, "index out of range");
lua_pushnil(lstate);
return 1;
}
bool use_utf16 = false;
if (lua_gettop(lstate) >= 3) {
@@ -281,7 +284,8 @@ int nlua_str_byteindex(lua_State *const lstate) FUNC_ATTR_NONNULL_ALL
ssize_t byteidx = mb_utf_index_to_bytes(s1, s1_len, (size_t)idx, use_utf16);
if (byteidx == -1) {
return luaL_error(lstate, "index out of range");
lua_pushnil(lstate);
return 1;
}
lua_pushinteger(lstate, (lua_Integer)byteidx);
@@ -695,10 +699,10 @@ void nlua_state_add_stdlib(lua_State *const lstate, bool is_thread)
lua_setfield(lstate, -2, "stricmp");
// str_utfindex
lua_pushcfunction(lstate, &nlua_str_utfindex);
lua_setfield(lstate, -2, "str_utfindex");
lua_setfield(lstate, -2, "__str_utfindex");
// str_byteindex
lua_pushcfunction(lstate, &nlua_str_byteindex);
lua_setfield(lstate, -2, "str_byteindex");
lua_setfield(lstate, -2, "__str_byteindex");
// str_utf_pos
lua_pushcfunction(lstate, &nlua_str_utf_pos);
lua_setfield(lstate, -2, "str_utf_pos");