Give callers a clean way to pre-size their own buffers so the encode/decode
hot paths never allocate or resize, instead of decode() silently reserving the
caller's arrays itself (removed). The library allocates nothing -- these only
grow the caller's own dynamic arrays, and only when not already big enough
(Odin's reserve no-ops when capacity already suffices).
Size-only helpers (caller manages its own memory), keyed off the input slice:
encode_max_code_size(instructions) - exact code bytes
encode_max_relocation_count(instructions) - exact reloc upper bound
decode_max_instruction_count(data) - exact ceiling (1 byte/inst)
decode_estimate_instruction_count(data) - typical estimate (~3 B/inst)
Reserve helpers (pre-size the caller's dynamic arrays; nil to skip an array):
encode_reserve(code, relocs, instructions)
code is a [dynamic]u8 grown by LENGTH (so code[:] is a valid emit
target); relocs reserved by capacity on top of existing elements.
decode_reserve(instructions, inst_info, label_defs, data, exact=false)
reserves capacity on top of existing; exact=true for the ceiling.
Error arrays grow only on the failure path, so they are intentionally not
covered. check/test green; 2282 cases; exercised end-to-end (the [dynamic]u8
code pattern, factor-in-existing, nil args, exact ceiling, reserve no-op).
Three layers on the x86 encode/decode hot paths, all byte-exact (2246
LLVM-verified cases) and roundtrip-clean:
1. Branchless: legacy-prefix emission (speculative write + conditional
advance), REX/VEX/EVEX extension-bit accumulation (gate-and-mask),
ModRM mod/disp-size selection (cmov selects), displacement emission
(widened store + ENCODE_TAIL_SLACK); decoder REX/VEX/EVEX register
extensions (arithmetic instead of if/+=8).
2. Resolve-operands-once: the previous code re-derived each user operand
~5-10x per instruction (a fresh O(n) scan of enc.ops per emission
pass). Now resolved into a [4]^Operand map a single time.
3. Single-pass gather: fold the opcode-+rb and ModR/M slot-detection
scans into that one resolve pass (3 enc.enc passes -> 1).
Net on a 100k mixed-instruction benchmark: encode ~58 -> ~54 ns/inst
(best 52). Branchless alone was a ~7% encode regression (predicted
branches, nothing to recover); the algorithmic passes recovered it and
beat baseline.
Move all ten ISA packages (x86, arm32, arm64, mips, riscv, ppc, ppc_vle,
rsp, mos6502, mos65816) from core/rexcode/<arch> to core/rexcode/isa/<arch>,
so the import pattern is now `import "core:rexcode/isa/x86"`. The shared
core stays at core:rexcode/isa.
Mechanical: relative `import "../isa"` / "../../isa" -> absolute
"core:rexcode/isa" (the only path that survives the move; the "../" and
"../.." self/generated imports move with their packages). build.lua now
builds paths as <root>/isa/<name>; stale `cd <arch>` hints in the verify
tools and the doc.odin paths updated.
WASM stays at core/rexcode/wasm for now -- it is an IR, not an ISA, and
will move under the forthcoming core:rexcode/ir once that layer lands.
All 10 arches gen/builders/check/test green; import core:rexcode/isa/x86
verified working; wasm still compiles.