mirrors/Odin - Odin - Kyren's Code

mirror of https://github.com/odin-lang/Odin.git synced 2026-06-19 16:42:33 +00:00

Author	SHA1	Message	Date
gingerBill	7b58aa8eba	Minor style changes	2026-06-19 09:30:58 +01:00
Brendan Punsky	fae15847a3	rexcode: buffer-sizing helpers across all ISAs + naming-contract doc Roll the encode/decode buffer-sizing helpers (added for x86 in `49787b7de`) out to every other ISA, and document them in the cross-arch naming contract. Per arch (arm32, arm64, mips, riscv, ppc, ppc_vle, rsp, mos6502, mos65816): - encode_max_code_size / encode_max_relocation_count now key off the []Instruction slice (were int counts); bodies unchanged (* MAX_INST_SIZE). - encode_reserve(code, relocs, instructions): grows the caller's code []u8 by length and reserves relocs by capacity; allocates no new buffers. - decode_max_instruction_count / decode_estimate_instruction_count: exact ceiling and typical estimate, keyed off the min/avg instruction size per arch (fixed-4: arm64/mips/ppc/rsp; min-2: arm32/riscv/ppc_vle; min-1: mos). - decode_reserve(instructions, inst_info, label_defs, data, exact=false). docs/cross_arch_design.md: helpers added to the naming contract. No behavior change to the existing size helpers (signature only). All 10 ISAs check + test green (x86 2282, arm32 600, arm64 461, mips 281, riscv 154, ppc 31, ppc_vle 281, rsp 70, mos6502 148, mos65816 53).	2026-06-19 04:11:30 -04:00
Brendan Punsky	49787b7de4	rexcode/x86: buffer-sizing helpers for encode and decode Give callers a clean way to pre-size their own buffers so the encode/decode hot paths never allocate or resize, instead of decode() silently reserving the caller's arrays itself (removed). The library allocates nothing -- these only grow the caller's own dynamic arrays, and only when not already big enough (Odin's reserve no-ops when capacity already suffices). Size-only helpers (caller manages its own memory), keyed off the input slice: encode_max_code_size(instructions) - exact code bytes encode_max_relocation_count(instructions) - exact reloc upper bound decode_max_instruction_count(data) - exact ceiling (1 byte/inst) decode_estimate_instruction_count(data) - typical estimate (~3 B/inst) Reserve helpers (pre-size the caller's dynamic arrays; nil to skip an array): encode_reserve(code, relocs, instructions) code is a [dynamic]u8 grown by LENGTH (so code[:] is a valid emit target); relocs reserved by capacity on top of existing elements. decode_reserve(instructions, inst_info, label_defs, data, exact=false) reserves capacity on top of existing; exact=true for the ceiling. Error arrays grow only on the failure path, so they are intentionally not covered. check/test green; 2282 cases; exercised end-to-end (the [dynamic]u8 code pattern, factor-in-existing, nil args, exact ceiling, reserve no-op).	2026-06-19 03:48:36 -04:00
Brendan Punsky	3341898437	rexcode/x86: flatten per-instruction loops + hint immediates (~1.5x encode) Data-oriented pass on the encode hot path. Profiling showed bounds checks already elided by -o:speed; the cost was per-instruction loop/scan machinery and immediates falling off the hint path. - Gather the immediate slot in the single resolve pass and emit it straight- line (no scan over enc.enc); likewise drive the legacy REX prefix from the precomputed reg/mr/opr slots instead of a per-form scan. - Fold the separate needs_66 (GPR16) and SPL/BPL/SIL/DIL operand loops into the resolve pass, so user operands are visited exactly once. This was the big one: mov r,r 27 -> 21 ns. - Gate the whole legacy-prefix block on a single flags!=0 test (a legacy prefix is almost always absent) instead of four branches per instruction. - Make immediate forms hintable. A typed immediate builder names its width (inst_add_r32_imm32), the matcher already keys off the operand's declared size, so baking the form is byte-identical AND drops immediates from the full match scan: mov r32,imm32 55.7 -> 17.8 ns (3.1x). Floor (no-op) 14.55 -> 10.3 ns; realistic immediate-heavy typed mix 30 -> 20.5 ns/inst (~49 M inst/s). gen/builders/check/test/idempotent green; 2282 cases (typed==generic byte-identical, incl. the new immediate cases).	2026-06-19 01:56:05 -04:00
Brendan Punsky	078015bc34	rexcode/x86: pre-matched encode hint + repair the typed builders Targeted branchless revert + the pre-matched form fast path, and a fix for a pre-existing bug the latter surfaced. (a) Revert the two speculative-write spots from the prior branchless pass (legacy-prefix emission, widened displacement store, ENCODE_TAIL_SLACK) back to predicted branches. In real streams a legacy prefix is almost always absent and disp size is stable, so those branches are ~free and the unconditional stores only added work. Every class got faster (RET 19->17.5, MOV r,r 52->46.6, VADDPS 42.8->39.3 ns). (b) Pre-matched form hint. Instruction.enc_hint (in the existing 11-byte padding, idx+1 biased; 0 = matcher path) lets a typed builder that maps to a single value-independent form bake the global form index, so encode() skips the O(forms) match scan -- and, in a varied stream, its unpredictable branches. Generated for non-immediate forms only (value- dependent imm8/imm32 selection stays on the matcher). On a 100k mixed typed-builder stream: 47.3 -> 30.2 ns/inst (-36%), byte-identical to the matcher path -- ~2x the original baseline for codegen. Repair the typed inst_/emit_ builders. They were non-functional: the generator cast the hw-only typed enum straight to Register (Register(GPR64.RAX) -> class 0), so every typed-builder operand was rejected by the matcher (encode returned empty). Untested because the suite builds via the generic constructors. Now they build through the class-correct op_gpr64/op_xmm/... path (op_* already used by 3+ operand builders), emit_ reuses inst_, and a new 30-case consistency suite asserts typed == generic (llvm-verified) and hint == matcher. gen/builders/check/test/idempotent all green; 2276 cases.	2026-06-18 21:04:18 -04:00
Brendan Punsky	8387731357	rexcode/x86: branchless hot paths + single-pass operand resolution Three layers on the x86 encode/decode hot paths, all byte-exact (2246 LLVM-verified cases) and roundtrip-clean: 1. Branchless: legacy-prefix emission (speculative write + conditional advance), REX/VEX/EVEX extension-bit accumulation (gate-and-mask), ModRM mod/disp-size selection (cmov selects), displacement emission (widened store + ENCODE_TAIL_SLACK); decoder REX/VEX/EVEX register extensions (arithmetic instead of if/+=8). 2. Resolve-operands-once: the previous code re-derived each user operand ~5-10x per instruction (a fresh O(n) scan of enc.ops per emission pass). Now resolved into a [4]^Operand map a single time. 3. Single-pass gather: fold the opcode-+rb and ModR/M slot-detection scans into that one resolve pass (3 enc.enc passes -> 1). Net on a 100k mixed-instruction benchmark: encode ~58 -> ~54 ns/inst (best 52). Branchless alone was a ~7% encode regression (predicted branches, nothing to recover); the algorithmic passes recovered it and beat baseline.	2026-06-18 20:16:26 -04:00
Brendan Punsky	95df04fbe1	rexcode: re-house ISA packages under core:rexcode/isa/<arch> Move all ten ISA packages (x86, arm32, arm64, mips, riscv, ppc, ppc_vle, rsp, mos6502, mos65816) from core/rexcode/<arch> to core/rexcode/isa/<arch>, so the import pattern is now `import "core:rexcode/isa/x86"`. The shared core stays at core:rexcode/isa. Mechanical: relative `import "../isa"` / "../../isa" -> absolute "core:rexcode/isa" (the only path that survives the move; the "../" and "../.." self/generated imports move with their packages). build.lua now builds paths as <root>/isa/<name>; stale `cd <arch>` hints in the verify tools and the doc.odin paths updated. WASM stays at core/rexcode/wasm for now -- it is an IR, not an ISA, and will move under the forthcoming core:rexcode/ir once that layer lands. All 10 arches gen/builders/check/test green; import core:rexcode/isa/x86 verified working; wasm still compiles.	2026-06-18 19:03:27 -04:00
gingerBill	c9ce8794c7	Replace `-> isa.Result` with `-> (byte_code: u32, ok: bool)`	2026-06-15 21:43:58 +01:00
Flāvius	a4f08f8307	Load rexcode encode/decode tables from committed binary blobs Each ISA's hand-written ENCODING_TABLE (the single source of truth) now lives in a per-arch tablegen/ metaprogram that flattens it and serializes committed binary blobs; the library #loads those into @(rodata) at compile time rather than compiling a table body. No arch keeps encoding_table.odin or decoding_tables.odin -- only a generated tables.odin loader and tables/.bin. Two-stage, type-checked pipeline: tablegen Stage A emits human-readable generated Odin, which compiles and serializes the blobs in Stage B. * encode() goes through encoding_forms(m); decoders are unchanged apart from x86's flattened 2-D index. Decode tables are byte-identical to the old ones. * build.lua: a LuaJIT driver for the metaprograms, validations, and tests, with cross-platform gating and a clear report. * Docs refreshed; the obsolete forward-looking plan in cross_arch_design.md trimmed to what was actually built. * Attribution headers added to all rexcode source files; the generators emit them so generated files keep them.	2026-06-15 07:43:29 -04:00
gingerBill	d75624ccbd	Add @(require_results) where appropriate to `isa`	2026-06-14 19:41:05 +01:00
gingerBill	2e58cc51d9	Improve mnemonic_builders for x86	2026-06-14 17:03:19 +01:00
gingerBill	d6ae77b67e	`core:rexcode`	2026-06-14 16:30:18 +01:00

12 Commits