core:rexcode

2026-06-20 00:52:33 +00:00 · 2026-06-14 16:30:18 +01:00
parent 4b482366c1
commit d6ae77b67e
194 changed files with 107075 additions and 0 deletions
--- a/core/rexcode/docs/cross_arch_design.md
+++ b/core/rexcode/docs/cross_arch_design.md
@@ -0,0 +1,469 @@
+# rexcode — Cross-Architecture API Design
+
+> How to grow rexcode from an x86-only encoder/decoder into a multi-target
+> library (x86, RISC-V, ARM64, MIPS, …) **without** flattening every
+> architecture to a lowest common denominator and **without** adding
+> runtime overhead to the single-target hot path.
+>
+> Companion to [x86_api.md](x86_api.md). Written ahead of the RISC-V
+> subpackage.
+
+---
+
+## 0. The guiding principle
+
+> **Share the bookkeeping, specialize the bytes.**
+
+An encoder/decoder is two things stitched together:
+
+1. **Orchestration & bookkeeping** — labels, relocations, the two-pass
+   encode/decode loops, error/result reporting, the print framework,
+   buffer management, the table-gen tooling pattern. This is *the same
+   problem on every ISA* and should be written once.
+2. **The instruction model & the bytes** — what a register/memory/operand
+   *is*, what the encoding tables look like, and the actual
+   bit/byte-twiddling of `encode_one`/`decode_one`. This is *irreducibly
+   per-architecture* and must stay native and zero-cost.
+
+Every decision below follows from drawing the line in exactly that place.
+We do **not** try to invent one `Instruction` type that fits all ISAs —
+that path forces x86's `segment`/SIB and ARM's writeback and RISC-V's
+split immediates into one bloated struct, and it is precisely the
+"compromise performance/effectiveness" outcome to avoid. Instead, each
+arch owns its concrete types, and uniformity comes from a **naming
+contract** (§6) plus a small **shared core** (§4) plus **opt-in**
+generic glue (§5, §7).
+
+---
+
+## 1. The universal shape
+
+Strip away the x86 specifics and every target needs the same nine things:
+
+| # | Concept | Example in x86 |
+|---|---|---|
+| 1 | A **register** = (class, hw number, size) | `Register` distinct u16 |
+| 2 | **Operands** tagged reg / mem / imm / relative | `Operand` + `Operand_Kind` |
+| 3 | An **instruction** = mnemonic + operands + flags | `Instruction` |
+| 4 | A **mnemonic** enum | `Mnemonic` (u16, INVALID=0) |
+| 5 | **Labels** + forward refs + named labels | `Label_Definition`, `Label_Map` |
+| 6 | **Relocations** left over after local resolution | `Relocation` |
+| 7 | `encode([]Inst) -> bytes (+relocs +errors)` | `encode()` |
+| 8 | `decode(bytes) -> []Inst (+info +labels +errors)` | `decode()` |
+| 9 | `print([]Inst) -> text (+tokens)` | `print()`/`tprint()`/… |
+
+Plus two cross-cutting concerns: **errors/result** reporting and a
+**table-driven core** fed by **codegen tooling**.
+
+The *shape* of items 5–9 (their signatures and the types they pass around)
+is architecture-independent. That is the surface we standardize.
+
+---
+
+## 2. Where architectures actually diverge
+
+This is the heart of the analysis. Ranked from "diverges hardest" to
+"barely diverges."
+
+### 2.1 Encoding mechanics — **maximal divergence**
+
+| ISA | Width | Mechanism |
+|---|---|---|
+| x86 | 1–15 B, variable | legacy prefixes → REX/VEX/EVEX → escape → opcode → ModRM → SIB → disp → imm |
+| RISC-V | 4 B (2 B for "C") | pack fixed bitfields; ~6 formats (R/I/S/B/U/J) |
+| ARM64 | 4 B fixed | pack per-class bitfields; many classes; bitmask-imm encoder |
+| MIPS | 4 B fixed | 3 formats (R/I/J), very regular |
+
+`encode()`'s ~500-line body and the whole `Encoding`/`Encoding_Flags`
+schema (esc/prefix/vex_*) are **x86-only**. RISC-V's `encode_one` is a
+dozen lines of shifts. **Conclusion: the `encode_one`/`decode_one` core
+and the `Encoding` struct do not generalize — but the loop that drives
+them does (§7).**
+
+### 2.2 Memory addressing — **high divergence**
+
+| ISA | Addressing modes |
+|---|---|
+| x86 | `[base + index*scale + disp32]`, RIP-relative, segment override, addr-size override |
+| RISC-V | `disp12(base)` only — no index, no scale |
+| MIPS | `imm16(base)` only |
+| ARM64 | `[base]`, `[base,#imm]`, `[base,Xm{,LSL#n}]`, `[base,Wm,SXTW]`, pre/post-index `[base,#imm]!` / `[base],#imm`, PC-rel literal |
+
+The x86 `Memory` bit_field (with `segment`, `addr_size_override`,
+index+scale) is deeply x86-flavored. RISC-V's memory is `{base, i32 disp}`.
+ARM adds **writeback** (a mode x86 cannot express) and extend/shift on the
+index. **Conclusion: `Memory` is per-arch.** What generalizes is only the
+*role*: a `MEMORY`-kind operand carrying an arch-defined payload.
+
+### 2.3 Immediates & operand size — **moderate divergence**
+
+- The *value* (an `i64`) generalizes perfectly.
+- The *encoding* does not: RISC-V scatters immediate bits across fields
+  (B-type, J-type) and shifts them; ARM has bitmask-immediate and shifted
+  forms. All of that lives inside `encode_one`; the `Operand` just holds
+  the clean value.
+- **Size association differs:** x86 carries an explicit `size: u8` and
+  uses it to select an encoding; RISC-V/ARM bake width into the mnemonic
+  (`LW` vs `LD`, `W0` vs `X0`). Keep `size` in the shared operand shape as
+  a *carrier*; let each arch decide how much it matters.
+
+### 2.4 Relocations — **moderate divergence (structurally aligned)**
+
+The `Relocation` *struct* (offset, symbol/label, addend, type, size)
+mirrors ELF `rela` and is universal. The *type enum* is per-arch and much
+larger on RISC-V (paired `PCREL_HI20`/`PCREL_LO12`, `CALL`, `BRANCH`,
+`JAL`, `HI20`, `LO12_I/S`, …) because PC-relative addressing needs
+instruction *pairs* (AUIPC+ADDI). **Conclusion: share the struct shape,
+make the type enum a per-arch parameter.**
+
+### 2.5 Registers — **low/structural divergence**
+
+The `(class, hw_number)`-packed `distinct u16` scheme generalizes well.
+What differs:
+- x86: REX/EVEX extension bits, AH↔SPL aliasing, RIP pseudo-reg.
+- RISC-V: clean 5-bit fields, `x0`=hardwired zero, ABI names
+  (`zero/ra/sp/gp/tp/t0../s0../a0..`), separate `f`/`v` files.
+- ARM64: reg #31 means **SP or XZR depending on instruction** (a
+  decode/print-time disambiguation x86 never needs); `w`/`x` and
+  `b/h/s/d/q` views.
+**Conclusion: share the *layout convention* + `reg_hw`/`reg_class`
+accessors; per-arch owns classes, enums, names, and extension semantics.**
+
+### 2.6 Mnemonics — **content differs, shape identical**
+
+Per-arch `enum u16`, `INVALID=0`. Nothing to share but the convention.
+
+### 2.7 Labels — **no divergence**
+
+`labels.odin` is pure bookkeeping. The array-index model
+(`Label_Definition`, `label`, `label_forward`, `label_set_at`,
+`Label_Map`, `label_named`, `label_reserve`, `label_set`) lives in
+`isa/labels.odin` and is parametric over the Instruction type. **Fully
+shared.** Each arch's `encode()` rewrites label_defs from instruction
+indices to byte offsets between pass 1 and pass 2.
+
+### 2.8 Errors / Result — **low divergence**
+
+`Result` is universal. `Error` is universal in shape. `Error_Code` splits
+into a **shared core** (`NONE, BUFFER_OVERFLOW, INVALID_MNEMONIC,
+NO_MATCHING_ENCODING, BUFFER_TOO_SHORT, INVALID_OPCODE, LABEL_OUT_OF_RANGE,
+…`) and **arch-specific** extras (`INVALID_MODRM/SIB/VEX/EVEX,
+TOO_MANY_PREFIXES` on x86; RISC-V would add `MISALIGNED_IMMEDIATE`,
+`INVALID_ROUNDING_MODE`, …).
+
+### 2.9 Printer — **framework universal, formatting per-arch**
+
+Shareable: `Token`, `Token_Kind` (the kinds are generic), `Print_Options`,
+the builder/number-formatting helpers, and the whole family of output
+sinks (`sbprint/print/aprint/tprint/bprint/fprint/wprint` + `ln`). Per-arch:
+`register_name`, `print_memory` (syntax differs wildly),
+`mnemonic_to_string`, and the size-suffix convention (x86's `.b/.w/.d` is
+x86-only; RISC-V puts width in the mnemonic).
+
+### Divergence summary
+
+| Component | Verdict | What's shared | What's per-arch |
+|---|---|---|---|
+| Labels | ✅ shared | everything | — |
+| Result / Error struct | ✅ shared | struct shapes | error-code extras |
+| Relocation struct | ✅ shared | struct shape | type enum |
+| Printer framework | ◑ split | tokens, options, sinks, num-fmt | reg/mem/mnemonic formatting |
+| Register scheme | ◑ split | layout + `reg_hw`/`reg_class` | classes, enums, names, ext bits |
+| Operand model | ◑ split | kind tag + union discipline + `size` carrier | `Memory`, `flags` payloads |
+| Encode/decode **driver** | ◑ shared via generics | two-pass loops, label/reloc resolution | the per-instruction hook |
+| `Instruction` | ✗ per-arch | shape convention only | concrete struct |
+| `Mnemonic` | ✗ per-arch | convention (u16, INVALID=0) | the enum |
+| `Encoding` + tables | ✗ per-arch | codegen *pattern* | schema + data |
+| `encode_one`/`decode_one` | ✗ per-arch | nothing | all of it |
+| Memory addressing | ✗ per-arch | operand *role* | the model |
+
+---
+
+## 3. Why not the "obvious" unifications
+
+Three tempting designs that **violate** the no-compromise rule:
+
+1. **One universal `Operand`/`Memory` for all ISAs.** Forces the union of
+   x86 SIB+segment, ARM writeback+extend, and RISC-V's nothing into a
+   single struct. Bloats every operand, leaks `segment` into RISC-V, and
+   still can't represent ARM writeback cleanly. ✗
+
+2. **A runtime `interface`/vtable the encoder calls per instruction.**
+   Adds an indirect call to the hottest loop (x86 does ~17 M inst/s — a
+   per-instruction `proc` pointer is a measurable tax) and defeats
+   inlining. ✗ on the default path.
+
+3. **`any`/tagged-union `Instruction` passed through a generic `encode`.**
+   Same monomorphization loss + runtime type checks in the hot loop. ✗
+
+The design instead gets uniformity from **compile-time** mechanisms
+(naming contract + parametric polymorphism), and reserves runtime dispatch
+for an **opt-in** facade (§5.3) that only multi-target *tools* pay for.
+
+---
+
+## 4. Proposed package layout
+
+```
+rexcode/
+  isa/                     # shared, architecture-independent core
+    labels.odin            #   Label, Label_Definition, Label_Map, resolution
+    reloc.odin             #   Relocation (type field is generic/u8)
+    status.odin            #   Result, Error, shared Error_Code core
+    print.odin             #   Token, Token_Kind, Print_Options, sinks, num-fmt
+    register.odin          #   distinct-u16 layout convention + reg_hw/reg_class
+    pipeline.odin          #   parametric encode_stream/decode_stream (§7)
+    target.odin            #   optional runtime Target vtable (§5.3)
+
+  x86/                     # exists today; refactor to import isa
+    registers.odin operands.odin instructions.odin mnemonics.odin
+    encoding_types.odin encoder.odin decoder.odin printer.odin
+    encoding_table.odin decoding_tables.odin mnemonic_builders.odin
+    tests/  tools/
+
+  riscv/                   # next: same shape as x86/
+    registers.odin operands.odin instructions.odin mnemonics.odin
+    encoding_types.odin encoder.odin decoder.odin printer.odin
+    encoding_table.odin decoding_tables.odin mnemonic_builders.odin
+    tests/  tools/
+
+  arm64/  mips/  …         # future, same template
+```
+
+- **`isa` depends on nothing.** Each arch package depends on `isa` and
+  **re-exports** the shared types (e.g. `x86.Result`, `x86.Label_Map`)
+  so a consumer of `x86` sees one coherent namespace and never imports
+  `isa` directly unless writing arch-generic tooling.
+- Each arch package is **self-contained** (its own tests/tools), matching
+  the move already done for x86.
+
+---
+
+## 5. Three layers of generality (pick per use case)
+
+### 5.1 Layer A — direct single-arch use (default, zero overhead)
+
+```odin
+import "rexcode/x86"
+code: [4096]u8
+res := x86.encode(insts[:], labels[:], code[:], &relocs, &errors)
+```
+Fully static, fully inlined, exactly as fast as today. **99% of consumers
+live here.**
+
+### 5.2 Layer B — source-portable code via the naming contract
+
+Because every arch package exposes the *same names with the same
+signatures* (§6), code that only touches the shared vocabulary
+(`Label_Map`, `encode`, `tprint`, `Result`, `Relocation`) can be written
+against `import arch "rexcode/x86"` and re-pointed at `rexcode/riscv` by
+changing one import — as long as the arch-specific operand construction is
+isolated (e.g. behind your own per-arch helper). Still 100% compile-time,
+zero overhead.
+
+### 5.3 Layer C — runtime multi-target facade (opt-in, for tools)
+
+For a disassembler or JIT that selects the arch *at runtime*, `isa`
+provides a vtable populated by each arch:
+
+```odin
+// isa/target.odin
+Target :: struct {
+    name:       string,
+    decode:     proc(data: []u8, out: ^Decoded) -> Result,   // bytes → generic Decoded
+    print:      proc(d: ^Decoded, opts: ^Print_Options) -> string,
+    inst_align: u32,   // 1 for x86, 4 for riscv/arm64/mips
+    max_inst:   u32,   // 15 for x86, 4 for riscv (8 for C-pairs), 4 for arm64
+}
+// each arch: x86.TARGET: isa.Target = { … }
+```
+This boundary trades in **bytes and a generic `Decoded` view**, not the
+concrete `Instruction`, so it never forces a unified instruction struct.
+It carries a proc-pointer indirection — acceptable for a tool that has
+already paid a `switch arch` somewhere, and never on Layer A's path.
+
+---
+
+## 6. The naming contract (the most important artifact)
+
+Every architecture package **MUST** expose these names with these
+signatures. This is what makes the family feel like one library and what
+the RISC-V implementation is built against as a checklist.
+
+### Types (concrete per arch, identical names)
+
+```
+Register      Memory        Operand       Operand_Kind
+Instruction   Mnemonic      Encoding      Instruction_Info
+```
+
+### Re-exported shared types (from `isa`)
+
+```
+Label  Label_Definition  Label_Map  LABEL_UNDEFINED
+Relocation  Relocation_Type   Error  Error_Code  Result
+Token  Token_Kind  Print_Options  DEFAULT_PRINT_OPTIONS
+```
+
+### Operand constructors
+
+```
+op_reg(r) op_mem(m, size) op_imm(v, size) op_label(id, size)
+mem_*(…)            # arch-specific set; at minimum mem_base_disp
+                    # (mem_base in x86 is an accessor, not a constructor;
+                    # use mem_base_only for the no-displacement case)
+op_<class>(typed)   # typed safe constructors where the arch has classes
+```
+
+### Instruction builders & emitters
+
+Builder names spell out each operand kind separated by underscores
+(matches x86's existing convention):
+
+```
+inst_none / inst_r / inst_r_r / inst_r_i / inst_r_m / inst_m_r / …
+emit_none / emit_r / emit_rr / emit_ri / emit_rm / emit_mr / …
+            # NB: emit_* uses concatenated suffixes (legacy x86 spelling)
+inst_<mnemonic>(…) / emit_<mnemonic>(…)   # generated typed overloads
+```
+
+### Entry points (identical signatures across arches)
+
+```odin
+encode(instructions: []Instruction, label_defs: []Label_Definition,
+       code: []u8, relocs: ^[dynamic]Relocation, errors: ^[dynamic]Error,
+       resolve := true, base_address: u64 = 0) -> Result
+
+decode(data: []u8, relocs: []Relocation,
+       instructions: ^[dynamic]Instruction, inst_info: ^[dynamic]Instruction_Info,
+       label_defs: ^[dynamic]Label_Definition, errors: ^[dynamic]Error) -> Result
+
+print/println/aprint/tprint/bprint/fprint/wprint(+ln)(
+       instructions: []Instruction, inst_info: []Instruction_Info,
+       label_defs: []Label_Definition, tokens=nil, options=nil, label_names=nil)
+```
+
+### Register/label/print helpers
+
+```
+reg_hw  reg_class  reg_size  register_name  mnemonic_to_string
+label  label_forward  label_named  label_reserve  label_set
+```
+
+> Anything an arch genuinely lacks (e.g. RISC-V has no `mem_base_index`)
+> is simply **absent**, not stubbed. Portable (Layer B) code stays within
+> the intersection; arch-aware code uses the extras.
+
+---
+
+## 7. Zero-cost code reuse via parametric polymorphism
+
+The encode/decode **drivers** are arch-independent control flow. Factor
+them into `isa` as procedures generic over the instruction type `$I`,
+parameterized by an arch-provided per-instruction hook. Odin monomorphizes
+these at compile time → **no runtime cost, real code sharing.**
+
+```odin
+// isa/pipeline.odin  (sketch)
+encode_stream :: proc(
+    instructions: []$I,
+    label_defs:   []Label_Definition,
+    code:         []u8,
+    relocs:       ^[dynamic]Relocation,
+    errors:       ^[dynamic]Error,
+    encode_one:   proc(inst: ^I, out: []u8, code_pos: u32,
+                       relocs: ^[dynamic]Relocation, errors: ^[dynamic]Error) -> (n: u32, ok: bool),
+    resolve := true, base_address: u64 = 0,
+) -> Result {
+    // PASS 1: for each inst → record offset, call encode_one, advance
+    // PASS 1.5: rewrite label_defs inst-index → byte-offset   (identical on every arch)
+    // PASS 2: resolve relocations / patch / spill unresolved   (identical on every arch)
+}
+```
+
+x86's current `encode()` becomes a thin wrapper that passes its
+`encode_one` (the prefix/ModRM/SIB body); RISC-V's wrapper passes its
+12-line bitfield packer. The label/relocation machinery — the part that's
+easy to get subtly wrong — is written and tested **once**.
+
+Caveats (arch-specific passes that stay out of the shared driver):
+- **RISC-V pseudo-ops** (`li`, `call`, `la`, `j`) expand to 1–2 real
+  instructions; needs an arch pre-lowering pass.
+- **Branch relaxation** (short↔long form) is arch-specific.
+- **ARM literal pools / constant islands** are an extra emission phase.
+
+These plug in *around* the shared driver, not inside it.
+
+---
+
+## 8. Concrete RISC-V mapping (RV64GC as the first target)
+
+What each contract item becomes, to validate the design before coding:
+
+| Contract item | RISC-V realization |
+|---|---|
+| `Register` | `distinct u16`, classes `REG_X` (x0–31), `REG_F` (f0–31), `REG_V` (v0–31). No REX/EVEX bits. `x0` semantic = zero. |
+| typed enums | `XREG{ZERO,RA,SP,GP,TP,T0,T1,T2,S0,S1,A0..A7,S2..S11,T3..T6}`, `FREG`, `VREG` |
+| `Memory` | `struct { base: Register, disp: i32 }` — no index/scale/segment |
+| `mem_*` | `mem_base(base)`, `mem_base_disp(base, disp)` only |
+| `Operand` | same kind-tagged shape; `size` mostly informational (width is in the mnemonic) |
+| `Mnemonic` | `enum u16` — RV32I/64I + M,A,F,D,C,V (`ADDI, LW, LD, BEQ, JAL, AUIPC, FADD_D, …`) |
+| `Encoding` | `struct { format: Format, opcode, funct3, funct7: u8, … }`, `Format{R,I,S,B,U,J,R4,…}` |
+| `encode_one` | switch on `format`, pack fields, scatter immediate bits |
+| `Encoding_Flags` | tiny (e.g. `is_compressible`, `rounding_ok`) vs x86's 11 fields |
+| `Relocation_Type` | `R_RISCV_BRANCH, JAL, CALL, PCREL_HI20, PCREL_LO12_I/S, HI20, LO12_I/S, RVC_BRANCH/JUMP, …` |
+| `Instruction_Info` | `offset`, `is_compressed: bool`, rounding mode — no prefix/VEX fields |
+| printer | `register_name` uses ABI names; `print_memory` emits `disp(base)`; width lives in the mnemonic (no `.b/.w` suffix) |
+| tables | `gen_decode_tables` becomes near-trivial: a fixed-field instruction decodes by `(opcode, funct3, funct7)` keys |
+| `MAX_INST_SIZE` | `4` (or `8` to cover a compressed pair); `inst_align` = 2 |
+
+Notable RISC-V-only concerns the design already accommodates:
+- **Split immediates** → hidden in `encode_one`; operand stays a clean value.
+- **Paired PC-relative relocs** (AUIPC+ADDI) → expressed via the shared
+  `Relocation` struct with RISC-V's type enum; resolution of the *pair* is
+  a RISC-V detail layered on the shared reloc list.
+- **Compressed (C) extension** → variable 2/4-byte width handled by
+  `decode_one` returning a length, exactly like x86's variable length —
+  the shared decode driver already threads instruction length.
+
+If RISC-V slots cleanly into the contract (it does above), the contract is
+sound for the regular fixed-width ISAs (ARM64, MIPS) too.
+
+---
+
+## 9. Recommended next steps
+
+1. **Stabilize x86 first.** Resolve the constructor-rename drift noted in
+   [x86_api.md](x86_api.md#known-drift) (tests/README vs `operands.odin`)
+   so x86 is the clean reference the contract is extracted from.
+2. **Extract `isa`** by lifting the *already-arch-independent* files:
+   `labels.odin`, the `Relocation`/`Error`/`Result` types, and the printer
+   framework (tokens/options/sinks/number-formatting). Make `x86`
+   re-export them. This is a low-risk refactor that proves the split.
+3. **Add the parametric `encode_stream`/`decode_stream`** to `isa` and
+   reduce x86's `encode`/`decode` to wrappers. Validate against the
+   existing test suite (same bytes out).
+4. **Write the RISC-V package against the contract** (§6) and the mapping
+   (§8), reusing `isa` wholesale. Build its `encoding_table.odin` by hand,
+   then port the two generators.
+5. **Only if a runtime-multi-target tool appears**, add the `Target`
+   vtable (§5.3). Don't build it speculatively.
+
+The deliverable order matters: every step is independently shippable, and
+x86 keeps working (and keeps its performance) throughout.
+
+---
+
+## 10. One-paragraph summary
+
+Make `isa` own the parts that are the same on every ISA — labels,
+relocations, errors/result, the print framework, and (via Odin
+parametric polymorphism) the encode/decode driver loops. Make each arch
+package own its registers, memory model, operands, mnemonics, encoding
+tables, and the actual `encode_one`/`decode_one` bytes. Bind the family
+together with a strict **naming contract** so packages are drop-in
+swappable at source level with zero runtime cost, and reserve a single
+opt-in runtime `Target` vtable for the rare tool that needs to choose an
+architecture dynamically. x86 keeps every cycle of its current
+performance; RISC-V (and later ARM/MIPS) gets the boring 60% for free and
+writes only the 40% that is genuinely its own.
--- a/core/rexcode/docs/mips_platforms.md
+++ b/core/rexcode/docs/mips_platforms.md
@@ -0,0 +1,79 @@
+# MIPS targets and extensions — platform catalog
+
+> What's worth supporting in `rexcode/mips/` (or a sibling subpackage) and
+> what isn't, framed around the actual hardware that runs MIPS.
+
+## Mainline consoles (MIPS-family CPUs)
+
+| Platform | CPU | Base ISA | Custom extension | Status |
+|---|---|---|---|---|
+| **PS1 / PSX** | Sony R3000A | MIPS I (no MMU) | **GTE** (COP2) — geometry transformation engine | ✅ done |
+| **PSX IOP / PS3 IOP** | LSI CW33300 / "IOP" | MIPS I | (none — same as PS1 CPU) | ✅ covered by MIPS I |
+| **N64** | NEC VR4300i | MIPS III + partial MIPS IV FPU | none on main CPU | ✅ covered by MIPS III + IV + FPU |
+| **N64 RSP** | RCP "Reality Signal Processor" | custom MIPS R4000 subset | **VU** (128-bit vector unit, 32 vec regs); also drops mult/div/FPU/TLB | ⚠ **needs its own subpackage** — different ISA |
+| **N64 RDP** | (display processor) | not a CPU, command-stream — not in scope |  |  |
+| **PS2 EE** | Sony R5900 (Toshiba) | MIPS III + MIPS IV (MOVN/MOVZ) | **MMI** (128-bit packed SIMD via MMI0-3), **LQ/SQ**, second HI/LO, VU0-macro | ✅ done (MMI; VU0-macro forms TBD) |
+| **PS2 VU0 / VU1** | "Vector Unit" | not MIPS — VLIW pair (upper + lower microcode) | — | 🚧 **separate ISA** — sibling `vu/` subpackage if needed |
+| **PS2 IOP** | (R3000A reused) | MIPS I | — | ✅ covered |
+| **PSP** | Sony "Allegrex" | MIPS32 R2 (+ R2 bitfield + rotates + SEB/SEH + BITREV) | **VFPU** (vector FPU, 128 32-bit regs in 8×4×4 matrices), Allegrex-specific BITREV/etc. | ⚠ Mnemonics enumerated, encodings TBD |
+| **PSP Media Engine** | (second Allegrex) | same as Allegrex | same VFPU | (covered when PSP CPU is) |
+| **PSV / Vita PS1-mode** | Cortex-A9 emulating R3000 | — (host is ARM) | — |  |
+
+## Arcade and other
+
+| Platform | CPU | Base ISA | Extension | Status |
+|---|---|---|---|---|
+| **SNK Hyper Neo Geo 64** | NEC VR4300 | MIPS III | none | ✅ covered |
+| **Konami Hornet** (arcade) | various | MIPS-family | none | ✅ covered |
+| **Sega Model 3** step 1.x | MIPS — IDT R5000 | MIPS IV | none | ✅ covered |
+
+## Modern / embedded MIPS with vendor extensions
+
+| Platform | CPU | Base | Extension | Status |
+|---|---|---|---|---|
+| **Ingenic XBurst** (Jz47xx) — old MP3/Android handhelds | XBurst | MIPS32 R2 | **MXU** (Multimedia Unit, custom SIMD), DSP ASE | 🚧 DSP enumerated, **MXU is XBurst-only** — defer |
+| **Broadcom MIPS** (older routers) | bcm473x / bcm63xx | MIPS32 R2/R5 | DSP ASE common | DSP enumerated; encodings TBD |
+| **Atheros / Qualcomm** (router SoC) | MIPS32 R2 | MIPS32 R2 | DSP common | as above |
+| **MediaTek MIPS** (older routers) | MIPS32 R2 | MIPS32 R2 | DSP | as above |
+| **Loongson 2/3** (China desktop) | Loongson | MIPS64 + custom | **Loongson MMI** (note: different from PS2 MMI!), **LSX** (128-bit), **LASX** (256-bit). Modern Loongson uses LoongArch instead. | 🚧 niche, defer |
+| **Microchip PIC32** | MIPS M4K / microAptiv | MIPS32 R1/R2 + microMIPS | none | ✅ covered (microMIPS not in scope) |
+| **Cavium Octeon** (server) | OCTEON | MIPS64 R2 | **OCTEON specific** (crypto, packet) | defer |
+
+## Workstations (historical)
+
+| Vendor | CPU | ISA | Notes |
+|---|---|---|---|
+| SGI Indy/Indigo/Octane/Origin | R4000/R5000/R8000/R10000/R12000/R14000 | MIPS III–IV | stock MIPS — ✅ covered |
+| DEC station | R3000 / R4000 | MIPS I–III | ✅ covered |
+| Various Unix workstations | MIPS family | various | ✅ covered |
+
+## **NOT** MIPS (mentioned because users sometimes ask)
+
+- **GBA / DS / 3DS / Switch** — ARM. Out of scope for `mips/`.
+- **Sega Saturn** — dual SH-2. **Dreamcast** — SH-4. Not MIPS.
+- **3DO** — ARM60. Not MIPS.
+- **Atari Jaguar** — 68k + custom Tom/Jerry RISCs. Not MIPS.
+- **Apple PowerBook / Macintosh** — PowerPC / Motorola 68k. Not MIPS.
+- **Sega Genesis / Mega Drive** — 68000. **Sega 32X** — SH-2. **Sega CD** — 68k. Not MIPS.
+
+## Recommended priority for `rexcode`
+
+Given typical demand (emulation, decompiling old console games, romhacking, RE):
+
+1. **What's done is the bulk of console value:** PS1, PS2, N64 main CPU, FPU, COP0.
+2. **N64 RSP** — high value for N64 emulation/microcode work. Should be `rexcode/rsp/` (separate ISA — see below).
+3. **PSP VFPU encodings** — high value for PSP emulation, completes the Allegrex story. Stays inside `mips/`.
+4. **DSP ASE encodings** — useful for modern router/embedded reversing. Stays inside `mips/`.
+5. **PS2 VU microcode** — distinct from MIPS (VLIW). Worth `rexcode/vu/` only if a real consumer appears.
+6. **MSA encodings** — modern MIPS only; some Linux distros for MIPS workstations. Lower priority.
+7. **Loongson / Octeon / MXU** — defer until someone needs them.
+
+## Why N64 RSP wants its own subpackage
+
+The RSP is a **subset** of MIPS (no MULT/DIV/FPU/TLB; no doubleword ops) **plus** a heavily custom COP2 vector unit. Trying to share `mips/` with it would mean:
+
+- The shared Mnemonic enum picks up ~60 RSP-only vector ops (VMULF/VMACF/VADDC/VCH/VCL/VCR/VRCP/VRCPL/VRSQ/VRSQL/VRNDP/VRNDN/...) plus vector load/store variants (LBV/LSV/LDV/LQV/LRV/LPV/LUV/LHV/LFV/LWV/LTV + their store equivalents). Polluting the MIPS namespace.
+- The RSP's COP2 encoding *collides* with PS1 GTE bit patterns (both use op=0x12 with the CO bit) so a single decode table can't disambiguate without an ISA gate.
+- The RSP's vector loads encode element offset + size in the cofun bits in ways that have no MIPS analogue.
+
+Cleaner: `rexcode/rsp/` as a sibling subpackage. It will reuse `isa/` (labels, relocs, errors, print framework) and parallel `mips/`'s shape (registers / operands / instructions / mnemonics / encoding_table / encoder / decoder / printer). Users targeting N64 import either `mips` (for the R4300 main CPU) or `rsp` (for RSP microcode) — or both, side-by-side.
--- a/core/rexcode/docs/x86_api.md
+++ b/core/rexcode/docs/x86_api.md
@@ -0,0 +1,518 @@
+# rexcode `x86` — Complete API Extraction
+
+> Snapshot of the entire public surface of the `x86` subpackage
+> (`rexcode/x86/`), grouped by module. This is the reference the
+> cross-architecture design ([cross_arch_design.md](cross_arch_design.md))
+> is built against.
+
+The package is **table-driven**: a hand-written master encoding table
+(`ENCODING_TABLE`) is the single source of truth, from which the decode
+tables and the typed builder procedures are *generated*. The runtime is
+zero-allocation (caller owns every buffer) and the hot paths are fully
+inlined.
+
+```
+                       ENCODING_TABLE  (hand-written, source of truth)
+                              │
+              ┌───────────────┼────────────────┐
+        gen_decode_tables           gen_mnemonic_builders
+              │                              │
+       decoding_tables.odin          mnemonic_builders.odin
+       (decode() reads these)        (typed inst_*/emit_* helpers)
+```
+
+Pipeline at a glance:
+
+```
+[]Instruction ──encode()──▶ []u8 (+ []Relocation, []Error)
+        ▲                          │
+        │                          ▼
+     builders                  decode()
+        │                          │
+   inst_*/emit_*                   ▼
+                          []Instruction + []Instruction_Info + []Label_Definition
+                                   │
+                                   ▼
+                            print()/tprint()/… ──▶ text (+ []Token)
+```
+
+---
+
+## 1. Registers (`registers.odin`)
+
+### Core type
+
+```odin
+Register :: distinct u16   // bit layout: 0b_0000_CCCC_EEEN_NNNN
+//   NNNNN = hardware register number (0–31)
+//   E     = needs REX/VEX .B/.R/.X extension (hw >= 8)
+//   EE    = needs EVEX (hw 16–31)
+//   CCCC  = register class (high byte)
+```
+
+### Class constants (high byte)
+
+`REG_NONE`, `REG_GPR64`, `REG_GPR32`, `REG_GPR16`, `REG_GPR8`, `REG_GPR8H`
+(legacy AH/CH/DH/BH), `REG_XMM`, `REG_YMM`, `REG_ZMM`, `REG_K` (opmask),
+`REG_SEG`, `REG_CR` (control), `REG_DR` (debug), `REG_BND` (MPX), `REG_MM`
+(MMX), `REG_ST` (x87).
+
+### Sentinels
+
+`NONE :: Register(0xFFFF)`, `RIP :: Register(0xFFFE)`.
+
+### Typed register enums (compile-time safety, value == hardware number)
+
+`GPR64`, `GPR32`, `GPR16`, `GPR8`, `GPR8H` (`AH=4..BH=7`), `XMM`, `YMM`,
+`ZMM` (each 0–31), `KREG` (K0–K7), `SREG` (ES,CS,SS,DS,FS,GS), `MM`
+(MM0–7), `CREG` (CR0,2,3,4,8), `DREG` (DR0–3,6,7), `ST` (ST0–7), `BND`
+(BND0–3).
+
+### Named register constants
+
+Every register has a package-level constant: `RAX`…`R15`, `EAX`…`R15D`,
+`AX`…`R15W`, `AL`…`R15B`, `AH/CH/DH/BH`, `XMM0`…`XMM31`, `YMM0`…`YMM31`,
+`ZMM0`…`ZMM31`, `K0`…`K7`, `ES/CS/SS/DS/FS/GS`, `CR0/2/3/4/8`,
+`DR0/1/2/3/6/7`, `BND0`…`BND3`, `MM0`…`MM7`, `ST0`…`ST7`, plus `RIP`.
+
+### Utility functions (all branchless, `contextless`)
+
+| Proc | Signature | Purpose |
+|---|---|---|
+| `reg_hw` | `(Register) -> u8` | hardware number (low 5 bits) |
+| `reg_class` | `(Register) -> u16` | class (high byte) |
+| `reg_needs_rex` | `(Register) -> bool` | hw >= 8 |
+| `reg_needs_rex_ext` | `(Register) -> bool` | hw >= 8 and class < K |
+| `reg_needs_evex` | `(Register) -> bool` | hw >= 16 |
+| `reg_is_gpr` | `(Register) -> bool` | any GPR class |
+| `reg_is_vector` | `(Register) -> bool` | XMM/YMM/ZMM |
+| `reg_is_high_byte` | `(Register) -> bool` | AH/CH/DH/BH |
+| `reg_size` | `(Register) -> u16` | size in **bits** |
+
+### Register-from-number constructors
+
+`gpr64_from_num`, `gpr32_from_num`, `gpr16_from_num` `(u8) -> Register`;
+`gpr8_from_num(num: u8, has_rex: bool) -> Register` (handles AH↔SPL
+aliasing); `xmm_from_num`, `ymm_from_num`, `zmm_from_num`,
+`mm_from_num`. Each returns `NONE` if out of range. Pure casts, no table.
+
+---
+
+## 2. Operands (`operands.odin`)
+
+### Operand kind
+
+```odin
+Operand_Kind :: enum u8 { NONE, REGISTER, MEMORY, IMMEDIATE, RELATIVE }
+```
+
+### Memory operand (packed)
+
+```odin
+Memory :: bit_field u64 {
+    base_hw:            u8   | 5,
+    base_ext:           bool | 1,
+    index_hw:           u8   | 5,
+    index_ext:          bool | 1,
+    scale_enc:          u8   | 2,
+    displacement:       i32  | 32,
+    segment:            u8   | 3,
+    addr_size_override: bool | 1,
+    base_class:         u8   | 5,
+    index_class:        u8   | 5,
+}
+MEM_BASE_RIP :: 30   MEM_BASE_NONE :: 31   MEM_INDEX_NONE :: 31
+```
+
+**Constructor:** `mem_make(base, index: Register, scale: u8, displacement: i32, segment: Register) -> Memory`
+
+**Convenience constructors** (current names after the in-tree refactor):
+`mem_base_only(base)`, `mem_base_disp(base, disp)`,
+`mem_base_index(base, index, scale)`,
+`mem_base_index_disp(base, index, scale, disp)`, `mem_rip_disp(disp)`.
+
+> ⚠️ The README and `tests/test.odin` still use the *old* names
+> (`mem_base`, `mem_base_displacement`, `mem_base_index_displacement`,
+> `mem_rip_relative`). `mem_base` is now an **accessor**, not a
+> constructor. See the "Known drift" note at the end.
+
+**Accessors:** `mem_scale`, `mem_is_rip_relative`, `mem_has_base`,
+`mem_has_index` `(Memory) -> …`; `mem_base`, `mem_index` `(Memory) -> Register`.
+
+### The unified operand
+
+```odin
+Operand :: struct #packed {              // 16 bytes
+    using _: struct #raw_union {
+        reg:       Register,
+        mem:       Memory,
+        immediate: i64,
+        relative:  i64,      // offset or label id
+    },
+    kind:  Operand_Kind,
+    size:  u8,               // operand size in bytes (1,2,4,8,16,32,64)
+    flags: Operand_Flags,
+    _pad:  [4]u8,
+}
+
+Broadcast :: enum u8 { NONE, B1TO2, B1TO4, B1TO8, B1TO16 }   // EVEX
+
+Operand_Flags :: bit_field u16 {   // EVEX-specific
+    mask:      u8        | 3,   // opmask K1–K7
+    zeroing:   bool      | 1,   // merge vs zero masking
+    broadcast: Broadcast | 3,
+    er_sae:    u8        | 2,   // embedded rounding / SAE
+}
+```
+
+### Generic operand constructors
+
+`op_reg(r)`, `op_mem(m, size)`, `op_mem_from_parts(base, index, scale, disp, size)`,
+`op_imm8/16/32/64(v)`, `op_rel8/32(offset)`, `op_label(label_id, size=4)`.
+
+### Typed operand constructors (compile-time class safety)
+
+`op_gpr64`, `op_gpr32`, `op_gpr16`, `op_gpr8`, `op_gpr8h`, `op_xmm`,
+`op_ymm`, `op_zmm`, `op_kreg`, `op_sreg`, `op_mm`, `op_creg`, `op_dreg`,
+`op_st`, `op_bnd` — each takes the matching typed enum and returns an
+`Operand` (e.g. `op_gpr64(.XMM0)` is a *compile error*).
+
+---
+
+## 3. Instructions (`instructions.odin`)
+
+```odin
+Rep :: enum u8 { NONE, REP, REPNE }
+
+Instruction_Flags :: bit_field u8 {
+    lock: bool|1, rep: Rep|2, segment: u8|3, addr32: bool|1, data16: bool|1,
+}
+
+Instruction :: struct #packed {          // 72 bytes
+    ops:           [4]Operand,
+    mnemonic:      Mnemonic,
+    operand_count: u8,
+    flags:         Instruction_Flags,
+    length:        u8,        // filled by decoder
+    _pad:          [3]u8,
+}
+```
+
+### Generic instruction builders (`inst_*`, all `contextless`)
+
+| Builder | Shape |
+|---|---|
+| `inst_none(m)` | no operands |
+| `inst_r(m, r)` | one register |
+| `inst_m(m, mem, size)` | one memory |
+| `inst_i(m, imm, imm_size)` | one immediate |
+| `inst_rel(m, label_id, size=4)` | branch to label |
+| `inst_rel_offset(m, offset, size)` | branch to raw offset |
+| `inst_r_r(m, dst, src)` | reg, reg |
+| `inst_r_m(m, dst, src_mem, size)` | reg, mem |
+| `inst_m_r(m, dst_mem, size, src)` | mem, reg |
+| `inst_r_i(m, dst, imm, imm_size)` | reg, imm |
+| `inst_m_i(m, dst_mem, size, imm, imm_size)` | mem, imm |
+| `inst_r_r_r(m, dst, s1, s2)` | 3× reg (VEX/EVEX) |
+| `inst_r_r_m(m, dst, s1, m2, size)` | reg, reg, mem |
+| `inst_r_r_i(m, dst, src, imm, imm_size)` | reg, reg, imm |
+| `inst_r_m_i(m, dst, m, msize, imm, isize)` | reg, mem, imm |
+| `inst_m_r_i(m, mem, msize, src, imm, isize)` | mem, reg, imm |
+| `inst_r_m_r(m, dst, m1, msize, s2)` | reg, mem, reg |
+| `inst_r_r_r_r(m, dst, s1, s2, s3)` | 4× reg |
+| `inst_r_r_r_i(m, dst, s1, s2, imm, isize)` | 3 reg + imm |
+| `inst_r_r_m_i(m, dst, s1, m2, msize, imm, isize)` | 2 reg + mem + imm |
+| `inst_r_r_m_r(m, dst, s1, m2, msize, s3)` | 2 reg + mem + reg |
+
+### Dynamic-array emitters (`emit_*`, in `encoder.odin`)
+
+One `emit_*` per `inst_*` shape: `emit_none, emit_r, emit_rr, emit_ri,
+emit_rm, emit_mr, emit_m, emit_mi, emit_rel, emit_rrr, emit_rrm, emit_rri,
+emit_rrrr, emit_i, emit_rmi, emit_mri, emit_rel_offset`. Each is
+`(instructions: ^[dynamic]Instruction, mnemonic, …)` and appends.
+
+---
+
+## 4. Mnemonics (`mnemonics.odin`, generated)
+
+```odin
+Mnemonic :: enum u16 { INVALID = 0, MOV, MOVABS, MOVZX, …, /* ~1176 total */ }
+```
+
+Grouped by family (data transfer, arithmetic, logical, …, SSE, AVX,
+AVX-512, BMI, FMA, AES, …). `INVALID = 0` is the sentinel.
+
+---
+
+## 5. Labels & references (`labels.odin`)
+
+Lightweight **array-index** model (`Label_Definition`) used by
+`encode()`/`decode()`. The label-construction procedures live in
+`isa/labels.odin` and are parametric over the Instruction type, so they
+work directly for any arch without per-arch wrappers.
+
+### Array-index model (used by encode/decode)
+
+```odin
+Label_Definition :: distinct u32          // label_id -> instruction index, then byte offset
+LABEL_UNDEFINED  :: Label_Definition(0xFFFFFFFF)
+```
+`label(labels: ^[dynamic]Label_Definition, instructions: ^[dynamic]Instruction) -> u32`
+(define at current position), `label_forward(labels) -> u32` (reserve).
+
+### Named labels
+
+```odin
+Label_Map :: struct { labels: [dynamic]Label_Definition, names: map[string]u32 }
+```
+`label_map_init(^, allocator)`, `label_map_destroy(^)`,
+`label_named(^, name, instructions) -> u32`, `label_reserve(^, name) -> u32`,
+`label_set(^, name, instructions)`.
+
+---
+
+## 6. Encoding types (`encoding_types.odin`)
+
+These describe **how** an instruction is encoded; they are the schema of
+`ENCODING_TABLE` and are shared by encoder and decoder.
+
+```odin
+Operand_Type :: enum u8 {            // ~70 values
+    NONE, R8,R16,R32,R64, RM8,RM16,RM32,RM64, M,M8..M512,
+    IMM8,IMM16,IMM32,IMM64, IMM8SX, REL8,REL32,
+    AL_IMPL,AX_IMPL,EAX_IMPL,RAX_IMPL,CL_IMPL,DX_IMPL,ONE_IMPL,
+    SREG, CR, DR, XMM,YMM,ZMM, XMM_M32,XMM_M64,XMM_M128,YMM_M256,ZMM_M512,
+    MM,MM_M64, ST0_IMPL,STI, XMM0_IMPL, K,K_M8..K_M64,
+    MOFFS8..MOFFS64, PTR16_16,PTR16_32,PTR16_64, M16_16,M16_32,M16_64,
+}
+
+Operand_Encoding :: enum u8 {        // where an operand's bits go
+    NONE, MR, REG, VVVV, OP_R, IB,IW,ID,IQ, IMPL, IS4, AAA,
+}
+
+Escape   :: enum u8 { NONE, _0F, _0F38, _0F3A }
+VEX_Type :: enum u8 { NONE, VEX, EVEX, XOP }
+VEX_W    :: enum u8 { WIG, W0, W1 }
+VEX_L    :: enum u8 { LIG, L0, L1, L2 }
+
+Encoding_Flags :: bit_field u16 {
+    esc: Escape|2, prefix: u8|2, vex_type: VEX_Type|2, vex_w: VEX_W|2,
+    vex_l: VEX_L|2, default_64: bool|1, force_rex_w: bool|1, no_rex: bool|1,
+    lock_ok: bool|1, rep_ok: bool|1, modrm_reg_ext: bool|1,
+}
+
+Encoding :: struct #packed {         // 14 bytes — one encoding form
+    mnemonic: Mnemonic, ops: [4]Operand_Type, enc: [4]Operand_Encoding,
+    opcode: u8, ext: u8, flags: Encoding_Flags,
+}
+PREFIX_66 :: 1   PREFIX_F3 :: 2   PREFIX_F2 :: 3
+```
+Helper: `encoding_flags(esc=…, prefix=…, …) -> Encoding_Flags`.
+
+### Shared status / interop types
+
+```odin
+Relocation_Type :: enum u8 { NONE, REL8, REL32, ABS32, ABS64 }
+Relocation :: struct #packed {       // 16 bytes (ELF-rela-like)
+    offset: u32, label_id: u32, addend: i32,
+    type: Relocation_Type, size: u8, inst_idx: u16,
+}
+
+Error_Code :: enum u8 {
+    NONE,
+    // encode
+    INVALID_MNEMONIC, NO_MATCHING_ENCODING, OPERAND_MISMATCH,
+    IMMEDIATE_OUT_OF_RANGE, BUFFER_OVERFLOW, LABEL_OUT_OF_RANGE,
+    INVALID_OPERAND_COUNT,
+    // decode
+    BUFFER_TOO_SHORT, INVALID_OPCODE, INVALID_MODRM, INVALID_SIB,
+    INVALID_PREFIX, INVALID_VEX, INVALID_EVEX, TOO_MANY_PREFIXES,
+}
+Error  :: struct #packed { inst_idx: u32, code: Error_Code, _pad: [3]u8 }   // 8 bytes
+Result :: struct { byte_count: u32, success: bool }
+```
+Helper: `op_type_to_size(Operand_Type) -> u8`.
+
+---
+
+## 7. Encoder (`encoder.odin`)
+
+```odin
+MAX_INST_SIZE :: 15
+
+encode :: proc(
+    instructions: []Instruction,
+    label_defs:   []Label_Definition,  // in: inst index; MODIFIED to byte offsets
+    code:         []u8,                 // output machine code
+    relocs:       ^[dynamic]Relocation, // unresolved relocations appended
+    errors:       ^[dynamic]Error,
+    resolve:      bool = true,          // patch resolvable relocs in place
+    base_address: u64  = 0,             // for ABS relocations
+) -> Result
+```
+
+Two-pass: (1) encode each instruction into `code`, recording byte offsets
+and emitting pending relocations; (1.5) rewrite `label_defs` from
+instruction indices to byte offsets; (2) resolve relocations, appending
+the unresolvable ones to `relocs`. Pure / no shared state →
+trivially parallelizable.
+
+Buffer-sizing helpers: `encode_max_code_size(n) -> int` (`n*15`),
+`encode_max_relocation_count(n) -> int` (`n`).
+
+Internal matcher (file-local, inlined): `encoding_matches_inline`,
+`operand_matches_inline`, `reg_matches_inline`, `mem_matches_inline`,
+`imm_matches_inline`, `implicit_operand_matches`, `is_implicit_op_inline`,
+`get_user_op_inline`.
+
+---
+
+## 8. Decoder (`decoder.odin`)
+
+```odin
+Instruction_Info :: struct {     // parallel metadata, one per decoded inst
+    offset: u32,
+    rex: u8, has_lock: bool, rep: Rep, segment: Register,
+    vex_type: VEX_Type, vex_l: VEX_L, vex_w: VEX_W,
+    evex_b: bool, evex_z: bool, opmask: u8,
+}
+
+decode :: proc(
+    data:         []u8,
+    relocs:       []Relocation,             // optional in: name labels
+    instructions: ^[dynamic]Instruction,    // out
+    inst_info:    ^[dynamic]Instruction_Info, // out (parallel)
+    label_defs:   ^[dynamic]Label_Definition, // out: inferred branch labels
+    errors:       ^[dynamic]Error,
+) -> Result
+```
+
+Two-pass: (1) decode each instruction (prefixes → opcode → operands),
+collecting branch targets; (2) infer labels for in-region branch targets,
+reusing IDs from `relocs` when available.
+
+`Decoder_State` (file-internal) holds prefix/VEX/EVEX decode state. The
+decoder relies on the generated tables in §10. Mostly file-internal procs:
+`decode_prefixes`, `decode_vex2/3`, `decode_evex`, `decode_opcode(_vex)`,
+`decode_operands(_vex)`, `decode_single_operand(_vex)`,
+`decode_memory_operand`, `decode_register`, `decode_implicit_operand`.
+
+---
+
+## 9. Printer (`printer.odin`)
+
+Modified Intel syntax: size suffix on the mnemonic (`.b .w .d .q .x .y
+.z`) instead of `PTR`, clean `[base + index*scale + disp]` memory.
+
+```odin
+Token_Kind :: enum u8 { WHITESPACE, NEWLINE, LABEL_DEF, LABEL_REF, OFFSET,
+    MNEMONIC, REGISTER, IMMEDIATE, MEMORY_BRACKET, MEMORY_OPERATOR,
+    MEMORY_DISP, MEMORY_SCALE, PUNCTUATION, COMMENT }
+
+Token :: struct { offset: u32, length: u16, kind: Token_Kind, instruction_index: u16 }
+
+Print_Options :: struct {
+    uppercase: bool, hex_prefix: string, hex_lowercase: bool,
+    label_prefix: string, show_offsets: bool, indent: string,
+    separator: string, space_after_comma: bool,
+}
+DEFAULT_PRINT_OPTIONS :: Print_Options{ … }
+
+Print_Result :: struct { text: string, tokens: []Token }
+```
+
+Helpers: `mnemonic_to_string(m, lowercase) -> string`,
+`register_name(r, lowercase) -> string`, `token_kind_to_string`,
+`size_to_suffix(size) -> u8`.
+
+### Output variants (all share the same trailing param set
+`tokens=nil, options=nil, label_names=nil`)
+
+| Family | Sink |
+|---|---|
+| `sbprint` / `sbprintln` | into a `^strings.Builder` |
+| `print` / `println` | stdout |
+| `aprint` / `aprintln` | newly allocated string (`allocator` param) |
+| `tprint` / `tprintln` | temp-allocator string |
+| `bprint` / `bprintln` | caller `[]u8` buffer |
+| `fprint` / `fprintln` | `^os.File` |
+| `wprint` / `wprintln` | `io.Writer` |
+
+All take `(instructions: []Instruction, inst_info: []Instruction_Info,
+label_defs: []Label_Definition, …)`.
+
+---
+
+## 10. Generated tables & builders
+
+### `encoding_table.odin` (hand-written master)
+
+```odin
+ENCODING_TABLE: [Mnemonic][]Encoding = { .MOV = { …forms… }, … }
+```
+The single source of truth. `encode()` does `ENCODING_TABLE[mnemonic]`
+(O(1)) then linear-scans the forms via `encoding_matches_inline`.
+
+### `decoding_tables.odin` (generated from `ENCODING_TABLE`)
+
+```odin
+ModRM_Info :: struct #packed { mod, reg, rm: u8, has_sib: bool, disp_size: u8 }
+SIB_Info   :: struct #packed { /* scale, index, base */ }
+Decode_Entry     :: struct { esc: Escape, prefix, opcode, ext: u8,
+                             mnemonic: Mnemonic, ops: [4]Operand_Type,
+                             enc: [4]Operand_Encoding, flags: Encoding_Flags }
+VEX_Decode_Entry :: struct { …Decode_Entry fields + vex_w: VEX_W, vex_l: VEX_L }
+Decode_Index     :: struct { start: u16, count: u8 }   // range into entries
+
+MODRM_TABLE[256], SIB_TABLE[256]
+LEGACY_DECODE_ENTRIES[1266], VEX_DECODE_ENTRIES[667], EVEX_DECODE_ENTRIES[418]
+DECODE_INDEX_LEGACY[4][256], DECODE_INDEX_ESC_0F/_0F38/_0F3A[4][256]
+VEX_INDEX_0F/_0F38/_0F3A[4][256], EVEX_INDEX_0F/_0F38/_0F3A[4][256]
+```
+`[prefix][opcode] -> Decode_Index` gives O(1) opcode resolution; the
+small `count` range is scanned for ModR/M-ext, operand-size, or VEX.W/L
+disambiguation.
+
+### `mnemonic_builders.odin` (generated, ~7,477 procs + ~2,338 overload groups)
+
+Typed memory wrappers `Mem8 … Mem512` (distinct structs over `Memory`)
+with constructors `mem8 … mem512`. Per-form typed procs like
+`inst_mov_r64_r64(dst: GPR64, src: GPR64) -> Instruction`, each grouped
+into an overload set:
+
+```odin
+inst_mov :: proc{ inst_mov_r8_r8, inst_mov_r64_r64, inst_mov_r64_imm64, … }
+emit_mov :: proc{ emit_mov_r8_r8, … }
+```
+So `x86.inst_mov(.RAX, .RBX)` resolves the right encoding at compile time
+with full type checking, no runtime dispatch.
+
+---
+
+## 11. Tools (`x86/tools/`)
+
+| File | Package | Role |
+|---|---|---|
+| `gen_decode_tables.odin` | `main` (`-file`) | walk `ENCODING_TABLE` → emit `decoding_tables.odin` |
+| `gen_mnemonic_builders.odin` | `main` (`-file`) | walk `ENCODING_TABLE` → emit `mnemonic_builders.odin` |
+| `verify_tables.odin` | `main`, imports `x86 "../"` | check decode tables consistent with `ENCODING_TABLE` |
+
+Tests live in `x86/tests/test.odin` (`package x86_tests`, `import x86 "../"`),
+run with `odin run x86/tests`.
+
+---
+
+## Known drift (pre-existing, not from the move)
+
+The working tree had uncommitted edits to `operands.odin`/`printer.odin`
+that **renamed the memory constructors** but did not update callers:
+
+- `mem_base_displacement` → `mem_base_disp`
+- `mem_base_index_displacement` → `mem_base_index_disp`
+- `mem_rip_relative` → `mem_rip_disp`
+- `mem_base` repurposed from *constructor* to *accessor*
+
+Result: the library compiles, but `tests/test.odin` (and the README
+examples) reference the old names and currently fail to type-check.
+Fixing requires either restoring the old constructor names or sweeping
+the tests/README to the new ones — a deliberate decision left to you.