The 5 Branch Future mnemonics (BF/BFI_BR/BFL/BFLX/BFCSEL) are left
enum-only on purpose: deprecated ARMv8.1-M, not disassemblable by
llvm-objdump (so unverifiable), and a correct encoder needs dual-offset
PC-relative relocation infrastructure that doesn't exist. Noted in the
enum for future readers.
Implement VMLSV/VMLSVA (MVE multiply-subtract reduce) properly: new
VN_Q_MVE (Qn at 19:17) and VM_Q_MVE (Qm at 3:1) encodings -- the actual
3-bit MVE Q fields -- with Rd at 15:12 (RDLO_A32). The earlier collision
was from reusing the 4-bit VN_Q (19:16) and RD_T32 (11:8), which place
the fields wrong; byte-exact vs llvm-mc now with distinct Qn/Qm/Rd.
Drop three placeholder/redundant enum entries: VRINT and VPRINT (not real
instructions -- llvm rejects bare 'vrint'; VPRINT is a printf-like debug
pseudo-op), and VRSHL_MVE (the author's own comment marks it a
placeholder; 'vrshl q,q,q' already decodes via VRSHL's MVE form). 600
tests green, verify matches llvm-mc.
New MVE_ROT_HCADD (#90/#270 at bit12) and MVE_ROT_CMLA (#0/90/180/270 at
bits 24:23) rotation encodings -- the rotation degrees round-trip
properly (unlike the existing FCMA VCMLA which leaves it unencoded). One
form each with the element-size bits left variable (MVE convention).
Verify round-trips; all rotations byte-exact vs llvm-mc; 600 tests green.
(VMLSV/VMLSVA reduce ops deferred: their format decode-collides with
other MVE encodings given the 4-bit VN_Q vs MVE's 3-bit Qn.)
New VMOV_LANE_8/16/32 encodings: Dd at bits 19:16+bit7, lane bits per
element size (.8 = bit21:bit6:bit5 with bit22 size marker; .16 =
bit21:bit6 with bit5 marker; .32 = bit21). Verify round-trips all three
sizes; spot-checked .8 byte-exact incl. max lane; 600 tests green.
New NEON_VM_SCALAR16/32 encodings for the Dm[lane] scalar operand: .16
places Dm in D0..D7 (bits 2:0) with the lane split bit5:bit3, .32 places
Dm in D0..D15 (bits 3:0) with the lane at bit5. VQDMULH_LANE and
VQRDMULH_LANE across .s16/.s32, D and Q destinations (8 forms). Verify
round-trips; spot-checked byte-exact incl. max register/lane and
decode-clean; 600 tests green.
LDRT/LDRBT/STRT/STRBT (imm12) and LDRHT/STRHT/LDRSBT/LDRSHT (imm8 split):
each is the corresponding post-indexed load/store with the W bit (21)
set. Hand-written, reusing the existing MEM_POST_INDEX encoding. All 8
byte-exact vs llvm-mc and decode-clean; 600 tests green.
New arm32 specgen (llvm-mc --triple=armv8a --mattr=+neon as the bits
oracle, empirical masks): VADDL/VSUBL/VABAL/VABDL (Qd,Dn,Dm) and
VADDW/VSUBW (Qd,Qn,Dm) across s/u 8/16/32; the compare aliases
VCLE/VCLT (= VCGE/VCGT with Vn/Vm swapped) and VACLE/VACLT (= VACGE/VACGT
swapped, f32); and VQRSHL shift-by-vector. 84 forms over 11 mnemonics.
Built-in llvm round-trip verify passes; spot-checked byte-exact with
distinct Q/D registers; 600 tests green.
Every mnemonic with an encode form now has a generated inst_<mnem>/emit_<mnem> overload group. The per-arch generators map ALL operand types — nothing is skipped: arm64 gains shifted/extended registers (multi-param via op_shifted/op_extended), SVE Z-regs + predicates, SME tile/slice, NEON arrangements/lanes, bitmask/sysreg/pattern immediates and condition codes (427 -> 777 mnemonics); arm32 gains shifted/register-shifted regs, register lists, NEON lanes and all encoded-immediate subclasses (479 -> 592); x86 gains m80 and descriptor-table memory operands — FBLD/FBSTP, LGDT/SGDT/LIDT/SIDT, FLD/FSTP, far-indirect JMP/CALL, BOUND (1167 -> 1175).
Mnemonic-specific builders are now fully generated, not hand-written: deleted the hand-written helpers the generated groups collided with — riscv inst_jal/inst_jalr, arm64 inst_b_cond/inst_cbz/inst_tbz/inst_csel, mos6502 inst_tst — and let the generators own those names (arm64 also gains inst_cbnz/tbnz/csinc/csinv/csneg). Updated the affected test call-sites. The generic operand-shape helpers (inst_r_r, inst_r_r_i, inst_ldst, ...) remain as delegation targets.
Decode-only mnemonics with no encode form are correctly left without builders. ppc/ppc_vle/rsp/mos65816 were already complete.
All 10 ISAs: structure + compile + tests pass; generators idempotent.
Add generated mnemonic_builders.odin (inst_<mnem>/emit_<mnem> typed overload sets) for arm32, arm64, mips, riscv, ppc, ppc_vle, rsp, mos6502 and mos65816, matching the existing x86 builders. Each is produced by a per-arch tools/gen_mnemonic_builders.odin that walks ENCODE_FORMS and maps operand types to typed params + op_* constructors.
Anchor every generator's output via #directory so regeneration is CWD-independent; previously the bare "mnemonic_builders.odin" path wrote to the current directory and misfired when run from the repo root.
Wire a --builders task into build.lua (folded into 'all', covered by --idempotent, enforced by the structural invariants) and document it in the README.
Each ISA's hand-written ENCODING_TABLE (the single source of truth) now lives
in a per-arch tablegen/ metaprogram that flattens it and serializes committed
binary blobs; the library #loads those into @(rodata) at compile time rather
than compiling a table body. No arch keeps encoding_table.odin or
decoding_tables.odin -- only a generated tables.odin loader and tables/*.bin.
* Two-stage, type-checked pipeline: tablegen Stage A emits human-readable
generated Odin, which compiles and serializes the blobs in Stage B.
* encode() goes through encoding_forms(m); decoders are unchanged apart from
x86's flattened 2-D index. Decode tables are byte-identical to the old ones.
* build.lua: a LuaJIT driver for the metaprograms, validations, and tests,
with cross-platform gating and a clear report.
* Docs refreshed; the obsolete forward-looking plan in cross_arch_design.md
trimmed to what was actually built.
* Attribution headers added to all rexcode source files; the generators emit
them so generated files keep them.