Commit Graph

7850 Commits

Author SHA1 Message Date
Brendan Punsky
078015bc34 rexcode/x86: pre-matched encode hint + repair the typed builders
Targeted branchless revert + the pre-matched form fast path, and a fix
for a pre-existing bug the latter surfaced.

(a) Revert the two speculative-write spots from the prior branchless pass
    (legacy-prefix emission, widened displacement store, ENCODE_TAIL_SLACK)
    back to predicted branches. In real streams a legacy prefix is almost
    always absent and disp size is stable, so those branches are ~free and
    the unconditional stores only added work. Every class got faster
    (RET 19->17.5, MOV r,r 52->46.6, VADDPS 42.8->39.3 ns).

(b) Pre-matched form hint. Instruction.enc_hint (in the existing 11-byte
    padding, idx+1 biased; 0 = matcher path) lets a typed builder that maps
    to a single value-independent form bake the global form index, so
    encode() skips the O(forms) match scan -- and, in a varied stream, its
    unpredictable branches. Generated for non-immediate forms only (value-
    dependent imm8/imm32 selection stays on the matcher). On a 100k mixed
    typed-builder stream: 47.3 -> 30.2 ns/inst (-36%), byte-identical to the
    matcher path -- ~2x the original baseline for codegen.

Repair the typed inst_/emit_ builders. They were non-functional: the
generator cast the hw-only typed enum straight to Register
(Register(GPR64.RAX) -> class 0), so every typed-builder operand was
rejected by the matcher (encode returned empty). Untested because the
suite builds via the generic constructors. Now they build through the
class-correct op_gpr64/op_xmm/... path (op_* already used by 3+ operand
builders), emit_ reuses inst_, and a new 30-case consistency suite
asserts typed == generic (llvm-verified) and hint == matcher.

gen/builders/check/test/idempotent all green; 2276 cases.
2026-06-18 21:04:18 -04:00
Brendan Punsky
8387731357 rexcode/x86: branchless hot paths + single-pass operand resolution
Three layers on the x86 encode/decode hot paths, all byte-exact (2246
LLVM-verified cases) and roundtrip-clean:

1. Branchless: legacy-prefix emission (speculative write + conditional
   advance), REX/VEX/EVEX extension-bit accumulation (gate-and-mask),
   ModRM mod/disp-size selection (cmov selects), displacement emission
   (widened store + ENCODE_TAIL_SLACK); decoder REX/VEX/EVEX register
   extensions (arithmetic instead of if/+=8).

2. Resolve-operands-once: the previous code re-derived each user operand
   ~5-10x per instruction (a fresh O(n) scan of enc.ops per emission
   pass). Now resolved into a [4]^Operand map a single time.

3. Single-pass gather: fold the opcode-+rb and ModR/M slot-detection
   scans into that one resolve pass (3 enc.enc passes -> 1).

Net on a 100k mixed-instruction benchmark: encode ~58 -> ~54 ns/inst
(best 52). Branchless alone was a ~7% encode regression (predicted
branches, nothing to recover); the algorithmic passes recovered it and
beat baseline.
2026-06-18 20:16:26 -04:00
Brendan Punsky
daa5b7cb79 rexcode: add core:rexcode/ir — the IR API layer (no concrete IR yet)
A sibling to core:rexcode/isa for the intermediate representations (WASM,
SPIR-V, LLVM bitcode + the LLVM dialects AIR/DXIL). Holds the shared
vocabulary every IR package builds on, implements no specific IR.

Design stance (see docs/ir_design.md): keep the ISA layer's spirit, but
where IRs are structurally MORE uniform than ISAs (SSA + a type system
regularize the operand/module shape), the shared core is richer. ir/ owns:

  status.odin  Error/Error_Code (shape-identical to isa.Error)
  refs.odin    Id/Ref/Ref_Space/Symbol_Table (the label analog: structural
               id references, not PC-relative byte offsets)
  types.odin   Type/Type_Ref/Type_Kind (the type table -- no ISA analog)
  module.odin  Module/Function/Block/Operation/Operand/Result/Dataflow
               (the structured model; Operation = isa.Instruction + an
               optional typed Result, opcode a u16 like Mnemonic)
  print.odin   token kinds + options + num-fmt (parallels isa.print)

Three honest concessions vs the ISA API, made explicit not inert: a
structured Module replaces the flat []Instruction; a first-class type
system; id-based entity refs replace labels. The encode/decode verbs take
a Module and drop label_defs/resolve/base_address. Dataflow hosts both the
WASM value stack and SSA; the codec is pluggable (table for WASM/SPIR-V,
bitstream for the LLVM family -- AIR/DXIL are LLVM dialects, not peers).

Package compiles; a hand-built SSA module round-trips through the types.
2026-06-18 19:03:27 -04:00
Brendan Punsky
95df04fbe1 rexcode: re-house ISA packages under core:rexcode/isa/<arch>
Move all ten ISA packages (x86, arm32, arm64, mips, riscv, ppc, ppc_vle,
rsp, mos6502, mos65816) from core/rexcode/<arch> to core/rexcode/isa/<arch>,
so the import pattern is now `import "core:rexcode/isa/x86"`. The shared
core stays at core:rexcode/isa.

Mechanical: relative `import "../isa"` / "../../isa" -> absolute
"core:rexcode/isa" (the only path that survives the move; the "../" and
"../.." self/generated imports move with their packages). build.lua now
builds paths as <root>/isa/<name>; stale `cd <arch>` hints in the verify
tools and the doc.odin paths updated.

WASM stays at core/rexcode/wasm for now -- it is an IR, not an ISA, and
will move under the forthcoming core:rexcode/ir once that layer lands.

All 10 arches gen/builders/check/test green; import core:rexcode/isa/x86
verified working; wasm still compiles.
2026-06-18 19:03:27 -04:00
gingerBill
1060fd4c72 Factor out reloc group logic 2026-06-18 15:21:05 +01:00
gingerBill
84e7e04816 Handle relocation groups 2026-06-18 15:15:53 +01:00
gingerBill
51436077c9 Begin work on printing the WAT format 2026-06-18 15:10:42 +01:00
gingerBill
3199ea266e Update ENCODING_TABLE to support arity count and tail-call instructions 2026-06-18 14:45:36 +01:00
gingerBill
5272f5a4f0 Simplify the printer even further to how only show relevant things 2026-06-18 11:35:53 +01:00
gingerBill
c002470b8d Do not print out the redundant aspects of the custom name section 2026-06-18 10:58:55 +01:00
gingerBill
e94d57f650 Remove dead parameter 2026-06-18 10:58:32 +01:00
gingerBill
e404dafaf0 Merge branch 'bill/rexcode' of https://github.com/odin-lang/Odin into bill/rexcode 2026-06-18 10:49:34 +01:00
Brendan Punsky
4cc6977321 Merge origin/bill/rexcode: struct repack (#raw_union #packed), wasm arch
Merge gingerBill's latest into bill/rexcode. His changes: minimize the
Instruction/Operand structs across ISAs with packed raw-unions (+ the
compiler support for #raw_union #packed), the new core:rexcode/wasm arch
and wasm/module, encode() now returns (byte_count, ok) instead of a Result
struct, decode_one made public, and assorted formatting/inlining.

Conflict: arm64/tests/pipeline_smoke.odin CSEL test -- kept the generated
4-arg inst_csel(dst,src,src2,cond) (mnemonic_builders.odin is generated,
not from Bill's branch) and adopted Bill's (byte_count, success) encode
signature.

Required rebuilding ./odin from the merged source for the packed-union
syntax. Re-validated after the repack: regenerated all artifacts
(idempotent -- no spurious churn), all 10 arches gen/builders/check/test
green, and byte-compared the new arm32 BF + mips PS/MMI/DSP/R6 forms to
confirm no field truncation. arm64/arm32/mips still 100%.
2026-06-18 05:44:48 -04:00
Brendan Punsky
83bdd501a3 rexcode: remove dead BFCSEL else-target scaffolding; tidy mips COPY specgen
BFCSEL's else-target turned out to be the implicit fall-through, so the
BF_BELSE operand encoding, the BFCSEL_ELSE_T32 relocation, and their
encoder/decoder cases were never referenced by any table entry. Remove
them. Also restructure the MSA COPY specgen loop so COPY_U only iterates
.B/.H (COPY_U.W is mips64-only and emitted in the mips64 section), which
drops the spurious 'skipped COPY_U_W' message. No functional change to any
generated encode form; arm64/arm32/mips all still 100%, 461/600/281 tests
green.
2026-06-18 05:29:20 -04:00
Brendan Punsky
c8851c546d rexcode/arm32: BFCSEL -> Branch Future complete, arm32 100%
BFCSEL = bf-point + true-target (hw1, like BF) + 4-bit condition at
hw0[5:2], base 0xF002E001 (hw0[1] is a static marker). The else-target is
the architectural fall-through, so it is not a separate operand -- BFCSEL
is modelled as three operands and reproduces llvm-mc's bytes exactly
(f082e003 / f102e803 / f086e003 across boff/true/cond variations).

Every encodable arm32 Mnemonic now has an encode form (gap = 0). 600
tests green.
2026-06-18 04:55:26 -04:00
Brendan Punsky
808716517e rexcode/arm32: Branch Future BF/BFL/BFLX/BFI_BR encode forms
Reverse-engineered the ARMv8.1-M Branch Future T32 encoding from llvm-mc:
bf-point imm4 = (label-(PC+4))/2 at hw0[10:7]; branch target val =
(label-(PC+4))/2 with J at hw1[11] and imm10 at hw1[10:1]; BFLX/BFX target
is Rm at hw0[3:0]. New REL_BF operand + BF_BOFF/BF_BLOC/BF_RM encodings +
BF_BOFF_T32/BF_BLOC_T32 relocations with resolver. BF=0xF040E001,
BFL=0xF000C001, BFLX=0xF070E001, BFI_BR=0xF060E001.

Tightened the WLSTP/DLSTP masks to mark hw0[6] static (it is always 0 for
valid B/H/W/D sizes) so they no longer shadow the BF register forms.
Byte-exact vs llvm-mc with resolved bf-point/target offsets; 600 tests
green. (BFCSEL still pending -- it adds an else-target + condition.)
2026-06-18 04:47:03 -04:00
Brendan Punsky
c6edd6d5cd rexcode/mips: R5900 MMI MADD/MSUB, RDPGPR/WRPGPR; drop BPOSGE64 -> 100%
PS2 R5900 MMI: MSUB1/MSUBU1 (second-MAC, SPECIAL2 func +0x20 exactly like
the implemented MADD1/MADDU1) and the three-operand MADD_EE/MADDU_EE/
MSUB_EE/MSUBU_EE (write Rd as well as HI/LO; the Rd!=0 form selected by a
less-specific mask after the two-operand MADD/MSUB and PLZCW match).
RDPGPR/WRPGPR (COP0 shadow-GPR move, hand-encoded from the MIPS32r2 manual
since llvm-mc gates them). Drop BPOSGE64: not a real ISA instruction
(DSPControl.pos is 6-bit, only BPOSGE32 exists; llvm rejects it).

Every encodable mips Mnemonic now has an encode form (gap = 0). All
self-consistent and decode-clean; 281 tests green.
2026-06-18 04:17:50 -04:00
Brendan Punsky
61a62185b8 rexcode/mips: R6 compact branches (BEQC/BNEC/BLTC/BGEC/.../BLTZC)
All ten two-/one-register R6 compact branches, byte-exact vs llvm-mc. The
signed forms share POP26/POP27 (opcodes 22/23) with the pre-R6 BLEZL/BGTZL
and with each other; the decode-entry mask sort tries the more-specific
rt=0 / rs=0 forms first, and a small operand-aware hook in
decode_one_inline recovers BGEZC/BLTZC (rs==rt) from the general BGEC/BLTC.

Where R6 reuses a pre-R6/PSP major opcode (BEQC vs ADDI at opcode 8, etc.)
decode is inherently ISA-mode-dependent and resolves to the legacy form;
the R6 encode side is exact. 281 tests green.
2026-06-18 04:12:20 -04:00
Brendan Punsky
ff2bf13121 rexcode/mips: R6 PC-relative loads LWPC/LWUPC/LDPC
New REL19/REL18 operand types + BRANCH_19/BRANCH_18 encodings + REL_PC19/
REL_PC18 relocations (R6 PC-relative semantics: offset is relative to the
instruction's own address, no delay-slot adjustment; LDPC aligns the PC
down to 8 and scales by 8). LWPC (mips32r6), LWUPC/LDPC (mips64r6).
Byte-exact vs llvm-mc and decode-clean; 281 tests green.
2026-06-18 04:05:32 -04:00
Brendan Punsky
eab483a527 rexcode/mips: paired-single FMA + conditional-move forms (spec-derived)
MADD/MSUB/NMADD/NMSUB.PS and MOVN/MOVZ/MOVF/MOVT.PS. This llvm-mc only
knows the .S/.D variants, so these are derived from the llvm-verified
single forms by switching the data-format field to PS (COP1X FMA fmt is
bits 2:0, S=0 -> PS=6; COP1 conditional-move fmt is bits 25:21, S=16 ->
PS=22), per the MIPS64 manual. Same operand slots/masks. Decode-clean and
281 tests green.
2026-06-18 03:59:04 -04:00
Brendan Punsky
09c1d5ba0f rexcode/mips: paired-single FP + mips64 MSA element forms
Parameterize the specgen oracle with a per-family llvm-mc command so
64-bit-FPU and mips64 forms can be assembled. Paired-single CVT_PS_S,
CVT_S_PL/PU, PLL/PLU/PUL/PUU.PS (via -mcpu=mips64r2). mips64-only MSA
INSERT_D and COPY_U_W (via the mips64 triple). Byte-exact vs llvm-mc and
decode-clean; 281 tests green.
2026-06-18 03:55:30 -04:00
Brendan Punsky
f290347c24 rexcode/mips: DSP ASE replicate-immediate forms (REPL.PH/QB)
REPL.PH (signed 10-bit broadcast, reuses MSA_S10) and REPL.QB (8-bit,
reuses MSA_I8). Byte-exact vs llvm-mc including a negative .PH immediate;
281 tests green.
2026-06-18 03:46:50 -04:00
Brendan Punsky
5b91624cd3 rexcode/mips: DSP ASE extract-from-accumulator forms
New EXT_SIZE encoding (5-bit extract size at 25:21). EXTPDP (immediate
size), and the variable forms EXTPDPV / EXTRV_R.W / EXTRV_RS.W / EXTRV_S.H
(extract via a GPR-specified position). Byte-exact vs llvm-mc and decode-
clean; 281 tests green.
2026-06-18 03:45:23 -04:00
Brendan Punsky
82f62ce9a9 rexcode/mips: DSP ASE accumulator multiply-add / shift forms
New AC_NUM (accumulator ac0..ac3 at bits 12:11) and SHILO_IMM (signed
6-bit at 25:20) encodings. DPA/DPAX/DPS/DPSX.W.PH and MAQ_S/MAQ_SA.W.PHL/
PHR (multiply-accumulate into a DSP accumulator), plus MTHLIP, SHILOV and
SHILO (accumulator shift). Spot-checked byte-exact vs llvm-mc and decode-
clean, including a negative SHILO immediate; 281 tests green.
2026-06-18 03:43:05 -04:00
Brendan Punsky
8fed538afc rexcode/mips: MSA branch-on-zero/non-zero forms (BZ/BNZ)
BZ/BNZ .B/.H/.W/.D/.V (branch if any/all elements zero/non-zero): a
specgen branch emitter that derives the opcode+Wt bits then marks the
16-bit PC-relative offset variable, reusing the existing REL16/BRANCH_16
relocation machinery. The offset is emitted as a relocation (label
target). 10 forms, opcode+Wt byte-exact vs llvm-mc and decode-clean.

The R6 two-/one-register compact branches (BEQC/BNEC/BLTC/BGEC/.../BLTZC)
are deferred: they share POP major opcodes disambiguated only by the
rs/rt relationship, which the opcode+mask decode model can't express
without operand-aware logic. 281 tests green.
2026-06-18 03:39:55 -04:00
Brendan Punsky
56cfbc675a rexcode/mips: DSP ASE shift-by-immediate forms
New DSP_SA encoding (shift amount at bits 24:21). SHRA.QB/SHRA_R.QB
(.QB 3-bit), SHRA_R.PH/SHRL.PH (.PH 4-bit). Byte-exact vs llvm-mc;
281 tests green.
2026-06-18 03:33:47 -04:00
Brendan Punsky
c2de507bb0 rexcode/mips: FPU FMA, MSA COPY/INSERT, DSP 2-register, DI/EI/RDHWR
New FR (FP reg at 25:21) encoding for the COP1X 4-register fused
multiply-adds MADD/MSUB/NMADD/NMSUB.S/.D. New GPR_AT_6 / GPR_AT_11
encodings (GPR in a vector-register slot, with correct GPR decode) for
MSA COPY_S/U (lane->GPR) and INSERT (GPR->lane). DSP two-register
PRECEQU/PRECEU (.PH.QBLA/QBRA) and REPLV (.PH/.QB). Control ops DI/EI and
RDHWR. 25 forms; spot-checked byte-exact vs llvm-mc and decode-clean; 281
tests green.
2026-06-18 03:31:40 -04:00
Brendan Punsky
930b988ebf rexcode/mips: FPU conditional-move + convert-to-FP forms
MOVN/MOVZ.S/.D (FP move on GPR nonzero/zero, enc {FD,FS,RT}), MOVF/MOVT.
S/.D (FP move on FP condition code, enc {FD,FS,FCC_BC}), and the
convert-to-FP forms FCVT_D_W/S_D/S_W (cvt.d.w/cvt.s.d/cvt.s.w). 11 forms.
Spot-checked byte-exact vs llvm-mc and decode-clean; 281 tests green.
2026-06-18 03:27:23 -04:00
Brendan Punsky
5b47f0ca29 rexcode/mips: MSA INSVE + DSP ASE 3-register/compare/shift forms
MSA INSVE (.B/.H/.W/.D element insert). DSP ASE three-register ops
(ADDU/SUBU/MULEQ/MULEU/MULQ/PRECRQ*/PICK/CMPGU, enc {RD,RS,RT}), the
variable shifts SHLLV/SHRAV/SHRLV (enc {RD,RT,RS} -- value is Rt, shift is
Rs), and the compares CMP/CMPU (.PH/.QB, {RS,RT}). 38 forms reusing the
existing GPR R-type slots. Spot-checked byte-exact vs llvm-mc; 281 tests
green.
2026-06-18 03:24:20 -04:00
Brendan Punsky
4ab24007b7 rexcode/mips: MSA BIT-shift, element-index, GPR-index, I8 forms
New MSA_BIT_SHIFT / MSA_ELM_IDX / MSA_I8 encodings (the data-format marker
is fixed in the entry bits; the operand drives the low bits; decode infers
df from the marker). SLLI/SRAI/SRLI (.B/.H/.W/.D shift), SPLATI/SLDI
(element index), SPLAT/SLD (GPR index), VSHF (.B/.H/.W/.D shuffle), and
the I8 forms ANDI/ORI/XORI/NORI/BMNZI/BMZI/BSELI.B + SHF.B/H/W. 42 forms.
Spot-checked byte-exact vs llvm-mc and decode-clean across all formats;
281 tests green.
2026-06-18 03:17:39 -04:00
Brendan Punsky
307aa2a9dd rexcode/mips: MSA 3RF/3R/2R/2RF/VEC encode forms (specgen)
New mips specgen (llvm-mc --triple=mips --mattr=+msa as the bits oracle,
big-endian words, empirical masks): vector FP arithmetic/compare FADD/
FSUB/FMUL/FDIV/FMAX/FMIN/FCEQ/FCLE/FCLT/FCNE (.W/.D), dot product DOTP_S/U
(.H/.W/.D), count/popcount NLOC/NLZC/PCNT (.B/.H/.W/.D), one-source FP
FSQRT/FRSQRT/FRCP/FRINT/FTRUNC_S/U/FFINT_S/U (.W/.D), and bit-select
BMNZ/BMZ/BSEL.V. 57 forms reusing the existing WD/WS/WT slots. Spot-
checked byte-exact vs llvm-mc and decode-clean; 281 tests green.
2026-06-18 03:11:41 -04:00
Brendan Punsky
e4cff78a70 rexcode/arm32: document BF family as intentionally unimplemented
The 5 Branch Future mnemonics (BF/BFI_BR/BFL/BFLX/BFCSEL) are left
enum-only on purpose: deprecated ARMv8.1-M, not disassemblable by
llvm-objdump (so unverifiable), and a correct encoder needs dual-offset
PC-relative relocation infrastructure that doesn't exist. Noted in the
enum for future readers.
2026-06-18 03:05:25 -04:00
Brendan Punsky
a63fb51fdd rexcode/arm32: MVE VMLSV/VMLSVA (correct 3-bit Q regs); drop placeholders
Implement VMLSV/VMLSVA (MVE multiply-subtract reduce) properly: new
VN_Q_MVE (Qn at 19:17) and VM_Q_MVE (Qm at 3:1) encodings -- the actual
3-bit MVE Q fields -- with Rd at 15:12 (RDLO_A32). The earlier collision
was from reusing the 4-bit VN_Q (19:16) and RD_T32 (11:8), which place
the fields wrong; byte-exact vs llvm-mc now with distinct Qn/Qm/Rd.

Drop three placeholder/redundant enum entries: VRINT and VPRINT (not real
instructions -- llvm rejects bare 'vrint'; VPRINT is a printf-like debug
pseudo-op), and VRSHL_MVE (the author's own comment marks it a
placeholder; 'vrshl q,q,q' already decodes via VRSHL's MVE form). 600
tests green, verify matches llvm-mc.
2026-06-18 01:58:19 -04:00
Brendan Punsky
239dea4f55 rexcode/arm32: MVE VHCADD (saturating halving complex add) + VCMLA
New MVE_ROT_HCADD (#90/#270 at bit12) and MVE_ROT_CMLA (#0/90/180/270 at
bits 24:23) rotation encodings -- the rotation degrees round-trip
properly (unlike the existing FCMA VCMLA which leaves it unencoded). One
form each with the element-size bits left variable (MVE convention).
Verify round-trips; all rotations byte-exact vs llvm-mc; 600 tests green.

(VMLSV/VMLSVA reduce ops deferred: their format decode-collides with
other MVE encodings given the 4-bit VN_Q vs MVE's 3-bit Qn.)
2026-06-18 01:47:44 -04:00
Brendan Punsky
55463b6719 rexcode/arm32: VMOV (ARM core register to scalar) Dd[lane], Rt
New VMOV_LANE_8/16/32 encodings: Dd at bits 19:16+bit7, lane bits per
element size (.8 = bit21:bit6:bit5 with bit22 size marker; .16 =
bit21:bit6 with bit5 marker; .32 = bit21). Verify round-trips all three
sizes; spot-checked .8 byte-exact incl. max lane; 600 tests green.
2026-06-18 01:34:48 -04:00
Brendan Punsky
5df81b5117 rexcode/arm32: VQDMULH/VQRDMULH by-scalar-lane
New NEON_VM_SCALAR16/32 encodings for the Dm[lane] scalar operand: .16
places Dm in D0..D7 (bits 2:0) with the lane split bit5:bit3, .32 places
Dm in D0..D15 (bits 3:0) with the lane at bit5. VQDMULH_LANE and
VQRDMULH_LANE across .s16/.s32, D and Q destinations (8 forms). Verify
round-trips; spot-checked byte-exact incl. max register/lane and
decode-clean; 600 tests green.
2026-06-18 01:29:19 -04:00
Brendan Punsky
acc14864f3 rexcode/arm32: DCPS1/DCPS2/DCPS3 (debug change PE state)
Fixed T32 encodings (0xF78F8001/2/3), no operands. Verify round-trips;
600 tests green.
2026-06-18 01:25:51 -04:00
Brendan Punsky
b2b14998f7 rexcode/arm32: VRSRA, VRECPE_F/VRSQRTE_F, VPADD_F, VCVTR
VRSRA (NEON rounding shift-right-accumulate, D/Q, mirrors VSRA's raw
imm6 convention), VRECPE_F/VRSQRTE_F (FP reciprocal/rsqrt estimate, D/Q),
VPADD_F (FP pairwise add, f32/f16), and VCVTR (VFP convert-to-integer
using the FPSCR rounding mode; s32/u32 from f32 and f64). Hand-written
mirroring the existing VSRA/VRECPE/VPADD/VCVT forms. Built-in llvm
round-trip verify passes; spot-checked byte-exact; 600 tests green.
2026-06-18 01:22:12 -04:00
Brendan Punsky
59750926d9 rexcode/arm32: unprivileged (translate) post-indexed loads/stores
LDRT/LDRBT/STRT/STRBT (imm12) and LDRHT/STRHT/LDRSBT/LDRSHT (imm8 split):
each is the corresponding post-indexed load/store with the W bit (21)
set. Hand-written, reusing the existing MEM_POST_INDEX encoding. All 8
byte-exact vs llvm-mc and decode-clean; 600 tests green.
2026-06-18 01:17:34 -04:00
Brendan Punsky
6fd233f041 rexcode/arm32: NEON long/wide/compare/shift encode forms (specgen)
New arm32 specgen (llvm-mc --triple=armv8a --mattr=+neon as the bits
oracle, empirical masks): VADDL/VSUBL/VABAL/VABDL (Qd,Dn,Dm) and
VADDW/VSUBW (Qd,Qn,Dm) across s/u 8/16/32; the compare aliases
VCLE/VCLT (= VCGE/VCGT with Vn/Vm swapped) and VACLE/VACLT (= VACGE/VACGT
swapped, f32); and VQRSHL shift-by-vector. 84 forms over 11 mnemonics.
Built-in llvm round-trip verify passes; spot-checked byte-exact with
distinct Q/D registers; 600 tests green.
2026-06-18 01:15:22 -04:00
Brendan Punsky
fe7b81d64f rexcode/arm64: drop vestigial/redundant mnemonics; alias redundant SME names
Remove from the Mnemonic enum: LDARB_X/LDARH_X/STLRB_X/STLRH_X (no
distinct byte/half acquire-release 'X' encoding exists -- LDARB/LDARH/
STLRB/STLRH already cover them), and the 12 redundant SME names
SME_LD1{B,H,W,D,Q}_ZA / SME_ST1{...}_ZA / SME_MOVA_TO_Z / SME_MOVA_TO_ZA
(same instructions as the canonical *_TILE / MOVA_*_FROM_* forms).

The builder generator now emits delegating aliases for the redundant SME
names (inst_sme_ld1b_za :: inst_sme_ld1b_tile, ...), so the convenient
names keep working and resolve to the canonical, decode-unambiguous
encodings. With XAR_Z landed, the arm64 Mnemonic enum is now 100%
covered: every entry has an encode form. 461 tests green.
2026-06-18 00:42:37 -04:00
Brendan Punsky
303fa9e509 rexcode/arm64: SVE2 XAR (exclusive-or and rotate) encode form
XAR Zdn.T, Zdn.T, Zm.T, #rotate across .B/.H/.S/.D. New SVE_XAR_SHIFT
encoding: the rotate amount is V = 2*esize - amount, split across
tszh(23:22):tszl(20:19):imm3(18:16); the element size is selected by the
Z register type on encode and recovered from the highest set bit of
tszh:tszl on decode (so the amount round-trips for every esize).
vec_esize now also handles Z_REG_B/H/S/D. All six representative forms
byte-exact vs llvm-mc and decode-clean; 461 tests green.
2026-06-18 00:39:48 -04:00
Brendan Punsky
33e5202f05 rexcode/arm64: single-structure lane load/store (LD1-4_LANE / ST1-4_LANE)
All eight LD#_LANE / ST#_LANE mnemonics across .B/.H/.S/.D (32 forms).
New NEON_LANE_B/H/S/D encodings split the lane index across Q (bit 30),
S (bit 12) and size (bits 11:10) per element size; the list length and
load/store bit are fixed in the entry bits. All 11 representative forms
(every element size, structure count, and lane extremes) byte-exact vs
llvm-mc and decode-clean; 461 tests green.
2026-06-18 00:21:43 -04:00
Brendan Punsky
2c8768b39a rexcode/arm64: TBL/TBX + structured LD2-4/ST2-4 + LD1R-4R encode forms
Table lookup TBL/TBX (.8b/.16b, single-register table) and the multi-
register structured load/store LD2/LD3/LD4, ST2/ST3/ST4 plus load-and-
replicate LD1R/LD2R/LD3R/LD4R (.16b). Following the existing LD1/ST1
convention: the register list is encoded by its first register, with the
list length + arrangement fixed in the bits. All 13 representative forms
byte-exact vs llvm-mc and decode-clean; 461 tests green.

(The single-lane _LANE variants need the Q:S:size lane-index split and
are left for a follow-up.)
2026-06-18 00:17:12 -04:00
Brendan Punsky
69157b7ec5 rexcode/arm64: SME ADDHA/ADDVA (ZA outer-sum accumulate)
ADDHA/ADDVA ZAda.S, Pn/m, Pm/m, Zn.S via a new ZA_TILE_LOW encoding
(accumulator tile at bits 2:0; Pn at 12:10, Pm at 15:13, Zn at 9:5).
Byte-exact vs llvm-mc and decode-clean across tile/predicate/Zn fields.

The other 11 missing SME enum names (SME_LD1*/ST1*_ZA, SME_MOVA_TO_Z/ZA)
are redundant aliases of the already-implemented SME_LD1*/ST1*_TILE and
SME_MOVA_*_FROM_* forms -- adding duplicate encodings collides in the
decode table (broke a roundtrip test), so they are intentionally left to
the existing canonical forms. 461 tests green.
2026-06-18 00:14:21 -04:00
Brendan Punsky
68aac263d0 rexcode/arm64: SVE FFR/BRKN/CPY/EXT/MOV aliases (10 more, SVE 47/48)
FFR ops (SETFFR/RDFFR/WRFFR) and BRKN (destructive, Pdm re-packs Pd) via
specgen; CPY (predicated from GPR), EXT (destructive, imm8 split via new
SVE_EXT_IMM), MOV-predicated (=SEL with Zm=Zd, via ZD_ZM_DUP), and the
predicate aliases NOT/MOVS/MOV (EOR/ORR/AND with a duplicated predicate
field, via PG4_PM_DUP/PN_PM_DUP/PN_PG_PM_DUP). All byte-exact vs llvm-mc;
the predicate aliases decode to their canonical base op (identical bytes,
as expected). 461 tests green.

(SVE_XAR_Z deferred: its tsz:imm3 shift field does not follow the NEON
immh:immb scheme and needs a bespoke esize-from-Z encoder.)
2026-06-18 00:09:21 -04:00
Brendan Punsky
cd8703acd4 rexcode/arm64: SVE predicated/compare/predicate-logical/SVE2 encode forms (37)
Predicated FP round (FRINTN/P/M/Z/A/X/I, FRECPX), reversed predicated
shifts (ASRR/LSLR/LSRR) and FP (FSUBR/FDIVR), FP compare (FCMEQ/GE/GT/
NE/UO + vs-zero FCMLE/FCMLT), integer compare aliases (CMPLE/LO/LS/LT),
predicate logical (NANDS/NORS/ORNS), predicate break (BRKPA/BRKPB,
BRKA/BRKB + flag-setting BRKAS/BRKBS), SVE2 EOR3/BCAX, INSR, COMPACT.

New specgen SVE section: a generic emitter assembles each form all-zero
then one variant per field at its max (Z 31, 3-bit Pg 7, 4-bit Pd/Pg/Pn/
Pm 15, GPR wzr/xzr) and derives mask = ~union. Operand placements
verified vs llvm-mc: the reversed/destructive ops put Zm at VN (5-9); the
CMPLE/LO/LS/LT aliases swap operands (VM/VN); EOR3/BCAX place the 3rd src
at VM and 4th at VN. All 22 representative forms byte-exact and
decode-clean; 461 tests green. (BRKN + CPY/EXT/MOV/NOT_P/FFR/XAR
stragglers next.)
2026-06-17 23:59:23 -04:00
Brendan Punsky
8006b5f7e2 rexcode/arm64: NEON MOVI/MVNI + FMOV scalar/vector immediate forms
MOVI (8B/16B/4H/8H/2S/4S/2D) and MVNI (4H/8H/2S/4S) via specgen (imm8 in
abc:defgh, cmode/op/Q static per arrangement; .2D probed with all-ones
since its asm immediate is the replicated 64-bit value). FMOV_IMM (scalar
Sd/Dd/Hd, 8-bit float at 20:13 via new FMOV_SCALAR_IMM encoding) and
FMOV_V_IMM (Vd.<2S|4S|2D|4H|8H>, fimm8 in abc:defgh, cmode=1111) hand-
written -- canonical bits with the imm8 fields zeroed (the live float
example would otherwise bake operand bits into the static pattern). All
14 representative forms byte-exact vs llvm-mc and decode-clean; 461 tests
green. (LSL/MSL-shifted MOVI/MVNI variants share the operand signature
and are omitted.)
2026-06-17 23:47:45 -04:00
Brendan Punsky
ab7f20a129 rexcode/arm64: byte/half/signed loads-stores + vector LDP/STP/LDUR/STUR
LDRB/LDRH/STRB/STRH (post-index, pre-index, register-offset),
LDRSB/LDRSH (register-offset, W and X) and LDRSW (register-offset), plus
the vector pair/unscaled forms LDP_V/STP_V (S/D/Q) and LDUR_V/STUR_V
(S/D/Q). Hand-written, reusing the existing OFFSET_BASE_POST/PRE/REG/S9
addressing encodings; canonical bits taken from llvm-mc (operand fields
zeroed). All 23 representative forms byte-exact vs llvm-mc and
decode-clean; 461 tests green.

(LDARB_X/LDARH_X/STLRB_X/STLRH_X left unimplemented: LDARB/LDARH/STLRB/
STLRH are byte/half acquire-release into a W register with no distinct
64-bit 'X' encoding -- these enum entries are vestigial.)
2026-06-17 23:39:01 -04:00
Brendan Punsky
aabcdd41b6 rexcode/arm64: CCMP/CCMN-imm, HINT, MSR-imm, USDOT encode forms
Conditional compare immediate (CCMP_IMM/CCMN_IMM: imm5 at 20:16 via a new
IMM5_HI encoding, bit 11 set), HINT #imm7, MSR <pstatefield>,#imm (new
MSR_PSTATE encoding placing op1 at 18:16 / op2 at 7:5, CRm via the shared
BARRIER_FIELD), and USDOT (I8MM unsigned-by-signed dot product, .2S/.4S).
Hand-written into the core (outside the specgen region). All forms
byte-exact vs llvm-mc and decode-clean; 461 tests green.
2026-06-17 23:34:10 -04:00