Commit Graph

8 Commits

Author SHA1 Message Date
Brendan Punsky
7cd39f1d0d rexcode/arm64: NEON FP two-register + FP across-lanes encode forms
Adds 16 mnemonics (72 forms) via specgen: FP two-register FABS/FNEG/FSQRT, FRINTA/I/M/N/P/X/Z, FRECPE/FRSQRTE; and FP across-lanes FMAXV/FMINV/FMAXNMV/FMINNMV (scalar S/H dst). SP/DP are .NEON, half-precision .FP16.

specgen per-form feature now scans all operands for an FP16 arrangement (handles scalar-dst across-lanes, where FP16 lives on the source). Verified: decode round-trips incl. FP16 (fabs/fsqrt.8h/frecpe/fmaxv), arm64 check + 461 tests pass.
2026-06-15 21:35:41 -04:00
Brendan Punsky
57fbe873d8 rexcode/arm64: NEON floating-point three-same encode forms (incl. FP16)
Adds 17 FP three-same mnemonics (85 forms) via specgen: FMAX/FMIN/FMAXNM/FMINNM, FMULX/FRECPS/FRSQRTS, FACGE/FACGT, FCMEQ/FCMGE/FCMGT (register form), FADDP/FMAXP/FMINP/FMAXNMP/FMINNMP. Single/double forms (2S/4S/2D) are .NEON; half-precision (4H/8H) use the distinct V_4H_FP16/V_8H_FP16 operand types and are tagged .FP16.

specgen gains: --mattr=+fullfp16, FP16 arrangement tokens, per-form feature tagging (derived from the operand arrangement), and per-family arrangement lists. Verified: decode round-trips incl. an FP16 form (fmax/fcmeq.4h/fmulx/faddp), arm64 check + 461 tests pass.
2026-06-15 21:33:25 -04:00
Brendan Punsky
824421853f rexcode/arm64: NEON across-lanes + pairwise-long encode forms
Adds 11 mnemonics (59 forms) via specgen: across-lanes reductions ADDV/SMAXV/SMINV/UMAXV/UMINV (scalar B/H/S dst) and SADDLV/UADDLV (widened scalar dst), plus pairwise-long SADDLP/UADDLP/SADALP/UADALP. The scalar destination packs into the same VD field, so still no encoder change.

specgen.emit generalized to accept scalar-register operand tokens (B/H/S/D) alongside vector arrangements. Verified: decode round-trips (ADDV/SADDLV/UMINV/SADDLP), arm64 check + 461 tests pass.
2026-06-15 21:27:40 -04:00
Brendan Punsky
7ebe042277 rexcode/arm64: NEON wide / narrow / XTN encode forms
Adds 24 more mixed-arrangement mnemonics (72 forms) via specgen: three-different wide (SADDW/UADDW/SSUBW/USUBW), narrowing-halving (ADDHN/SUBHN/RADDHN/RSUBHN), and two-register narrowing (XTN/SQXTN/UQXTN/SQXTUN), plus their high-half '2' variants. All register-only (VD/VN/VM or VD/VN), no encoder change.

specgen refactored to a general arrangement-tuple mechanism (uniform + long/wide/narrow/XTN share one emit path). Verified: decode round-trips (SADDW/ADDHN/XTN/SQXTUN2), arm64 check + 461 tests pass.
2026-06-15 21:22:07 -04:00
Brendan Punsky
f78a3a5573 rexcode/arm64: NEON three-different (long) encode forms
Adds 26 widening long mnemonics (72 forms) via specgen: SADDL/UADDL/SSUBL/USUBL, SMULL/UMULL, SMLAL/UMLAL/SMLSL/UMLSL, SQDMULL/SQDMLAL/SQDMLSL and their high-half '2' variants. Destination arrangement is wider than the sources (Vd.8H, Vn.8B, Vm.8B; the '2' forms read the high half). Encoding stays VD/VN/VM, so no encoder change.

specgen gains a mixed-arrangement THREE_DIFF shape (low/high source-half pairs). Verified: decode round-trips (SMULL/SADDL2/SQDMULL/UMLAL), arm64 check + 461 tests pass.
2026-06-15 21:16:56 -04:00
Brendan Punsky
00b666bbc0 rexcode/arm64: NEON pairwise + variable-shift encode forms
Adds 9 register-only three-same mnemonics (59 forms) via specgen: ADDP/SMAXP/SMINP/UMAXP/UMINP (pairwise) and SSHL/USHL/SRSHL/URSHL (per-lane variable shift). Verified: decode round-trips (ADDP/SSHL/SMAXP/URSHL), arm64 check + 461 tests pass.

Skipped the already-implemented logical/compare/mul forms (AND_V/ORR_V/EOR_V/BIC_V/ORN_V/BSL/BIT/BIF/CMEQ/CMGT/CMHI/MUL_V) to avoid duplicate keys.
2026-06-15 13:01:26 -04:00
Brendan Punsky
d83065e3b8 rexcode/arm64: NEON two-register-misc encode forms
Adds 10 Advanced-SIMD two-register-misc mnemonics (34 forms across valid arrangements) via specgen: NOT/RBIT/REV16/REV32/REV64/CLS/CLZ/CNT/URECPE/URSQRTE. llvm-mc filters illegal arrangements (CNT/NOT only 8B/16B, URECPE/URSQRTE only 2S/4S, ...).

specgen.lua generalized to a SHAPE table (THREE_SAME / TWO_SAME), so adding a family is one row. Verified: decode round-trips (NOT/REV64/CNT/URECPE), arm64 check + 461 tests pass.
2026-06-15 12:57:37 -04:00
Brendan Punsky
e21fa59733 rexcode/arm64: NEON three-same (integer) encode forms + specgen tool
Adds 25 Advanced-SIMD three-same integer mnemonics (153 forms across arrangements) to ENCODING_TABLE: SHADD/UHADD/SHSUB/UHSUB/SRHADD/URHADD, SQADD/UQADD/SQSUB/UQSUB, SMAX/UMAX/SMIN/UMIN, SABD/UABD/SABA/UABA, MLA/MLS, CMGE/CMHS/CMTST, SQDMULH/SQRDMULH.

Introduces tablegen/specgen.lua: compact specs (mnemonic + llvm name + arrangements) -> ENCODING_TABLE blocks, with bits taken from llvm-mc (the oracle) and mask derived empirically (vary registers 0..31). Invalid arrangements are auto-detected via llvm-mc and skipped. Output fills the SPECGEN:BEGIN..END region of encoding_table.odin in place; the hand-written core is untouched.

Verified: decode round-trips for SHADD/SQADD/CMGE/SQRDMULH; arm64 check + 461 tests pass; builders auto-generate (780 -> 805). Caveat: NEON builders currently collapse arrangements (one Register param per V operand) so inst_<mnem> exposes only the first arrangement -- an arrangement-aware builder-gen pass follows.

Author: Brendan Punsky (machine git config user.name is the login 'Flāvius').
2026-06-15 12:52:10 -04:00