mirrors/Odin - Odin - Kyren's Code

mirror of https://github.com/odin-lang/Odin.git synced 2026-06-19 16:42:33 +00:00

Author	SHA1	Message	Date
Brendan Punsky	06eb3de6a2	rexcode/arm64: NEON copy/permute (MOV/MVN/DUP/INS/EXT) encode forms MOV_V (ORR alias: source feeds both Vn and Vm via a new VN_VM_DUP encoding), MVN_V (NOT alias, plain 2-register), DUP_V (element form Vd.T,Vn.Ts[i] and general form Vd.T,Wn/Xn), INS (element-to-element and from-GPR), EXT_V (imm4 byte index). Adds a VEC_INDEX operand type plus NEON_IDX5/NEON_IDX4/NEON_EXT_IDX encodings: the element-size marker rides in the entry bits, the lane index drives the bits above it, and the decoder recovers the element size from imm5's marker. Element size now rides in op.size (B=1/H=2/S=4/D=8) via op_v_elem_b/h/s/d so the matcher can disambiguate DUP/INS element forms; the builder generator maps V_ELEM_* to those constructors. specgen derives the mask by varying registers and each index field to its max -- the GPR-source forms vary Vd and Rn independently (Rn 31 = wzr/xzr) so the low bit of each field toggles. All 19 representative forms byte-exact vs llvm-mc and decode-clean; 461 tests green. (TBL/TBX register-list forms deferred.)	2026-06-17 23:23:44 -04:00
Brendan Punsky	5761c23ba4	rexcode/arm64: NEON narrowing-shift (SHRN/SQSHRN/...) encode forms SHRN/2, RSHRN/2, SQSHRN/2, UQSHRN/2, SQRSHRN/2, UQRSHRN/2, SQSHRUN/2, SQRSHRUN/2 (16 forms). Verified against llvm-mc that immh keys on the NARROW destination element (not the wide source), so the existing NEON_SHR_IMM encoder/decoder (esize from ops[0]) is already correct -- this is a specgen-only change: right-shift esize now uses ESIZE[dst] (equal to ESIZE[src] for same-arrangement shifts) plus narrowing {narrow-dst, wide-src} arrangement tuples. All forms byte-exact and decode-roundtrip verified across B/H/S element sizes; 461 tests green.	2026-06-17 23:07:38 -04:00
Brendan Punsky	a1e359b64a	rexcode/arm64: NEON permute, compare-zero, SXTL/UXTL encode forms ZIP1/2, UZP1/2, TRN1/2 (three-same permute); CMLE/CMLT and FP FCMLE/FCMLT (compare against zero, with the literal #0 / #0.0 operand); SXTL/SXTL2/UXTL/UXTL2 (= SSHLL/USHLL #0, plain 2-register widen, shift implicit in the static bits). All reuse the VD/VN/VM register slots, so no encoder change. specgen gains an emit_cmp0 shape plus permute and widen families. All forms byte-exact vs llvm-mc; 461 tests green.	2026-06-17 23:03:53 -04:00
Brendan Punsky	55a141be4f	rexcode/arm64: NEON widening left-shift (SSHLL/USHLL) encode forms Adds SSHLL/SSHLL2/USHLL/USHLL2 (12 forms) via specgen, reusing NEON_SHL_IMM (left shifts need no esize; the size marker is in bits). specgen's shift shape generalized to arrangement pairs {dst, src} with the shift element size taken from the source. Verified: encode matches llvm-mc + decode recovers mnemonic + amount (sshll/sshll2/ushll across widths); arm64 check + 461 tests pass.	2026-06-16 02:22:37 -04:00
Brendan Punsky	e52953c7ff	rexcode/arm64: NEON shift-by-immediate encode forms + encoder extension First encoder-extension family. Adds Operand_Type VEC_SHIFT and Operand_Encoding NEON_SHL_IMM/NEON_SHR_IMM: the element-size marker bit sits in the entry's bits, the encoder packs the amount into the low immh:immb bits (left = shift; right = esize - shift, esize from the vector operand via vec_esize/form.ops[0]), and the decoder recovers esize from immh to compute the amount. Adds 13 mnemonics (91 forms) via specgen: left SHL/SLI/SQSHLU/SQSHL, right SSHR/USHR/SRSHR/URSHR/SSRA/USRA/SRSRA/URSRA/SRI. specgen derives bits/mask empirically by varying registers AND the shift (canon = operand bits zero; other extreme sets all shift bits), so per-arrangement immh discrimination + the growing shift-field width fall out automatically. Verified end-to-end: encode matches llvm-mc byte-for-byte AND decode recovers mnemonic + amount (sshr/shl/sli/ushr/srsra across B/H/S/D); arm64 check + 461 tests pass. First of the encoder-extension phase ([[rexcode-encode-coverage]]); CCMP_IMM imm5@20:16 pattern generalizes here.	2026-06-16 02:19:03 -04:00
Brendan Punsky	ff3a1acdc7	rexcode/arm64: NEON FP widen/narrow (FCVTL/FCVTN/FCVTXN) encode forms Adds 6 mnemonics (10 forms) via specgen: FCVTL/FCVTL2 (widen half->single, single->double), FCVTN/FCVTN2 and FCVTXN/FCVTXN2 (narrow), with FP16 (V_4H_FP16/V_8H_FP16) on the half side. Completes the register-only NEON phase. Verified: decode round-trips, arm64 check + 461 tests pass.	2026-06-15 21:39:29 -04:00
Brendan Punsky	77c0265df9	rexcode/arm64: NEON ABS/NEG + FP vector-convert encode forms Adds 14 mnemonics (74 forms) via specgen: integer two-register ABS/NEG, and the FP vector-convert (register form) family FCVTAS/AU/MS/MU/NS/NU/PS/PU/ZS/ZU + SCVTF/UCVTF. SP/DP .NEON, half-precision .FP16; the fixed-point (#fbits) convert forms come later with the immediate phase. Verified: decode round-trips incl. FP16 (abs/neg/fcvtzs.8h/scvtf), arm64 check + 461 tests pass.	2026-06-15 21:37:31 -04:00
Brendan Punsky	7cd39f1d0d	rexcode/arm64: NEON FP two-register + FP across-lanes encode forms Adds 16 mnemonics (72 forms) via specgen: FP two-register FABS/FNEG/FSQRT, FRINTA/I/M/N/P/X/Z, FRECPE/FRSQRTE; and FP across-lanes FMAXV/FMINV/FMAXNMV/FMINNMV (scalar S/H dst). SP/DP are .NEON, half-precision .FP16. specgen per-form feature now scans all operands for an FP16 arrangement (handles scalar-dst across-lanes, where FP16 lives on the source). Verified: decode round-trips incl. FP16 (fabs/fsqrt.8h/frecpe/fmaxv), arm64 check + 461 tests pass.	2026-06-15 21:35:41 -04:00
Brendan Punsky	57fbe873d8	rexcode/arm64: NEON floating-point three-same encode forms (incl. FP16) Adds 17 FP three-same mnemonics (85 forms) via specgen: FMAX/FMIN/FMAXNM/FMINNM, FMULX/FRECPS/FRSQRTS, FACGE/FACGT, FCMEQ/FCMGE/FCMGT (register form), FADDP/FMAXP/FMINP/FMAXNMP/FMINNMP. Single/double forms (2S/4S/2D) are .NEON; half-precision (4H/8H) use the distinct V_4H_FP16/V_8H_FP16 operand types and are tagged .FP16. specgen gains: --mattr=+fullfp16, FP16 arrangement tokens, per-form feature tagging (derived from the operand arrangement), and per-family arrangement lists. Verified: decode round-trips incl. an FP16 form (fmax/fcmeq.4h/fmulx/faddp), arm64 check + 461 tests pass.	2026-06-15 21:33:25 -04:00
Brendan Punsky	824421853f	rexcode/arm64: NEON across-lanes + pairwise-long encode forms Adds 11 mnemonics (59 forms) via specgen: across-lanes reductions ADDV/SMAXV/SMINV/UMAXV/UMINV (scalar B/H/S dst) and SADDLV/UADDLV (widened scalar dst), plus pairwise-long SADDLP/UADDLP/SADALP/UADALP. The scalar destination packs into the same VD field, so still no encoder change. specgen.emit generalized to accept scalar-register operand tokens (B/H/S/D) alongside vector arrangements. Verified: decode round-trips (ADDV/SADDLV/UMINV/SADDLP), arm64 check + 461 tests pass.	2026-06-15 21:27:40 -04:00
Brendan Punsky	7ebe042277	rexcode/arm64: NEON wide / narrow / XTN encode forms Adds 24 more mixed-arrangement mnemonics (72 forms) via specgen: three-different wide (SADDW/UADDW/SSUBW/USUBW), narrowing-halving (ADDHN/SUBHN/RADDHN/RSUBHN), and two-register narrowing (XTN/SQXTN/UQXTN/SQXTUN), plus their high-half '2' variants. All register-only (VD/VN/VM or VD/VN), no encoder change. specgen refactored to a general arrangement-tuple mechanism (uniform + long/wide/narrow/XTN share one emit path). Verified: decode round-trips (SADDW/ADDHN/XTN/SQXTUN2), arm64 check + 461 tests pass.	2026-06-15 21:22:07 -04:00
Brendan Punsky	f78a3a5573	rexcode/arm64: NEON three-different (long) encode forms Adds 26 widening long mnemonics (72 forms) via specgen: SADDL/UADDL/SSUBL/USUBL, SMULL/UMULL, SMLAL/UMLAL/SMLSL/UMLSL, SQDMULL/SQDMLAL/SQDMLSL and their high-half '2' variants. Destination arrangement is wider than the sources (Vd.8H, Vn.8B, Vm.8B; the '2' forms read the high half). Encoding stays VD/VN/VM, so no encoder change. specgen gains a mixed-arrangement THREE_DIFF shape (low/high source-half pairs). Verified: decode round-trips (SMULL/SADDL2/SQDMULL/UMLAL), arm64 check + 461 tests pass.	2026-06-15 21:16:56 -04:00
Brendan Punsky	00b666bbc0	rexcode/arm64: NEON pairwise + variable-shift encode forms Adds 9 register-only three-same mnemonics (59 forms) via specgen: ADDP/SMAXP/SMINP/UMAXP/UMINP (pairwise) and SSHL/USHL/SRSHL/URSHL (per-lane variable shift). Verified: decode round-trips (ADDP/SSHL/SMAXP/URSHL), arm64 check + 461 tests pass. Skipped the already-implemented logical/compare/mul forms (AND_V/ORR_V/EOR_V/BIC_V/ORN_V/BSL/BIT/BIF/CMEQ/CMGT/CMHI/MUL_V) to avoid duplicate keys.	2026-06-15 13:01:26 -04:00
Brendan Punsky	d83065e3b8	rexcode/arm64: NEON two-register-misc encode forms Adds 10 Advanced-SIMD two-register-misc mnemonics (34 forms across valid arrangements) via specgen: NOT/RBIT/REV16/REV32/REV64/CLS/CLZ/CNT/URECPE/URSQRTE. llvm-mc filters illegal arrangements (CNT/NOT only 8B/16B, URECPE/URSQRTE only 2S/4S, ...). specgen.lua generalized to a SHAPE table (THREE_SAME / TWO_SAME), so adding a family is one row. Verified: decode round-trips (NOT/REV64/CNT/URECPE), arm64 check + 461 tests pass.	2026-06-15 12:57:37 -04:00
Brendan Punsky	e21fa59733	rexcode/arm64: NEON three-same (integer) encode forms + specgen tool Adds 25 Advanced-SIMD three-same integer mnemonics (153 forms across arrangements) to ENCODING_TABLE: SHADD/UHADD/SHSUB/UHSUB/SRHADD/URHADD, SQADD/UQADD/SQSUB/UQSUB, SMAX/UMAX/SMIN/UMIN, SABD/UABD/SABA/UABA, MLA/MLS, CMGE/CMHS/CMTST, SQDMULH/SQRDMULH. Introduces tablegen/specgen.lua: compact specs (mnemonic + llvm name + arrangements) -> ENCODING_TABLE blocks, with bits taken from llvm-mc (the oracle) and mask derived empirically (vary registers 0..31). Invalid arrangements are auto-detected via llvm-mc and skipped. Output fills the SPECGEN:BEGIN..END region of encoding_table.odin in place; the hand-written core is untouched. Verified: decode round-trips for SHADD/SQADD/CMGE/SQRDMULH; arm64 check + 461 tests pass; builders auto-generate (780 -> 805). Caveat: NEON builders currently collapse arrangements (one Register param per V operand) so inst_<mnem> exposes only the first arrangement -- an arrangement-aware builder-gen pass follows. Author: Brendan Punsky (machine git config user.name is the login 'Flāvius').	2026-06-15 12:52:10 -04:00

15 Commits