mirrors/Odin - Odin - Kyren's Code

mirror of https://github.com/odin-lang/Odin.git synced 2026-06-19 16:42:33 +00:00

Author	SHA1	Message	Date
Brendan Punsky	89ef64c490	rexcode/x86: form-match memoization cache for the matcher path The matcher path (generic builders, hand-built, decode->re-encode) resolves an instruction to its encoding form by linearly scanning the forms for its mnemonic and operand-matching each -- the dominant cost on that path. Memoize it: pack (mnemonic, per-operand shape) into a key (immediates folded to the smallest size class they fit, matching imm_matches_inline) and cache key -> form so a repeated instruction shape skips the scan. Direct-mapped, fixed 8192-slot table (64 KB, no allocation). Each slot packs the full 48-bit key and form index into one u64, read/written with relaxed atomics, so concurrent encode() stays safe -- a reader sees a matching key or rescans, never a torn entry. The scan stays the source of truth (a miss runs it and records the result), so the cache is exact. Lookup + scan live in a non-inlined find_form() so they don't bloat encode()'s hot loop and slow the hint path that shares it. (Routing the matcher path through the recipe emit was tried and dropped: it costs the hint path ~1.2-1.5 ns however isolated -- the hot loop is too codegen-sensitive -- while the cache alone is free for the hint path.) Realistic generic-builder mix: matcher ~52 -> ~35 ns/inst (~1.49x); hint path unchanged. Byte-exact across 2282 + idempotent.	2026-06-19 10:49:10 -04:00
Brendan Punsky	9be899e6c7	rexcode/x86: recipe fast path handles memory r/m operands Extend emit_recipe to the full ModR/M + SIB + displacement addressing (register direct, RIP-relative, absolute [disp32], and base/index/scale/disp), mirroring the interpreter byte-for-byte, and drop the caller's reg-direct guard so memory operands take the fast path too. Only a label/relative immediate (a relocation) still falls back. Realistic immediate-heavy mix: ~20.1 -> ~12.9 ns/inst vs the pre-recipe base (~1.55x, 50 -> 77 M/s). Byte-exact across 2282 + idempotent.	2026-06-19 10:49:10 -04:00
Brendan Punsky	ac0589daa1	rexcode/x86: emit-descriptor fast path (precompiled per-form recipe) Precompute each encoding form into a flat Form_Recipe -- prefix byte, escape+ opcode blob, role->operand-index slots, ext, imm size, flags -- so the encoder replays common forms straight-line instead of re-interpreting enc.ops/enc.enc on every instruction (the resolve scan, escape ladder, prefix/REX selection). encode() takes the fast path when the form is hinted, eligible, has a register r/m and a literal immediate; everything else falls through to the existing interpreter, which stays the byte-exact source of truth. First cut: - reg-direct ModR/M only (memory r/m falls back) - hint path only (matcher / generic builders fall back) - ~33% of forms eligible (VEX/EVEX, 16-bit operand-size, x87 fixed-ModR/M, moffs/far/rel/implicit operands are marked ineligible) Recipes are built at startup into static storage (no heap); this moves into the table generator (#loaded like every other table) once the shape settles. Realistic immediate-heavy mix: ~19.0 -> ~16.3 ns/inst (52.7 -> 61.3 M/s). Byte-exact across 2282 cases + idempotent. Next: memory r/m addressing in the fast path, then the matcher path, then the gen-time port.	2026-06-19 10:49:10 -04:00
gingerBill	de0d2ae178	Optimize `append_elem` for different optimization levels * For `-o:size` and below, uses the type erased approach * For `-o:speed` and above, the inlined form is used This is necessary because a generic `mem_copy_non_overlapping` cannot be optimized when type erasure is used, meaning in a hot path where `append_elem` is used a lot; thus `mem_copy_non_overlapping` becomes a bottleneck.	2026-06-19 11:19:16 +01:00
gingerBill	7b58aa8eba	Minor style changes	2026-06-19 09:30:58 +01:00
gingerBill	70768e447f	Merge branch 'master' into bill/rexcode	2026-06-19 09:20:52 +01:00
gingerBill	abbfe793e0	`fmt` on `h'` floats: force the width to always be bit_size/4 to accurately represent the number	2026-06-19 09:20:26 +01:00
gingerBill	cd73a467ef	Merge branch 'bill/rexcode' of https://github.com/odin-lang/Odin into bill/rexcode	2026-06-19 09:16:44 +01:00
gingerBill	ceb242fc53	Merge branch 'master' into bill/rexcode	2026-06-19 09:15:04 +01:00
gingerBill	85db8c68a9	Remove `-stack-protector:default` as `none` is now the default	2026-06-19 09:14:11 +01:00
gingerBill	11e7cff116	Change `-stack-protector:` default to `none`	2026-06-19 09:12:57 +01:00
gingerBill	69daa4d184	Merge pull request #6830 from A1029384756/stack-canaries Stack canaries	2026-06-19 09:11:40 +01:00
Brendan Punsky	fae15847a3	rexcode: buffer-sizing helpers across all ISAs + naming-contract doc Roll the encode/decode buffer-sizing helpers (added for x86 in `49787b7de`) out to every other ISA, and document them in the cross-arch naming contract. Per arch (arm32, arm64, mips, riscv, ppc, ppc_vle, rsp, mos6502, mos65816): - encode_max_code_size / encode_max_relocation_count now key off the []Instruction slice (were int counts); bodies unchanged (* MAX_INST_SIZE). - encode_reserve(code, relocs, instructions): grows the caller's code []u8 by length and reserves relocs by capacity; allocates no new buffers. - decode_max_instruction_count / decode_estimate_instruction_count: exact ceiling and typical estimate, keyed off the min/avg instruction size per arch (fixed-4: arm64/mips/ppc/rsp; min-2: arm32/riscv/ppc_vle; min-1: mos). - decode_reserve(instructions, inst_info, label_defs, data, exact=false). docs/cross_arch_design.md: helpers added to the naming contract. No behavior change to the existing size helpers (signature only). All 10 ISAs check + test green (x86 2282, arm32 600, arm64 461, mips 281, riscv 154, ppc 31, ppc_vle 281, rsp 70, mos6502 148, mos65816 53).	2026-06-19 04:11:30 -04:00
gingerBill	d2813b978d	Merge branch 'master' into bill/rexcode	2026-06-19 09:10:30 +01:00
gingerBill	417aa0ea9e	Remove `0h` float panic which will have been caught previously by the tokenizer	2026-06-19 09:08:52 +01:00
gingerBill	f3e2262705	Update all_main.odin	2026-06-19 09:04:32 +01:00
Brendan Punsky	49787b7de4	rexcode/x86: buffer-sizing helpers for encode and decode Give callers a clean way to pre-size their own buffers so the encode/decode hot paths never allocate or resize, instead of decode() silently reserving the caller's arrays itself (removed). The library allocates nothing -- these only grow the caller's own dynamic arrays, and only when not already big enough (Odin's reserve no-ops when capacity already suffices). Size-only helpers (caller manages its own memory), keyed off the input slice: encode_max_code_size(instructions) - exact code bytes encode_max_relocation_count(instructions) - exact reloc upper bound decode_max_instruction_count(data) - exact ceiling (1 byte/inst) decode_estimate_instruction_count(data) - typical estimate (~3 B/inst) Reserve helpers (pre-size the caller's dynamic arrays; nil to skip an array): encode_reserve(code, relocs, instructions) code is a [dynamic]u8 grown by LENGTH (so code[:] is a valid emit target); relocs reserved by capacity on top of existing elements. decode_reserve(instructions, inst_info, label_defs, data, exact=false) reserves capacity on top of existing; exact=true for the ceiling. Error arrays grow only on the failure path, so they are intentionally not covered. check/test green; 2282 cases; exercised end-to-end (the [dynamic]u8 code pattern, factor-in-existing, nil args, exact ceiling, reserve no-op).	2026-06-19 03:48:36 -04:00
Brendan Punsky	3341898437	rexcode/x86: flatten per-instruction loops + hint immediates (~1.5x encode) Data-oriented pass on the encode hot path. Profiling showed bounds checks already elided by -o:speed; the cost was per-instruction loop/scan machinery and immediates falling off the hint path. - Gather the immediate slot in the single resolve pass and emit it straight- line (no scan over enc.enc); likewise drive the legacy REX prefix from the precomputed reg/mr/opr slots instead of a per-form scan. - Fold the separate needs_66 (GPR16) and SPL/BPL/SIL/DIL operand loops into the resolve pass, so user operands are visited exactly once. This was the big one: mov r,r 27 -> 21 ns. - Gate the whole legacy-prefix block on a single flags!=0 test (a legacy prefix is almost always absent) instead of four branches per instruction. - Make immediate forms hintable. A typed immediate builder names its width (inst_add_r32_imm32), the matcher already keys off the operand's declared size, so baking the form is byte-identical AND drops immediates from the full match scan: mov r32,imm32 55.7 -> 17.8 ns (3.1x). Floor (no-op) 14.55 -> 10.3 ns; realistic immediate-heavy typed mix 30 -> 20.5 ns/inst (~49 M inst/s). gen/builders/check/test/idempotent green; 2282 cases (typed==generic byte-identical, incl. the new immediate cases).	2026-06-19 01:56:05 -04:00
Brendan Punsky	a86f13b856	gitignore: un-ignore core/rexcode/isa/<arch> after the re-house The broad VS-style `x86/` build-artifact rule was shadowing the re-housed core/rexcode/isa/x86 source package (new files needed `git add -f`), and the stale `core/rexcode//tables/.bin` un-ignore no longer matched the deeper isa/<arch>/tables path. Retarget both to isa/: allow the directory tree and every arch's committed table blobs, while stray .bin/.obj/*.exe build output inside it stays ignored.	2026-06-18 22:54:41 -04:00
Brendan Punsky	078015bc34	rexcode/x86: pre-matched encode hint + repair the typed builders Targeted branchless revert + the pre-matched form fast path, and a fix for a pre-existing bug the latter surfaced. (a) Revert the two speculative-write spots from the prior branchless pass (legacy-prefix emission, widened displacement store, ENCODE_TAIL_SLACK) back to predicted branches. In real streams a legacy prefix is almost always absent and disp size is stable, so those branches are ~free and the unconditional stores only added work. Every class got faster (RET 19->17.5, MOV r,r 52->46.6, VADDPS 42.8->39.3 ns). (b) Pre-matched form hint. Instruction.enc_hint (in the existing 11-byte padding, idx+1 biased; 0 = matcher path) lets a typed builder that maps to a single value-independent form bake the global form index, so encode() skips the O(forms) match scan -- and, in a varied stream, its unpredictable branches. Generated for non-immediate forms only (value- dependent imm8/imm32 selection stays on the matcher). On a 100k mixed typed-builder stream: 47.3 -> 30.2 ns/inst (-36%), byte-identical to the matcher path -- ~2x the original baseline for codegen. Repair the typed inst_/emit_ builders. They were non-functional: the generator cast the hw-only typed enum straight to Register (Register(GPR64.RAX) -> class 0), so every typed-builder operand was rejected by the matcher (encode returned empty). Untested because the suite builds via the generic constructors. Now they build through the class-correct op_gpr64/op_xmm/... path (op_* already used by 3+ operand builders), emit_ reuses inst_, and a new 30-case consistency suite asserts typed == generic (llvm-verified) and hint == matcher. gen/builders/check/test/idempotent all green; 2276 cases.	2026-06-18 21:04:18 -04:00
Brendan Punsky	8387731357	rexcode/x86: branchless hot paths + single-pass operand resolution Three layers on the x86 encode/decode hot paths, all byte-exact (2246 LLVM-verified cases) and roundtrip-clean: 1. Branchless: legacy-prefix emission (speculative write + conditional advance), REX/VEX/EVEX extension-bit accumulation (gate-and-mask), ModRM mod/disp-size selection (cmov selects), displacement emission (widened store + ENCODE_TAIL_SLACK); decoder REX/VEX/EVEX register extensions (arithmetic instead of if/+=8). 2. Resolve-operands-once: the previous code re-derived each user operand ~5-10x per instruction (a fresh O(n) scan of enc.ops per emission pass). Now resolved into a [4]^Operand map a single time. 3. Single-pass gather: fold the opcode-+rb and ModR/M slot-detection scans into that one resolve pass (3 enc.enc passes -> 1). Net on a 100k mixed-instruction benchmark: encode ~58 -> ~54 ns/inst (best 52). Branchless alone was a ~7% encode regression (predicted branches, nothing to recover); the algorithmic passes recovered it and beat baseline.	2026-06-18 20:16:26 -04:00
Brendan Punsky	daa5b7cb79	rexcode: add core:rexcode/ir — the IR API layer (no concrete IR yet) A sibling to core:rexcode/isa for the intermediate representations (WASM, SPIR-V, LLVM bitcode + the LLVM dialects AIR/DXIL). Holds the shared vocabulary every IR package builds on, implements no specific IR. Design stance (see docs/ir_design.md): keep the ISA layer's spirit, but where IRs are structurally MORE uniform than ISAs (SSA + a type system regularize the operand/module shape), the shared core is richer. ir/ owns: status.odin Error/Error_Code (shape-identical to isa.Error) refs.odin Id/Ref/Ref_Space/Symbol_Table (the label analog: structural id references, not PC-relative byte offsets) types.odin Type/Type_Ref/Type_Kind (the type table -- no ISA analog) module.odin Module/Function/Block/Operation/Operand/Result/Dataflow (the structured model; Operation = isa.Instruction + an optional typed Result, opcode a u16 like Mnemonic) print.odin token kinds + options + num-fmt (parallels isa.print) Three honest concessions vs the ISA API, made explicit not inert: a structured Module replaces the flat []Instruction; a first-class type system; id-based entity refs replace labels. The encode/decode verbs take a Module and drop label_defs/resolve/base_address. Dataflow hosts both the WASM value stack and SSA; the codec is pluggable (table for WASM/SPIR-V, bitstream for the LLVM family -- AIR/DXIL are LLVM dialects, not peers). Package compiles; a hand-built SSA module round-trips through the types.	2026-06-18 19:03:27 -04:00
Brendan Punsky	95df04fbe1	rexcode: re-house ISA packages under core:rexcode/isa/<arch> Move all ten ISA packages (x86, arm32, arm64, mips, riscv, ppc, ppc_vle, rsp, mos6502, mos65816) from core/rexcode/<arch> to core/rexcode/isa/<arch>, so the import pattern is now `import "core:rexcode/isa/x86"`. The shared core stays at core:rexcode/isa. Mechanical: relative `import "../isa"` / "../../isa" -> absolute "core:rexcode/isa" (the only path that survives the move; the "../" and "../.." self/generated imports move with their packages). build.lua now builds paths as <root>/isa/<name>; stale `cd <arch>` hints in the verify tools and the doc.odin paths updated. WASM stays at core/rexcode/wasm for now -- it is an IR, not an ISA, and will move under the forthcoming core:rexcode/ir once that layer lands. All 10 arches gen/builders/check/test green; import core:rexcode/isa/x86 verified working; wasm still compiles.	2026-06-18 19:03:27 -04:00
gingerBill	1060fd4c72	Factor out reloc group logic	2026-06-18 15:21:05 +01:00
gingerBill	84e7e04816	Handle relocation groups	2026-06-18 15:15:53 +01:00
gingerBill	51436077c9	Begin work on printing the WAT format	2026-06-18 15:10:42 +01:00
gingerBill	3199ea266e	Update ENCODING_TABLE to support arity count and tail-call instructions	2026-06-18 14:45:36 +01:00
gingerBill	5272f5a4f0	Simplify the printer even further to how only show relevant things	2026-06-18 11:35:53 +01:00
gingerBill	c002470b8d	Do not print out the redundant aspects of the custom `name` section	2026-06-18 10:58:55 +01:00
gingerBill	e94d57f650	Remove dead parameter	2026-06-18 10:58:32 +01:00
gingerBill	e404dafaf0	Merge branch 'bill/rexcode' of https://github.com/odin-lang/Odin into bill/rexcode	2026-06-18 10:49:34 +01:00
Brendan Punsky	4cc6977321	Merge origin/bill/rexcode: struct repack (#raw_union #packed), wasm arch Merge gingerBill's latest into bill/rexcode. His changes: minimize the Instruction/Operand structs across ISAs with packed raw-unions (+ the compiler support for #raw_union #packed), the new core:rexcode/wasm arch and wasm/module, encode() now returns (byte_count, ok) instead of a Result struct, decode_one made public, and assorted formatting/inlining. Conflict: arm64/tests/pipeline_smoke.odin CSEL test -- kept the generated 4-arg inst_csel(dst,src,src2,cond) (mnemonic_builders.odin is generated, not from Bill's branch) and adopted Bill's (byte_count, success) encode signature. Required rebuilding ./odin from the merged source for the packed-union syntax. Re-validated after the repack: regenerated all artifacts (idempotent -- no spurious churn), all 10 arches gen/builders/check/test green, and byte-compared the new arm32 BF + mips PS/MMI/DSP/R6 forms to confirm no field truncation. arm64/arm32/mips still 100%.	2026-06-18 05:44:48 -04:00
Brendan Punsky	83bdd501a3	rexcode: remove dead BFCSEL else-target scaffolding; tidy mips COPY specgen BFCSEL's else-target turned out to be the implicit fall-through, so the BF_BELSE operand encoding, the BFCSEL_ELSE_T32 relocation, and their encoder/decoder cases were never referenced by any table entry. Remove them. Also restructure the MSA COPY specgen loop so COPY_U only iterates .B/.H (COPY_U.W is mips64-only and emitted in the mips64 section), which drops the spurious 'skipped COPY_U_W' message. No functional change to any generated encode form; arm64/arm32/mips all still 100%, 461/600/281 tests green.	2026-06-18 05:29:20 -04:00
Brendan Punsky	c8851c546d	rexcode/arm32: BFCSEL -> Branch Future complete, arm32 100% BFCSEL = bf-point + true-target (hw1, like BF) + 4-bit condition at hw0[5:2], base 0xF002E001 (hw0[1] is a static marker). The else-target is the architectural fall-through, so it is not a separate operand -- BFCSEL is modelled as three operands and reproduces llvm-mc's bytes exactly (f082e003 / f102e803 / f086e003 across boff/true/cond variations). Every encodable arm32 Mnemonic now has an encode form (gap = 0). 600 tests green.	2026-06-18 04:55:26 -04:00
Brendan Punsky	808716517e	rexcode/arm32: Branch Future BF/BFL/BFLX/BFI_BR encode forms Reverse-engineered the ARMv8.1-M Branch Future T32 encoding from llvm-mc: bf-point imm4 = (label-(PC+4))/2 at hw0[10:7]; branch target val = (label-(PC+4))/2 with J at hw1[11] and imm10 at hw1[10:1]; BFLX/BFX target is Rm at hw0[3:0]. New REL_BF operand + BF_BOFF/BF_BLOC/BF_RM encodings + BF_BOFF_T32/BF_BLOC_T32 relocations with resolver. BF=0xF040E001, BFL=0xF000C001, BFLX=0xF070E001, BFI_BR=0xF060E001. Tightened the WLSTP/DLSTP masks to mark hw0[6] static (it is always 0 for valid B/H/W/D sizes) so they no longer shadow the BF register forms. Byte-exact vs llvm-mc with resolved bf-point/target offsets; 600 tests green. (BFCSEL still pending -- it adds an else-target + condition.)	2026-06-18 04:47:03 -04:00
Brendan Punsky	c6edd6d5cd	rexcode/mips: R5900 MMI MADD/MSUB, RDPGPR/WRPGPR; drop BPOSGE64 -> 100% PS2 R5900 MMI: MSUB1/MSUBU1 (second-MAC, SPECIAL2 func +0x20 exactly like the implemented MADD1/MADDU1) and the three-operand MADD_EE/MADDU_EE/ MSUB_EE/MSUBU_EE (write Rd as well as HI/LO; the Rd!=0 form selected by a less-specific mask after the two-operand MADD/MSUB and PLZCW match). RDPGPR/WRPGPR (COP0 shadow-GPR move, hand-encoded from the MIPS32r2 manual since llvm-mc gates them). Drop BPOSGE64: not a real ISA instruction (DSPControl.pos is 6-bit, only BPOSGE32 exists; llvm rejects it). Every encodable mips Mnemonic now has an encode form (gap = 0). All self-consistent and decode-clean; 281 tests green.	2026-06-18 04:17:50 -04:00
Brendan Punsky	61a62185b8	rexcode/mips: R6 compact branches (BEQC/BNEC/BLTC/BGEC/.../BLTZC) All ten two-/one-register R6 compact branches, byte-exact vs llvm-mc. The signed forms share POP26/POP27 (opcodes 22/23) with the pre-R6 BLEZL/BGTZL and with each other; the decode-entry mask sort tries the more-specific rt=0 / rs=0 forms first, and a small operand-aware hook in decode_one_inline recovers BGEZC/BLTZC (rs==rt) from the general BGEC/BLTC. Where R6 reuses a pre-R6/PSP major opcode (BEQC vs ADDI at opcode 8, etc.) decode is inherently ISA-mode-dependent and resolves to the legacy form; the R6 encode side is exact. 281 tests green.	2026-06-18 04:12:20 -04:00
Brendan Punsky	ff2bf13121	rexcode/mips: R6 PC-relative loads LWPC/LWUPC/LDPC New REL19/REL18 operand types + BRANCH_19/BRANCH_18 encodings + REL_PC19/ REL_PC18 relocations (R6 PC-relative semantics: offset is relative to the instruction's own address, no delay-slot adjustment; LDPC aligns the PC down to 8 and scales by 8). LWPC (mips32r6), LWUPC/LDPC (mips64r6). Byte-exact vs llvm-mc and decode-clean; 281 tests green.	2026-06-18 04:05:32 -04:00
Brendan Punsky	eab483a527	rexcode/mips: paired-single FMA + conditional-move forms (spec-derived) MADD/MSUB/NMADD/NMSUB.PS and MOVN/MOVZ/MOVF/MOVT.PS. This llvm-mc only knows the .S/.D variants, so these are derived from the llvm-verified single forms by switching the data-format field to PS (COP1X FMA fmt is bits 2:0, S=0 -> PS=6; COP1 conditional-move fmt is bits 25:21, S=16 -> PS=22), per the MIPS64 manual. Same operand slots/masks. Decode-clean and 281 tests green.	2026-06-18 03:59:04 -04:00
Brendan Punsky	09c1d5ba0f	rexcode/mips: paired-single FP + mips64 MSA element forms Parameterize the specgen oracle with a per-family llvm-mc command so 64-bit-FPU and mips64 forms can be assembled. Paired-single CVT_PS_S, CVT_S_PL/PU, PLL/PLU/PUL/PUU.PS (via -mcpu=mips64r2). mips64-only MSA INSERT_D and COPY_U_W (via the mips64 triple). Byte-exact vs llvm-mc and decode-clean; 281 tests green.	2026-06-18 03:55:30 -04:00
Brendan Punsky	f290347c24	rexcode/mips: DSP ASE replicate-immediate forms (REPL.PH/QB) REPL.PH (signed 10-bit broadcast, reuses MSA_S10) and REPL.QB (8-bit, reuses MSA_I8). Byte-exact vs llvm-mc including a negative .PH immediate; 281 tests green.	2026-06-18 03:46:50 -04:00
Brendan Punsky	5b91624cd3	rexcode/mips: DSP ASE extract-from-accumulator forms New EXT_SIZE encoding (5-bit extract size at 25:21). EXTPDP (immediate size), and the variable forms EXTPDPV / EXTRV_R.W / EXTRV_RS.W / EXTRV_S.H (extract via a GPR-specified position). Byte-exact vs llvm-mc and decode- clean; 281 tests green.	2026-06-18 03:45:23 -04:00
Brendan Punsky	82f62ce9a9	rexcode/mips: DSP ASE accumulator multiply-add / shift forms New AC_NUM (accumulator ac0..ac3 at bits 12:11) and SHILO_IMM (signed 6-bit at 25:20) encodings. DPA/DPAX/DPS/DPSX.W.PH and MAQ_S/MAQ_SA.W.PHL/ PHR (multiply-accumulate into a DSP accumulator), plus MTHLIP, SHILOV and SHILO (accumulator shift). Spot-checked byte-exact vs llvm-mc and decode- clean, including a negative SHILO immediate; 281 tests green.	2026-06-18 03:43:05 -04:00
Brendan Punsky	8fed538afc	rexcode/mips: MSA branch-on-zero/non-zero forms (BZ/BNZ) BZ/BNZ .B/.H/.W/.D/.V (branch if any/all elements zero/non-zero): a specgen branch emitter that derives the opcode+Wt bits then marks the 16-bit PC-relative offset variable, reusing the existing REL16/BRANCH_16 relocation machinery. The offset is emitted as a relocation (label target). 10 forms, opcode+Wt byte-exact vs llvm-mc and decode-clean. The R6 two-/one-register compact branches (BEQC/BNEC/BLTC/BGEC/.../BLTZC) are deferred: they share POP major opcodes disambiguated only by the rs/rt relationship, which the opcode+mask decode model can't express without operand-aware logic. 281 tests green.	2026-06-18 03:39:55 -04:00
Brendan Punsky	56cfbc675a	rexcode/mips: DSP ASE shift-by-immediate forms New DSP_SA encoding (shift amount at bits 24:21). SHRA.QB/SHRA_R.QB (.QB 3-bit), SHRA_R.PH/SHRL.PH (.PH 4-bit). Byte-exact vs llvm-mc; 281 tests green.	2026-06-18 03:33:47 -04:00
Brendan Punsky	c2de507bb0	rexcode/mips: FPU FMA, MSA COPY/INSERT, DSP 2-register, DI/EI/RDHWR New FR (FP reg at 25:21) encoding for the COP1X 4-register fused multiply-adds MADD/MSUB/NMADD/NMSUB.S/.D. New GPR_AT_6 / GPR_AT_11 encodings (GPR in a vector-register slot, with correct GPR decode) for MSA COPY_S/U (lane->GPR) and INSERT (GPR->lane). DSP two-register PRECEQU/PRECEU (.PH.QBLA/QBRA) and REPLV (.PH/.QB). Control ops DI/EI and RDHWR. 25 forms; spot-checked byte-exact vs llvm-mc and decode-clean; 281 tests green.	2026-06-18 03:31:40 -04:00
Brendan Punsky	930b988ebf	rexcode/mips: FPU conditional-move + convert-to-FP forms MOVN/MOVZ.S/.D (FP move on GPR nonzero/zero, enc {FD,FS,RT}), MOVF/MOVT. S/.D (FP move on FP condition code, enc {FD,FS,FCC_BC}), and the convert-to-FP forms FCVT_D_W/S_D/S_W (cvt.d.w/cvt.s.d/cvt.s.w). 11 forms. Spot-checked byte-exact vs llvm-mc and decode-clean; 281 tests green.	2026-06-18 03:27:23 -04:00
Brendan Punsky	5b47f0ca29	rexcode/mips: MSA INSVE + DSP ASE 3-register/compare/shift forms MSA INSVE (.B/.H/.W/.D element insert). DSP ASE three-register ops (ADDU/SUBU/MULEQ/MULEU/MULQ/PRECRQ*/PICK/CMPGU, enc {RD,RS,RT}), the variable shifts SHLLV/SHRAV/SHRLV (enc {RD,RT,RS} -- value is Rt, shift is Rs), and the compares CMP/CMPU (.PH/.QB, {RS,RT}). 38 forms reusing the existing GPR R-type slots. Spot-checked byte-exact vs llvm-mc; 281 tests green.	2026-06-18 03:24:20 -04:00
Brendan Punsky	4ab24007b7	rexcode/mips: MSA BIT-shift, element-index, GPR-index, I8 forms New MSA_BIT_SHIFT / MSA_ELM_IDX / MSA_I8 encodings (the data-format marker is fixed in the entry bits; the operand drives the low bits; decode infers df from the marker). SLLI/SRAI/SRLI (.B/.H/.W/.D shift), SPLATI/SLDI (element index), SPLAT/SLD (GPR index), VSHF (.B/.H/.W/.D shuffle), and the I8 forms ANDI/ORI/XORI/NORI/BMNZI/BMZI/BSELI.B + SHF.B/H/W. 42 forms. Spot-checked byte-exact vs llvm-mc and decode-clean across all formats; 281 tests green.	2026-06-18 03:17:39 -04:00
Brendan Punsky	307aa2a9dd	rexcode/mips: MSA 3RF/3R/2R/2RF/VEC encode forms (specgen) New mips specgen (llvm-mc --triple=mips --mattr=+msa as the bits oracle, big-endian words, empirical masks): vector FP arithmetic/compare FADD/ FSUB/FMUL/FDIV/FMAX/FMIN/FCEQ/FCLE/FCLT/FCNE (.W/.D), dot product DOTP_S/U (.H/.W/.D), count/popcount NLOC/NLZC/PCNT (.B/.H/.W/.D), one-source FP FSQRT/FRSQRT/FRCP/FRINT/FTRUNC_S/U/FFINT_S/U (.W/.D), and bit-select BMNZ/BMZ/BSEL.V. 57 forms reusing the existing WD/WS/WT slots. Spot- checked byte-exact vs llvm-mc and decode-clean; 281 tests green.	2026-06-18 03:11:41 -04:00

1 2 3 4 5 ...

17778 Commits