rexcode: add core:rexcode/ir — the IR API layer (no concrete IR yet)

A sibling to core:rexcode/isa for the intermediate representations (WASM, SPIR-V, LLVM bitcode + the LLVM dialects AIR/DXIL). Holds the shared vocabulary every IR package builds on, implements no specific IR. Design stance (see docs/ir_design.md): keep the ISA layer's spirit, but where IRs are structurally MORE uniform than ISAs (SSA + a type system regularize the operand/module shape), the shared core is richer. ir/ owns: status.odin Error/Error_Code (shape-identical to isa.Error) refs.odin Id/Ref/Ref_Space/Symbol_Table (the label analog: structural id references, not PC-relative byte offsets) types.odin Type/Type_Ref/Type_Kind (the type table -- no ISA analog) module.odin Module/Function/Block/Operation/Operand/Result/Dataflow (the structured model; Operation = isa.Instruction + an optional typed Result, opcode a u16 like Mnemonic) print.odin token kinds + options + num-fmt (parallels isa.print) Three honest concessions vs the ISA API, made explicit not inert: a structured Module replaces the flat []Instruction; a first-class type system; id-based entity refs replace labels. The encode/decode verbs take a Module and drop label_defs/resolve/base_address. Dataflow hosts both the WASM value stack and SSA; the codec is pluggable (table for WASM/SPIR-V, bitstream for the LLVM family -- AIR/DXIL are LLVM dialects, not peers). Package compiles; a hand-built SSA module round-trips through the types.
2026-06-19 16:42:33 +00:00 · 2026-06-18 18:59:50 -04:00
parent 95df04fbe1
commit daa5b7cb79
8 changed files with 852 additions and 1 deletions
--- a/core/rexcode/doc.odin
+++ b/core/rexcode/doc.odin
@@ -214,8 +214,9 @@ rexcode/
 		ppc_vle/        # Freescale VLE (sibling of ppc)
 		riscv/          # RISC-V
 		rsp/            # N64 RSP
+	ir/                 # shared IR core (parallels isa/; see docs/ir_design.md)
 	wasm/               # WebAssembly (an IR; destined for ir/wasm once the IR layer settles)
-	docs/               # cross-arch design + per-arch design docs
+	docs/               # cross-arch design + IR design + per-arch design docs
 ```

 Each ISA is imported as `core:rexcode/isa/<arch>` (e.g. `core:rexcode/isa/x86`); the
--- a/core/rexcode/docs/ir_design.md
+++ b/core/rexcode/docs/ir_design.md
@@ -0,0 +1,270 @@
+<!-- rexcode  ·  Brendan Punsky (dotbmp@github), original author -->
+
+# rexcode — IR API Design
+
+> Why the rexcode IR family (`wasm`, and the planned `spirv`, `llvm`, with
+> `air` / `dxil` as LLVM dialects) gets its own API layer (`core:rexcode/ir`)
+> **parallel** to the ISA layer (`core:rexcode/isa`) — sharing the ISA layer's
+> spirit and as much of its shape as honestly survives, while conceding exactly
+> the three places an IR is not an ISA.
+
+Read [cross_arch_design.md](cross_arch_design.md) first; this document is its
+sibling and assumes its vocabulary.
+
+---
+
+## The guiding principle
+
+The ISA layer's rule was *“share the bookkeeping, specialize the bytes.”* The IR
+layer keeps it and adds one clause:
+
+> **Share the bookkeeping *and the structure*, specialize the dialect and the codec.**
+
+An ISA only ever shares bookkeeping because its *content* (registers, operands,
+the bit-twiddling) diverges maximally per arch. An IR shares **more** — the whole
+`Module → Function → Block → Operation` structure and the operand/type model are
+genuinely the same problem on every IR — because SSA and a type system regularize
+what ISAs leave ad hoc. So `ir/` is a richer shared core than `isa/`: it owns the
+*structural model*, not just labels and errors. What stays per-IR is the *opcode
+set*, the *codec* (the wire format), and the *dialect* (intrinsic/metadata
+conventions).
+
+---
+
+## 0. First: how many IRs are there really?
+
+Fewer than the list suggests. **AIR and DXIL are not peers of LLVM — they are
+LLVM bitcode.** AIR is LLVM bitcode + a Metal dialect; DXIL is LLVM ~3.7 bitcode
+ a DirectX dialect inside a DXContainer. So the field is **three codec
+families**, not five:
+
+| family | members | wire format |
+|---|---|---|
+| WASM | wasm | byte stream + LEB128, one form per opcode |
+| SPIR-V | spirv | 32-bit words, uniform `wordCount<<16 \| opcode` header |
+| LLVM bitstream | llvm, **air**, **dxil** | self-describing block/record/abbreviation bitstream |
+
+The implementation cost is therefore *3 codecs + N dialects*, and `air`/`dxil`
+should reuse the `llvm` codec wholesale. That single fact shapes the package
+tree: `ir/llvm/`, `ir/llvm/air/`, `ir/llvm/dxil/`.
+
+---
+
+## 1. The universal IR shape
+
+Strip away specifics and every IR needs these — the same checklist `isa` has,
+shifted up one level of structure:
+
+| # | Concept | `ir` type |
+|---|---|---|
+| 1 | A **type** = (kind, width/elem/fields) | `Type`, `Type_Ref` |
+| 2 | An **operand** = literal \| entity-ref \| type | `Operand`, `Operand_Kind` |
+| 3 | An **operation** = opcode + operands + *optional result* | `Operation` |
+| 4 | An **opcode** enum | per-IR `Opcode` (u16, INVALID=0) |
+| 5 | **References** to entities by id (+ named symbols) | `Id`, `Ref`, `Symbol_Table` |
+| 6 | **Relocations** for object-file symbol fixups | per-IR `Relocation` |
+| 7 | `encode(Module) -> bytes (+relocs +errors)` | per-IR `encode()` |
+| 8 | `decode(bytes) -> Module (+errors)` | per-IR `decode()` |
+| 9 | `print(Module) -> text (+tokens)` | per-IR `print()`/`tprint()` |
+| + | A **structured module** of functions→blocks→operations | `Module`/`Function`/`Block` |
+| + | A **dataflow discipline** (stack or SSA) | `Dataflow` |
+
+Items 1–9 are item-for-item the ISA's nine, re-aimed: *type* generalizes the
+ISA's implicit-width; *operand* keeps the kind tag; *operation* is `Instruction`
+ a `Result`; *opcode* is `Mnemonic`; *references* replace *labels*. The two `+`
+rows are the genuinely new structure (§3).
+
+---
+
+## 2. Where IRs diverge from ISAs
+
+Three real divergences, then a long tail of things that *look* different but are
+the same shape.
+
+### The three real concessions
+
+1. **The unit of work is a structured `Module`, not a flat `[]Instruction`.**
+   An ISA program is a byte-addressed instruction stream; an IR program is a
+   typed graph: `Module → []Function → []Block → []Operation`, where an op may
+   define an SSA value that later ops use. So `decode` is a *structured parse*,
+   not a linear scan, and `ir` owns `Module`/`Function`/`Block` where `isa` owns
+   no `Instruction`. `Operation.operands` is **variable-arity** (`[]Operand`) —
+   the ISA `Instruction`'s fixed `[4]Operand` is the one leaf shape that does not
+   survive (calls, `switch`, `phi`).
+
+2. **A first-class type system.** Operations and results carry a `Type_Ref` into
+   the module's type table. ISAs bake width into the mnemonic and never need
+   this. `Type_Kind` is the WASM∪SPIR-V∪LLVM denominator (`INT/FLOAT/VECTOR/
+   POINTER/STRUCT/FUNCTION/...`).
+
+3. **Entity references replace PC-relative labels.** ISA branches resolve as
+   instruction-index→byte-offset (`isa.Label_Definition`, rewritten by `encode`).
+   IR operands reference entities by **`Id`** — SSA results, blocks, functions,
+   globals, types — resolved *structurally*, with no PC-relative pass. (Object-
+   file *symbol* fixups still produce `Relocation`s for `EXTERNAL` refs.)
+
+### Two axes that sort the IRs
+
+Everything else sorts onto two orthogonal axes. Note the clustering is
+counterintuitive — the encoding mates and the model mates are *different* pairs:
+
+| IR | encoding model | dataflow model |
+|---|---|---|
+| WASM | **table** (byte/LEB, one form per opcode) | **stack** (implicit) |
+| SPIR-V | **table** (32-bit words, uniform header) | **SSA** (result ids, typed) |
+| LLVM / AIR / DXIL | **bitstream** (data-defined abbreviations) | **SSA** (+ metadata graph) |
+
+- On **encoding**, WASM and SPIR-V are siblings — a static `opcode → operand-
+  layout` table, *exactly* the ISA `ENCODING_TABLE` shape. LLVM is the outlier:
+  its layout is defined by abbreviation records *in the stream*, so **no static
+  table can describe it**.
+- On **dataflow**, SPIR-V/LLVM are siblings (SSA + types); **WASM is the
+  outlier** — a stack bytecode with no SSA, no named results, minimal types.
+
+So WASM is encoding-kin to SPIR-V but model-kin to nothing, and the one thing
+you most want to share (LLVM) breaks the table assumption the others share. The
+`Dataflow` trait and the *pluggable codec* (§5) exist precisely to absorb these
+two splits without forking the API.
+
+### Divergence summary
+
+| Component | Verdict | Shared (`ir/`) | Per-IR |
+|---|---|---|---|
+| References / `Id` | ✅ shared | the whole id + symbol model | which `Ref_Space`s exist |
+| Error / status | ✅ shared | struct shape (= `isa.Error`) | error-code subset |
+| Type model | ✅ shared | `Type`/`Type_Ref`/`Type_Kind` | wire⇄`Type` lowering |
+| Operand model | ✅ shared* | `Operand` + kinds (SSA homogenizes it) | dialect `aux` encodings |
+| Structural model | ✅ shared | `Module`/`Function`/`Block`/`Operation` | — |
+| Printer framework | ◑ split | tokens, options, num-fmt | type/value/block syntax |
+| Relocation | ◑ split | struct-shape convention | type enum (per-IR file) |
+| `Opcode` | ✗ per-IR | convention (u16, INVALID=0) | the enum |
+| Opcode table / codec | ✗ per-IR | codec *strategy* (§5) | schema + data (or bitstream) |
+| `encode`/`decode` driver | ✗ per-IR | verb signature | the whole parse/emit |
+
+> *`Operand` is shared here where `isa.Operand` is per-arch. ISA operands diverge
+> wildly (ModRM/SIB vs shifted-register vs split immediates); SSA collapses IR
+> operands to "a literal, a reference, or a type," uniform enough to define once.
+> Dialect-specific encodings (WASM memarg, SPIR-V enum masks) are an *encoding*
+> detail carried in `Operand.aux` + the IR's opcode table — not a new shape.
+
+---
+
+## 3. The shared core (`ir/`) and why this much is shared
+
+`ir/` depends on nothing (it does **not** depend on `isa/`) and owns the parts
+that are the same problem on every IR:
+
+- `status.odin` — `Error`/`Error_Code`; the `Error` struct is byte-identical to
+  `isa.Error` so one tool surfaces both.
+- `refs.odin` — `Id`, `Ref`, `Ref_Space`, `Symbol_Table` (the `isa.labels`
+  analog, re-cast from byte-offsets to structural ids).
+- `types.odin` — `Type`, `Type_Ref`, `Type_Kind` (no ISA analog).
+- `module.odin` — `Module`/`Function`/`Block`/`Operation`/`Operand`/`Result`/
+  `Dataflow` (the structural model; the heart of the layer).
+- `print.odin` — token kinds (with IR-only `TYPE`/`VALUE_REF`/`RESULT`/
+  `BLOCK_LABEL`), print options, number-formatting helpers.
+
+Each concrete IR package **re-exports** these (e.g. `wasm.Module`,
+`spirv.Operation`) so a consumer sees one namespace, mirroring how arch packages
+re-export `isa`.
+
+### The validating precedent and the rejected alternatives
+
+The `Operation`-with-blocks-and-regions spine is exactly **MLIR's** structural
+model, which is field-proof that one model cleanly subsumes a CFG (LLVM/SPIR-V),
+structured control (WASM, as block regions), *and* a flat ISA (the degenerate
+one-block, no-SSA case). We take MLIR's spine, not its open-ended generality
+(no region/trait/dialect-registry machinery) — the lean version.
+
+Rejected, for the same reasons the ISA layer rejected its three:
+
+1. **Fold ISAs into the IR API** (ISA = "degenerate IR"). True in theory, but it
+   taxes the fast, flat ISA hot path with type/SSA/module machinery it never
+   needs. Keep them **siblings**; share only the leaf vocabulary in spirit.
+2. **One concrete codec for all IRs.** LLVM's bitstream is not a static table;
+   forcing WASM/SPIR-V and LLVM through one table breaks LLVM. The codec is
+   *pluggable* behind the verbs (§5).
+3. **Bake in SSA** (mandatory results + value-refs). Excludes WASM. `Dataflow`
+   + optional `Result.id == ID_NONE` keeps the stack machine first-class.
+
+---
+
+## 4. The naming contract
+
+Every IR package exposes these names with these signatures — the checklist each
+new IR is built against.
+
+**Re-exported shared types (from `ir`):**
+`Module Function Block Operation Operand Operand_Kind Result Type Type_Ref
+Type_Kind Id Ref Ref_Space Symbol_Table Dataflow Error Error_Code Token
+Token_Kind Print_Options DEFAULT_PRINT_OPTIONS`
+
+**Per-IR concrete types (identical names):**
+`Opcode` (u16, `INVALID = 0`) and `Relocation` / `Relocation_Type`.
+
+**Operand constructors (shared):** `op_int op_float op_type op_ref op_value
+op_block`, plus the IR's own dialect helpers where an opcode needs a structured
+immediate (e.g. a WASM `op_memarg`).
+
+**Operation builders & emitters** — by *shape*, mnemonic passed in (an IR has
+hundreds of opcodes over a handful of shapes, so per-opcode typed builders are
+optional, not the default): `op_none(opcode) op_unary(opcode, a)
+op_binary(opcode, a, b) op_call(callee, args) op_branch(target) …` and `emit_*`.
+
+**Entry points (identical signatures across IRs):**
+
+```odin
+encode(m: Module, code: []u8,
+       relocs: ^[dynamic]Relocation, errors: ^[dynamic]Error) -> (byte_count: u32, ok: bool)
+
+decode(data: []u8, m: ^Module, errors: ^[dynamic]Error,
+       allocator := context.allocator) -> (byte_count: u32, ok: bool)
+
+print/tprint/…(m: Module, options := ir.DEFAULT_PRINT_OPTIONS) -> (Print_Result | string)
+```
+
+Note the *deliberate* differences from the ISA verbs: they take a **`Module`**,
+not `[]Instruction`, and they **drop `label_defs` / `resolve` / `base_address`**
+— an IR has no PC-relative resolution pass, so those parameters would be dead.
+This is the divergence made explicit rather than carried inert. (It is also why
+WASM, currently shaped like an ISA package, will move to `ir/wasm`: its real
+`encode`/`decode` already dropped those parameters.)
+
+> Anything an IR genuinely lacks (WASM has no `VALUE` refs; an untyped IR no
+> `TYPE` refs) is simply **absent**, not stubbed — same rule as the ISA layer.
+
+---
+
+## 5. Codecs — the one place the strategy, not just the data, differs
+
+For an ISA, every codec is the same *kind* of thing (a bit/byte packer driven by
+a static table). For IRs there are **two kinds**, and the API contract is the
+verbs (§4), not the table — so a package picks its strategy underneath:
+
+- **Table-driven (WASM, SPIR-V).** A static `OPCODE → [operand layout]` table,
+  literally the ISA `ENCODING_TABLE` pattern: hand-written single source of
+  truth, O(1) dispatch. WASM's existing `ENCODING_TABLE` and SPIR-V's grammar
+  JSON both fit this.
+- **Bitstream (LLVM, AIR, DXIL).** A generic block/record/abbreviation engine;
+  operand layout is defined by abbreviation records encountered in the stream,
+  so there is no static opcode table. This is a real subsystem (shared by the
+  three LLVM-family members) that the LLVM IR reader sits on top of.
+
+Both satisfy the same `encode`/`decode` signatures; callers never see which.
+
+---
+
+## 6. One-paragraph summary
+
+Make `ir` own what is the same on every IR — and for IRs that is *more* than for
+ISAs: not just errors/refs/printing but the whole typed `Module → Function →
+Block → Operation` structure, because SSA and a type system regularize it. Keep
+the leaf ISA-shaped (`Operation` = `Instruction` + an optional `Result`, opcode a
+u16), keep the three verbs, and make exactly three concessions where an IR is not
+an ISA: a structured module instead of a flat stream, a first-class type table,
+and id-based entity references instead of PC-relative labels. Let `Dataflow`
+host both the stack machine and SSA, and let the codec be pluggable so the LLVM
+bitstream and the WASM/SPIR-V tables live under one contract. The result is a
+sibling to the ISA API, not a generalization of it: each new IR gets the shared
+structure and vocabulary for free and writes only its opcode set, its codec, and
+its dialect.
--- a/core/rexcode/ir/doc.odin
+++ b/core/rexcode/ir/doc.odin
@@ -0,0 +1,89 @@
+// rexcode  ·  Brendan Punsky (dotbmp@github), original author
+
+/*
+# rexcode/ir — the IR API layer
+
+`core:rexcode/ir` is to the intermediate representations (WASM, SPIR-V, LLVM
+bitcode, and the LLVM dialects AIR / DXIL) what `core:rexcode/isa` is to the
+machine ISAs: the **shared core** every concrete IR package builds on. It holds
+the parts that are the same for every IR, and defines the contract each IR
+package follows. It implements **no specific IR** — the concrete packages
+(`core:rexcode/ir/wasm`, `.../spirv`, `.../llvm`, …) are added separately.
+
+See `docs/ir_design.md` for the full design rationale and the ISA↔IR comparison.
+
+## Why a sibling, not a generalization of `isa`
+
+The ISA API works because every arch follows one *shape contract*
+(`Mnemonic` / `Instruction` / `Operand` / `encode` / `decode` / `print`) while
+the shared `isa` package carries only the universal bookkeeping. The IR API
+keeps that spirit, with three honest concessions where IRs truly diverge:
+
+  1. **A structured module replaces the flat instruction stream.** The unit of
+     work is a `Module` (`Module → []Function → []Block → []Operation`), not a
+     `[]Instruction`. So `ir` owns the *structural model* (module/function/block/
+     operation), where `isa` owns no `Instruction`.
+
+  2. **A first-class type system.** Operations and results reference a
+     module-level type table by `Type_Ref`. ISAs bake width into the mnemonic.
+
+  3. **Entity references replace PC-relative labels.** Operands reference SSA
+     values / blocks / functions / globals / types by `Id`, resolved
+     structurally — there is no instruction-index→byte-offset rewrite. (Object-
+     file *symbol* fixups still produce Relocations, defined per-IR.)
+
+Everything else is deliberately ISA-shaped: the leaf `Operation` is
+`isa.Instruction` + an optional typed `Result`, `opcode` is a u16 just like
+`isa.Mnemonic`, `Operand` is one discriminated value, and the verbs are the same
+three. `Dataflow` lets one model host both an implicit value stack (WASM) and
+explicit SSA (SPIR-V/LLVM) without baking in either.
+
+## What this package provides (shared)
+
+  * `status.odin` — `Error` / `Error_Code` (shape-identical to `isa.Error`).
+  * `refs.odin`   — `Id` / `Ref` / `Ref_Space` / `Symbol_Table` (the label analog).
+  * `types.odin`  — `Type` / `Type_Ref` / `Type_Kind` (the type table).
+  * `module.odin` — `Module` / `Function` / `Block` / `Operation` / `Operand` /
+                    `Result` / `Dataflow` (the structural model).
+  * `print.odin`  — token kinds, print options, number-formatting helpers.
+
+## What a concrete IR package provides (the contract)
+
+Each `core:rexcode/ir/<name>` package supplies, mirroring an arch package:
+
+  * `Opcode` — the IR's operation enum (`u16`, `INVALID = 0`), stored in
+    `Operation.opcode`. (Analogous to a `Mnemonic`.)
+  * A **codec** — the wire format. Two strategies cover the field:
+      - *table-driven* (WASM byte/LEB, SPIR-V 32-bit words): a static
+        `OPCODE → operand-layout` table, exactly like an ISA `ENCODING_TABLE`.
+      - *bitstream* (LLVM bitcode, and thus AIR / DXIL): a block/record/
+        abbreviation engine; the operand layout is data-defined, so there is no
+        static table. The codec is pluggable behind the verbs below.
+  * The three verbs, on a `Module` (vs the ISA verbs' `[]Instruction`):
+
+        encode :: proc(m:    Module,
+                       code:  []u8,
+                       relocs: ^[dynamic]Relocation,
+                       errors: ^[dynamic]Error) -> (byte_count: u32, ok: bool)
+
+        decode :: proc(data: []u8,
+                       m:    ^Module,
+                       errors: ^[dynamic]Error,
+                       allocator := context.allocator) -> (byte_count: u32, ok: bool)
+
+        print  :: proc(m: Module, options := ir.DEFAULT_PRINT_OPTIONS) -> ir.Print_Result
+        tprint :: proc(m: Module, options := ir.DEFAULT_PRINT_OPTIONS) -> string
+
+    (`encode`/`decode` deliberately *drop* the ISA verbs' `label_defs` /
+    `resolve` / `base_address` — there is no PC-relative resolution pass — and
+    take a `Module` rather than an instruction array. That is the whole point of
+    the divergence, made explicit rather than left inert.)
+
+  * `Relocation` / `Relocation_Type` — per-IR (the linker fixups for `EXTERNAL`
+    references), exactly as each arch owns its `reloc.odin`.
+  * Type lowering — how the IR's wire types map to/from `ir.Type`.
+
+A *dialect* (AIR over LLVM, DXIL over LLVM) reuses its base IR's codec wholesale
+and adds only the intrinsic/metadata conventions and any container wrapper.
+*/
+package rexcode_ir
--- a/core/rexcode/ir/module.odin
+++ b/core/rexcode/ir/module.odin
@@ -0,0 +1,154 @@
+// rexcode  ·  Brendan Punsky (dotbmp@github), original author
+
+package rexcode_ir
+
+// =============================================================================
+// STRUCTURAL MODEL  (the core of the IR API)
+// =============================================================================
+//
+// The central divergence from the ISA API. An ISA program is a flat
+// `[]Instruction`; an IR program is a *typed, structured module* --
+//
+//     Module → []Function → []Block → []Operation
+//
+// where an operation may define an SSA result that later operations reference.
+//
+// Design stance vs the ISA API:
+//
+//   * The leaf is kept ISA-shaped on purpose. `Operation` is `isa.Instruction`
+//     plus an optional typed `Result`; `opcode` is the concrete IR's Opcode
+//     enum stored as a u16, exactly as `isa` stores `Mnemonic` as a u16. So the
+//     opcode-table dispatch, the encode/decode/print verbs, and the relocation
+//     model all carry over.
+//
+//   * `Operand` is *shared* here, where `isa.Operand` is per-arch. Justified:
+//     ISA operands diverge wildly (ModRM/SIB vs shifted-register vs ...), but
+//     SSA collapses IR operands to "a literal, a reference to an entity, or a
+//     type", which is uniform enough to define once. Dialect-specific operand
+//     encodings (WASM memarg, SPIR-V enum masks) ride in `aux` + the IR's own
+//     opcode table -- they are an encoding detail, not a new operand shape.
+//
+//   * Both dataflow styles are first-class. `Dataflow` is a per-IR trait, NOT a
+//     baked-in assumption: a stack IR (WASM) leaves `Result.id == ID_NONE` and
+//     references nothing through VALUE; an SSA IR (SPIR-V/LLVM) names results
+//     and threads them as REF operands. The model excludes neither.
+//
+// What is deliberately NOT here: the wire codec (`encode`/`decode`) and the
+// printer. Those are per-IR -- just as `isa` defines no `encode`, each concrete
+// IR provides its own, against the contract in `doc.odin`. This package is the
+// shared *vocabulary*, not an implementation.
+
+// Per-IR dataflow discipline. WASM = STACK; SPIR-V / LLVM / AIR / DXIL = SSA.
+Dataflow :: enum u8 { STACK, SSA }
+
+// -----------------------------------------------------------------------------
+// Operand  (generalizes isa.Operand)
+// -----------------------------------------------------------------------------
+
+Operand_Kind :: enum u8 {
+	NONE,
+	LIT_INT,     // integer literal (value in `imm`)
+	LIT_FLOAT,   // float literal (IEEE bits in `imm`, width in `aux`)
+	REF,         // reference to an entity: `imm` is the Id, `space` the Ref_Space
+	TYPE,        // a Type_Ref (in the low 32 bits of `imm`)
+	ATTRIBUTE,   // a dialect enum / decoration / mask (value in `imm`, tag in `aux`)
+}
+
+// 16 bytes. The payload is one i64 (covers an Id, a Type_Ref, an int/float-bits
+// literal, or an attribute value); `space`/`aux` discriminate the entity space
+// or dialect tag. Large/aggregate constants are *entities* (a CONSTANT ref), not
+// inline operands -- the SSA way -- so no inline byte blob is needed here.
+Operand :: struct #packed {
+	imm:   i64,
+	kind:  Operand_Kind,
+	space: Ref_Space,    // REF: which id space
+	aux:   u16,          // LIT_FLOAT width / ATTRIBUTE tag / dialect bits
+	flags: u32,
+}
+#assert(size_of(Operand) == 16)
+
+@(require_results) op_int   :: #force_inline proc "contextless" (v: i64)             -> Operand { return Operand{kind = .LIT_INT,   imm = v} }
+@(require_results) op_float :: #force_inline proc "contextless" (bits: u64, w: u16)  -> Operand { return Operand{kind = .LIT_FLOAT, imm = i64(bits), aux = w} }
+@(require_results) op_type  :: #force_inline proc "contextless" (t: Type_Ref)        -> Operand { return Operand{kind = .TYPE, imm = i64(u32(t))} }
+
+@(require_results)
+op_ref :: #force_inline proc "contextless" (space: Ref_Space, id: Id) -> Operand {
+	return Operand{kind = .REF, space = space, imm = i64(u32(id))}
+}
+
+@(require_results) op_value :: #force_inline proc "contextless" (id: Id) -> Operand { return op_ref(.VALUE, id) }
+@(require_results) op_block :: #force_inline proc "contextless" (id: Id) -> Operand { return op_ref(.BLOCK, id) }
+
+// Reconstruct the Id / Type_Ref carried by an operand.
+@(require_results) operand_id   :: #force_inline proc "contextless" (o: Operand) -> Id       { return Id(u32(o.imm)) }
+@(require_results) operand_type :: #force_inline proc "contextless" (o: Operand) -> Type_Ref { return Type_Ref(u32(o.imm)) }
+
+// -----------------------------------------------------------------------------
+// Operation  (the leaf -- parallels isa.Instruction)
+// -----------------------------------------------------------------------------
+
+Operation_Flags :: bit_field u8 {
+	terminator: bool | 1,   // ends a block (br / ret / switch / unreachable)
+	control:    bool | 1,   // structured-control op (block/loop/if/... for stack IRs)
+	memory:     bool | 1,   // touches linear memory / pointers
+	_:          u8   | 5,
+}
+
+// `opcode` is the concrete IR's Opcode enum, stored as u16 (like isa.Mnemonic).
+// `operands` is variable-arity (calls, switch, phi) and caller-owned, like the
+// rest of the decoded module -- the fixed `[4]Operand` of the ISA Instruction is
+// the one shape that does not survive into IRs.
+Operation :: struct {
+	operands: []Operand,
+	result:   Result,       // SSA def; `.id == ID_NONE` for stack/void ops
+	opcode:   u16,
+	flags:    Operation_Flags,
+	_:        u8,
+}
+
+// What an operation produces.
+Result :: struct #packed {
+	id:   Id,         // ID_NONE if the op defines no value
+	type: Type_Ref,
+}
+#assert(size_of(Result) == 8)
+
+// -----------------------------------------------------------------------------
+// Containers  (no ISA parallel -- the structured-module concession)
+// -----------------------------------------------------------------------------
+
+// A basic block (SSA) or a structured region (stack IRs). `params` are block
+// arguments (SSA, phi-free form); empty for stack IRs. The terminator is the
+// final operation (Operation_Flags.terminator).
+Block :: struct {
+	ops:    []Operation,
+	params: []Result,
+	id:     Id,
+}
+
+Function :: struct {
+	blocks:    []Block,
+	name:      string,
+	signature: Type_Ref,   // a FUNCTION type in Module.types
+}
+
+// A module-level mutable/immutable value.
+Global :: struct {
+	name: string,
+	init: Id,          // a CONSTANT ref, or ID_NONE
+	type: Type_Ref,
+	mutable: bool,
+}
+
+// The module -- the unit the IR verbs operate on (where the ISA verbs take a
+// flat `[]Instruction`). Metadata, decorations, and dialect custom sections are
+// carried by the concrete IR alongside this core, the way each arch carries its
+// own reloc.odin.
+Module :: struct {
+	target:    string,        // triple / capability profile / version tag
+	types:     []Type,        // the type table; Type_Ref indexes here
+	globals:   []Global,
+	functions: []Function,
+	symbols:   Symbol_Table,  // externally-visible names
+	dataflow:  Dataflow,
+}
--- a/core/rexcode/ir/print.odin
+++ b/core/rexcode/ir/print.odin
@@ -0,0 +1,130 @@
+// rexcode  ·  Brendan Punsky (dotbmp@github), original author
+
+package rexcode_ir
+
+// =============================================================================
+// PRINTER FRAMEWORK  (shared scaffolding -- parallels isa.print)
+// =============================================================================
+//
+// Same role as isa.print: the universal pieces of textual output (token kinds
+// for highlighting, print options, the result type, number-formatting helpers).
+// A concrete IR's printer (WAT / SPIR-V disasm / LLVM `.ll`) owns the syntax of
+// types, value names, blocks, and the output-sink procedures, and calls these
+// helpers for hex/decimal. Kept independent of isa.print so the two siblings do
+// not couple; the `Token_Kind` set adds the IR-only categories.
+
+import "core:strings"
+import "core:reflect"
+
+Token_Kind :: enum u8 {
+	WHITESPACE,
+	NEWLINE,
+	OFFSET,          // byte/word offset prefix
+	KEYWORD,         // `func` / `block` / `define` / `OpLabel` style keywords
+	OPCODE,          // the operation mnemonic
+	TYPE,            // a type reference / spelling           (IR-only)
+	VALUE_REF,       // a use of an SSA value / local         (IR-only)
+	RESULT,          // a value definition (`%3 =`)           (IR-only)
+	BLOCK_LABEL,     // a basic-block / branch-target label   (IR-only)
+	GLOBAL_REF,      // function / global / symbol reference  (IR-only)
+	IMMEDIATE,       // literal constant
+	ATTRIBUTE,       // dialect attribute / decoration / flag (IR-only)
+	PUNCTUATION,     // `(`, `)`, `,`, `=`, `:`
+	COMMENT,
+}
+
+Token :: struct {
+	offset:         u32,   // byte offset in the output string
+	length:         u16,
+	kind:           Token_Kind,
+	operation_index: u16,  // which operation (0xFFFF for module-level / whitespace)
+}
+
+@(require_results)
+token_kind_to_string :: proc(k: Token_Kind) -> string {
+	if name, ok := reflect.enum_name_from_value(k); ok {
+		return name
+	}
+	return "???"
+}
+
+// -----------------------------------------------------------------------------
+// Print options & result  (same shape as isa, IR-flavoured defaults)
+// -----------------------------------------------------------------------------
+
+Print_Options :: struct {
+	uppercase:     bool,
+	hex_prefix:    string,   // default "0x"
+	hex_lowercase: bool,
+	value_prefix:  string,   // SSA value sigil, default "%"
+	block_prefix:  string,   // block-label sigil, default "^"
+	show_offsets:  bool,
+	indent:        string,   // default "  "
+	separator:     string,   // default "\n"
+}
+
+DEFAULT_PRINT_OPTIONS :: Print_Options{
+	uppercase     = false,
+	hex_prefix    = "0x",
+	hex_lowercase = true,
+	value_prefix  = "%",
+	block_prefix  = "^",
+	show_offsets  = false,
+	indent        = "  ",
+	separator     = "\n",
+}
+
+Print_Result :: struct {
+	text:   string,
+	tokens: []Token,   // nil unless requested
+}
+
+// -----------------------------------------------------------------------------
+// Number formatting helpers (used by every IR printer -- arch/IR-agnostic)
+// -----------------------------------------------------------------------------
+
+print_hex :: proc(sb: ^strings.Builder, value: u64, options: ^Print_Options) {
+	strings.write_string(sb, options.hex_prefix)
+	print_hex_digits(sb, value, options)
+}
+
+print_hex_digits :: proc(sb: ^strings.Builder, value: u64, options: ^Print_Options) {
+	if value == 0 {
+		strings.write_byte(sb, '0')
+		return
+	}
+	buf: [16]u8
+	i := 0
+	v := value
+	for v > 0 {
+		digit := u8(v & 0xF)
+		buf[i] = digit < 10 ? '0' + digit : 'a' + digit - 10
+		v >>= 4
+		i += 1
+	}
+	for j := i - 1; j >= 0; j -= 1 {
+		c := buf[j]
+		if options.uppercase && c >= 'a' && c <= 'f' {
+			c -= 32
+		}
+		strings.write_byte(sb, c)
+	}
+}
+
+print_decimal :: proc(sb: ^strings.Builder, value: u32) {
+	if value == 0 {
+		strings.write_byte(sb, '0')
+		return
+	}
+	buf: [10]u8
+	i := 0
+	v := value
+	for v > 0 {
+		buf[i] = '0' + u8(v % 10)
+		v /= 10
+		i += 1
+	}
+	for j := i - 1; j >= 0; j -= 1 {
+		strings.write_byte(sb, buf[j])
+	}
+}
--- a/core/rexcode/ir/refs.odin
+++ b/core/rexcode/ir/refs.odin
@@ -0,0 +1,99 @@
+// rexcode  ·  Brendan Punsky (dotbmp@github), original author
+
+package rexcode_ir
+
+// =============================================================================
+// REFERENCES  (the IR analog of isa.labels)
+// =============================================================================
+//
+// This is the first place the IR API genuinely diverges from the ISA API.
+//
+// An ISA resolves control flow as *PC-relative labels*: `Label_Definition`
+// maps a label id to an instruction index and `encode()` rewrites it to a byte
+// offset (isa.labels.rewrite_label_defs_to_offsets). That model is wrong for an
+// IR: IR operands reference *entities by id* -- SSA results, blocks, functions,
+// globals, types -- which are stable indices into the module's entity tables,
+// not byte offsets, and resolve *structurally* (no PC-relative pass).
+//
+// So the label machinery is replaced, not re-exported. What survives in spirit:
+//   * a small distinct-u32 id type with an "undefined" sentinel (forward refs),
+//   * a name<->id table for the externally-visible symbols (the Label_Map analog).
+//
+// Object-file *symbol* fixups (a linker patching a function/global index) are
+// still real and still produce Relocations -- but that is a codec concern,
+// defined per-IR (parallel to each arch's reloc.odin), not here.
+
+// A stable id into one of the module's entity spaces (see Ref_Space).
+Id :: distinct u32
+
+ID_NONE :: Id(0xFFFFFFFF)
+
+// Which id space a reference addresses. Drives validation, printer annotation,
+// and (for EXTERNAL) relocation-type selection. This is the union of the spaces
+// the modelled IRs use; a concrete IR uses only the subset it needs -- a stack
+// IR (WASM) never produces a VALUE ref, an untyped IR never produces a TYPE ref.
+Ref_Space :: enum u8 {
+	NONE,
+	VALUE,      // an SSA result (or a local/stack slot)
+	BLOCK,      // a basic block / structured-control label (branch target)
+	FUNCTION,
+	GLOBAL,
+	TYPE,
+	CONSTANT,   // a constant-pool entry
+	MEMORY,     // a linear memory / address space
+	METADATA,   // a metadata/debug node
+	EXTERNAL,   // an imported/exported symbol -- relocatable across object files
+}
+
+// A typed reference: which space, plus the id within it. Carried by REF operands
+// and by branch targets. 8 bytes, like isa.Label_Definition is u32-cheap.
+Ref :: struct #packed {
+	id:    Id,
+	space: Ref_Space,
+	_:     [3]u8,
+}
+#assert(size_of(Ref) == 8)
+
+@(require_results)
+ref :: #force_inline proc "contextless" (space: Ref_Space, id: Id) -> Ref {
+	return Ref{id = id, space = space}
+}
+
+// -----------------------------------------------------------------------------
+// Symbol table  (the IR analog of isa.Label_Map: name <-> id for visible names)
+// -----------------------------------------------------------------------------
+
+Symbol_Table :: struct {
+	names: map[string]Id,
+	space: Ref_Space,   // what these names address (usually FUNCTION/GLOBAL)
+}
+
+symbol_table_init :: #force_inline proc(st: ^Symbol_Table, space := Ref_Space.EXTERNAL, allocator := context.allocator) {
+	st.names = make(map[string]Id, allocator = allocator)
+	st.space = space
+}
+
+symbol_table_destroy :: #force_inline proc(st: ^Symbol_Table) {
+	delete(st.names)
+}
+
+// Bind a name to an id (e.g. when a definition is emitted).
+symbol_define :: #force_inline proc(st: ^Symbol_Table, name: string, id: Id) {
+	st.names[name] = id
+}
+
+// Reserve a name for a forward reference; resolve later with symbol_define.
+@(require_results)
+symbol_reserve :: #force_inline proc(st: ^Symbol_Table, name: string) -> Id {
+	if existing, ok := st.names[name]; ok {
+		return existing
+	}
+	st.names[name] = ID_NONE
+	return ID_NONE
+}
+
+@(require_results)
+symbol_lookup :: #force_inline proc(st: ^Symbol_Table, name: string) -> (id: Id, ok: bool) {
+	id, ok = st.names[name]
+	return
+}
--- a/core/rexcode/ir/status.odin
+++ b/core/rexcode/ir/status.odin
@@ -0,0 +1,42 @@
+// rexcode  ·  Brendan Punsky (dotbmp@github), original author
+
+package rexcode_ir
+
+// =============================================================================
+// ERROR / RESULT TYPES (shared by every IR codec)
+// =============================================================================
+//
+// Parallels isa.status. The `Error` struct shape is intentionally identical to
+// `isa.Error` (8 bytes: a u32 location + a 1-byte code) so a tool can surface
+// ISA and IR diagnostics through one path. `Error_Code` keeps the encode/decode
+// codes shared with the ISA side, then adds the codes only a *typed, structured*
+// IR can produce. Per-IR codecs emit the subset that applies to them.
+
+Error_Code :: enum u8 {
+	NONE = 0,
+
+	// Shared with the ISA side (encode/decode of the byte/word stream).
+	INVALID_OPCODE,
+	NO_MATCHING_ENCODING,
+	OPERAND_MISMATCH,
+	IMMEDIATE_OUT_OF_RANGE,
+	BUFFER_OVERFLOW,
+	BUFFER_TOO_SHORT,
+
+	// IR-specific (no ISA analog -- these need a type system / SSA / a module).
+	INVALID_TYPE,         // malformed or out-of-range Type_Ref
+	TYPE_MISMATCH,        // an operand/result type disagrees with the op signature
+	UNDEFINED_REF,        // a Ref to an id/symbol that is never defined
+	DUPLICATE_DEFINITION, // an id/symbol defined twice
+	MALFORMED_MODULE,     // structural violation (block without terminator, ...)
+	UNSUPPORTED_FEATURE,  // a capability/extension the codec does not implement
+}
+
+// `location` is the operation index on encode, or the byte offset on decode --
+// mirroring isa.Error.inst_idx.
+Error :: struct #packed {
+	location: u32,
+	code:     Error_Code,
+	_:        [3]u8,
+}
+#assert(size_of(Error) == 8)
--- a/core/rexcode/ir/types.odin
+++ b/core/rexcode/ir/types.odin
@@ -0,0 +1,66 @@
+// rexcode  ·  Brendan Punsky (dotbmp@github), original author
+
+package rexcode_ir
+
+// =============================================================================
+// TYPE MODEL
+// =============================================================================
+//
+// The second genuine divergence from the ISA API: IRs have a *first-class type
+// system*. An ISA bakes width into the mnemonic (`ADD` vs `ADDB`); operands are
+// just bit patterns. An IR carries an explicit type table and operations /
+// results reference types by `Type_Ref` (an index into `Module.types`).
+//
+// `Type_Kind` is the common denominator across the modelled IRs:
+//   * WASM:   i32/i64/f32/f64/v128 + funcref/externref  (a *degenerate* table --
+//             a handful of primitives, no user structs).
+//   * SPIR-V: OpTypeInt / Float / Vector / Pointer / Struct / Function / ...
+//   * LLVM:   iN / float / pointer / vector / array / struct / function / opaque.
+//
+// A concrete IR lowers its wire types onto this set on decode and back on
+// encode. Anything a dialect needs beyond the common shape rides in `aux` (e.g.
+// pointer address space) or in the concrete IR's own side tables.
+
+Type_Ref :: distinct u32
+
+TYPE_NONE :: Type_Ref(0xFFFFFFFF)
+
+Type_Kind :: enum u8 {
+	VOID,
+	INT,        // `bits` = width (1/8/16/32/64/...); signedness is op-level in most IRs
+	FLOAT,      // `bits` = width (16/32/64/128)
+	VECTOR,     // `elem` x `count`   (fixed-width SIMD)
+	ARRAY,      // `elem` x `count`
+	POINTER,    // `elem`, address space in `aux`
+	STRUCT,     // members in `fields`
+	FUNCTION,   // `fields` = params ++ [result]; `count` = param count
+	OPAQUE,     // named / forward-declared / abstract handle (images, tokens, ...)
+	REF,        // funcref / externref / typed GC reference (`elem` for typed refs)
+}
+
+// One node in a module's type table. `fields` (struct members / function
+// signature) is caller-owned, like the rest of the decoded module.
+Type :: struct {
+	fields: []Type_Ref,   // STRUCT members, or FUNCTION params ++ result
+	name:   string,       // OPAQUE / named struct
+	elem:   Type_Ref,     // VECTOR / ARRAY / POINTER / typed REF element
+	count:  u32,          // VECTOR / ARRAY length, or FUNCTION param count
+	bits:   u16,          // INT / FLOAT width
+	aux:    u16,          // POINTER address space, packed kind flags, ...
+	kind:   Type_Kind,
+	_:      [3]u8,
+}
+
+@(require_results) type_void  :: #force_inline proc "contextless" ()           -> Type { return Type{kind = .VOID} }
+@(require_results) type_int   :: #force_inline proc "contextless" (bits: u16)   -> Type { return Type{kind = .INT,   bits = bits} }
+@(require_results) type_float :: #force_inline proc "contextless" (bits: u16)   -> Type { return Type{kind = .FLOAT, bits = bits} }
+
+@(require_results)
+type_vector :: #force_inline proc "contextless" (elem: Type_Ref, count: u32) -> Type {
+	return Type{kind = .VECTOR, elem = elem, count = count}
+}
+
+@(require_results)
+type_pointer :: #force_inline proc "contextless" (elem: Type_Ref, address_space: u16 = 0) -> Type {
+	return Type{kind = .POINTER, elem = elem, aux = address_space}
+}