A sibling to core:rexcode/isa for the intermediate representations (WASM,
SPIR-V, LLVM bitcode + the LLVM dialects AIR/DXIL). Holds the shared
vocabulary every IR package builds on, implements no specific IR.
Design stance (see docs/ir_design.md): keep the ISA layer's spirit, but
where IRs are structurally MORE uniform than ISAs (SSA + a type system
regularize the operand/module shape), the shared core is richer. ir/ owns:
status.odin Error/Error_Code (shape-identical to isa.Error)
refs.odin Id/Ref/Ref_Space/Symbol_Table (the label analog: structural
id references, not PC-relative byte offsets)
types.odin Type/Type_Ref/Type_Kind (the type table -- no ISA analog)
module.odin Module/Function/Block/Operation/Operand/Result/Dataflow
(the structured model; Operation = isa.Instruction + an
optional typed Result, opcode a u16 like Mnemonic)
print.odin token kinds + options + num-fmt (parallels isa.print)
Three honest concessions vs the ISA API, made explicit not inert: a
structured Module replaces the flat []Instruction; a first-class type
system; id-based entity refs replace labels. The encode/decode verbs take
a Module and drop label_defs/resolve/base_address. Dataflow hosts both the
WASM value stack and SSA; the codec is pluggable (table for WASM/SPIR-V,
bitstream for the LLVM family -- AIR/DXIL are LLVM dialects, not peers).
Package compiles; a hand-built SSA module round-trips through the types.
13 KiB
rexcode — IR API Design
Why the rexcode IR family (
wasm, and the plannedspirv,llvm, withair/dxilas LLVM dialects) gets its own API layer (core:rexcode/ir) parallel to the ISA layer (core:rexcode/isa) — sharing the ISA layer's spirit and as much of its shape as honestly survives, while conceding exactly the three places an IR is not an ISA.
Read cross_arch_design.md first; this document is its sibling and assumes its vocabulary.
The guiding principle
The ISA layer's rule was “share the bookkeeping, specialize the bytes.” The IR layer keeps it and adds one clause:
Share the bookkeeping and the structure, specialize the dialect and the codec.
An ISA only ever shares bookkeeping because its content (registers, operands,
the bit-twiddling) diverges maximally per arch. An IR shares more — the whole
Module → Function → Block → Operation structure and the operand/type model are
genuinely the same problem on every IR — because SSA and a type system regularize
what ISAs leave ad hoc. So ir/ is a richer shared core than isa/: it owns the
structural model, not just labels and errors. What stays per-IR is the opcode
set, the codec (the wire format), and the dialect (intrinsic/metadata
conventions).
0. First: how many IRs are there really?
Fewer than the list suggests. AIR and DXIL are not peers of LLVM — they are LLVM bitcode. AIR is LLVM bitcode + a Metal dialect; DXIL is LLVM ~3.7 bitcode
- a DirectX dialect inside a DXContainer. So the field is three codec families, not five:
| family | members | wire format |
|---|---|---|
| WASM | wasm | byte stream + LEB128, one form per opcode |
| SPIR-V | spirv | 32-bit words, uniform wordCount<<16 | opcode header |
| LLVM bitstream | llvm, air, dxil | self-describing block/record/abbreviation bitstream |
The implementation cost is therefore 3 codecs + N dialects, and air/dxil
should reuse the llvm codec wholesale. That single fact shapes the package
tree: ir/llvm/, ir/llvm/air/, ir/llvm/dxil/.
1. The universal IR shape
Strip away specifics and every IR needs these — the same checklist isa has,
shifted up one level of structure:
| # | Concept | ir type |
|---|---|---|
| 1 | A type = (kind, width/elem/fields) | Type, Type_Ref |
| 2 | An operand = literal | entity-ref | type | Operand, Operand_Kind |
| 3 | An operation = opcode + operands + optional result | Operation |
| 4 | An opcode enum | per-IR Opcode (u16, INVALID=0) |
| 5 | References to entities by id (+ named symbols) | Id, Ref, Symbol_Table |
| 6 | Relocations for object-file symbol fixups | per-IR Relocation |
| 7 | encode(Module) -> bytes (+relocs +errors) |
per-IR encode() |
| 8 | decode(bytes) -> Module (+errors) |
per-IR decode() |
| 9 | print(Module) -> text (+tokens) |
per-IR print()/tprint() |
| + | A structured module of functions→blocks→operations | Module/Function/Block |
| + | A dataflow discipline (stack or SSA) | Dataflow |
Items 1–9 are item-for-item the ISA's nine, re-aimed: type generalizes the
ISA's implicit-width; operand keeps the kind tag; operation is Instruction
- a
Result; opcode isMnemonic; references replace labels. The two+rows are the genuinely new structure (§3).
2. Where IRs diverge from ISAs
Three real divergences, then a long tail of things that look different but are the same shape.
The three real concessions
-
The unit of work is a structured
Module, not a flat[]Instruction. An ISA program is a byte-addressed instruction stream; an IR program is a typed graph:Module → []Function → []Block → []Operation, where an op may define an SSA value that later ops use. Sodecodeis a structured parse, not a linear scan, andirownsModule/Function/Blockwhereisaowns noInstruction.Operation.operandsis variable-arity ([]Operand) — the ISAInstruction's fixed[4]Operandis the one leaf shape that does not survive (calls,switch,phi). -
A first-class type system. Operations and results carry a
Type_Refinto the module's type table. ISAs bake width into the mnemonic and never need this.Type_Kindis the WASM∪SPIR-V∪LLVM denominator (INT/FLOAT/VECTOR/ POINTER/STRUCT/FUNCTION/...). -
Entity references replace PC-relative labels. ISA branches resolve as instruction-index→byte-offset (
isa.Label_Definition, rewritten byencode). IR operands reference entities byId— SSA results, blocks, functions, globals, types — resolved structurally, with no PC-relative pass. (Object- file symbol fixups still produceRelocations forEXTERNALrefs.)
Two axes that sort the IRs
Everything else sorts onto two orthogonal axes. Note the clustering is counterintuitive — the encoding mates and the model mates are different pairs:
| IR | encoding model | dataflow model |
|---|---|---|
| WASM | table (byte/LEB, one form per opcode) | stack (implicit) |
| SPIR-V | table (32-bit words, uniform header) | SSA (result ids, typed) |
| LLVM / AIR / DXIL | bitstream (data-defined abbreviations) | SSA (+ metadata graph) |
- On encoding, WASM and SPIR-V are siblings — a static
opcode → operand- layouttable, exactly the ISAENCODING_TABLEshape. LLVM is the outlier: its layout is defined by abbreviation records in the stream, so no static table can describe it. - On dataflow, SPIR-V/LLVM are siblings (SSA + types); WASM is the outlier — a stack bytecode with no SSA, no named results, minimal types.
So WASM is encoding-kin to SPIR-V but model-kin to nothing, and the one thing
you most want to share (LLVM) breaks the table assumption the others share. The
Dataflow trait and the pluggable codec (§5) exist precisely to absorb these
two splits without forking the API.
Divergence summary
| Component | Verdict | Shared (ir/) |
Per-IR |
|---|---|---|---|
References / Id |
✅ shared | the whole id + symbol model | which Ref_Spaces exist |
| Error / status | ✅ shared | struct shape (= isa.Error) |
error-code subset |
| Type model | ✅ shared | Type/Type_Ref/Type_Kind |
wire⇄Type lowering |
| Operand model | ✅ shared* | Operand + kinds (SSA homogenizes it) |
dialect aux encodings |
| Structural model | ✅ shared | Module/Function/Block/Operation |
— |
| Printer framework | ◑ split | tokens, options, num-fmt | type/value/block syntax |
| Relocation | ◑ split | struct-shape convention | type enum (per-IR file) |
Opcode |
✗ per-IR | convention (u16, INVALID=0) | the enum |
| Opcode table / codec | ✗ per-IR | codec strategy (§5) | schema + data (or bitstream) |
encode/decode driver |
✗ per-IR | verb signature | the whole parse/emit |
*
Operandis shared here whereisa.Operandis per-arch. ISA operands diverge wildly (ModRM/SIB vs shifted-register vs split immediates); SSA collapses IR operands to "a literal, a reference, or a type," uniform enough to define once. Dialect-specific encodings (WASM memarg, SPIR-V enum masks) are an encoding detail carried inOperand.aux+ the IR's opcode table — not a new shape.
3. The shared core (ir/) and why this much is shared
ir/ depends on nothing (it does not depend on isa/) and owns the parts
that are the same problem on every IR:
status.odin—Error/Error_Code; theErrorstruct is byte-identical toisa.Errorso one tool surfaces both.refs.odin—Id,Ref,Ref_Space,Symbol_Table(theisa.labelsanalog, re-cast from byte-offsets to structural ids).types.odin—Type,Type_Ref,Type_Kind(no ISA analog).module.odin—Module/Function/Block/Operation/Operand/Result/Dataflow(the structural model; the heart of the layer).print.odin— token kinds (with IR-onlyTYPE/VALUE_REF/RESULT/BLOCK_LABEL), print options, number-formatting helpers.
Each concrete IR package re-exports these (e.g. wasm.Module,
spirv.Operation) so a consumer sees one namespace, mirroring how arch packages
re-export isa.
The validating precedent and the rejected alternatives
The Operation-with-blocks-and-regions spine is exactly MLIR's structural
model, which is field-proof that one model cleanly subsumes a CFG (LLVM/SPIR-V),
structured control (WASM, as block regions), and a flat ISA (the degenerate
one-block, no-SSA case). We take MLIR's spine, not its open-ended generality
(no region/trait/dialect-registry machinery) — the lean version.
Rejected, for the same reasons the ISA layer rejected its three:
- Fold ISAs into the IR API (ISA = "degenerate IR"). True in theory, but it taxes the fast, flat ISA hot path with type/SSA/module machinery it never needs. Keep them siblings; share only the leaf vocabulary in spirit.
- One concrete codec for all IRs. LLVM's bitstream is not a static table; forcing WASM/SPIR-V and LLVM through one table breaks LLVM. The codec is pluggable behind the verbs (§5).
- Bake in SSA (mandatory results + value-refs). Excludes WASM.
Dataflow- optional
Result.id == ID_NONEkeeps the stack machine first-class.
- optional
4. The naming contract
Every IR package exposes these names with these signatures — the checklist each new IR is built against.
Re-exported shared types (from ir):
Module Function Block Operation Operand Operand_Kind Result Type Type_Ref Type_Kind Id Ref Ref_Space Symbol_Table Dataflow Error Error_Code Token Token_Kind Print_Options DEFAULT_PRINT_OPTIONS
Per-IR concrete types (identical names):
Opcode (u16, INVALID = 0) and Relocation / Relocation_Type.
Operand constructors (shared): op_int op_float op_type op_ref op_value op_block, plus the IR's own dialect helpers where an opcode needs a structured
immediate (e.g. a WASM op_memarg).
Operation builders & emitters — by shape, mnemonic passed in (an IR has
hundreds of opcodes over a handful of shapes, so per-opcode typed builders are
optional, not the default): op_none(opcode) op_unary(opcode, a) op_binary(opcode, a, b) op_call(callee, args) op_branch(target) … and emit_*.
Entry points (identical signatures across IRs):
encode(m: Module, code: []u8,
relocs: ^[dynamic]Relocation, errors: ^[dynamic]Error) -> (byte_count: u32, ok: bool)
decode(data: []u8, m: ^Module, errors: ^[dynamic]Error,
allocator := context.allocator) -> (byte_count: u32, ok: bool)
print/tprint/…(m: Module, options := ir.DEFAULT_PRINT_OPTIONS) -> (Print_Result | string)
Note the deliberate differences from the ISA verbs: they take a Module,
not []Instruction, and they drop label_defs / resolve / base_address
— an IR has no PC-relative resolution pass, so those parameters would be dead.
This is the divergence made explicit rather than carried inert. (It is also why
WASM, currently shaped like an ISA package, will move to ir/wasm: its real
encode/decode already dropped those parameters.)
Anything an IR genuinely lacks (WASM has no
VALUErefs; an untyped IR noTYPErefs) is simply absent, not stubbed — same rule as the ISA layer.
5. Codecs — the one place the strategy, not just the data, differs
For an ISA, every codec is the same kind of thing (a bit/byte packer driven by a static table). For IRs there are two kinds, and the API contract is the verbs (§4), not the table — so a package picks its strategy underneath:
- Table-driven (WASM, SPIR-V). A static
OPCODE → [operand layout]table, literally the ISAENCODING_TABLEpattern: hand-written single source of truth, O(1) dispatch. WASM's existingENCODING_TABLEand SPIR-V's grammar JSON both fit this. - Bitstream (LLVM, AIR, DXIL). A generic block/record/abbreviation engine; operand layout is defined by abbreviation records encountered in the stream, so there is no static opcode table. This is a real subsystem (shared by the three LLVM-family members) that the LLVM IR reader sits on top of.
Both satisfy the same encode/decode signatures; callers never see which.
6. One-paragraph summary
Make ir own what is the same on every IR — and for IRs that is more than for
ISAs: not just errors/refs/printing but the whole typed Module → Function → Block → Operation structure, because SSA and a type system regularize it. Keep
the leaf ISA-shaped (Operation = Instruction + an optional Result, opcode a
u16), keep the three verbs, and make exactly three concessions where an IR is not
an ISA: a structured module instead of a flat stream, a first-class type table,
and id-based entity references instead of PC-relative labels. Let Dataflow
host both the stack machine and SSA, and let the codec be pluggable so the LLVM
bitstream and the WASM/SPIR-V tables live under one contract. The result is a
sibling to the ISA API, not a generalization of it: each new IR gets the shared
structure and vocabulary for free and writes only its opcode set, its codec, and
its dialect.