rexcode: add core:rexcode/ir — the IR API layer (no concrete IR yet)

A sibling to core:rexcode/isa for the intermediate representations (WASM,
SPIR-V, LLVM bitcode + the LLVM dialects AIR/DXIL). Holds the shared
vocabulary every IR package builds on, implements no specific IR.

Design stance (see docs/ir_design.md): keep the ISA layer's spirit, but
where IRs are structurally MORE uniform than ISAs (SSA + a type system
regularize the operand/module shape), the shared core is richer. ir/ owns:

  status.odin  Error/Error_Code (shape-identical to isa.Error)
  refs.odin    Id/Ref/Ref_Space/Symbol_Table (the label analog: structural
               id references, not PC-relative byte offsets)
  types.odin   Type/Type_Ref/Type_Kind (the type table -- no ISA analog)
  module.odin  Module/Function/Block/Operation/Operand/Result/Dataflow
               (the structured model; Operation = isa.Instruction + an
               optional typed Result, opcode a u16 like Mnemonic)
  print.odin   token kinds + options + num-fmt (parallels isa.print)

Three honest concessions vs the ISA API, made explicit not inert: a
structured Module replaces the flat []Instruction; a first-class type
system; id-based entity refs replace labels. The encode/decode verbs take
a Module and drop label_defs/resolve/base_address. Dataflow hosts both the
WASM value stack and SSA; the codec is pluggable (table for WASM/SPIR-V,
bitstream for the LLVM family -- AIR/DXIL are LLVM dialects, not peers).

Package compiles; a hand-built SSA module round-trips through the types.
This commit is contained in:
Brendan Punsky
2026-06-18 18:59:50 -04:00
committed by Flāvius
parent 95df04fbe1
commit daa5b7cb79
8 changed files with 852 additions and 1 deletions

View File

@@ -214,8 +214,9 @@ rexcode/
ppc_vle/ # Freescale VLE (sibling of ppc)
riscv/ # RISC-V
rsp/ # N64 RSP
ir/ # shared IR core (parallels isa/; see docs/ir_design.md)
wasm/ # WebAssembly (an IR; destined for ir/wasm once the IR layer settles)
docs/ # cross-arch design + per-arch design docs
docs/ # cross-arch design + IR design + per-arch design docs
```
Each ISA is imported as `core:rexcode/isa/<arch>` (e.g. `core:rexcode/isa/x86`); the

View File

@@ -0,0 +1,270 @@
<!-- rexcode · Brendan Punsky (dotbmp@github), original author -->
# rexcode — IR API Design
> Why the rexcode IR family (`wasm`, and the planned `spirv`, `llvm`, with
> `air` / `dxil` as LLVM dialects) gets its own API layer (`core:rexcode/ir`)
> **parallel** to the ISA layer (`core:rexcode/isa`) — sharing the ISA layer's
> spirit and as much of its shape as honestly survives, while conceding exactly
> the three places an IR is not an ISA.
Read [cross_arch_design.md](cross_arch_design.md) first; this document is its
sibling and assumes its vocabulary.
---
## The guiding principle
The ISA layer's rule was *“share the bookkeeping, specialize the bytes.”* The IR
layer keeps it and adds one clause:
> **Share the bookkeeping *and the structure*, specialize the dialect and the codec.**
An ISA only ever shares bookkeeping because its *content* (registers, operands,
the bit-twiddling) diverges maximally per arch. An IR shares **more** — the whole
`Module → Function → Block → Operation` structure and the operand/type model are
genuinely the same problem on every IR — because SSA and a type system regularize
what ISAs leave ad hoc. So `ir/` is a richer shared core than `isa/`: it owns the
*structural model*, not just labels and errors. What stays per-IR is the *opcode
set*, the *codec* (the wire format), and the *dialect* (intrinsic/metadata
conventions).
---
## 0. First: how many IRs are there really?
Fewer than the list suggests. **AIR and DXIL are not peers of LLVM — they are
LLVM bitcode.** AIR is LLVM bitcode + a Metal dialect; DXIL is LLVM ~3.7 bitcode
+ a DirectX dialect inside a DXContainer. So the field is **three codec
families**, not five:
| family | members | wire format |
|---|---|---|
| WASM | wasm | byte stream + LEB128, one form per opcode |
| SPIR-V | spirv | 32-bit words, uniform `wordCount<<16 \| opcode` header |
| LLVM bitstream | llvm, **air**, **dxil** | self-describing block/record/abbreviation bitstream |
The implementation cost is therefore *3 codecs + N dialects*, and `air`/`dxil`
should reuse the `llvm` codec wholesale. That single fact shapes the package
tree: `ir/llvm/`, `ir/llvm/air/`, `ir/llvm/dxil/`.
---
## 1. The universal IR shape
Strip away specifics and every IR needs these — the same checklist `isa` has,
shifted up one level of structure:
| # | Concept | `ir` type |
|---|---|---|
| 1 | A **type** = (kind, width/elem/fields) | `Type`, `Type_Ref` |
| 2 | An **operand** = literal \| entity-ref \| type | `Operand`, `Operand_Kind` |
| 3 | An **operation** = opcode + operands + *optional result* | `Operation` |
| 4 | An **opcode** enum | per-IR `Opcode` (u16, INVALID=0) |
| 5 | **References** to entities by id (+ named symbols) | `Id`, `Ref`, `Symbol_Table` |
| 6 | **Relocations** for object-file symbol fixups | per-IR `Relocation` |
| 7 | `encode(Module) -> bytes (+relocs +errors)` | per-IR `encode()` |
| 8 | `decode(bytes) -> Module (+errors)` | per-IR `decode()` |
| 9 | `print(Module) -> text (+tokens)` | per-IR `print()`/`tprint()` |
| + | A **structured module** of functions→blocks→operations | `Module`/`Function`/`Block` |
| + | A **dataflow discipline** (stack or SSA) | `Dataflow` |
Items 19 are item-for-item the ISA's nine, re-aimed: *type* generalizes the
ISA's implicit-width; *operand* keeps the kind tag; *operation* is `Instruction`
+ a `Result`; *opcode* is `Mnemonic`; *references* replace *labels*. The two `+`
rows are the genuinely new structure (§3).
---
## 2. Where IRs diverge from ISAs
Three real divergences, then a long tail of things that *look* different but are
the same shape.
### The three real concessions
1. **The unit of work is a structured `Module`, not a flat `[]Instruction`.**
An ISA program is a byte-addressed instruction stream; an IR program is a
typed graph: `Module → []Function → []Block → []Operation`, where an op may
define an SSA value that later ops use. So `decode` is a *structured parse*,
not a linear scan, and `ir` owns `Module`/`Function`/`Block` where `isa` owns
no `Instruction`. `Operation.operands` is **variable-arity** (`[]Operand`) —
the ISA `Instruction`'s fixed `[4]Operand` is the one leaf shape that does not
survive (calls, `switch`, `phi`).
2. **A first-class type system.** Operations and results carry a `Type_Ref` into
the module's type table. ISAs bake width into the mnemonic and never need
this. `Type_Kind` is the WASMSPIR-VLLVM denominator (`INT/FLOAT/VECTOR/
POINTER/STRUCT/FUNCTION/...`).
3. **Entity references replace PC-relative labels.** ISA branches resolve as
instruction-index→byte-offset (`isa.Label_Definition`, rewritten by `encode`).
IR operands reference entities by **`Id`** — SSA results, blocks, functions,
globals, types — resolved *structurally*, with no PC-relative pass. (Object-
file *symbol* fixups still produce `Relocation`s for `EXTERNAL` refs.)
### Two axes that sort the IRs
Everything else sorts onto two orthogonal axes. Note the clustering is
counterintuitive — the encoding mates and the model mates are *different* pairs:
| IR | encoding model | dataflow model |
|---|---|---|
| WASM | **table** (byte/LEB, one form per opcode) | **stack** (implicit) |
| SPIR-V | **table** (32-bit words, uniform header) | **SSA** (result ids, typed) |
| LLVM / AIR / DXIL | **bitstream** (data-defined abbreviations) | **SSA** (+ metadata graph) |
- On **encoding**, WASM and SPIR-V are siblings — a static `opcode → operand-
layout` table, *exactly* the ISA `ENCODING_TABLE` shape. LLVM is the outlier:
its layout is defined by abbreviation records *in the stream*, so **no static
table can describe it**.
- On **dataflow**, SPIR-V/LLVM are siblings (SSA + types); **WASM is the
outlier** — a stack bytecode with no SSA, no named results, minimal types.
So WASM is encoding-kin to SPIR-V but model-kin to nothing, and the one thing
you most want to share (LLVM) breaks the table assumption the others share. The
`Dataflow` trait and the *pluggable codec* (§5) exist precisely to absorb these
two splits without forking the API.
### Divergence summary
| Component | Verdict | Shared (`ir/`) | Per-IR |
|---|---|---|---|
| References / `Id` | ✅ shared | the whole id + symbol model | which `Ref_Space`s exist |
| Error / status | ✅ shared | struct shape (= `isa.Error`) | error-code subset |
| Type model | ✅ shared | `Type`/`Type_Ref`/`Type_Kind` | wire⇄`Type` lowering |
| Operand model | ✅ shared* | `Operand` + kinds (SSA homogenizes it) | dialect `aux` encodings |
| Structural model | ✅ shared | `Module`/`Function`/`Block`/`Operation` | — |
| Printer framework | ◑ split | tokens, options, num-fmt | type/value/block syntax |
| Relocation | ◑ split | struct-shape convention | type enum (per-IR file) |
| `Opcode` | ✗ per-IR | convention (u16, INVALID=0) | the enum |
| Opcode table / codec | ✗ per-IR | codec *strategy* (§5) | schema + data (or bitstream) |
| `encode`/`decode` driver | ✗ per-IR | verb signature | the whole parse/emit |
> *`Operand` is shared here where `isa.Operand` is per-arch. ISA operands diverge
> wildly (ModRM/SIB vs shifted-register vs split immediates); SSA collapses IR
> operands to "a literal, a reference, or a type," uniform enough to define once.
> Dialect-specific encodings (WASM memarg, SPIR-V enum masks) are an *encoding*
> detail carried in `Operand.aux` + the IR's opcode table — not a new shape.
---
## 3. The shared core (`ir/`) and why this much is shared
`ir/` depends on nothing (it does **not** depend on `isa/`) and owns the parts
that are the same problem on every IR:
- `status.odin` — `Error`/`Error_Code`; the `Error` struct is byte-identical to
`isa.Error` so one tool surfaces both.
- `refs.odin` — `Id`, `Ref`, `Ref_Space`, `Symbol_Table` (the `isa.labels`
analog, re-cast from byte-offsets to structural ids).
- `types.odin` — `Type`, `Type_Ref`, `Type_Kind` (no ISA analog).
- `module.odin` — `Module`/`Function`/`Block`/`Operation`/`Operand`/`Result`/
`Dataflow` (the structural model; the heart of the layer).
- `print.odin` — token kinds (with IR-only `TYPE`/`VALUE_REF`/`RESULT`/
`BLOCK_LABEL`), print options, number-formatting helpers.
Each concrete IR package **re-exports** these (e.g. `wasm.Module`,
`spirv.Operation`) so a consumer sees one namespace, mirroring how arch packages
re-export `isa`.
### The validating precedent and the rejected alternatives
The `Operation`-with-blocks-and-regions spine is exactly **MLIR's** structural
model, which is field-proof that one model cleanly subsumes a CFG (LLVM/SPIR-V),
structured control (WASM, as block regions), *and* a flat ISA (the degenerate
one-block, no-SSA case). We take MLIR's spine, not its open-ended generality
(no region/trait/dialect-registry machinery) — the lean version.
Rejected, for the same reasons the ISA layer rejected its three:
1. **Fold ISAs into the IR API** (ISA = "degenerate IR"). True in theory, but it
taxes the fast, flat ISA hot path with type/SSA/module machinery it never
needs. Keep them **siblings**; share only the leaf vocabulary in spirit.
2. **One concrete codec for all IRs.** LLVM's bitstream is not a static table;
forcing WASM/SPIR-V and LLVM through one table breaks LLVM. The codec is
*pluggable* behind the verbs (§5).
3. **Bake in SSA** (mandatory results + value-refs). Excludes WASM. `Dataflow`
+ optional `Result.id == ID_NONE` keeps the stack machine first-class.
---
## 4. The naming contract
Every IR package exposes these names with these signatures — the checklist each
new IR is built against.
**Re-exported shared types (from `ir`):**
`Module Function Block Operation Operand Operand_Kind Result Type Type_Ref
Type_Kind Id Ref Ref_Space Symbol_Table Dataflow Error Error_Code Token
Token_Kind Print_Options DEFAULT_PRINT_OPTIONS`
**Per-IR concrete types (identical names):**
`Opcode` (u16, `INVALID = 0`) and `Relocation` / `Relocation_Type`.
**Operand constructors (shared):** `op_int op_float op_type op_ref op_value
op_block`, plus the IR's own dialect helpers where an opcode needs a structured
immediate (e.g. a WASM `op_memarg`).
**Operation builders & emitters** — by *shape*, mnemonic passed in (an IR has
hundreds of opcodes over a handful of shapes, so per-opcode typed builders are
optional, not the default): `op_none(opcode) op_unary(opcode, a)
op_binary(opcode, a, b) op_call(callee, args) op_branch(target) …` and `emit_*`.
**Entry points (identical signatures across IRs):**
```odin
encode(m: Module, code: []u8,
relocs: ^[dynamic]Relocation, errors: ^[dynamic]Error) -> (byte_count: u32, ok: bool)
decode(data: []u8, m: ^Module, errors: ^[dynamic]Error,
allocator := context.allocator) -> (byte_count: u32, ok: bool)
print/tprint/…(m: Module, options := ir.DEFAULT_PRINT_OPTIONS) -> (Print_Result | string)
```
Note the *deliberate* differences from the ISA verbs: they take a **`Module`**,
not `[]Instruction`, and they **drop `label_defs` / `resolve` / `base_address`**
— an IR has no PC-relative resolution pass, so those parameters would be dead.
This is the divergence made explicit rather than carried inert. (It is also why
WASM, currently shaped like an ISA package, will move to `ir/wasm`: its real
`encode`/`decode` already dropped those parameters.)
> Anything an IR genuinely lacks (WASM has no `VALUE` refs; an untyped IR no
> `TYPE` refs) is simply **absent**, not stubbed — same rule as the ISA layer.
---
## 5. Codecs — the one place the strategy, not just the data, differs
For an ISA, every codec is the same *kind* of thing (a bit/byte packer driven by
a static table). For IRs there are **two kinds**, and the API contract is the
verbs (§4), not the table — so a package picks its strategy underneath:
- **Table-driven (WASM, SPIR-V).** A static `OPCODE → [operand layout]` table,
literally the ISA `ENCODING_TABLE` pattern: hand-written single source of
truth, O(1) dispatch. WASM's existing `ENCODING_TABLE` and SPIR-V's grammar
JSON both fit this.
- **Bitstream (LLVM, AIR, DXIL).** A generic block/record/abbreviation engine;
operand layout is defined by abbreviation records encountered in the stream,
so there is no static opcode table. This is a real subsystem (shared by the
three LLVM-family members) that the LLVM IR reader sits on top of.
Both satisfy the same `encode`/`decode` signatures; callers never see which.
---
## 6. One-paragraph summary
Make `ir` own what is the same on every IR — and for IRs that is *more* than for
ISAs: not just errors/refs/printing but the whole typed `Module → Function →
Block → Operation` structure, because SSA and a type system regularize it. Keep
the leaf ISA-shaped (`Operation` = `Instruction` + an optional `Result`, opcode a
u16), keep the three verbs, and make exactly three concessions where an IR is not
an ISA: a structured module instead of a flat stream, a first-class type table,
and id-based entity references instead of PC-relative labels. Let `Dataflow`
host both the stack machine and SSA, and let the codec be pluggable so the LLVM
bitstream and the WASM/SPIR-V tables live under one contract. The result is a
sibling to the ISA API, not a generalization of it: each new IR gets the shared
structure and vocabulary for free and writes only its opcode set, its codec, and
its dialect.

89
core/rexcode/ir/doc.odin Normal file
View File

@@ -0,0 +1,89 @@
// rexcode · Brendan Punsky (dotbmp@github), original author
/*
# rexcode/ir the IR API layer
`core:rexcode/ir` is to the intermediate representations (WASM, SPIR-V, LLVM
bitcode, and the LLVM dialects AIR / DXIL) what `core:rexcode/isa` is to the
machine ISAs: the **shared core** every concrete IR package builds on. It holds
the parts that are the same for every IR, and defines the contract each IR
package follows. It implements **no specific IR** the concrete packages
(`core:rexcode/ir/wasm`, `.../spirv`, `.../llvm`, ) are added separately.
See `docs/ir_design.md` for the full design rationale and the ISAIR comparison.
## Why a sibling, not a generalization of `isa`
The ISA API works because every arch follows one *shape contract*
(`Mnemonic` / `Instruction` / `Operand` / `encode` / `decode` / `print`) while
the shared `isa` package carries only the universal bookkeeping. The IR API
keeps that spirit, with three honest concessions where IRs truly diverge:
1. **A structured module replaces the flat instruction stream.** The unit of
work is a `Module` (`Module []Function []Block []Operation`), not a
`[]Instruction`. So `ir` owns the *structural model* (module/function/block/
operation), where `isa` owns no `Instruction`.
2. **A first-class type system.** Operations and results reference a
module-level type table by `Type_Ref`. ISAs bake width into the mnemonic.
3. **Entity references replace PC-relative labels.** Operands reference SSA
values / blocks / functions / globals / types by `Id`, resolved
structurally there is no instruction-indexbyte-offset rewrite. (Object-
file *symbol* fixups still produce Relocations, defined per-IR.)
Everything else is deliberately ISA-shaped: the leaf `Operation` is
`isa.Instruction` + an optional typed `Result`, `opcode` is a u16 just like
`isa.Mnemonic`, `Operand` is one discriminated value, and the verbs are the same
three. `Dataflow` lets one model host both an implicit value stack (WASM) and
explicit SSA (SPIR-V/LLVM) without baking in either.
## What this package provides (shared)
* `status.odin` `Error` / `Error_Code` (shape-identical to `isa.Error`).
* `refs.odin` `Id` / `Ref` / `Ref_Space` / `Symbol_Table` (the label analog).
* `types.odin` `Type` / `Type_Ref` / `Type_Kind` (the type table).
* `module.odin` `Module` / `Function` / `Block` / `Operation` / `Operand` /
`Result` / `Dataflow` (the structural model).
* `print.odin` token kinds, print options, number-formatting helpers.
## What a concrete IR package provides (the contract)
Each `core:rexcode/ir/<name>` package supplies, mirroring an arch package:
* `Opcode` the IR's operation enum (`u16`, `INVALID = 0`), stored in
`Operation.opcode`. (Analogous to a `Mnemonic`.)
* A **codec** the wire format. Two strategies cover the field:
- *table-driven* (WASM byte/LEB, SPIR-V 32-bit words): a static
`OPCODE operand-layout` table, exactly like an ISA `ENCODING_TABLE`.
- *bitstream* (LLVM bitcode, and thus AIR / DXIL): a block/record/
abbreviation engine; the operand layout is data-defined, so there is no
static table. The codec is pluggable behind the verbs below.
* The three verbs, on a `Module` (vs the ISA verbs' `[]Instruction`):
encode :: proc(m: Module,
code: []u8,
relocs: ^[dynamic]Relocation,
errors: ^[dynamic]Error) -> (byte_count: u32, ok: bool)
decode :: proc(data: []u8,
m: ^Module,
errors: ^[dynamic]Error,
allocator := context.allocator) -> (byte_count: u32, ok: bool)
print :: proc(m: Module, options := ir.DEFAULT_PRINT_OPTIONS) -> ir.Print_Result
tprint :: proc(m: Module, options := ir.DEFAULT_PRINT_OPTIONS) -> string
(`encode`/`decode` deliberately *drop* the ISA verbs' `label_defs` /
`resolve` / `base_address` there is no PC-relative resolution pass and
take a `Module` rather than an instruction array. That is the whole point of
the divergence, made explicit rather than left inert.)
* `Relocation` / `Relocation_Type` per-IR (the linker fixups for `EXTERNAL`
references), exactly as each arch owns its `reloc.odin`.
* Type lowering how the IR's wire types map to/from `ir.Type`.
A *dialect* (AIR over LLVM, DXIL over LLVM) reuses its base IR's codec wholesale
and adds only the intrinsic/metadata conventions and any container wrapper.
*/
package rexcode_ir

154
core/rexcode/ir/module.odin Normal file
View File

@@ -0,0 +1,154 @@
// rexcode · Brendan Punsky (dotbmp@github), original author
package rexcode_ir
// =============================================================================
// STRUCTURAL MODEL (the core of the IR API)
// =============================================================================
//
// The central divergence from the ISA API. An ISA program is a flat
// `[]Instruction`; an IR program is a *typed, structured module* --
//
// Module → []Function → []Block → []Operation
//
// where an operation may define an SSA result that later operations reference.
//
// Design stance vs the ISA API:
//
// * The leaf is kept ISA-shaped on purpose. `Operation` is `isa.Instruction`
// plus an optional typed `Result`; `opcode` is the concrete IR's Opcode
// enum stored as a u16, exactly as `isa` stores `Mnemonic` as a u16. So the
// opcode-table dispatch, the encode/decode/print verbs, and the relocation
// model all carry over.
//
// * `Operand` is *shared* here, where `isa.Operand` is per-arch. Justified:
// ISA operands diverge wildly (ModRM/SIB vs shifted-register vs ...), but
// SSA collapses IR operands to "a literal, a reference to an entity, or a
// type", which is uniform enough to define once. Dialect-specific operand
// encodings (WASM memarg, SPIR-V enum masks) ride in `aux` + the IR's own
// opcode table -- they are an encoding detail, not a new operand shape.
//
// * Both dataflow styles are first-class. `Dataflow` is a per-IR trait, NOT a
// baked-in assumption: a stack IR (WASM) leaves `Result.id == ID_NONE` and
// references nothing through VALUE; an SSA IR (SPIR-V/LLVM) names results
// and threads them as REF operands. The model excludes neither.
//
// What is deliberately NOT here: the wire codec (`encode`/`decode`) and the
// printer. Those are per-IR -- just as `isa` defines no `encode`, each concrete
// IR provides its own, against the contract in `doc.odin`. This package is the
// shared *vocabulary*, not an implementation.
// Per-IR dataflow discipline. WASM = STACK; SPIR-V / LLVM / AIR / DXIL = SSA.
Dataflow :: enum u8 { STACK, SSA }
// -----------------------------------------------------------------------------
// Operand (generalizes isa.Operand)
// -----------------------------------------------------------------------------
Operand_Kind :: enum u8 {
NONE,
LIT_INT, // integer literal (value in `imm`)
LIT_FLOAT, // float literal (IEEE bits in `imm`, width in `aux`)
REF, // reference to an entity: `imm` is the Id, `space` the Ref_Space
TYPE, // a Type_Ref (in the low 32 bits of `imm`)
ATTRIBUTE, // a dialect enum / decoration / mask (value in `imm`, tag in `aux`)
}
// 16 bytes. The payload is one i64 (covers an Id, a Type_Ref, an int/float-bits
// literal, or an attribute value); `space`/`aux` discriminate the entity space
// or dialect tag. Large/aggregate constants are *entities* (a CONSTANT ref), not
// inline operands -- the SSA way -- so no inline byte blob is needed here.
Operand :: struct #packed {
imm: i64,
kind: Operand_Kind,
space: Ref_Space, // REF: which id space
aux: u16, // LIT_FLOAT width / ATTRIBUTE tag / dialect bits
flags: u32,
}
#assert(size_of(Operand) == 16)
@(require_results) op_int :: #force_inline proc "contextless" (v: i64) -> Operand { return Operand{kind = .LIT_INT, imm = v} }
@(require_results) op_float :: #force_inline proc "contextless" (bits: u64, w: u16) -> Operand { return Operand{kind = .LIT_FLOAT, imm = i64(bits), aux = w} }
@(require_results) op_type :: #force_inline proc "contextless" (t: Type_Ref) -> Operand { return Operand{kind = .TYPE, imm = i64(u32(t))} }
@(require_results)
op_ref :: #force_inline proc "contextless" (space: Ref_Space, id: Id) -> Operand {
return Operand{kind = .REF, space = space, imm = i64(u32(id))}
}
@(require_results) op_value :: #force_inline proc "contextless" (id: Id) -> Operand { return op_ref(.VALUE, id) }
@(require_results) op_block :: #force_inline proc "contextless" (id: Id) -> Operand { return op_ref(.BLOCK, id) }
// Reconstruct the Id / Type_Ref carried by an operand.
@(require_results) operand_id :: #force_inline proc "contextless" (o: Operand) -> Id { return Id(u32(o.imm)) }
@(require_results) operand_type :: #force_inline proc "contextless" (o: Operand) -> Type_Ref { return Type_Ref(u32(o.imm)) }
// -----------------------------------------------------------------------------
// Operation (the leaf -- parallels isa.Instruction)
// -----------------------------------------------------------------------------
Operation_Flags :: bit_field u8 {
terminator: bool | 1, // ends a block (br / ret / switch / unreachable)
control: bool | 1, // structured-control op (block/loop/if/... for stack IRs)
memory: bool | 1, // touches linear memory / pointers
_: u8 | 5,
}
// `opcode` is the concrete IR's Opcode enum, stored as u16 (like isa.Mnemonic).
// `operands` is variable-arity (calls, switch, phi) and caller-owned, like the
// rest of the decoded module -- the fixed `[4]Operand` of the ISA Instruction is
// the one shape that does not survive into IRs.
Operation :: struct {
operands: []Operand,
result: Result, // SSA def; `.id == ID_NONE` for stack/void ops
opcode: u16,
flags: Operation_Flags,
_: u8,
}
// What an operation produces.
Result :: struct #packed {
id: Id, // ID_NONE if the op defines no value
type: Type_Ref,
}
#assert(size_of(Result) == 8)
// -----------------------------------------------------------------------------
// Containers (no ISA parallel -- the structured-module concession)
// -----------------------------------------------------------------------------
// A basic block (SSA) or a structured region (stack IRs). `params` are block
// arguments (SSA, phi-free form); empty for stack IRs. The terminator is the
// final operation (Operation_Flags.terminator).
Block :: struct {
ops: []Operation,
params: []Result,
id: Id,
}
Function :: struct {
blocks: []Block,
name: string,
signature: Type_Ref, // a FUNCTION type in Module.types
}
// A module-level mutable/immutable value.
Global :: struct {
name: string,
init: Id, // a CONSTANT ref, or ID_NONE
type: Type_Ref,
mutable: bool,
}
// The module -- the unit the IR verbs operate on (where the ISA verbs take a
// flat `[]Instruction`). Metadata, decorations, and dialect custom sections are
// carried by the concrete IR alongside this core, the way each arch carries its
// own reloc.odin.
Module :: struct {
target: string, // triple / capability profile / version tag
types: []Type, // the type table; Type_Ref indexes here
globals: []Global,
functions: []Function,
symbols: Symbol_Table, // externally-visible names
dataflow: Dataflow,
}

130
core/rexcode/ir/print.odin Normal file
View File

@@ -0,0 +1,130 @@
// rexcode · Brendan Punsky (dotbmp@github), original author
package rexcode_ir
// =============================================================================
// PRINTER FRAMEWORK (shared scaffolding -- parallels isa.print)
// =============================================================================
//
// Same role as isa.print: the universal pieces of textual output (token kinds
// for highlighting, print options, the result type, number-formatting helpers).
// A concrete IR's printer (WAT / SPIR-V disasm / LLVM `.ll`) owns the syntax of
// types, value names, blocks, and the output-sink procedures, and calls these
// helpers for hex/decimal. Kept independent of isa.print so the two siblings do
// not couple; the `Token_Kind` set adds the IR-only categories.
import "core:strings"
import "core:reflect"
Token_Kind :: enum u8 {
WHITESPACE,
NEWLINE,
OFFSET, // byte/word offset prefix
KEYWORD, // `func` / `block` / `define` / `OpLabel` style keywords
OPCODE, // the operation mnemonic
TYPE, // a type reference / spelling (IR-only)
VALUE_REF, // a use of an SSA value / local (IR-only)
RESULT, // a value definition (`%3 =`) (IR-only)
BLOCK_LABEL, // a basic-block / branch-target label (IR-only)
GLOBAL_REF, // function / global / symbol reference (IR-only)
IMMEDIATE, // literal constant
ATTRIBUTE, // dialect attribute / decoration / flag (IR-only)
PUNCTUATION, // `(`, `)`, `,`, `=`, `:`
COMMENT,
}
Token :: struct {
offset: u32, // byte offset in the output string
length: u16,
kind: Token_Kind,
operation_index: u16, // which operation (0xFFFF for module-level / whitespace)
}
@(require_results)
token_kind_to_string :: proc(k: Token_Kind) -> string {
if name, ok := reflect.enum_name_from_value(k); ok {
return name
}
return "???"
}
// -----------------------------------------------------------------------------
// Print options & result (same shape as isa, IR-flavoured defaults)
// -----------------------------------------------------------------------------
Print_Options :: struct {
uppercase: bool,
hex_prefix: string, // default "0x"
hex_lowercase: bool,
value_prefix: string, // SSA value sigil, default "%"
block_prefix: string, // block-label sigil, default "^"
show_offsets: bool,
indent: string, // default " "
separator: string, // default "\n"
}
DEFAULT_PRINT_OPTIONS :: Print_Options{
uppercase = false,
hex_prefix = "0x",
hex_lowercase = true,
value_prefix = "%",
block_prefix = "^",
show_offsets = false,
indent = " ",
separator = "\n",
}
Print_Result :: struct {
text: string,
tokens: []Token, // nil unless requested
}
// -----------------------------------------------------------------------------
// Number formatting helpers (used by every IR printer -- arch/IR-agnostic)
// -----------------------------------------------------------------------------
print_hex :: proc(sb: ^strings.Builder, value: u64, options: ^Print_Options) {
strings.write_string(sb, options.hex_prefix)
print_hex_digits(sb, value, options)
}
print_hex_digits :: proc(sb: ^strings.Builder, value: u64, options: ^Print_Options) {
if value == 0 {
strings.write_byte(sb, '0')
return
}
buf: [16]u8
i := 0
v := value
for v > 0 {
digit := u8(v & 0xF)
buf[i] = digit < 10 ? '0' + digit : 'a' + digit - 10
v >>= 4
i += 1
}
for j := i - 1; j >= 0; j -= 1 {
c := buf[j]
if options.uppercase && c >= 'a' && c <= 'f' {
c -= 32
}
strings.write_byte(sb, c)
}
}
print_decimal :: proc(sb: ^strings.Builder, value: u32) {
if value == 0 {
strings.write_byte(sb, '0')
return
}
buf: [10]u8
i := 0
v := value
for v > 0 {
buf[i] = '0' + u8(v % 10)
v /= 10
i += 1
}
for j := i - 1; j >= 0; j -= 1 {
strings.write_byte(sb, buf[j])
}
}

99
core/rexcode/ir/refs.odin Normal file
View File

@@ -0,0 +1,99 @@
// rexcode · Brendan Punsky (dotbmp@github), original author
package rexcode_ir
// =============================================================================
// REFERENCES (the IR analog of isa.labels)
// =============================================================================
//
// This is the first place the IR API genuinely diverges from the ISA API.
//
// An ISA resolves control flow as *PC-relative labels*: `Label_Definition`
// maps a label id to an instruction index and `encode()` rewrites it to a byte
// offset (isa.labels.rewrite_label_defs_to_offsets). That model is wrong for an
// IR: IR operands reference *entities by id* -- SSA results, blocks, functions,
// globals, types -- which are stable indices into the module's entity tables,
// not byte offsets, and resolve *structurally* (no PC-relative pass).
//
// So the label machinery is replaced, not re-exported. What survives in spirit:
// * a small distinct-u32 id type with an "undefined" sentinel (forward refs),
// * a name<->id table for the externally-visible symbols (the Label_Map analog).
//
// Object-file *symbol* fixups (a linker patching a function/global index) are
// still real and still produce Relocations -- but that is a codec concern,
// defined per-IR (parallel to each arch's reloc.odin), not here.
// A stable id into one of the module's entity spaces (see Ref_Space).
Id :: distinct u32
ID_NONE :: Id(0xFFFFFFFF)
// Which id space a reference addresses. Drives validation, printer annotation,
// and (for EXTERNAL) relocation-type selection. This is the union of the spaces
// the modelled IRs use; a concrete IR uses only the subset it needs -- a stack
// IR (WASM) never produces a VALUE ref, an untyped IR never produces a TYPE ref.
Ref_Space :: enum u8 {
NONE,
VALUE, // an SSA result (or a local/stack slot)
BLOCK, // a basic block / structured-control label (branch target)
FUNCTION,
GLOBAL,
TYPE,
CONSTANT, // a constant-pool entry
MEMORY, // a linear memory / address space
METADATA, // a metadata/debug node
EXTERNAL, // an imported/exported symbol -- relocatable across object files
}
// A typed reference: which space, plus the id within it. Carried by REF operands
// and by branch targets. 8 bytes, like isa.Label_Definition is u32-cheap.
Ref :: struct #packed {
id: Id,
space: Ref_Space,
_: [3]u8,
}
#assert(size_of(Ref) == 8)
@(require_results)
ref :: #force_inline proc "contextless" (space: Ref_Space, id: Id) -> Ref {
return Ref{id = id, space = space}
}
// -----------------------------------------------------------------------------
// Symbol table (the IR analog of isa.Label_Map: name <-> id for visible names)
// -----------------------------------------------------------------------------
Symbol_Table :: struct {
names: map[string]Id,
space: Ref_Space, // what these names address (usually FUNCTION/GLOBAL)
}
symbol_table_init :: #force_inline proc(st: ^Symbol_Table, space := Ref_Space.EXTERNAL, allocator := context.allocator) {
st.names = make(map[string]Id, allocator = allocator)
st.space = space
}
symbol_table_destroy :: #force_inline proc(st: ^Symbol_Table) {
delete(st.names)
}
// Bind a name to an id (e.g. when a definition is emitted).
symbol_define :: #force_inline proc(st: ^Symbol_Table, name: string, id: Id) {
st.names[name] = id
}
// Reserve a name for a forward reference; resolve later with symbol_define.
@(require_results)
symbol_reserve :: #force_inline proc(st: ^Symbol_Table, name: string) -> Id {
if existing, ok := st.names[name]; ok {
return existing
}
st.names[name] = ID_NONE
return ID_NONE
}
@(require_results)
symbol_lookup :: #force_inline proc(st: ^Symbol_Table, name: string) -> (id: Id, ok: bool) {
id, ok = st.names[name]
return
}

View File

@@ -0,0 +1,42 @@
// rexcode · Brendan Punsky (dotbmp@github), original author
package rexcode_ir
// =============================================================================
// ERROR / RESULT TYPES (shared by every IR codec)
// =============================================================================
//
// Parallels isa.status. The `Error` struct shape is intentionally identical to
// `isa.Error` (8 bytes: a u32 location + a 1-byte code) so a tool can surface
// ISA and IR diagnostics through one path. `Error_Code` keeps the encode/decode
// codes shared with the ISA side, then adds the codes only a *typed, structured*
// IR can produce. Per-IR codecs emit the subset that applies to them.
Error_Code :: enum u8 {
NONE = 0,
// Shared with the ISA side (encode/decode of the byte/word stream).
INVALID_OPCODE,
NO_MATCHING_ENCODING,
OPERAND_MISMATCH,
IMMEDIATE_OUT_OF_RANGE,
BUFFER_OVERFLOW,
BUFFER_TOO_SHORT,
// IR-specific (no ISA analog -- these need a type system / SSA / a module).
INVALID_TYPE, // malformed or out-of-range Type_Ref
TYPE_MISMATCH, // an operand/result type disagrees with the op signature
UNDEFINED_REF, // a Ref to an id/symbol that is never defined
DUPLICATE_DEFINITION, // an id/symbol defined twice
MALFORMED_MODULE, // structural violation (block without terminator, ...)
UNSUPPORTED_FEATURE, // a capability/extension the codec does not implement
}
// `location` is the operation index on encode, or the byte offset on decode --
// mirroring isa.Error.inst_idx.
Error :: struct #packed {
location: u32,
code: Error_Code,
_: [3]u8,
}
#assert(size_of(Error) == 8)

View File

@@ -0,0 +1,66 @@
// rexcode · Brendan Punsky (dotbmp@github), original author
package rexcode_ir
// =============================================================================
// TYPE MODEL
// =============================================================================
//
// The second genuine divergence from the ISA API: IRs have a *first-class type
// system*. An ISA bakes width into the mnemonic (`ADD` vs `ADDB`); operands are
// just bit patterns. An IR carries an explicit type table and operations /
// results reference types by `Type_Ref` (an index into `Module.types`).
//
// `Type_Kind` is the common denominator across the modelled IRs:
// * WASM: i32/i64/f32/f64/v128 + funcref/externref (a *degenerate* table --
// a handful of primitives, no user structs).
// * SPIR-V: OpTypeInt / Float / Vector / Pointer / Struct / Function / ...
// * LLVM: iN / float / pointer / vector / array / struct / function / opaque.
//
// A concrete IR lowers its wire types onto this set on decode and back on
// encode. Anything a dialect needs beyond the common shape rides in `aux` (e.g.
// pointer address space) or in the concrete IR's own side tables.
Type_Ref :: distinct u32
TYPE_NONE :: Type_Ref(0xFFFFFFFF)
Type_Kind :: enum u8 {
VOID,
INT, // `bits` = width (1/8/16/32/64/...); signedness is op-level in most IRs
FLOAT, // `bits` = width (16/32/64/128)
VECTOR, // `elem` x `count` (fixed-width SIMD)
ARRAY, // `elem` x `count`
POINTER, // `elem`, address space in `aux`
STRUCT, // members in `fields`
FUNCTION, // `fields` = params ++ [result]; `count` = param count
OPAQUE, // named / forward-declared / abstract handle (images, tokens, ...)
REF, // funcref / externref / typed GC reference (`elem` for typed refs)
}
// One node in a module's type table. `fields` (struct members / function
// signature) is caller-owned, like the rest of the decoded module.
Type :: struct {
fields: []Type_Ref, // STRUCT members, or FUNCTION params ++ result
name: string, // OPAQUE / named struct
elem: Type_Ref, // VECTOR / ARRAY / POINTER / typed REF element
count: u32, // VECTOR / ARRAY length, or FUNCTION param count
bits: u16, // INT / FLOAT width
aux: u16, // POINTER address space, packed kind flags, ...
kind: Type_Kind,
_: [3]u8,
}
@(require_results) type_void :: #force_inline proc "contextless" () -> Type { return Type{kind = .VOID} }
@(require_results) type_int :: #force_inline proc "contextless" (bits: u16) -> Type { return Type{kind = .INT, bits = bits} }
@(require_results) type_float :: #force_inline proc "contextless" (bits: u16) -> Type { return Type{kind = .FLOAT, bits = bits} }
@(require_results)
type_vector :: #force_inline proc "contextless" (elem: Type_Ref, count: u32) -> Type {
return Type{kind = .VECTOR, elem = elem, count = count}
}
@(require_results)
type_pointer :: #force_inline proc "contextless" (elem: Type_Ref, address_space: u16 = 0) -> Type {
return Type{kind = .POINTER, elem = elem, aux = address_space}
}