Files
Nim/doc/ic.md
Araq d269ccff26 IC: correct instance-name format in design doc
The content key is hashed into <disamb> (setInstanceDisamb), not a separate
.key. token; document that the cross-TU merge and DCE already key on the
module-suffix-stripped name.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 20:28:43 +02:00

322 lines
18 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

======================================
Incremental Compilation (IC)
======================================
The ``nim ic`` command provides incremental compilation for Nim projects. It
decomposes compilation into per-module steps whose results are cached as NIF
files, and uses the external ``nifmake`` build tool to re-run only the steps
whose inputs changed.
This document describes **how `nim ic` works today**, including the edge cases
that shaped the current design, and ends with a **Plan** for the next backend
rewrite.
Overview
========
The pipeline has two halves driven by one process (`nim ic`, `commandIc` in
``compiler/deps.nim``) that constructs a dependency graph, writes a build file,
and hands it to ``nifmake``:
1. **Frontend** — per module:
- ``nifler parse --deps`` turns ``.nim`` source into a parsed NIF
(``.p.nif``) plus a static dependency list (``.deps.nif``).
- ``nim m`` (the *semantic* step, `cmdM`) reads the parsed NIF + the
precompiled NIFs of the module's imports, type-checks, and writes the
**semmed NIF** (``.nif``) plus invalidation sidecars (see *Cookies*).
2. **Backend** — ``nim nifc`` (`cmdNifC`, ``compiler/nifbackend.nim``) reads the
semmed NIFs, generates C, compiles and links.
``nifmake`` orders the steps by their input/output files: every `nim m` runs
before the `nim nifc` step that consumes its NIF, and a step re-fires only when
one of its inputs is newer than its outputs. The driver invokes ``nifmake run
--parallel`` by default, so independent steps at the same DAG depth fan out
across cores; pass ``-d:icNoParallel`` to serialize (readable child output when
debugging a build).
Artifacts (the NIF zoo)
=======================
Per module ``<suffix>`` (a content hash of the path; see *NIF symbols* below),
under the nimcache directory:
| File | Producer | Purpose |
| ---- | -------- | ------- |
| ``<s>.p.nif`` | nifler | parsed AST (syntactic) |
| ``<s>.deps.nif`` | nifler | **static** import list (syntactic `import`s) |
| ``<s>.s.deps.nif`` | `nim m` | **real** post-sem imports (incl. macro-generated); see *Discovery* |
| ``<s>.nif`` | `nim m` | semmed module (symbols resolved, typed) |
| ``<s>.iface.nif`` | `nim m` | **iface cookie**: hash of the importer-visible surface |
| ``<s>.impl.nif`` | `nim m` | **impl cookie**: hash of the entire content (bodies included) |
| ``<s>.edges.nif`` | `nim m` | **NeedsImpl edges**: modules whose bodies this sem consumed |
| ``<s>.c.nif`` | `nim nifc` | the C text as a NIF, with def/ref markers for DCE & dedup |
| ``ic_config.cfg.nif`` | driver | precompiled config replayed by every child (`icconfig.nim`) |
| ``ic.version`` | driver | format stamp; a mismatch wipes the cache (`icFormatVersion`) |
NIF symbols and ownership
=========================
(See ``../nifspec/doc/nif-spec.md``.) A global symbol is
``<ident>.<disamb>.<moduleSuffix>``. For a **generic instantiation** the
`<disamb>` is not a counter but a *content hash* — `setInstanceDisamb`
(``modulegraphs.nim``) MD5s the generic's identity plus the `typeKey` of every
concrete type argument, masks it to 30 bits and tags it with `InstanceDisambBit`.
So the only part of the name that varies between two modules making the **same**
instantiation (`seq[Foo]`) is the `<moduleSuffix>`. Two consequences drive the
backend:
- **Instance names are content-addressed**: the same instantiation produced in
different modules yields the *same* `<ident>.<disamb>`, so a deterministic dedup
is possible by the *module-suffix-stripped* name. The cross-TU merge
(`ccgtypes.sharedInstanceCName`, keyed via `graph.icSharedSigs`) and the DCE
analysis (`dce.computeLiveSymbols`, its `uniq` set) both already key on this
stripped form.
- **The suffix names a mint-site owner.** Today the `<moduleSuffix>` is the
module *that minted the instance* (the instantiation site), so the same
instance has a different full name in each module that makes it; *which* TU
emits the single definition is then decided at codegen by the
reuse/redirect machinery — the root of the *single-writer* hazard below.
The driver: graph construction (`commandIc`)
============================================
1. Stamp/wipe the cache by ``icFormatVersion``.
2. Seed the graph with the root module and **`system.nim`**. `system`'s entire
import closure is folded into one node (one `nim m` invocation) — see
*single-writer* below.
3. ``traverseDeps`` runs ``nifler`` per module and reads ``.deps.nif`` to add
import edges.
4. **SCC grouping**: strongly-connected import cycles are collapsed (Tarjan).
A singleton compiles as ``nim m <mod>``; a cycle compiles as one
``nim m <rep> --icGroup:<member>…`` that builds every member *from source* in
one process (resolving the recursion in memory) and writes each member's NIF.
Only edges *leaving* the component become build-graph inputs.
5. **Discovery fixpoint**: write the build file, run ``nifmake``; if it fails,
re-derive the graph from every module's ``.s.deps.nif`` (adding nodes/edges
for imports the static scanner missed), and retry. See *Discovery*.
6. The backend step (`nim nifc`) depends on every module's semmed NIF, so
``nifmake`` runs it last.
Invalidation: the cookie system
================================
A dependent must re-sem only when a dependency's relevant surface changed. Two
hashes per module (``ast2nif.nim``):
- **iface cookie** (``.iface.nif``): hashes only the *importer-visible* surface —
exported declarations' **signatures** (for *all* routine kinds: plain procs,
templates, macros, generics, `inline` procs alike), full content for
consts/types, plus import/export/replay/hook records. Routine **bodies are
excluded.** It also chains in the iface cookies of its own dependencies, so a
surface change anywhere in the import closure propagates. A `nim m` rule for a
module depends on its dependencies' iface cookies, so a body-only edit moves no
iface cookie and stops the re-sem cascade.
- **impl cookie** (``.impl.nif``): hashes the *entire* serialized content (private
defs and bodies included), with the module's own iface mixed in.
**NeedsImpl edges** (``.edges.nif``): if a module *consumed another module's body*
during sem — a macro expansion, a generic instantiation, a `getImpl`, or a
compile-time call run in the VM — it records a strong edge. The dependent is then
gated on that dependency's **impl** cookie instead of its iface cookie, so e.g.
`const x = dep.foo()` re-sems when `foo`'s body changes. Recording sites:
`semExprs.semTemplateExpr` (templates), `seminst.generateInstance` (generics),
`vmgen.genProc` (VM/macros/CT procs), `vm.opcGetImpl` (`getImpl`). Inline
iterators and `inline` procs are *not* tracked — they are inlined at codegen,
where the backend's NIF-mtime invalidation re-codegens their users.
Discovery of macro-generated imports
====================================
The static scanner only sees syntactic `import`s. A macro can synthesize one
(chronicles does `parseStmt("import chronicles/textlines")` driven by the
`chronicles_sinks` define). Such an import is invisible until sem runs the macro.
Each `nim m` records the imports it *actually* resolved (via the
``semdata.addImportFileDep`` hook → ``graph.importDeps`` → ``ast2nif.writeSemDeps``)
into ``<s>.s.deps.nif``; a child that fails on a not-yet-built import flushes it
before erroring. The driver re-derives the graph from those sidecars — adding the
missing node + the importer→import edge — and reruns to a fixpoint. (This replaced
an earlier `icmissing.txt` side channel.)
The backend today: `commandNifC`
================================
The current backend is **whole-program and demand-driven**, run as one process
(``compiler/nifbackend.nim``):
1. Load `system` then **all** modules' semmed NIFs in dependency order
(`loadModuleDependencies`), so all hooks/types are in memory.
2. **DCE**: `computeLiveSymbols` over all NIFs computes a global live set used to
filter the top-level routine listing.
3. **`computeModuleReuse`**: decide which modules' cached translation units
(``.c.nif``) can be reused — skip codegen for them and use their ``.c``/``.o``
as is. The gate mirrors the m-step's cookie gating; a coarse fallback uses the
transitive NIF-mtime closure (`-d:icCoarseReuse`).
4. **Codegen**: for each non-reused module, `generateCodeForModule` runs `cgen`.
Codegen is **demand-driven**: emitting one module can *demand* entities
(generic instances, type-bound hooks, RTTI) that belong to other modules; a
demand whose home TU is reused is **redirected** into the demanding TU
(`redirectToLiveModule`), and reused TUs' definitions become prototypes.
5. **`enforceDefRetention`**: an un-reuse cascade. If a regenerated module would
stop emitting a definition that a still-reused TU references (the demand chain
that placed it no longer arises), the referencing TUs lose their reuse and
regenerate so the symbol does not vanish under them.
6. `emitMethodDispatchers`, then `finishModule` for every module (main module
**last**, so init-proc registration is complete before `genMainProc`).
7. Emit ``.c``, then `extccomp.callCCompiler` + link.
Edge cases (and why the machinery exists)
=========================================
- **Single-writer.** Instance type-ids are minted in process-local order, so if
two `nim m` processes both write a module's NIF (e.g. a stdlib module pulled
into `system`'s from-source closure *and* given its own rule), the second
overwrites with different ids and every module checked against the first carries
dangling refs ("symbol has no offset"). Fixed by folding `system`'s closure into
one SCC and by **forwarding the project's defines** to every child so their
`when` bodies (hence import sets and NIF contents) match the scanner's.
- **`when … else: import`.** nifler emits `else`-branch imports unguarded, so a
dead `else: import` would be scheduled. The compiler's own sources were rewritten
to explicit negated `when`s; the vendored nifler later learned to negate prior
conditions for the `else`.
- **`nil` sons of loaded ASTs.** NIF dot-tokens load as `nil` where from-source
ASTs have `nkEmpty`; several passes gained `nil` guards.
- **Sealed loaded types.** Loaded types are `Sealed`; sem/transform mutate via
`unsealForTransform`/`exactReplica(idgen)` (the latter mints a fresh `uniqueId`
so serialized replicas don't collapse).
- **Methods/RTTI ownership.** `genTypeInfoV1` already routes a type's RTTI to
`t…itemId.module` when that module is open for codegen — an existing ownership
notion the rewrite can generalize.
- **Config cost.** Each child re-parsing `nim.cfg` + re-running `config.nims` in
the VM was ~80 ms; replaced by a precompiled `ic_config.cfg.nif` replayed in
`loadConfigs` (`compiler/icconfig.nim`).
- **`koch bootic`** bootstraps the compiler through `nim ic` (a 3-iteration
fixed-point check). It writes its binary to ``bin/nim_ic`` and never clobbers
``bin/nim``.
Known residual hacks (targets for the rewrite)
----------------------------------------------
- `deps.runNifler` uses `setLastModificationTime` to mark its scan up-to-date and
deletes a stale parsed file to coordinate with the nifmake nifler rule — the
driver duplicating nifmake's freshness logic.
- `computeModuleReuse` + `enforceDefRetention` + the redirect/cached-defs
machinery is a hand-rolled mini-`nifmake` *inside* the backend process, needed
only because one process reuses some TUs while regenerating others.
These are legacy artifacts of a code generator that predates IC, not intrinsic
requirements.
- **Cross-mm reuse / `--force` (`var not init i_<hash>`).** Reusing TUs built
under one memory-management mode (e.g. an `orc` cache) while rebuilding under
another (`refc`), or forcing a full backend re-run, can abort with
``var not init``. The name is a backend-minted closure-capture local whose
`loc` is unset; the root cause is that `nifmake` is *mtime-only* and
`computeModuleReuse` is not config-aware, so a reuse decision can mix TUs
generated under different config signatures. It is **crash-safe** (it aborts,
never emits a wrong-mm binary) and does **not** affect normal same-mode warm
rebuilds. Deferred to the rewrite, which makes reuse config-aware by
construction (each codegen rule's config is a declared input).
Status and performance
======================
`nim ic` self-builds the compiler (`koch bootic`'s fixed-point check) under both
`orc` and `--mm:refc`, and passes the external-package CI set. The
macro sweep is 93/95 (two known-baseline failures: `tmacro7`, `tmacrogetimpl`).
Rough numbers on a 32-core box (`-d:release`, building the compiler):
| | wall | notes |
| - | ---- | ----- |
| classic `nim c` full | ~27.5s | ~7 cores busy |
| `nim ic` cold, serial | ~81s | one `nim m` at a time |
| `nim ic` cold, `--parallel` | ~53s | now the default |
| `nim ic` warm no-op | ~0.14s | ~200× vs classic |
The cold gap is a *parallelism* deficiency, not extra fundamental work
(aggregate work is ≈1.33× classic). The backend's own phases are cheap
(load 0.3s, DCE 1.0s, cgen 5.7s, write 0.8s); the C compile+link floor (~21s) is
already parallel and shared with the classic backend. The remaining cold cost is
the `nim m` sem phase, throttled by `nifmake`'s per-DAG-depth barrier across a
narrow, deep tail of heavy modules. A ready-queue scheduler (dependency-ready
dispatch instead of a depth barrier) would recover several seconds but cannot
beat the import-chain critical path; it folds into the rewrite. **Warm rebuilds
— the actual point of IC — already dominate the classic backend by ~200×.**
The strategic direction (decided 2026-06-13) is to make this NIF backend
(`cmdNifC`) the **default** code generator. The clean translation-unit model it
needs — global DCE, content-keyed instance merging, deterministic RTTI — already
exists and is always-on inside `cmdNifC`; the remaining work is the per-module
backend rewrite below (which also dissolves the cross-mm reuse hazard), then
*promotion + deletion*, not new machinery.
Plan: a nifmake-driven, per-module backend
==========================================
Goal: the backend stops re-implementing `nifmake`. Each module's codegen becomes
its own build rule, so "which TUs rebuild" is just "which rules `nifmake`
re-fires from input mtimes" — exactly as the frontend already works. The reuse /
def-retention / redirect machinery then dissolves.
Target build graph (mirrors Nimony's ``src/nimony/deps.nim``):
1. **Frontend split.** Generate a *frontend* build file (nifler + `nim m` rules),
run `nifmake`, run the `.s.deps` discovery fixpoint. Then re-derive the graph
from `.s.deps` (now complete; dead `when` imports can also be **pruned** here).
2. **Per-module codegen rule.** One ``nifc <mod>`` rule per module: inputs are the
module's own ``.nif`` plus the ``.iface``/``.impl`` cookies of its
dependencies; output is its ``<s>.c.nif``. The process loads `<mod>` + its
import closure's NIFs (like `nim m`) and emits **only the entities it owns**,
referencing everything else `extern`.
3. **Static ownership** replaces runtime redirect. Every emittable entity —
generic instances, type-bound hooks, RTTI, lifted procs — gets a deterministic
owner module *by symbol suffix*. Because instance names are content-addressed
(``ident.disamb.key.owner``), the same instance demanded by several modules has
one name and one owner, so there is exactly one writer and no link-time
duplicate. (The precise owner rule — minting module vs. root-type's module — is
the open design decision; start from `itemId.module` and adjust where it forces
a downstream package to own stdlib code.)
4. **DCE as a rule.** A single rule reads all ``.c.nif``, computes the global live
set, and (Nimony: ``.live.nif`` / per-module ``.dce.nif``) drives per-module
``.c`` emission filtered to live entities.
5. **Link rule.** Depends on every ``.o`` (each compiled by its own rule) and the
DCE output; produces the executable.
6. **Deletions.** `computeModuleReuse`, `enforceDefRetention`,
`redirectToLiveModule` and the cached-defs/claim bookkeeping go away. The
``setLastModificationTime`` coordination in `runNifler` goes away with the
frontend split (the nifler rule owns parsed+deps; the driver's pre-scan only
reads `.deps` to build the graph).
Validation bar: `koch bootic` must still reach its byte-identical fixed point, and
binary size must not regress (DCE parity), across the external-package CI set.
Code, logic & debugging
========================
Core modules:
- **`compiler/deps.nim`** — graph construction, SCC grouping, discovery fixpoint,
build-file generation; `commandIc`.
- **`compiler/ast2nif.nim`** — AST↔NIF, the cookie hashes (`cookieSd`,
`writeIfaceCookie`, `writeImplCookie`, `writeEdgesFile`, `writeSemDeps`).
- **`compiler/nifbackend.nim`** — the backend (`commandNifC`) and its reuse
machinery.
- **`compiler/icconfig.nim`** — precompiled config.
- **`compiler/pipelines.nim`** / **`modulegraphs.nim`** — pipeline integration and
the graph state (`importDeps`, `icImplDeps`, `icReusedModules`, …).
Manual workflow:
- Frontend a module: ``nim m --nimcache:nifcache path/to/mod.nim`` (writes
``.nif`` + cookies + ``.s.deps``).
- Backend: ``nim nifc --nimcache:nifcache main.nim``.
- NIF files are text — open/grep them directly; ``diff`` two successive ``.nif``
to see why a module rebuilt.
- Force a re-sem: delete the module's ``.nif`` and rerun `nim m`.
- A stale-cache crash after editing the serialization layout means bumping
``icFormatVersion`` (`compiler/options.nim`).
See also
========
- NIF format spec: [nifspec/doc/nif-spec.md](../nifspec/doc/nif-spec.md)
- NIFC (C-like target) spec: dist/nimony/doc/nifc-spec.md