14 KiB
====================================== Incremental Compilation (IC)
The nim ic command provides incremental compilation for Nim projects. It
decomposes compilation into per-module steps whose results are cached as NIF
files, and uses the external nifmake build tool to re-run only the steps
whose inputs changed.
This document describes how nim ic works today, including the edge cases
that shaped the current design, and ends with a Plan for the next backend
rewrite.
Overview
The pipeline has two halves driven by one process (nim ic, commandIc in
compiler/deps.nim) that constructs a dependency graph, writes a build file,
and hands it to nifmake:
- Frontend — per module:
nifler parse --depsturns.nimsource into a parsed NIF (.p.nif) plus a static dependency list (.deps.nif).nim m(the semantic step,cmdM) reads the parsed NIF + the precompiled NIFs of the module's imports, type-checks, and writes the semmed NIF (.nif) plus invalidation sidecars (see Cookies).
- Backend —
nim nifc(cmdNifC,compiler/nifbackend.nim) reads the semmed NIFs, generates C, compiles and links.
nifmake orders the steps by their input/output files: every nim m runs
before the nim nifc step that consumes its NIF, and a step re-fires only when
one of its inputs is newer than its outputs.
Artifacts (the NIF zoo)
Per module <suffix> (a content hash of the path; see NIF symbols below),
under the nimcache directory:
| File | Producer | Purpose |
|---|---|---|
<s>.p.nif |
nifler | parsed AST (syntactic) |
<s>.deps.nif |
nifler | static import list (syntactic imports) |
<s>.s.deps.nif |
nim m |
real post-sem imports (incl. macro-generated); see Discovery |
<s>.nif |
nim m |
semmed module (symbols resolved, typed) |
<s>.iface.nif |
nim m |
iface cookie: hash of the importer-visible surface |
<s>.impl.nif |
nim m |
impl cookie: hash of the entire content (bodies included) |
<s>.edges.nif |
nim m |
NeedsImpl edges: modules whose bodies this sem consumed |
<s>.c.nif |
nim nifc |
the C text as a NIF, with def/ref markers for DCE & dedup |
ic_config.cfg.nif |
driver | precompiled config replayed by every child (icconfig.nim) |
ic.version |
driver | format stamp; a mismatch wipes the cache (icFormatVersion) |
NIF symbols and ownership
(See ../nifspec/doc/nif-spec.md.) A global symbol is
<ident>.<disamb>.<moduleSuffix> or, for a generic instantiation,
<ident>.<disamb>.<key>.<moduleSuffix> where key is the instantiation
encoded by the NIF-trees-as-identifiers scheme. Two consequences drive the
backend:
- Instance names are content-addressed: the same instantiation (
seq[Foo]) produced in different modules yields the samekey, so a deterministic dedup is possible by name. - The suffix names an owner module. Today an instance's owner is the module
whose process minted it (
itemId.module), which is process-local mint order — the root of the single-writer hazard below.
The driver: graph construction (commandIc)
- Stamp/wipe the cache by
icFormatVersion. - Seed the graph with the root module and
system.nim.system's entire import closure is folded into one node (onenim minvocation) — see single-writer below. traverseDepsrunsniflerper module and reads.deps.nifto add import edges.- SCC grouping: strongly-connected import cycles are collapsed (Tarjan).
A singleton compiles as
nim m <mod>; a cycle compiles as onenim m <rep> --icGroup:<member>…that builds every member from source in one process (resolving the recursion in memory) and writes each member's NIF. Only edges leaving the component become build-graph inputs. - Discovery fixpoint: write the build file, run
nifmake; if it fails, re-derive the graph from every module's.s.deps.nif(adding nodes/edges for imports the static scanner missed), and retry. See Discovery. - The backend step (
nim nifc) depends on every module's semmed NIF, sonifmakeruns it last.
Invalidation: the cookie system
A dependent must re-sem only when a dependency's relevant surface changed. Two
hashes per module (ast2nif.nim):
- iface cookie (
.iface.nif): hashes only the importer-visible surface — exported declarations' signatures (for all routine kinds: plain procs, templates, macros, generics,inlineprocs alike), full content for consts/types, plus import/export/replay/hook records. Routine bodies are excluded. It also chains in the iface cookies of its own dependencies, so a surface change anywhere in the import closure propagates. Anim mrule for a module depends on its dependencies' iface cookies, so a body-only edit moves no iface cookie and stops the re-sem cascade. - impl cookie (
.impl.nif): hashes the entire serialized content (private defs and bodies included), with the module's own iface mixed in.
NeedsImpl edges (.edges.nif): if a module consumed another module's body
during sem — a macro expansion, a generic instantiation, a getImpl, or a
compile-time call run in the VM — it records a strong edge. The dependent is then
gated on that dependency's impl cookie instead of its iface cookie, so e.g.
const x = dep.foo() re-sems when foo's body changes. Recording sites:
semExprs.semTemplateExpr (templates), seminst.generateInstance (generics),
vmgen.genProc (VM/macros/CT procs), vm.opcGetImpl (getImpl). Inline
iterators and inline procs are not tracked — they are inlined at codegen,
where the backend's NIF-mtime invalidation re-codegens their users.
Discovery of macro-generated imports
The static scanner only sees syntactic imports. A macro can synthesize one
(chronicles does parseStmt("import chronicles/textlines") driven by the
chronicles_sinks define). Such an import is invisible until sem runs the macro.
Each nim m records the imports it actually resolved (via the
semdata.addImportFileDep hook → graph.importDeps → ast2nif.writeSemDeps)
into <s>.s.deps.nif; a child that fails on a not-yet-built import flushes it
before erroring. The driver re-derives the graph from those sidecars — adding the
missing node + the importer→import edge — and reruns to a fixpoint. (This replaced
an earlier icmissing.txt side channel.)
The backend today: commandNifC
The current backend is whole-program and demand-driven, run as one process
(compiler/nifbackend.nim):
- Load
systemthen all modules' semmed NIFs in dependency order (loadModuleDependencies), so all hooks/types are in memory. - DCE:
computeLiveSymbolsover all NIFs computes a global live set used to filter the top-level routine listing. computeModuleReuse: decide which modules' cached translation units (.c.nif) can be reused — skip codegen for them and use their.c/.oas is. The gate mirrors the m-step's cookie gating; a coarse fallback uses the transitive NIF-mtime closure (-d:icCoarseReuse).- Codegen: for each non-reused module,
generateCodeForModulerunscgen. Codegen is demand-driven: emitting one module can demand entities (generic instances, type-bound hooks, RTTI) that belong to other modules; a demand whose home TU is reused is redirected into the demanding TU (redirectToLiveModule), and reused TUs' definitions become prototypes. enforceDefRetention: an un-reuse cascade. If a regenerated module would stop emitting a definition that a still-reused TU references (the demand chain that placed it no longer arises), the referencing TUs lose their reuse and regenerate so the symbol does not vanish under them.emitMethodDispatchers, thenfinishModulefor every module (main module last, so init-proc registration is complete beforegenMainProc).- Emit
.c, thenextccomp.callCCompiler+ link.
Edge cases (and why the machinery exists)
- Single-writer. Instance type-ids are minted in process-local order, so if
two
nim mprocesses both write a module's NIF (e.g. a stdlib module pulled intosystem's from-source closure and given its own rule), the second overwrites with different ids and every module checked against the first carries dangling refs ("symbol has no offset"). Fixed by foldingsystem's closure into one SCC and by forwarding the project's defines to every child so theirwhenbodies (hence import sets and NIF contents) match the scanner's. when … else: import. nifler emitselse-branch imports unguarded, so a deadelse: importwould be scheduled. The compiler's own sources were rewritten to explicit negatedwhens; the vendored nifler later learned to negate prior conditions for theelse.nilsons of loaded ASTs. NIF dot-tokens load asnilwhere from-source ASTs havenkEmpty; several passes gainednilguards.- Sealed loaded types. Loaded types are
Sealed; sem/transform mutate viaunsealForTransform/exactReplica(idgen)(the latter mints a freshuniqueIdso serialized replicas don't collapse). - Methods/RTTI ownership.
genTypeInfoV1already routes a type's RTTI tot…itemId.modulewhen that module is open for codegen — an existing ownership notion the rewrite can generalize. - Config cost. Each child re-parsing
nim.cfg+ re-runningconfig.nimsin the VM was ~80 ms; replaced by a precompiledic_config.cfg.nifreplayed inloadConfigs(compiler/icconfig.nim). koch booticbootstraps the compiler throughnim ic(a 3-iteration fixed-point check). It writes its binary tobin/nim_icand never clobbersbin/nim.
Known residual hacks (targets for the rewrite)
deps.runNiflerusessetLastModificationTimeto mark its scan up-to-date and deletes a stale parsed file to coordinate with the nifmake nifler rule — the driver duplicating nifmake's freshness logic.computeModuleReuse+enforceDefRetention+ the redirect/cached-defs machinery is a hand-rolled mini-nifmakeinside the backend process, needed only because one process reuses some TUs while regenerating others.
These are legacy artifacts of a code generator that predates IC, not intrinsic requirements.
Plan: a nifmake-driven, per-module backend
Goal: the backend stops re-implementing nifmake. Each module's codegen becomes
its own build rule, so "which TUs rebuild" is just "which rules nifmake
re-fires from input mtimes" — exactly as the frontend already works. The reuse /
def-retention / redirect machinery then dissolves.
Target build graph (mirrors Nimony's src/nimony/deps.nim):
- Frontend split. Generate a frontend build file (nifler +
nim mrules), runnifmake, run the.s.depsdiscovery fixpoint. Then re-derive the graph from.s.deps(now complete; deadwhenimports can also be pruned here). - Per-module codegen rule. One
nifc <mod>rule per module: inputs are the module's own.nifplus the.iface/.implcookies of its dependencies; output is its<s>.c.nif. The process loads<mod>+ its import closure's NIFs (likenim m) and emits only the entities it owns, referencing everything elseextern. - Static ownership replaces runtime redirect. Every emittable entity —
generic instances, type-bound hooks, RTTI, lifted procs — gets a deterministic
owner module by symbol suffix. Because instance names are content-addressed
(
ident.disamb.key.owner), the same instance demanded by several modules has one name and one owner, so there is exactly one writer and no link-time duplicate. (The precise owner rule — minting module vs. root-type's module — is the open design decision; start fromitemId.moduleand adjust where it forces a downstream package to own stdlib code.) - DCE as a rule. A single rule reads all
.c.nif, computes the global live set, and (Nimony:.live.nif/ per-module.dce.nif) drives per-module.cemission filtered to live entities. - Link rule. Depends on every
.o(each compiled by its own rule) and the DCE output; produces the executable. - Deletions.
computeModuleReuse,enforceDefRetention,redirectToLiveModuleand the cached-defs/claim bookkeeping go away. ThesetLastModificationTimecoordination inrunNiflergoes away with the frontend split (the nifler rule owns parsed+deps; the driver's pre-scan only reads.depsto build the graph).
Validation bar: koch bootic must still reach its byte-identical fixed point, and
binary size must not regress (DCE parity), across the external-package CI set.
Code, logic & debugging
Core modules:
compiler/deps.nim— graph construction, SCC grouping, discovery fixpoint, build-file generation;commandIc.compiler/ast2nif.nim— AST↔NIF, the cookie hashes (cookieSd,writeIfaceCookie,writeImplCookie,writeEdgesFile,writeSemDeps).compiler/nifbackend.nim— the backend (commandNifC) and its reuse machinery.compiler/icconfig.nim— precompiled config.compiler/pipelines.nim/modulegraphs.nim— pipeline integration and the graph state (importDeps,icImplDeps,icReusedModules, …).
Manual workflow:
- Frontend a module:
nim m --nimcache:nifcache path/to/mod.nim(writes.nif+ cookies +.s.deps). - Backend:
nim nifc --nimcache:nifcache main.nim. - NIF files are text — open/grep them directly;
difftwo successive.nifto see why a module rebuilt. - Force a re-sem: delete the module's
.nifand rerunnim m. - A stale-cache crash after editing the serialization layout means bumping
icFormatVersion(compiler/options.nim).
See also
- NIF format spec: nifspec/doc/nif-spec.md
- NIFC (C-like target) spec: dist/nimony/doc/nifc-spec.md