mirror of
https://github.com/nim-lang/Nim.git
synced 2026-01-03 03:32:32 +00:00
document how the incremental compilation scheme could work
This commit is contained in:
@@ -246,10 +246,10 @@ const trackPosInvalidFileIdx* = FileIndex(-2) # special marker so that no sugges
|
||||
# are produced within comments and string literals
|
||||
|
||||
type
|
||||
MsgConfig* = object
|
||||
MsgConfig* = object ## does not need to be stored in the incremental cache
|
||||
trackPos*: TLineInfo
|
||||
trackPosAttached*: bool ## whether the tracking position was attached to some
|
||||
## close token.
|
||||
trackPosAttached*: bool ## whether the tracking position was attached to
|
||||
## some close token.
|
||||
|
||||
errorOutputs*: TErrorOutputs
|
||||
msgContext*: seq[TLineInfo]
|
||||
|
||||
@@ -47,7 +47,7 @@ type
|
||||
doStopCompile*: proc(): bool {.closure.}
|
||||
usageSym*: PSym # for nimsuggest
|
||||
owners*: seq[PSym]
|
||||
methods*: seq[tuple[methods: TSymSeq, dispatcher: PSym]]
|
||||
methods*: seq[tuple[methods: TSymSeq, dispatcher: PSym]] # needs serialization!
|
||||
systemModule*: PSym
|
||||
sysTypes*: array[TTypeKind, PType]
|
||||
compilerprocs*: TStrTable
|
||||
|
||||
@@ -156,24 +156,27 @@ type
|
||||
version*: int
|
||||
Suggestions* = seq[Suggest]
|
||||
|
||||
ConfigRef* = ref object ## eventually all global configuration should be moved here
|
||||
target*: Target
|
||||
ConfigRef* = ref object ## every global configuration
|
||||
## fields marked with '*' are subject to
|
||||
## the incremental compilation mechanisms
|
||||
## (+) means "part of the dependency"
|
||||
target*: Target # (+)
|
||||
linesCompiled*: int # all lines that have been compiled
|
||||
options*: TOptions
|
||||
globalOptions*: TGlobalOptions
|
||||
options*: TOptions # (+)
|
||||
globalOptions*: TGlobalOptions # (+)
|
||||
m*: MsgConfig
|
||||
evalTemplateCounter*: int
|
||||
evalMacroCounter*: int
|
||||
exitcode*: int8
|
||||
cmd*: TCommands # the command
|
||||
selectedGC*: TGCMode # the selected GC
|
||||
selectedGC*: TGCMode # the selected GC (+)
|
||||
verbosity*: int # how verbose the compiler is
|
||||
numberOfProcessors*: int # number of processors
|
||||
evalExpr*: string # expression for idetools --eval
|
||||
lastCmdTime*: float # when caas is enabled, we measure each command
|
||||
symbolFiles*: SymbolFilesOption
|
||||
|
||||
cppDefines*: HashSet[string]
|
||||
cppDefines*: HashSet[string] # (*)
|
||||
headerFile*: string
|
||||
features*: set[Feature]
|
||||
arguments*: string ## the arguments to be passed to the program that
|
||||
@@ -220,13 +223,13 @@ type
|
||||
cLinkedLibs*: seq[string] # libraries to link
|
||||
|
||||
externalToLink*: seq[string] # files to link in addition to the file
|
||||
# we compiled
|
||||
# we compiled (*)
|
||||
linkOptionsCmd*: string
|
||||
compileOptionsCmd*: seq[string]
|
||||
linkOptions*: string
|
||||
compileOptions*: string
|
||||
linkOptions*: string # (*)
|
||||
compileOptions*: string # (*)
|
||||
ccompilerpath*: string
|
||||
toCompile*: CfileList
|
||||
toCompile*: CfileList # (*)
|
||||
suggestionResultHook*: proc (result: Suggest) {.closure.}
|
||||
suggestVersion*: int
|
||||
suggestMaxResults*: int
|
||||
|
||||
130
doc/intern.txt
130
doc/intern.txt
@@ -38,10 +38,6 @@ Path Purpose
|
||||
Bootstrapping the compiler
|
||||
==========================
|
||||
|
||||
As of version 0.8.5 the compiler is maintained in Nim. (The first versions
|
||||
have been implemented in Object Pascal.) The Python-based build system has
|
||||
been rewritten in Nim too.
|
||||
|
||||
Compiling the compiler is a simple matter of running::
|
||||
|
||||
nim c koch.nim
|
||||
@@ -202,16 +198,86 @@ Compilation cache
|
||||
=================
|
||||
|
||||
The implementation of the compilation cache is tricky: There are lots
|
||||
of issues to be solved for the front- and backend. In the following
|
||||
sections *global* means *shared between modules* or *property of the whole
|
||||
program*.
|
||||
of issues to be solved for the front- and backend.
|
||||
|
||||
|
||||
General approach: AST replay
|
||||
----------------------------
|
||||
|
||||
We store a module's AST of a successful semantic check in a SQLite
|
||||
database. There are plenty of features that require a sub sequence
|
||||
to be re-applied, for example:
|
||||
|
||||
.. code-block:: nim
|
||||
{.compile: "foo.c".} # even if the module is loaded from the DB,
|
||||
# "foo.c" needs to be compiled/linked.
|
||||
|
||||
The solution is to **re-play** the module's top level statements.
|
||||
This solves the problem without having to special case the logic
|
||||
that fills the internal seqs which are affected by the pragmas.
|
||||
|
||||
In fact, this decribes how the AST should be stored in the database,
|
||||
as a "shallow" tree. Let's assume we compile module ``m`` with the
|
||||
following contents:
|
||||
|
||||
.. code-block:: nim
|
||||
import strutils
|
||||
|
||||
var x*: int = 90
|
||||
{.compile: "foo.c".}
|
||||
proc p = echo "p"
|
||||
proc q = echo "q"
|
||||
static:
|
||||
echo "static"
|
||||
|
||||
Conceptually this is the AST we store for the module:
|
||||
|
||||
.. code-block:: nim
|
||||
import strutils
|
||||
|
||||
var x*
|
||||
{.compile: "foo.c".}
|
||||
proc p
|
||||
proc q
|
||||
static:
|
||||
echo "static"
|
||||
|
||||
The symbol's ``ast`` field is loaded lazily, on demand. This is where most
|
||||
savings come from, only the shallow outer AST is reconstructed immediately.
|
||||
|
||||
It is also important that the replay involves the ``import`` statement so
|
||||
that the dependencies are resolved properly.
|
||||
|
||||
|
||||
Shared global compiletime state
|
||||
-------------------------------
|
||||
|
||||
Nim allows ``.global, compiletime`` variables that can be filled by macro
|
||||
invokations across different modules. This feature breaks modularity in a
|
||||
severe way. Plenty of different solutions have been proposed:
|
||||
|
||||
- Restrict the types of global compiletime variables to ``Set[T]`` or
|
||||
similar unordered, only-growable collections so that we can track
|
||||
the module's write effects to these variables and reapply the changes
|
||||
in a different order.
|
||||
- In every module compilation, reset the variable to its default value.
|
||||
- Provide a restrictive API that can load/save the compiletime state to
|
||||
a file.
|
||||
|
||||
(These solutions are not mutually exclusive.)
|
||||
|
||||
Since we adopt the "replay the top level statements" idea, the natural
|
||||
solution to this problem is to emit pseudo top level statements that
|
||||
reflect the mutations done to the global variable.
|
||||
|
||||
|
||||
Frontend issues
|
||||
---------------
|
||||
|
||||
Methods and type converters
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
---------------------------
|
||||
|
||||
In the following
|
||||
sections *global* means *shared between modules* or *property of the whole
|
||||
program*.
|
||||
|
||||
Nim contains language features that are *global*. The best example for that
|
||||
are multi methods: Introducing a new method with the same name and some
|
||||
@@ -238,20 +304,17 @@ If in the above example module ``B`` is re-compiled, but ``A`` is not then
|
||||
``B`` needs to be aware of ``toBool`` even though ``toBool`` is not referenced
|
||||
in ``B`` *explicitly*.
|
||||
|
||||
Both the multi method and the type converter problems are solved by storing
|
||||
them in special sections in the ROD file that are loaded *unconditionally*
|
||||
when the ROD file is read.
|
||||
Both the multi method and the type converter problems are solved by the
|
||||
AST replay implementation.
|
||||
|
||||
|
||||
Generics
|
||||
~~~~~~~~
|
||||
|
||||
If we generate an instance of a generic, we'd like to re-use that
|
||||
instance if possible across module boundaries. However, this is not
|
||||
possible if the compilation cache is enabled. So we give up then and use
|
||||
the caching of generics only per module, not per project. This means that
|
||||
``--symbolFiles:on`` hurts a bit for efficiency. A better solution would
|
||||
be to persist the instantiations in a global cache per project. This might be
|
||||
implemented in later versions.
|
||||
We cache generic instantiations and need to ensure this caching works
|
||||
well with the incremental compilation feature. Since the cache is
|
||||
attached to the ``PSym`` datastructure, it should work without any
|
||||
special logic.
|
||||
|
||||
|
||||
Backend issues
|
||||
@@ -259,13 +322,10 @@ Backend issues
|
||||
|
||||
- Init procs must not be "forgotten" to be called.
|
||||
- Files must not be "forgotten" to be linked.
|
||||
- Anything that is contained in ``nim__dat.c`` is shared between modules
|
||||
implicitly.
|
||||
- Method dispatchers are global.
|
||||
- DLL loading via ``dlsym`` is global.
|
||||
- Emulated thread vars are global.
|
||||
|
||||
|
||||
However the biggest problem is that dead code elimination breaks modularity!
|
||||
To see why, consider this scenario: The module ``G`` (for example the huge
|
||||
Gtk2 module...) is compiled with dead code elimination turned on. So none
|
||||
@@ -274,25 +334,21 @@ of ``G``'s procs is generated at all.
|
||||
Then module ``B`` is compiled that requires ``G.P1``. Ok, no problem,
|
||||
``G.P1`` is loaded from the symbol file and ``G.c`` now contains ``G.P1``.
|
||||
|
||||
Then module ``A`` (that depends onto ``B`` and ``G``) is compiled and ``B``
|
||||
Then module ``A`` (that depends on ``B`` and ``G``) is compiled and ``B``
|
||||
and ``G`` are left unchanged. ``A`` requires ``G.P2``.
|
||||
|
||||
So now ``G.c`` MUST contain both ``P1`` and ``P2``, but we haven't even
|
||||
loaded ``P1`` from the symbol file, nor do we want to because we then quickly
|
||||
would restore large parts of the whole program. But we also don't want to
|
||||
store ``P1`` in ``B.c`` because that would mean to store every symbol where
|
||||
it is referred from which ultimately means the main module and putting
|
||||
everything in a single C file.
|
||||
would restore large parts of the whole program.
|
||||
|
||||
There is however another solution: The old file ``G.c`` containing ``P1`` is
|
||||
**merged** with the new file ``G.c`` containing ``P2``. This is the solution
|
||||
that is implemented in the C code generator (have a look at the ``ccgmerge``
|
||||
module). The merging may lead to *cruft* (aka dead code) in generated C code
|
||||
which can only be removed by recompiling a project with the compilation cache
|
||||
turned off. Nevertheless the merge solution is way superior to the
|
||||
cheap solution "turn off dead code elimination if the compilation cache is
|
||||
turned on".
|
||||
Solution
|
||||
~~~~~~~~
|
||||
|
||||
The backend must have some logic so that if the currently processed module
|
||||
is from the compilation cache, the ``ast`` field is not accessed. Instead
|
||||
the generated C(++) for the symbol's body needs to be cached too and
|
||||
inserted back into the produced C file. This approach seems to deal with
|
||||
all the outlined problems above.
|
||||
|
||||
|
||||
Debugging Nim's memory management
|
||||
@@ -317,7 +373,7 @@ Introduction
|
||||
|
||||
I use the term *cell* here to refer to everything that is traced
|
||||
(sequences, refs, strings).
|
||||
This section describes how the new GC works.
|
||||
This section describes how the GC works.
|
||||
|
||||
The basic algorithm is *Deferrent Reference Counting* with cycle detection.
|
||||
References on the stack are not counted for better performance and easier C
|
||||
|
||||
Reference in New Issue
Block a user