mirror of
https://github.com/nim-lang/Nim.git
synced 2026-04-20 06:20:38 +00:00
cleaned up the internal documentation (#17524)
This commit is contained in:
605
doc/intern.rst
605
doc/intern.rst
@@ -32,28 +32,29 @@ Path Purpose
|
||||
`doc` the documentation; it is a bunch of
|
||||
reStructuredText files
|
||||
`lib` the Nim library
|
||||
`web` website of Nim; generated by `nimweb`
|
||||
from the `*.txt` and `*.nimf` files
|
||||
============ ===================================================
|
||||
|
||||
|
||||
Bootstrapping the compiler
|
||||
==========================
|
||||
|
||||
**Note**: Add ``.`` to your PATH so that `koch` can be used without the `./`.
|
||||
|
||||
Compiling the compiler is a simple matter of running::
|
||||
|
||||
nim c koch.nim
|
||||
./koch boot
|
||||
koch boot -d:release
|
||||
|
||||
For a release version use::
|
||||
For a debug version use::
|
||||
|
||||
nim c koch.nim
|
||||
./koch boot -d:release
|
||||
koch boot
|
||||
|
||||
|
||||
And for a debug version compatible with GDB::
|
||||
|
||||
nim c koch.nim
|
||||
./koch boot --debuginfo --linedir:on
|
||||
koch boot --debuginfo --linedir:on
|
||||
|
||||
The `koch` program is Nim's maintenance script. It is a replacement for
|
||||
make and shell scripting with the advantage that it is much more portable.
|
||||
@@ -61,12 +62,105 @@ More information about its options can be found in the `koch <koch.html>`_
|
||||
documentation.
|
||||
|
||||
|
||||
Developing the compiler
|
||||
=======================
|
||||
|
||||
To create a new compiler for each run, use `koch temp`::
|
||||
|
||||
koch temp c test.nim
|
||||
|
||||
`koch temp` creates a debug build of the compiler, which is useful
|
||||
to create stacktraces for compiler debugging.
|
||||
|
||||
You can of course use GDB or Visual Studio to debug the
|
||||
compiler (via `--debuginfo --lineDir:on`). However, there
|
||||
are also lots of procs that aid in debugging:
|
||||
|
||||
|
||||
.. code-block:: nim
|
||||
|
||||
# dealing with PNode:
|
||||
echo renderTree(someNode)
|
||||
debug(someNode) # some JSON representation
|
||||
|
||||
# dealing with PType:
|
||||
echo typeToString(someType)
|
||||
debug(someType)
|
||||
|
||||
# dealing with PSym:
|
||||
echo symbol.name.s
|
||||
debug(symbol)
|
||||
|
||||
# pretty prints the Nim ast, but annotates symbol IDs:
|
||||
echo renderTree(someNode, {renderIds})
|
||||
if `??`(conf, n.info, "temp.nim"):
|
||||
# only output when it comes from "temp.nim"
|
||||
echo renderTree(n)
|
||||
if `??`(conf, n.info, "temp.nim"):
|
||||
# why does it process temp.nim here?
|
||||
writeStackTrace()
|
||||
|
||||
These procs may not already be imported by the module you're editing.
|
||||
You can import them directly for debugging:
|
||||
|
||||
.. code-block:: nim
|
||||
from astalgo import debug
|
||||
from types import typeToString
|
||||
from renderer import renderTree
|
||||
from msgs import `??`
|
||||
|
||||
|
||||
The compiler's architecture
|
||||
===========================
|
||||
|
||||
Nim uses the classic compiler architecture: A lexer/scanner feds tokens to a
|
||||
parser. The parser builds a syntax tree that is used by the code generators.
|
||||
This syntax tree is the interface between the parser and the code generator.
|
||||
It is essential to understand most of the compiler's code.
|
||||
|
||||
Semantic analysis is separated from parsing.
|
||||
|
||||
.. include:: filelist.txt
|
||||
|
||||
|
||||
The syntax tree
|
||||
---------------
|
||||
The syntax tree consists of nodes which may have an arbitrary number of
|
||||
children. Types and symbols are represented by other nodes, because they
|
||||
may contain cycles. The AST changes its shape after semantic checking. This
|
||||
is needed to make life easier for the code generators. See the "ast" module
|
||||
for the type definitions. The `macros <macros.html>`_ module contains many
|
||||
examples how the AST represents each syntactic structure.
|
||||
|
||||
|
||||
Bisecting for regressions
|
||||
=========================
|
||||
|
||||
`koch temp` returns 125 as the exit code in case the compiler
|
||||
compilation fails. This exit code tells `git bisect` to skip the
|
||||
current commit.::
|
||||
|
||||
git bisect start bad-commit good-commit
|
||||
git bisect run ./koch temp -r c test-source.nim
|
||||
|
||||
You can also bisect using custom options to build the compiler, for example if
|
||||
you don't need a debug version of the compiler (which runs slower), you can replace
|
||||
`./koch temp` by explicit compilation command, see `Rebuilding the compiler`_.
|
||||
|
||||
|
||||
Runtimes
|
||||
========
|
||||
|
||||
Nim has two different runtimes, the "old runtime" and the "new runtime". The old
|
||||
runtime supports the old GCs (markAndSweep, refc, Boehm), the new runtime supports
|
||||
ARC/ORC. The new runtime is active `when defined(nimV2)`.
|
||||
|
||||
|
||||
Coding Guidelines
|
||||
=================
|
||||
|
||||
* Use CamelCase, not underscored_identifiers.
|
||||
* Indent with two spaces.
|
||||
* Max line length is 80 characters.
|
||||
* We follow Nim's official style guide, see `<nep1.html>`_.
|
||||
* Max line length is 100 characters.
|
||||
* Provide spaces around binary operators if that enhances readability.
|
||||
* Use a space after a colon, but not before it.
|
||||
* [deprecated] Start types with a capital `T`, unless they are
|
||||
@@ -86,9 +180,9 @@ POSIX-compliant systems on conventional hardware are usually pretty easy to
|
||||
port: Add the platform to `platform` (if it is not already listed there),
|
||||
check that the OS, System modules work and recompile Nim.
|
||||
|
||||
The only case where things aren't as easy is when the garbage
|
||||
collector needs some assembler tweaking to work. The standard
|
||||
version of the GC uses C's `setjmp` function to store all registers
|
||||
The only case where things aren't as easy is when old runtime's garbage
|
||||
collectors need some assembler tweaking to work. The default
|
||||
implementation uses C's `setjmp` function to store all registers
|
||||
on the hardware stack. It may be necessary that the new platform needs to
|
||||
replace this generic code by some assembler code.
|
||||
|
||||
@@ -96,136 +190,33 @@ replace this generic code by some assembler code.
|
||||
Runtime type information
|
||||
========================
|
||||
|
||||
**Note**: This section describes the "old runtime".
|
||||
|
||||
*Runtime type information* (RTTI) is needed for several aspects of the Nim
|
||||
programming language:
|
||||
|
||||
Garbage collection
|
||||
The most important reason for RTTI. Generating
|
||||
traversal procedures produces bigger code and is likely to be slower on
|
||||
modern hardware as dynamic procedure binding is hard to predict.
|
||||
The old GCs use the RTTI for traversing abitrary Nim types, but usually
|
||||
only the `marker` field which contains a proc that does the traversal.
|
||||
|
||||
Complex assignments
|
||||
Sequences and strings are implemented as
|
||||
pointers to resizeable buffers, but Nim requires copying for
|
||||
assignments. Apart from RTTI the compiler could generate copy procedures
|
||||
for any type that needs one. However, this would make the code bigger and
|
||||
the RTTI is likely already there for the GC.
|
||||
assignments. Apart from RTTI the compiler also generates copy procedures
|
||||
as a specialization.
|
||||
|
||||
We already know the type information as a graph in the compiler.
|
||||
Thus we need to serialize this graph as RTTI for C code generation.
|
||||
Look at the file `lib/system/hti.nim` for more information.
|
||||
|
||||
Rebuilding the compiler
|
||||
|
||||
Magics and compilerProcs
|
||||
========================
|
||||
|
||||
After an initial build via `sh build_all.sh` on posix or `build_all.bat` on windows,
|
||||
you can rebuild the compiler as follows:
|
||||
|
||||
* `nim c koch` if you need to rebuild koch
|
||||
* `./koch boot -d:release` this ensures the compiler can rebuild itself
|
||||
(use `koch` instead of `./koch` on windows), which builds the compiler 3 times.
|
||||
|
||||
A faster approach if you don't need to run the full bootstrapping implied by `koch boot`,
|
||||
is the following:
|
||||
|
||||
* `pathto/nim c --lib:lib -d:release -o:bin/nim_temp compiler/nim.nim`
|
||||
|
||||
Where `pathto/nim` is any nim binary sufficiently recent (e.g. `bin/nim_cources`
|
||||
built during bootstrap or `$HOME/.nimble/bin/nim` installed by `choosenim 1.2.0`)
|
||||
|
||||
You can pass any additional options such as `-d:leanCompiler` if you don't need
|
||||
certain features or `-d:debug --stacktrace:on --excessiveStackTrace --stackTraceMsgs`
|
||||
for debugging the compiler. See also `Debugging the compiler`_.
|
||||
|
||||
Debugging the compiler
|
||||
======================
|
||||
|
||||
You can of course use GDB or Visual Studio to debug the
|
||||
compiler (via `--debuginfo --lineDir:on`). However, there
|
||||
are also lots of procs that aid in debugging:
|
||||
|
||||
|
||||
.. code-block:: nim
|
||||
# pretty prints the Nim AST
|
||||
echo renderTree(someNode)
|
||||
# outputs some JSON representation
|
||||
debug(someNode)
|
||||
# pretty prints some type
|
||||
echo typeToString(someType)
|
||||
debug(someType)
|
||||
echo symbol.name.s
|
||||
debug(symbol)
|
||||
# pretty prints the Nim ast, but annotates symbol IDs:
|
||||
echo renderTree(someNode, {renderIds})
|
||||
if `??`(conf, n.info, "temp.nim"):
|
||||
# only output when it comes from "temp.nim"
|
||||
echo renderTree(n)
|
||||
if `??`(conf, n.info, "temp.nim"):
|
||||
# why does it process temp.nim here?
|
||||
writeStackTrace()
|
||||
|
||||
These procs may not be imported by a module. You can import them directly for debugging:
|
||||
|
||||
.. code-block:: nim
|
||||
from astalgo import debug
|
||||
from types import typeToString
|
||||
from renderer import renderTree
|
||||
from msgs import `??`
|
||||
|
||||
To create a new compiler for each run, use `koch temp`::
|
||||
|
||||
./koch temp c /tmp/test.nim
|
||||
|
||||
`koch temp` creates a debug build of the compiler, which is useful
|
||||
to create stacktraces for compiler debugging. See also
|
||||
`Rebuilding the compiler`_ if you need more control.
|
||||
|
||||
Bisecting for regressions
|
||||
=========================
|
||||
|
||||
`koch temp` returns 125 as the exit code in case the compiler
|
||||
compilation fails. This exit code tells `git bisect` to skip the
|
||||
current commit.::
|
||||
|
||||
git bisect start bad-commit good-commit
|
||||
git bisect run ./koch temp -r c test-source.nim
|
||||
|
||||
You can also bisect using custom options to build the compiler, for example if
|
||||
you don't need a debug version of the compiler (which runs slower), you can replace
|
||||
`./koch temp` by explicit compilation command, see `Rebuilding the compiler`_.
|
||||
|
||||
The compiler's architecture
|
||||
===========================
|
||||
|
||||
Nim uses the classic compiler architecture: A lexer/scanner feds tokens to a
|
||||
parser. The parser builds a syntax tree that is used by the code generator.
|
||||
This syntax tree is the interface between the parser and the code generator.
|
||||
It is essential to understand most of the compiler's code.
|
||||
|
||||
In order to compile Nim correctly, type-checking has to be separated from
|
||||
parsing. Otherwise generics cannot work.
|
||||
|
||||
.. include:: filelist.txt
|
||||
|
||||
|
||||
The syntax tree
|
||||
---------------
|
||||
The syntax tree consists of nodes which may have an arbitrary number of
|
||||
children. Types and symbols are represented by other nodes, because they
|
||||
may contain cycles. The AST changes its shape after semantic checking. This
|
||||
is needed to make life easier for the code generators. See the "ast" module
|
||||
for the type definitions. The `macros <macros.html>`_ module contains many
|
||||
examples how the AST represents each syntactic structure.
|
||||
|
||||
|
||||
How the RTL is compiled
|
||||
=======================
|
||||
|
||||
The `system` module contains the part of the RTL which needs support by
|
||||
compiler magic (and the stuff that needs to be in it because the spec
|
||||
says so). The C code generator generates the C code for it, just like any other
|
||||
compiler magic. The C code generator generates the C code for it, just like any other
|
||||
module. However, calls to some procedures like `addInt` are inserted by
|
||||
the CCG. Therefore the module `magicsys` contains a table (`compilerprocs`)
|
||||
the generator. Therefore there is a table (`compilerprocs`)
|
||||
with all symbols that are marked as `compilerproc`. `compilerprocs` are
|
||||
needed by the code generator. A `magic` proc is not the same as a
|
||||
`compilerproc`: A `magic` is a proc that needs compiler magic for its
|
||||
@@ -233,325 +224,6 @@ semantic checking, a `compilerproc` is a proc that is used by the code
|
||||
generator.
|
||||
|
||||
|
||||
Compilation cache
|
||||
=================
|
||||
|
||||
The implementation of the compilation cache is tricky: There are lots
|
||||
of issues to be solved for the front- and backend.
|
||||
|
||||
|
||||
General approach: AST replay
|
||||
----------------------------
|
||||
|
||||
We store a module's AST of a successful semantic check in a SQLite
|
||||
database. There are plenty of features that require a sub sequence
|
||||
to be re-applied, for example:
|
||||
|
||||
.. code-block:: nim
|
||||
{.compile: "foo.c".} # even if the module is loaded from the DB,
|
||||
# "foo.c" needs to be compiled/linked.
|
||||
|
||||
The solution is to **re-play** the module's top level statements.
|
||||
This solves the problem without having to special case the logic
|
||||
that fills the internal seqs which are affected by the pragmas.
|
||||
|
||||
In fact, this describes how the AST should be stored in the database,
|
||||
as a "shallow" tree. Let's assume we compile module `m` with the
|
||||
following contents:
|
||||
|
||||
.. code-block:: nim
|
||||
import std/strutils
|
||||
|
||||
var x*: int = 90
|
||||
{.compile: "foo.c".}
|
||||
proc p = echo "p"
|
||||
proc q = echo "q"
|
||||
static:
|
||||
echo "static"
|
||||
|
||||
Conceptually this is the AST we store for the module:
|
||||
|
||||
.. code-block:: nim
|
||||
import std/strutils
|
||||
|
||||
var x*
|
||||
{.compile: "foo.c".}
|
||||
proc p
|
||||
proc q
|
||||
static:
|
||||
echo "static"
|
||||
|
||||
The symbol's `ast` field is loaded lazily, on demand. This is where most
|
||||
savings come from, only the shallow outer AST is reconstructed immediately.
|
||||
|
||||
It is also important that the replay involves the `import` statement so
|
||||
that dependencies are resolved properly.
|
||||
|
||||
|
||||
Shared global compiletime state
|
||||
-------------------------------
|
||||
|
||||
Nim allows `.global, compiletime` variables that can be filled by macro
|
||||
invocations across different modules. This feature breaks modularity in a
|
||||
severe way. Plenty of different solutions have been proposed:
|
||||
|
||||
- Restrict the types of global compiletime variables to `Set[T]` or
|
||||
similar unordered, only-growable collections so that we can track
|
||||
the module's write effects to these variables and reapply the changes
|
||||
in a different order.
|
||||
- In every module compilation, reset the variable to its default value.
|
||||
- Provide a restrictive API that can load/save the compiletime state to
|
||||
a file.
|
||||
|
||||
(These solutions are not mutually exclusive.)
|
||||
|
||||
Since we adopt the "replay the top level statements" idea, the natural
|
||||
solution to this problem is to emit pseudo top level statements that
|
||||
reflect the mutations done to the global variable. However, this is
|
||||
MUCH harder than it sounds, for example `squeaknim` uses this
|
||||
snippet:
|
||||
|
||||
.. code-block:: nim
|
||||
apicall.add(") module: '" & dllName & "'>\C" &
|
||||
"\t^self externalCallFailed\C!\C\C")
|
||||
stCode.add(st & "\C\t\"Generated by NimSqueak\"\C\t" & apicall)
|
||||
|
||||
We can "replay" `stCode.add` only if the values of `st`
|
||||
and `apicall` are known. And even then a hash table's `add` with its
|
||||
hashing mechanism is too hard to replay.
|
||||
|
||||
In practice, things are worse still, consider `someGlobal[i][j].add arg`.
|
||||
We only know the root is `someGlobal` but the concrete path to the data
|
||||
is unknown as is the value that is added. We could compute a "diff" between
|
||||
the global states and use that to compute a symbol patchset, but this is
|
||||
quite some work, expensive to do at runtime (it would need to run after
|
||||
every module has been compiled) and would also break for hash tables.
|
||||
|
||||
We need an API that hides the complex aliasing problems by not relying
|
||||
on Nim's global variables. The obvious solution is to use string keys
|
||||
instead of global variables:
|
||||
|
||||
.. code-block:: nim
|
||||
|
||||
proc cachePut*(key: string; value: string)
|
||||
proc cacheGet*(key: string): string
|
||||
|
||||
However, the values being strings/json is quite problematic: Many
|
||||
lookup tables that are built at compiletime embed *proc vars* and
|
||||
types which have no obvious string representation... Seems like
|
||||
AST diffing is still the best idea as it will not require to use
|
||||
an alien API and works with some existing Nimble packages, at least.
|
||||
|
||||
On the other hand, in Nim's future I would like to replace the VM
|
||||
by native code. A diff algorithm wouldn't work for that.
|
||||
Instead the native code would work with an API like `put`, `get`:
|
||||
|
||||
.. code-block:: nim
|
||||
|
||||
proc cachePut*(key: string; value: NimNode)
|
||||
proc cacheGet*(key: string): NimNode
|
||||
|
||||
The API should embrace the AST diffing notion: See the
|
||||
module `macrocache` for the final details.
|
||||
|
||||
|
||||
|
||||
Methods and type converters
|
||||
---------------------------
|
||||
|
||||
In the following
|
||||
sections *global* means *shared between modules* or *property of the whole
|
||||
program*.
|
||||
|
||||
Nim contains language features that are *global*. The best example for that
|
||||
are multi methods: Introducing a new method with the same name and some
|
||||
compatible object parameter means that the method's dispatcher needs to take
|
||||
the new method into account. So the dispatching logic is only completely known
|
||||
after the whole program has been translated!
|
||||
|
||||
Other features that are *implicitly* triggered cause problems for modularity
|
||||
too. Type converters fall into this category:
|
||||
|
||||
.. code-block:: nim
|
||||
# module A
|
||||
converter toBool(x: int): bool =
|
||||
result = x != 0
|
||||
|
||||
.. code-block:: nim
|
||||
# module B
|
||||
import A
|
||||
|
||||
if 1:
|
||||
echo "ugly, but should work"
|
||||
|
||||
If in the above example module `B` is re-compiled, but `A` is not then
|
||||
`B` needs to be aware of `toBool` even though `toBool` is not referenced
|
||||
in `B` *explicitly*.
|
||||
|
||||
Both the multi method and the type converter problems are solved by the
|
||||
AST replay implementation.
|
||||
|
||||
|
||||
Generics
|
||||
~~~~~~~~
|
||||
|
||||
We cache generic instantiations and need to ensure this caching works
|
||||
well with the incremental compilation feature. Since the cache is
|
||||
attached to the `PSym` datastructure, it should work without any
|
||||
special logic.
|
||||
|
||||
|
||||
Backend issues
|
||||
--------------
|
||||
|
||||
- Init procs must not be "forgotten" to be called.
|
||||
- Files must not be "forgotten" to be linked.
|
||||
- Method dispatchers are global.
|
||||
- DLL loading via `dlsym` is global.
|
||||
- Emulated thread vars are global.
|
||||
|
||||
However the biggest problem is that dead code elimination breaks modularity!
|
||||
To see why, consider this scenario: The module `G` (for example the huge
|
||||
Gtk2 module...) is compiled with dead code elimination turned on. So none
|
||||
of `G`'s procs is generated at all.
|
||||
|
||||
Then module `B` is compiled that requires `G.P1`. Ok, no problem,
|
||||
`G.P1` is loaded from the symbol file and `G.c` now contains `G.P1`.
|
||||
|
||||
Then module `A` (that depends on `B` and `G`) is compiled and `B`
|
||||
and `G` are left unchanged. `A` requires `G.P2`.
|
||||
|
||||
So now `G.c` MUST contain both `P1` and `P2`, but we haven't even
|
||||
loaded `P1` from the symbol file, nor do we want to because we then quickly
|
||||
would restore large parts of the whole program.
|
||||
|
||||
|
||||
Solution
|
||||
~~~~~~~~
|
||||
|
||||
The backend must have some logic so that if the currently processed module
|
||||
is from the compilation cache, the `ast` field is not accessed. Instead
|
||||
the generated C(++) for the symbol's body needs to be cached too and
|
||||
inserted back into the produced C file. This approach seems to deal with
|
||||
all the outlined problems above.
|
||||
|
||||
|
||||
Debugging Nim's memory management
|
||||
=================================
|
||||
|
||||
The following paragraphs are mostly a reminder for myself. Things to keep
|
||||
in mind:
|
||||
|
||||
* If an assertion in Nim's memory manager or GC fails, the stack trace
|
||||
keeps allocating memory! Thus a stack overflow may happen, hiding the
|
||||
real issue.
|
||||
* What seem to be C code generation problems is often a bug resulting from
|
||||
not producing prototypes, so that some types default to `cint`. Testing
|
||||
without the `-w` option helps!
|
||||
|
||||
|
||||
The Garbage Collector
|
||||
=====================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
I use the term *cell* here to refer to everything that is traced
|
||||
(sequences, refs, strings).
|
||||
This section describes how the GC works.
|
||||
|
||||
The basic algorithm is *Deferrent Reference Counting* with cycle detection.
|
||||
References on the stack are not counted for better performance and easier C
|
||||
code generation.
|
||||
|
||||
Each cell has a header consisting of a RC and a pointer to its type
|
||||
descriptor. However the program does not know about these, so they are placed at
|
||||
negative offsets. In the GC code the type `PCell` denotes a pointer
|
||||
decremented by the right offset, so that the header can be accessed easily. It
|
||||
is extremely important that `pointer` is not confused with a `PCell`
|
||||
as this would lead to a memory corruption.
|
||||
|
||||
|
||||
The CellSet data structure
|
||||
--------------------------
|
||||
|
||||
The GC depends on an extremely efficient datastructure for storing a
|
||||
set of pointers - this is called a `TCellSet` in the source code.
|
||||
Inserting, deleting and searching are done in constant time. However,
|
||||
modifying a `TCellSet` during traversal leads to undefined behaviour.
|
||||
|
||||
.. code-block:: Nim
|
||||
type
|
||||
TCellSet # hidden
|
||||
|
||||
proc cellSetInit(s: var TCellSet) # initialize a new set
|
||||
proc cellSetDeinit(s: var TCellSet) # empty the set and free its memory
|
||||
proc incl(s: var TCellSet, elem: PCell) # include an element
|
||||
proc excl(s: var TCellSet, elem: PCell) # exclude an element
|
||||
|
||||
proc `in`(elem: PCell, s: TCellSet): bool # tests membership
|
||||
|
||||
iterator elements(s: TCellSet): (elem: PCell)
|
||||
|
||||
|
||||
All the operations have to perform efficiently. Because a Cellset can
|
||||
become huge a hash table alone is not suitable for this.
|
||||
|
||||
We use a mixture of bitset and hash table for this. The hash table maps *pages*
|
||||
to a page descriptor. The page descriptor contains a bit for any possible cell
|
||||
address within this page. So including a cell is done as follows:
|
||||
|
||||
- Find the page descriptor for the page the cell belongs to.
|
||||
- Set the appropriate bit in the page descriptor indicating that the
|
||||
cell points to the start of a memory block.
|
||||
|
||||
Removing a cell is analogous - the bit has to be set to zero.
|
||||
Single page descriptors are never deleted from the hash table. This is not
|
||||
needed as the data structures needs to be rebuilt periodically anyway.
|
||||
|
||||
Complete traversal is done in this way::
|
||||
|
||||
for each page descriptor d:
|
||||
for each bit in d:
|
||||
if bit == 1:
|
||||
traverse the pointer belonging to this bit
|
||||
|
||||
|
||||
Further complications
|
||||
---------------------
|
||||
|
||||
In Nim the compiler cannot always know if a reference
|
||||
is stored on the stack or not. This is caused by var parameters.
|
||||
Consider this example:
|
||||
|
||||
.. code-block:: Nim
|
||||
proc setRef(r: var ref TNode) =
|
||||
new(r)
|
||||
|
||||
proc usage =
|
||||
var
|
||||
r: ref TNode
|
||||
setRef(r) # here we should not update the reference counts, because
|
||||
# r is on the stack
|
||||
setRef(r.left) # here we should update the refcounts!
|
||||
|
||||
We have to decide at runtime whether the reference is on the stack or not.
|
||||
The generated code looks roughly like this:
|
||||
|
||||
.. code-block:: C
|
||||
void setref(TNode** ref) {
|
||||
unsureAsgnRef(ref, newObj(TNode_TI, sizeof(TNode)))
|
||||
}
|
||||
void usage(void) {
|
||||
setRef(&r)
|
||||
setRef(&r->left)
|
||||
}
|
||||
|
||||
Note that for systems with a continuous stack (which most systems have)
|
||||
the check whether the ref is on the stack is very cheap (only two
|
||||
comparisons).
|
||||
|
||||
|
||||
Code generation for closures
|
||||
============================
|
||||
|
||||
@@ -598,14 +270,14 @@ This should produce roughly this code:
|
||||
|
||||
.. code-block:: nim
|
||||
type
|
||||
PEnv = ref object
|
||||
Env = ref object
|
||||
x: int # data
|
||||
|
||||
proc anon(y: int, c: PEnv): int =
|
||||
proc anon(y: int, c: Env): int =
|
||||
return y + c.x
|
||||
|
||||
proc add(x: int): tuple[prc, data] =
|
||||
var env: PEnv
|
||||
var env: Env
|
||||
new env
|
||||
env.x = x
|
||||
result = (anon, env)
|
||||
@@ -630,25 +302,25 @@ This should produce roughly this code:
|
||||
|
||||
.. code-block:: nim
|
||||
type
|
||||
PEnvX = ref object
|
||||
EnvX = ref object
|
||||
x: int # data
|
||||
|
||||
PEnvY = ref object
|
||||
EnvY = ref object
|
||||
y: int
|
||||
ex: PEnvX
|
||||
ex: EnvX
|
||||
|
||||
proc lambdaZ(z: int, ey: PEnvY): int =
|
||||
proc lambdaZ(z: int, ey: EnvY): int =
|
||||
return ey.ex.x + ey.y + z
|
||||
|
||||
proc lambdaY(y: int, ex: PEnvX): tuple[prc, data: PEnvY] =
|
||||
var ey: PEnvY
|
||||
proc lambdaY(y: int, ex: EnvX): tuple[prc, data: EnvY] =
|
||||
var ey: EnvY
|
||||
new ey
|
||||
ey.y = y
|
||||
ey.ex = ex
|
||||
result = (lambdaZ, ey)
|
||||
|
||||
proc add(x: int): tuple[prc, data: PEnvX] =
|
||||
var ex: PEnvX
|
||||
proc add(x: int): tuple[prc, data: EnvX] =
|
||||
var ex: EnvX
|
||||
ex.x = x
|
||||
result = (labmdaY, ex)
|
||||
|
||||
@@ -663,17 +335,11 @@ More useful is escape analysis and stack allocation of the environment,
|
||||
however.
|
||||
|
||||
|
||||
Alternative
|
||||
-----------
|
||||
|
||||
Process the closure of all inner procs in one pass and accumulate the
|
||||
environments. This is however not always possible.
|
||||
|
||||
|
||||
Accumulator
|
||||
-----------
|
||||
|
||||
.. code-block:: nim
|
||||
|
||||
proc getAccumulator(start: int): proc (): int {.closure} =
|
||||
var i = start
|
||||
return lambda: int =
|
||||
@@ -708,20 +374,26 @@ backend somehow. We deal with this by modifying `s.ast[paramPos]` to contain
|
||||
the formal hidden parameter, but not `s.typ`!
|
||||
|
||||
|
||||
Integer literals:
|
||||
-----------------
|
||||
Notes on type and AST representation
|
||||
====================================
|
||||
|
||||
To be expanded.
|
||||
|
||||
|
||||
Integer literals
|
||||
----------------
|
||||
|
||||
In Nim, there is a redundant way to specify the type of an
|
||||
integer literal. First of all, it should be unsurprising that every
|
||||
node has a node kind. The node of an integer literal can be any of the
|
||||
following values:
|
||||
following values::
|
||||
|
||||
nkIntLit, nkInt8Lit, nkInt16Lit, nkInt32Lit, nkInt64Lit,
|
||||
nkUIntLit, nkUInt8Lit, nkUInt16Lit, nkUInt32Lit, nkUInt64Lit
|
||||
|
||||
On top of that, there is also the `typ` field for the type. It the
|
||||
kind of the `typ` field can be one of the following ones, and it
|
||||
should be matching the literal kind:
|
||||
should be matching the literal kind::
|
||||
|
||||
tyInt, tyInt8, tyInt16, tyInt32, tyInt64, tyUInt, tyUInt8,
|
||||
tyUInt16, tyUInt32, tyUInt64
|
||||
@@ -777,6 +449,7 @@ pointing back to the integer literal node in the ast containing the
|
||||
integer value. These are the properties that hold true for integer
|
||||
literal types.
|
||||
|
||||
::
|
||||
n.kind == nkIntLit
|
||||
n.typ.kind == tyInt
|
||||
n.typ.n == n
|
||||
|
||||
@@ -7,7 +7,40 @@
|
||||
# distribution, for details about the copyright.
|
||||
#
|
||||
|
||||
# Efficient set of pointers for the GC (and repr)
|
||||
|
||||
#[
|
||||
|
||||
Efficient set of pointers for the GC (and repr)
|
||||
-----------------------------------------------
|
||||
|
||||
The GC depends on an extremely efficient datastructure for storing a
|
||||
set of pointers - this is called a `CellSet` in the source code.
|
||||
Inserting, deleting and searching are done in constant time. However,
|
||||
modifying a `CellSet` during traversal leads to undefined behaviour.
|
||||
|
||||
All operations on a CellSet have to perform efficiently. Because a Cellset can
|
||||
become huge a hash table alone is not suitable for this.
|
||||
|
||||
We use a mixture of bitset and hash table for this. The hash table maps *pages*
|
||||
to a page descriptor. The page descriptor contains a bit for any possible cell
|
||||
address within this page. So including a cell is done as follows:
|
||||
|
||||
- Find the page descriptor for the page the cell belongs to.
|
||||
- Set the appropriate bit in the page descriptor indicating that the
|
||||
cell points to the start of a memory block.
|
||||
|
||||
Removing a cell is analogous - the bit has to be set to zero.
|
||||
Single page descriptors are never deleted from the hash table. This is not
|
||||
needed as the data structures needs to be rebuilt periodically anyway.
|
||||
|
||||
Complete traversal is done in this way::
|
||||
|
||||
for each page descriptor d:
|
||||
for each bit in d:
|
||||
if bit == 1:
|
||||
traverse the pointer belonging to this bit
|
||||
|
||||
]#
|
||||
|
||||
when defined(gcOrc) or defined(gcArc):
|
||||
type
|
||||
|
||||
@@ -12,6 +12,53 @@
|
||||
# Refcounting + Mark&Sweep. Complex algorithms avoided.
|
||||
# Been there, done that, didn't work.
|
||||
|
||||
#[
|
||||
|
||||
A *cell* is anything that is traced by the GC
|
||||
(sequences, refs, strings, closures).
|
||||
|
||||
The basic algorithm is *Deferrent Reference Counting* with cycle detection.
|
||||
References on the stack are not counted for better performance and easier C
|
||||
code generation.
|
||||
|
||||
Each cell has a header consisting of a RC and a pointer to its type
|
||||
descriptor. However the program does not know about these, so they are placed at
|
||||
negative offsets. In the GC code the type `PCell` denotes a pointer
|
||||
decremented by the right offset, so that the header can be accessed easily. It
|
||||
is extremely important that `pointer` is not confused with a `PCell`.
|
||||
|
||||
In Nim the compiler cannot always know if a reference
|
||||
is stored on the stack or not. This is caused by var parameters.
|
||||
Consider this example:
|
||||
|
||||
.. code-block:: Nim
|
||||
proc setRef(r: var ref TNode) =
|
||||
new(r)
|
||||
|
||||
proc usage =
|
||||
var
|
||||
r: ref TNode
|
||||
setRef(r) # here we should not update the reference counts, because
|
||||
# r is on the stack
|
||||
setRef(r.left) # here we should update the refcounts!
|
||||
|
||||
We have to decide at runtime whether the reference is on the stack or not.
|
||||
The generated code looks roughly like this:
|
||||
|
||||
.. code-block:: C
|
||||
void setref(TNode** ref) {
|
||||
unsureAsgnRef(ref, newObj(TNode_TI, sizeof(TNode)))
|
||||
}
|
||||
void usage(void) {
|
||||
setRef(&r)
|
||||
setRef(&r->left)
|
||||
}
|
||||
|
||||
Note that for systems with a continuous stack (which most systems have)
|
||||
the check whether the ref is on the stack is very cheap (only two
|
||||
comparisons).
|
||||
]#
|
||||
|
||||
{.push profiler:off.}
|
||||
|
||||
const
|
||||
|
||||
Reference in New Issue
Block a user