mirror of
https://github.com/nim-lang/Nim.git
synced 2025-12-28 17:04:41 +00:00
795 lines
26 KiB
ReStructuredText
795 lines
26 KiB
ReStructuredText
=========================================
|
|
Internals of the Nim Compiler
|
|
=========================================
|
|
|
|
|
|
:Author: Andreas Rumpf
|
|
:Version: |nimversion|
|
|
|
|
.. contents::
|
|
|
|
"Abstraction is layering ignorance on top of reality." -- Richard Gabriel
|
|
|
|
|
|
Directory structure
|
|
===================
|
|
|
|
The Nim project's directory structure is:
|
|
|
|
============ ===================================================
|
|
Path Purpose
|
|
============ ===================================================
|
|
``bin`` generated binary files
|
|
``build`` generated C code for the installation
|
|
``compiler`` the Nim compiler itself; note that this
|
|
code has been translated from a bootstrapping
|
|
version written in Pascal, so the code is **not**
|
|
a poster child of good Nim code
|
|
``config`` configuration files for Nim
|
|
``dist`` additional packages for the distribution
|
|
``doc`` the documentation; it is a bunch of
|
|
reStructuredText files
|
|
``lib`` the Nim library
|
|
``web`` website of Nim; generated by ``nimweb``
|
|
from the ``*.txt`` and ``*.nimf`` files
|
|
============ ===================================================
|
|
|
|
|
|
Bootstrapping the compiler
|
|
==========================
|
|
|
|
Compiling the compiler is a simple matter of running::
|
|
|
|
nim c koch.nim
|
|
./koch boot
|
|
|
|
For a release version use::
|
|
|
|
nim c koch.nim
|
|
./koch boot -d:release
|
|
|
|
And for a debug version compatible with GDB::
|
|
|
|
nim c koch.nim
|
|
./koch boot --debuginfo --linedir:on
|
|
|
|
The ``koch`` program is Nim's maintenance script. It is a replacement for
|
|
make and shell scripting with the advantage that it is much more portable.
|
|
More information about its options can be found in the `koch <koch.html>`_
|
|
documentation.
|
|
|
|
|
|
Coding Guidelines
|
|
=================
|
|
|
|
* Use CamelCase, not underscored_identifiers.
|
|
* Indent with two spaces.
|
|
* Max line length is 80 characters.
|
|
* Provide spaces around binary operators if that enhances readability.
|
|
* Use a space after a colon, but not before it.
|
|
* [deprecated] Start types with a capital ``T``, unless they are
|
|
pointers/references which start with ``P``.
|
|
|
|
See also the `API naming design <apis.html>`_ document.
|
|
|
|
|
|
Porting to new platforms
|
|
========================
|
|
|
|
Porting Nim to a new architecture is pretty easy, since C is the most
|
|
portable programming language (within certain limits) and Nim generates
|
|
C code, porting the code generator is not necessary.
|
|
|
|
POSIX-compliant systems on conventional hardware are usually pretty easy to
|
|
port: Add the platform to ``platform`` (if it is not already listed there),
|
|
check that the OS, System modules work and recompile Nim.
|
|
|
|
The only case where things aren't as easy is when the garbage
|
|
collector needs some assembler tweaking to work. The standard
|
|
version of the GC uses C's ``setjmp`` function to store all registers
|
|
on the hardware stack. It may be necessary that the new platform needs to
|
|
replace this generic code by some assembler code.
|
|
|
|
|
|
Runtime type information
|
|
========================
|
|
|
|
*Runtime type information* (RTTI) is needed for several aspects of the Nim
|
|
programming language:
|
|
|
|
Garbage collection
|
|
The most important reason for RTTI. Generating
|
|
traversal procedures produces bigger code and is likely to be slower on
|
|
modern hardware as dynamic procedure binding is hard to predict.
|
|
|
|
Complex assignments
|
|
Sequences and strings are implemented as
|
|
pointers to resizeable buffers, but Nim requires copying for
|
|
assignments. Apart from RTTI the compiler could generate copy procedures
|
|
for any type that needs one. However, this would make the code bigger and
|
|
the RTTI is likely already there for the GC.
|
|
|
|
We already know the type information as a graph in the compiler.
|
|
Thus we need to serialize this graph as RTTI for C code generation.
|
|
Look at the file ``lib/system/hti.nim`` for more information.
|
|
|
|
Rebuilding the compiler
|
|
========================
|
|
|
|
After an initial build via `sh build_all.sh` on posix or `build_all.bat` on windows,
|
|
you can rebuild the compiler as follows:
|
|
|
|
* `nim c koch` if you need to rebuild koch
|
|
* `./koch boot -d:release` this ensures the compiler can rebuild itself
|
|
(use `koch` instead of `./koch` on windows), which builds the compiler 3 times.
|
|
|
|
A faster approach if you don't need to run the full bootstrapping implied by `koch boot`,
|
|
is the following:
|
|
|
|
* `pathto/nim c --lib:lib -d:release -o:bin/nim_temp compiler/nim.nim`
|
|
|
|
Where `pathto/nim` is any nim binary sufficiently recent (e.g. `bin/nim_cources`
|
|
built during bootstrap or `$HOME/.nimble/bin/nim` installed by `choosenim 1.2.0`)
|
|
|
|
You can pass any additional options such as `-d:leanCompiler` if you don't need
|
|
certain features or `-d:debug --stacktrace:on --excessiveStackTrace --stackTraceMsgs`
|
|
for debugging the compiler. See also `Debugging the compiler`_.
|
|
|
|
Debugging the compiler
|
|
======================
|
|
|
|
You can of course use GDB or Visual Studio to debug the
|
|
compiler (via ``--debuginfo --lineDir:on``). However, there
|
|
are also lots of procs that aid in debugging:
|
|
|
|
|
|
.. code-block:: nim
|
|
# pretty prints the Nim AST
|
|
echo renderTree(someNode)
|
|
# outputs some JSON representation
|
|
debug(someNode)
|
|
# pretty prints some type
|
|
echo typeToString(someType)
|
|
debug(someType)
|
|
echo symbol.name.s
|
|
debug(symbol)
|
|
# pretty prints the Nim ast, but annotates symbol IDs:
|
|
echo renderTree(someNode, {renderIds})
|
|
if `??`(conf, n.info, "temp.nim"):
|
|
# only output when it comes from "temp.nim"
|
|
echo renderTree(n)
|
|
if `??`(conf, n.info, "temp.nim"):
|
|
# why does it process temp.nim here?
|
|
writeStackTrace()
|
|
|
|
These procs may not be imported by a module. You can import them directly for debugging:
|
|
|
|
.. code-block:: nim
|
|
from astalgo import debug
|
|
from types import typeToString
|
|
from renderer import renderTree
|
|
from msgs import `??`
|
|
|
|
To create a new compiler for each run, use ``koch temp``::
|
|
|
|
./koch temp c /tmp/test.nim
|
|
|
|
``koch temp`` creates a debug build of the compiler, which is useful
|
|
to create stacktraces for compiler debugging. See also
|
|
`Rebuilding the compiler`_ if you need more control.
|
|
|
|
Bisecting for regressions
|
|
=========================
|
|
|
|
``koch temp`` returns 125 as the exit code in case the compiler
|
|
compilation fails. This exit code tells ``git bisect`` to skip the
|
|
current commit.::
|
|
|
|
git bisect start bad-commit good-commit
|
|
git bisect run ./koch temp -r c test-source.nim
|
|
|
|
You can also bisect using custom options to build the compiler, for example if
|
|
you don't need a debug version of the compiler (which runs slower), you can replace
|
|
`./koch temp` by explicit compilation command, see `Rebuilding the compiler`_.
|
|
|
|
The compiler's architecture
|
|
===========================
|
|
|
|
Nim uses the classic compiler architecture: A lexer/scanner feds tokens to a
|
|
parser. The parser builds a syntax tree that is used by the code generator.
|
|
This syntax tree is the interface between the parser and the code generator.
|
|
It is essential to understand most of the compiler's code.
|
|
|
|
In order to compile Nim correctly, type-checking has to be separated from
|
|
parsing. Otherwise generics cannot work.
|
|
|
|
.. include:: filelist.txt
|
|
|
|
|
|
The syntax tree
|
|
---------------
|
|
The syntax tree consists of nodes which may have an arbitrary number of
|
|
children. Types and symbols are represented by other nodes, because they
|
|
may contain cycles. The AST changes its shape after semantic checking. This
|
|
is needed to make life easier for the code generators. See the "ast" module
|
|
for the type definitions. The `macros <macros.html>`_ module contains many
|
|
examples how the AST represents each syntactic structure.
|
|
|
|
|
|
How the RTL is compiled
|
|
=======================
|
|
|
|
The ``system`` module contains the part of the RTL which needs support by
|
|
compiler magic (and the stuff that needs to be in it because the spec
|
|
says so). The C code generator generates the C code for it, just like any other
|
|
module. However, calls to some procedures like ``addInt`` are inserted by
|
|
the CCG. Therefore the module ``magicsys`` contains a table (``compilerprocs``)
|
|
with all symbols that are marked as ``compilerproc``. ``compilerprocs`` are
|
|
needed by the code generator. A ``magic`` proc is not the same as a
|
|
``compilerproc``: A ``magic`` is a proc that needs compiler magic for its
|
|
semantic checking, a ``compilerproc`` is a proc that is used by the code
|
|
generator.
|
|
|
|
|
|
Compilation cache
|
|
=================
|
|
|
|
The implementation of the compilation cache is tricky: There are lots
|
|
of issues to be solved for the front- and backend.
|
|
|
|
|
|
General approach: AST replay
|
|
----------------------------
|
|
|
|
We store a module's AST of a successful semantic check in a SQLite
|
|
database. There are plenty of features that require a sub sequence
|
|
to be re-applied, for example:
|
|
|
|
.. code-block:: nim
|
|
{.compile: "foo.c".} # even if the module is loaded from the DB,
|
|
# "foo.c" needs to be compiled/linked.
|
|
|
|
The solution is to **re-play** the module's top level statements.
|
|
This solves the problem without having to special case the logic
|
|
that fills the internal seqs which are affected by the pragmas.
|
|
|
|
In fact, this describes how the AST should be stored in the database,
|
|
as a "shallow" tree. Let's assume we compile module ``m`` with the
|
|
following contents:
|
|
|
|
.. code-block:: nim
|
|
import std/strutils
|
|
|
|
var x*: int = 90
|
|
{.compile: "foo.c".}
|
|
proc p = echo "p"
|
|
proc q = echo "q"
|
|
static:
|
|
echo "static"
|
|
|
|
Conceptually this is the AST we store for the module:
|
|
|
|
.. code-block:: nim
|
|
import std/strutils
|
|
|
|
var x*
|
|
{.compile: "foo.c".}
|
|
proc p
|
|
proc q
|
|
static:
|
|
echo "static"
|
|
|
|
The symbol's ``ast`` field is loaded lazily, on demand. This is where most
|
|
savings come from, only the shallow outer AST is reconstructed immediately.
|
|
|
|
It is also important that the replay involves the ``import`` statement so
|
|
that dependencies are resolved properly.
|
|
|
|
|
|
Shared global compiletime state
|
|
-------------------------------
|
|
|
|
Nim allows ``.global, compiletime`` variables that can be filled by macro
|
|
invocations across different modules. This feature breaks modularity in a
|
|
severe way. Plenty of different solutions have been proposed:
|
|
|
|
- Restrict the types of global compiletime variables to ``Set[T]`` or
|
|
similar unordered, only-growable collections so that we can track
|
|
the module's write effects to these variables and reapply the changes
|
|
in a different order.
|
|
- In every module compilation, reset the variable to its default value.
|
|
- Provide a restrictive API that can load/save the compiletime state to
|
|
a file.
|
|
|
|
(These solutions are not mutually exclusive.)
|
|
|
|
Since we adopt the "replay the top level statements" idea, the natural
|
|
solution to this problem is to emit pseudo top level statements that
|
|
reflect the mutations done to the global variable. However, this is
|
|
MUCH harder than it sounds, for example ``squeaknim`` uses this
|
|
snippet:
|
|
|
|
.. code-block:: nim
|
|
apicall.add(") module: '" & dllName & "'>\C" &
|
|
"\t^self externalCallFailed\C!\C\C")
|
|
stCode.add(st & "\C\t\"Generated by NimSqueak\"\C\t" & apicall)
|
|
|
|
We can "replay" ``stCode.add`` only if the values of ``st``
|
|
and ``apicall`` are known. And even then a hash table's ``add`` with its
|
|
hashing mechanism is too hard to replay.
|
|
|
|
In practice, things are worse still, consider ``someGlobal[i][j].add arg``.
|
|
We only know the root is ``someGlobal`` but the concrete path to the data
|
|
is unknown as is the value that is added. We could compute a "diff" between
|
|
the global states and use that to compute a symbol patchset, but this is
|
|
quite some work, expensive to do at runtime (it would need to run after
|
|
every module has been compiled) and would also break for hash tables.
|
|
|
|
We need an API that hides the complex aliasing problems by not relying
|
|
on Nim's global variables. The obvious solution is to use string keys
|
|
instead of global variables:
|
|
|
|
.. code-block:: nim
|
|
|
|
proc cachePut*(key: string; value: string)
|
|
proc cacheGet*(key: string): string
|
|
|
|
However, the values being strings/json is quite problematic: Many
|
|
lookup tables that are built at compiletime embed *proc vars* and
|
|
types which have no obvious string representation... Seems like
|
|
AST diffing is still the best idea as it will not require to use
|
|
an alien API and works with some existing Nimble packages, at least.
|
|
|
|
On the other hand, in Nim's future I would like to replace the VM
|
|
by native code. A diff algorithm wouldn't work for that.
|
|
Instead the native code would work with an API like ``put``, ``get``:
|
|
|
|
.. code-block:: nim
|
|
|
|
proc cachePut*(key: string; value: NimNode)
|
|
proc cacheGet*(key: string): NimNode
|
|
|
|
The API should embrace the AST diffing notion: See the
|
|
module ``macrocache`` for the final details.
|
|
|
|
|
|
|
|
Methods and type converters
|
|
---------------------------
|
|
|
|
In the following
|
|
sections *global* means *shared between modules* or *property of the whole
|
|
program*.
|
|
|
|
Nim contains language features that are *global*. The best example for that
|
|
are multi methods: Introducing a new method with the same name and some
|
|
compatible object parameter means that the method's dispatcher needs to take
|
|
the new method into account. So the dispatching logic is only completely known
|
|
after the whole program has been translated!
|
|
|
|
Other features that are *implicitly* triggered cause problems for modularity
|
|
too. Type converters fall into this category:
|
|
|
|
.. code-block:: nim
|
|
# module A
|
|
converter toBool(x: int): bool =
|
|
result = x != 0
|
|
|
|
.. code-block:: nim
|
|
# module B
|
|
import A
|
|
|
|
if 1:
|
|
echo "ugly, but should work"
|
|
|
|
If in the above example module ``B`` is re-compiled, but ``A`` is not then
|
|
``B`` needs to be aware of ``toBool`` even though ``toBool`` is not referenced
|
|
in ``B`` *explicitly*.
|
|
|
|
Both the multi method and the type converter problems are solved by the
|
|
AST replay implementation.
|
|
|
|
|
|
Generics
|
|
~~~~~~~~
|
|
|
|
We cache generic instantiations and need to ensure this caching works
|
|
well with the incremental compilation feature. Since the cache is
|
|
attached to the ``PSym`` datastructure, it should work without any
|
|
special logic.
|
|
|
|
|
|
Backend issues
|
|
--------------
|
|
|
|
- Init procs must not be "forgotten" to be called.
|
|
- Files must not be "forgotten" to be linked.
|
|
- Method dispatchers are global.
|
|
- DLL loading via ``dlsym`` is global.
|
|
- Emulated thread vars are global.
|
|
|
|
However the biggest problem is that dead code elimination breaks modularity!
|
|
To see why, consider this scenario: The module ``G`` (for example the huge
|
|
Gtk2 module...) is compiled with dead code elimination turned on. So none
|
|
of ``G``'s procs is generated at all.
|
|
|
|
Then module ``B`` is compiled that requires ``G.P1``. Ok, no problem,
|
|
``G.P1`` is loaded from the symbol file and ``G.c`` now contains ``G.P1``.
|
|
|
|
Then module ``A`` (that depends on ``B`` and ``G``) is compiled and ``B``
|
|
and ``G`` are left unchanged. ``A`` requires ``G.P2``.
|
|
|
|
So now ``G.c`` MUST contain both ``P1`` and ``P2``, but we haven't even
|
|
loaded ``P1`` from the symbol file, nor do we want to because we then quickly
|
|
would restore large parts of the whole program.
|
|
|
|
|
|
Solution
|
|
~~~~~~~~
|
|
|
|
The backend must have some logic so that if the currently processed module
|
|
is from the compilation cache, the ``ast`` field is not accessed. Instead
|
|
the generated C(++) for the symbol's body needs to be cached too and
|
|
inserted back into the produced C file. This approach seems to deal with
|
|
all the outlined problems above.
|
|
|
|
|
|
Debugging Nim's memory management
|
|
=================================
|
|
|
|
The following paragraphs are mostly a reminder for myself. Things to keep
|
|
in mind:
|
|
|
|
* If an assertion in Nim's memory manager or GC fails, the stack trace
|
|
keeps allocating memory! Thus a stack overflow may happen, hiding the
|
|
real issue.
|
|
* What seem to be C code generation problems is often a bug resulting from
|
|
not producing prototypes, so that some types default to ``cint``. Testing
|
|
without the ``-w`` option helps!
|
|
|
|
|
|
The Garbage Collector
|
|
=====================
|
|
|
|
Introduction
|
|
------------
|
|
|
|
I use the term *cell* here to refer to everything that is traced
|
|
(sequences, refs, strings).
|
|
This section describes how the GC works.
|
|
|
|
The basic algorithm is *Deferrent Reference Counting* with cycle detection.
|
|
References on the stack are not counted for better performance and easier C
|
|
code generation.
|
|
|
|
Each cell has a header consisting of a RC and a pointer to its type
|
|
descriptor. However the program does not know about these, so they are placed at
|
|
negative offsets. In the GC code the type ``PCell`` denotes a pointer
|
|
decremented by the right offset, so that the header can be accessed easily. It
|
|
is extremely important that ``pointer`` is not confused with a ``PCell``
|
|
as this would lead to a memory corruption.
|
|
|
|
|
|
The CellSet data structure
|
|
--------------------------
|
|
|
|
The GC depends on an extremely efficient datastructure for storing a
|
|
set of pointers - this is called a ``TCellSet`` in the source code.
|
|
Inserting, deleting and searching are done in constant time. However,
|
|
modifying a ``TCellSet`` during traversal leads to undefined behaviour.
|
|
|
|
.. code-block:: Nim
|
|
type
|
|
TCellSet # hidden
|
|
|
|
proc cellSetInit(s: var TCellSet) # initialize a new set
|
|
proc cellSetDeinit(s: var TCellSet) # empty the set and free its memory
|
|
proc incl(s: var TCellSet, elem: PCell) # include an element
|
|
proc excl(s: var TCellSet, elem: PCell) # exclude an element
|
|
|
|
proc `in`(elem: PCell, s: TCellSet): bool # tests membership
|
|
|
|
iterator elements(s: TCellSet): (elem: PCell)
|
|
|
|
|
|
All the operations have to perform efficiently. Because a Cellset can
|
|
become huge a hash table alone is not suitable for this.
|
|
|
|
We use a mixture of bitset and hash table for this. The hash table maps *pages*
|
|
to a page descriptor. The page descriptor contains a bit for any possible cell
|
|
address within this page. So including a cell is done as follows:
|
|
|
|
- Find the page descriptor for the page the cell belongs to.
|
|
- Set the appropriate bit in the page descriptor indicating that the
|
|
cell points to the start of a memory block.
|
|
|
|
Removing a cell is analogous - the bit has to be set to zero.
|
|
Single page descriptors are never deleted from the hash table. This is not
|
|
needed as the data structures needs to be rebuilt periodically anyway.
|
|
|
|
Complete traversal is done in this way::
|
|
|
|
for each page descriptor d:
|
|
for each bit in d:
|
|
if bit == 1:
|
|
traverse the pointer belonging to this bit
|
|
|
|
|
|
Further complications
|
|
---------------------
|
|
|
|
In Nim the compiler cannot always know if a reference
|
|
is stored on the stack or not. This is caused by var parameters.
|
|
Consider this example:
|
|
|
|
.. code-block:: Nim
|
|
proc setRef(r: var ref TNode) =
|
|
new(r)
|
|
|
|
proc usage =
|
|
var
|
|
r: ref TNode
|
|
setRef(r) # here we should not update the reference counts, because
|
|
# r is on the stack
|
|
setRef(r.left) # here we should update the refcounts!
|
|
|
|
We have to decide at runtime whether the reference is on the stack or not.
|
|
The generated code looks roughly like this:
|
|
|
|
.. code-block:: C
|
|
void setref(TNode** ref) {
|
|
unsureAsgnRef(ref, newObj(TNode_TI, sizeof(TNode)))
|
|
}
|
|
void usage(void) {
|
|
setRef(&r)
|
|
setRef(&r->left)
|
|
}
|
|
|
|
Note that for systems with a continuous stack (which most systems have)
|
|
the check whether the ref is on the stack is very cheap (only two
|
|
comparisons).
|
|
|
|
|
|
Code generation for closures
|
|
============================
|
|
|
|
Code generation for closures is implemented by `lambda lifting`:idx:.
|
|
|
|
|
|
Design
|
|
------
|
|
|
|
A ``closure`` proc var can call ordinary procs of the default Nim calling
|
|
convention. But not the other way round! A closure is implemented as a
|
|
``tuple[prc, env]``. ``env`` can be nil implying a call without a closure.
|
|
This means that a call through a closure generates an ``if`` but the
|
|
interoperability is worth the cost of the ``if``. Thunk generation would be
|
|
possible too, but it's slightly more effort to implement.
|
|
|
|
Tests with GCC on Amd64 showed that it's really beneficial if the
|
|
'environment' pointer is passed as the last argument, not as the first argument.
|
|
|
|
Proper thunk generation is harder because the proc that is to wrap
|
|
could stem from a complex expression:
|
|
|
|
.. code-block:: nim
|
|
receivesClosure(returnsDefaultCC[i])
|
|
|
|
A thunk would need to call 'returnsDefaultCC[i]' somehow and that would require
|
|
an *additional* closure generation... Ok, not really, but it requires to pass
|
|
the function to call. So we'd end up with 2 indirect calls instead of one.
|
|
Another much more severe problem which this solution is that it's not GC-safe
|
|
to pass a proc pointer around via a generic ``ref`` type.
|
|
|
|
|
|
Example code:
|
|
|
|
.. code-block:: nim
|
|
proc add(x: int): proc (y: int): int {.closure.} =
|
|
return proc (y: int): int =
|
|
return x + y
|
|
|
|
var add2 = add(2)
|
|
echo add2(5) #OUT 7
|
|
|
|
This should produce roughly this code:
|
|
|
|
.. code-block:: nim
|
|
type
|
|
PEnv = ref object
|
|
x: int # data
|
|
|
|
proc anon(y: int, c: PEnv): int =
|
|
return y + c.x
|
|
|
|
proc add(x: int): tuple[prc, data] =
|
|
var env: PEnv
|
|
new env
|
|
env.x = x
|
|
result = (anon, env)
|
|
|
|
var add2 = add(2)
|
|
let tmp = if add2.data == nil: add2.prc(5) else: add2.prc(5, add2.data)
|
|
echo tmp
|
|
|
|
|
|
Beware of nesting:
|
|
|
|
.. code-block:: nim
|
|
proc add(x: int): proc (y: int): proc (z: int): int {.closure.} {.closure.} =
|
|
return lambda (y: int): proc (z: int): int {.closure.} =
|
|
return lambda (z: int): int =
|
|
return x + y + z
|
|
|
|
var add24 = add(2)(4)
|
|
echo add24(5) #OUT 11
|
|
|
|
This should produce roughly this code:
|
|
|
|
.. code-block:: nim
|
|
type
|
|
PEnvX = ref object
|
|
x: int # data
|
|
|
|
PEnvY = ref object
|
|
y: int
|
|
ex: PEnvX
|
|
|
|
proc lambdaZ(z: int, ey: PEnvY): int =
|
|
return ey.ex.x + ey.y + z
|
|
|
|
proc lambdaY(y: int, ex: PEnvX): tuple[prc, data: PEnvY] =
|
|
var ey: PEnvY
|
|
new ey
|
|
ey.y = y
|
|
ey.ex = ex
|
|
result = (lambdaZ, ey)
|
|
|
|
proc add(x: int): tuple[prc, data: PEnvX] =
|
|
var ex: PEnvX
|
|
ex.x = x
|
|
result = (labmdaY, ex)
|
|
|
|
var tmp = add(2)
|
|
var tmp2 = tmp.fn(4, tmp.data)
|
|
var add24 = tmp2.fn(4, tmp2.data)
|
|
echo add24(5)
|
|
|
|
|
|
We could get rid of nesting environments by always inlining inner anon procs.
|
|
More useful is escape analysis and stack allocation of the environment,
|
|
however.
|
|
|
|
|
|
Alternative
|
|
-----------
|
|
|
|
Process the closure of all inner procs in one pass and accumulate the
|
|
environments. This is however not always possible.
|
|
|
|
|
|
Accumulator
|
|
-----------
|
|
|
|
.. code-block:: nim
|
|
proc getAccumulator(start: int): proc (): int {.closure} =
|
|
var i = start
|
|
return lambda: int =
|
|
inc i
|
|
return i
|
|
|
|
proc p =
|
|
var delta = 7
|
|
proc accumulator(start: int): proc(): int =
|
|
var x = start-1
|
|
result = proc (): int =
|
|
x = x + delta
|
|
inc delta
|
|
return x
|
|
|
|
var a = accumulator(3)
|
|
var b = accumulator(4)
|
|
echo a() + b()
|
|
|
|
|
|
Internals
|
|
---------
|
|
|
|
Lambda lifting is implemented as part of the ``transf`` pass. The ``transf``
|
|
pass generates code to setup the environment and to pass it around. However,
|
|
this pass does not change the types! So we have some kind of mismatch here; on
|
|
the one hand the proc expression becomes an explicit tuple, on the other hand
|
|
the tyProc(ccClosure) type is not changed. For C code generation it's also
|
|
important the hidden formal param is ``void*`` and not something more
|
|
specialized. However the more specialized env type needs to passed to the
|
|
backend somehow. We deal with this by modifying ``s.ast[paramPos]`` to contain
|
|
the formal hidden parameter, but not ``s.typ``!
|
|
|
|
|
|
Integer literals:
|
|
-----------------
|
|
|
|
In Nim, there is a redundant way to specify the type of an
|
|
integer literal. First of all, it should be unsurprising that every
|
|
node has a node kind. The node of an integer literal can be any of the
|
|
following values:
|
|
|
|
nkIntLit, nkInt8Lit, nkInt16Lit, nkInt32Lit, nkInt64Lit,
|
|
nkUIntLit, nkUInt8Lit, nkUInt16Lit, nkUInt32Lit, nkUInt64Lit
|
|
|
|
On top of that, there is also the `typ` field for the type. It the
|
|
kind of the `typ` field can be one of the following ones, and it
|
|
should be matching the literal kind:
|
|
|
|
tyInt, tyInt8, tyInt16, tyInt32, tyInt64, tyUInt, tyUInt8,
|
|
tyUInt16, tyUInt32, tyUInt64
|
|
|
|
Then there is also the integer literal type. This is a specific type
|
|
that is implicitly convertible into the requested type if the
|
|
requested type can hold the value. For this to work, the type needs to
|
|
know the concrete value of the literal. For example an expression
|
|
`321` will be of type `int literal(321)`. This type is implicitly
|
|
convertible to all integer types and ranges that contain the value
|
|
`321`. That would be all builtin integer types except `uint8` and
|
|
`int8` where `321` would be out of range. When this literal type is
|
|
assigned to a new `var` or `let` variable, it's type will be resolved
|
|
to just `int`, not `int literal(321)` unlike constants. A constant
|
|
keeps the full `int literal(321)` type. Here is an example where that
|
|
difference matters.
|
|
|
|
|
|
.. code-block:: nim
|
|
|
|
proc foo(arg: int8) =
|
|
echo "def"
|
|
|
|
const tmp1 = 123
|
|
foo(tmp1) # OK
|
|
|
|
let tmp2 = 123
|
|
foo(tmp2) # Error
|
|
|
|
In a context with multiple overloads, the integer literal kind will
|
|
always prefer the `int` type over all other types. If none of the
|
|
overloads is of type `int`, then there will be an error because of
|
|
ambiguity.
|
|
|
|
.. code-block:: nim
|
|
|
|
proc foo(arg: int) =
|
|
echo "abc"
|
|
proc foo(arg: int8) =
|
|
echo "def"
|
|
foo(123) # output: abc
|
|
|
|
proc bar(arg: int16) =
|
|
echo "abc"
|
|
proc bar(arg: int8) =
|
|
echo "def"
|
|
|
|
bar(123) # Error ambiguous call
|
|
|
|
In the compiler these integer literal types are represented with the
|
|
node kind `nkIntLit`, type kind `tyInt` and the member `n` of the type
|
|
pointing back to the integer literal node in the ast containing the
|
|
integer value. These are the properties that hold true for integer
|
|
literal types.
|
|
|
|
n.kind == nkIntLit
|
|
n.typ.kind == tyInt
|
|
n.typ.n == n
|
|
|
|
Other literal types, such as `uint literal(123)` that would
|
|
automatically convert to other integer types, but prefers to
|
|
become a `uint` are not part of the Nim language.
|
|
|
|
In an unchecked AST, the `typ` field is nil. The type checker will set
|
|
the `typ` field accordingly to the node kind. Nodes of kind `nkIntLit`
|
|
will get the integer literal type (e.g. `int literal(123)`). Nodes of
|
|
kind `nkUIntLit` will get type `uint` (kind `tyUint`), etc.
|
|
|
|
This also means that it is not possible to write a literal in an
|
|
unchecked AST that will after sem checking just be of type `int` and
|
|
not implicitly convertible to other integer types. This only works for
|
|
all integer types that are not `int`.
|