Merge pull request #676 from gradha/pr_expands_macro_tutorial

Expands tutorial macro section with step by step guide.
This commit is contained in:
Grzegorz Adam Hankiewicz
2013-12-02 15:15:09 -08:00

View File

@@ -699,15 +699,22 @@ once.
Macros
======
Macros enable advanced compile-time code transformations, but they
cannot change Nimrod's syntax. However, this is no real restriction because
Nimrod's syntax is flexible enough anyway.
Macros enable advanced compile-time code transformations, but they cannot
change Nimrod's syntax. However, this is no real restriction because Nimrod's
syntax is flexible enough anyway. Macros have to be implemented in pure Nimrod
code if `foreign function interface (FFI)
<manual.html#foreign-function-interface>`_ is not enabled in the compiler, but
other than that restriction (which at some point in the future will go away)
you can write any kind of Nimrod code and the compiler will run it at compile
time.
To write a macro, one needs to know how the Nimrod concrete syntax is converted
to an abstract syntax tree (AST). The AST is documented in the
`macros <macros.html>`_ module.
There are two ways to write a macro, either *generating* Nimrod source code and
letting the compiler parse it, or creating manually an abstract syntax tree
(AST) which you feed to the compiler. In order to build the AST one needs to
know how the Nimrod concrete syntax is converted to an abstract syntax tree
(AST). The AST is documented in the `macros <macros.html>`_ module.
There are two ways to invoke a macro:
Once your macro is finished, there are two ways to invoke it:
(1) invoking a macro like a procedure call (`expression macros`:idx:)
(2) invoking a macro with the special ``macrostmt``
syntax (`statement macros`:idx:)
@@ -796,3 +803,249 @@ Term rewriting macros
Term rewriting macros can be used to enhance the compilation process
with user defined optimizations; see this `document <trmacros.html>`_ for
further information.
Building your first macro
-------------------------
To give a footstart to writing macros we will show now how to turn your typical
dynamic code into something that compiles statically. For the exercise we will
use the following snippet of code as the starting point:
.. code-block:: nimrod
import strutils, tables
proc readCfgAtRuntime(cfgFilename: string): TTable[string, string] =
let
inputString = readFile(cfgFilename)
var
source = ""
result = initTable[string, string]()
for line in inputString.splitLines:
# Ignore empty lines
if line.len < 1: continue
var chunks = split(line, ',')
if chunks.len != 2:
quit("Input needs comma split values, got: " & line)
result[chunks[0]] = chunks[1]
if result.len < 1: quit("Input file empty!")
let info = readCfgAtRuntime("data.cfg")
when isMainModule:
echo info["licenseOwner"]
echo info["licenseKey"]
echo info["version"]
Presumably this snippet of code could be used in a commercial software, reading
a configuration file to display information about the person who bought the
software. This external file would be generated by an online web shopping cart
to be included along the program containing the license information::
version,1.1
licenseOwner,Hyori Lee
licenseKey,M1Tl3PjBWO2CC48m
The ``readCfgAtRuntime`` proc will open the given filename and return a
``TTable`` from the `tables module <tables.html>`_. The parsing of the file is
done (without much care for handling invalid data or corner cases) using the
``split`` proc from the `strutils module <strutils.html>`_. There are many
things which can fail; mind the purpose is explaining how to make this run at
compile time, not how to properly implement a DRM scheme.
The reimplementation of this code as a compile time proc will allow us to get
rid of the ``data.cfg`` file we would need to distribute along the binary, plus
if the information is really constant, it doesn't make from a logical point of
view to have it *mutable* in a global variable, it would be better if it was a
constant. Finally, and likely the most valuable feature, we can implement some
verification at compile time. You could think of this as a *better unit
testing*, since it is impossible to obtain a binary unless everything is
correct, preventing you to ship to users a broken program which won't start
because a small critical file is missing or its contents changed by mistake to
something invalid.
Generating source code
++++++++++++++++++++++
Our first attempt will start by modifying the program to generate a compile
time string with the *generated source code*, which we then pass to the
``parseStmt`` proc from the `macros module <macros.html>`_. Here is the
modified source code implementing the macro:
.. code-block:: nimrod
import macros, strutils
macro readCfgAndBuildSource(cfgFilename: string): stmt =
let
inputString = slurp(cfgFilename.strVal)
var
source = ""
for line in inputString.splitLines:
# Ignore empty lines
if line.len < 1: continue
var chunks = split(line, ',')
if chunks.len != 2:
error("Input needs comma split values, got: " & line)
source &= "const cfg" & chunks[0] & "= \"" & chunks[1] & "\"\n"
if source.len < 1: error("Input file empty!")
result = parseStmt(source)
readCfgAndBuildSource("data.cfg")
when isMainModule:
echo cfglicenseOwner
echo cfglicenseKey
echo cfgversion
The good news is not much has changed! First, we need to change the handling of
the input parameter. In the dynamic version the ``readCfgAtRuntime`` proc
receives a string parameter. However, in the macro version it is also declared
as string, but this is the *outside* interface of the macro. When the macro is
run, it actually gets a ``PNimrodNode`` object instead of a string, and we have
to call the ``strVal`` proc from the `macros module <macros.html>`_ to obtain
the string being passed in to the macro.
Second, we cannot use the ``readFile`` proc from the `system module
<system.html>`_ due to FFI restriction at compile time. If we try to use this
proc, or any other which depends on FFI, the compiler will error with the
message ``cannot evaluate`` and a dump of the macro's source code, along with a
stack trace where the compiler reached before bailing out. We can get around
this limitation by using the ``slurp`` proc from the `system module
<system.html>`_, which was precisely made for compilation time (just like
``gorge`` which executes an external program and captures its output).
The interesting thing is that our macro does not return a runtime ``TTable``
object. Instead, it builds up Nimrod source code into the ``source`` variable.
For each line of the configuration file a ``const`` variable will be generated.
To avoid conflicts we prefix these variables with ``cfg``. In essence, what the
compiler is doing is replacing the line calling the macro with the following
snippet of code:
.. code-block:: nimrod
const cfgversion= "1.1"
const cfglicenseOwner= "Hyori Lee"
const cfglicenseKey= "M1Tl3PjBWO2CC48m"
You can verify this yourself adding the line ``echo source`` somewhere at the
end of the macro and compiling the program. Another difference is that instead
of calling the usual ``quit`` proc to abort (which we could still call) this
version calls the ``error`` proc. The ``error`` proc has the same behavior as
``quit`` but will dump also the source and file line information where the
error happened, making it easier for the programmer to find where compilation
failed. In this situation it would point to the line invoking the macro, but
**not** the line of ``data.cfg`` we are processing, that's something the macro
itself would need to control.
Generating AST by hand
++++++++++++++++++++++
To generate an AST we would need to intimately know the structures used by the
Nimrod compiler exposed in the `macros module <macros.html>`_, which at first
look seems a daunting task. But we can use a helper shortcut the ``dumpTree``
macro, which is used as a statement macro instead of an expression macro.
Since we know that we want to generate a bunch of ``const`` symbols we can
create the following source file and compile it to see what the compiler
*expects* from us:
.. code-block:: nimrod
import macros
dumpTree:
const cfgversion: string = "1.1"
const cfglicenseOwner= "Hyori Lee"
const cfglicenseKey= "M1Tl3PjBWO2CC48m"
During compilation of the source code we should see the following lines in the
output (again, since this is a macro, compilation is enough, you don't have to
run any binary)::
StmtList
ConstSection
ConstDef
Ident !"cfgversion"
Ident !"string"
StrLit 1.1
ConstSection
ConstDef
Ident !"cfglicenseOwner"
Empty
StrLit Hyori Lee
ConstSection
ConstDef
Ident !"cfglicenseKey"
Empty
StrLit M1Tl3PjBWO2CC48m
With this output we have a better idea of what kind of input the compiler
expects. We need to generate a list of statements. For each constant the source
code generates a ``ConstSection`` and a ``ConstDef``. If we were to move all
the constants to a single ``const`` block we would see only a single
``ConstSection`` with three children.
Maybe you didn't notice, but in the ``dumpTree`` example the first constant
explicitly specifies the type of the constant. That's why in the tree output
the two last constants have their second child ``Empty`` but the first has a
string identifier. So basically a ``const`` definition is made up from an
identifier, optionally a type (can be an *empty* node) and the value. Armed
with this knowledge, let's look at the finished version of the AST building
macro:
.. code-block:: nimrod
import macros, strutils
macro readCfgAndBuildAST(cfgFilename: string): stmt =
let
inputString = slurp(cfgFilename.strVal)
result = newNimNode(nnkStmtList)
for line in inputString.splitLines:
# Ignore empty lines
if line.len < 1: continue
var chunks = split(line, ',')
if chunks.len != 2:
error("Input needs comma split values, got: " & line)
var
section = newNimNode(nnkConstSection)
constDef = newNimNode(nnkConstDef)
constDef.add(newIdentNode("cfg" & chunks[0]))
constDef.add(newEmptyNode())
constDef.add(newStrLitNode(chunks[1]))
section.add(constDef)
result.add(section)
if result.len < 1: error("Input file empty!")
readCfgAndBuildAST("data.cfg")
when isMainModule:
echo cfglicenseOwner
echo cfglicenseKey
echo cfgversion
Since we are building on the previous example generating source code, we will
only mention the differences to it. Instead of creating a temporary ``string``
variable and writing into it source code as if it were written *by hand*, we
use the ``result`` variable directly and create a statement list node
(``nnkStmtList``) which will hold our children.
For each input line we have to create a constant definition (``nnkConstDef``)
and wrap it inside a constant section (``nnkConstSection``). Once these
variables are created, we fill them hierarchichally like the previous AST dump
tree showed: the constant definition is a child of the section definition, and
the constant definition has an identifier node, an empty node (we let the
compiler figure out the type), and a string literal with the value.
A last tip when writing a macro: if you are not sure the AST you are building
looks ok, you may be tempted to use the ``dumpTree`` macro. But you can't use
it *inside* the macro you are writting/debugging. Instead ``echo`` the string
generated by ``treeRepr``. If at the end of the this example you add ``echo
treeRepr(result)`` you should get the same output as using the ``dumpTree``
macro, but of course you can call that at any point of the macro where you
might be having troubles.