From 38eb67de835db986105a7340def055cad704f697 Mon Sep 17 00:00:00 2001 From: Grzegorz Adam Hankiewicz Date: Fri, 15 Nov 2013 23:04:21 +0100 Subject: [PATCH 1/9] Expands tutorial macro section with step by step guide. --- doc/tut2.txt | 265 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 258 insertions(+), 7 deletions(-) diff --git a/doc/tut2.txt b/doc/tut2.txt index e1e36bfc41..4d8d0be157 100644 --- a/doc/tut2.txt +++ b/doc/tut2.txt @@ -699,15 +699,22 @@ once. Macros ====== -Macros enable advanced compile-time code transformations, but they -cannot change Nimrod's syntax. However, this is no real restriction because -Nimrod's syntax is flexible enough anyway. +Macros enable advanced compile-time code transformations, but they cannot +change Nimrod's syntax. However, this is no real restriction because Nimrod's +syntax is flexible enough anyway. Macros have to be implemented in pure Nimrod +code if `foreign function interface (FFI) +`_ is not enabled in the compiler, but +other than that restriction (which at some point in the future will go away) +you can write any kind of Nimrod code and the compiler will run it at compile +time. -To write a macro, one needs to know how the Nimrod concrete syntax is converted -to an abstract syntax tree (AST). The AST is documented in the -`macros `_ module. +There are two ways to write a macro, either *generating* Nimrod source code and +letting the compiler parse it, or creating manually an abstract syntax tree +(AST) which you feed to the compiler. In order to build the AST one needs to +know how the Nimrod concrete syntax is converted to an abstract syntax tree +(AST). The AST is documented in the `macros `_ module. -There are two ways to invoke a macro: +Once your macro is finished, there are two ways to invoke it: (1) invoking a macro like a procedure call (`expression macros`:idx:) (2) invoking a macro with the special ``macrostmt`` syntax (`statement macros`:idx:) @@ -796,3 +803,247 @@ Term rewriting macros Term rewriting macros can be used to enhance the compilation process with user defined optimizations; see this `document `_ for further information. + + +Building your first macro +------------------------- + +To give a footstart to writing macros we will show now how to turn your typical +dynamic code into something that compiles statically. For the exercise we will +use the following snippet of code as the starting point: + +.. code-block:: nimrod + + import strutils, tables + + proc readCfgAtRuntime(cfgFilename: string): TTable[string, string] = + let + inputString = readFile(cfgFilename) + var + rawLines = split(inputString, {char(0x0a), char(0x0d)}) + source = "" + + result = initTable[string, string]() + for line in rawLines: + var chunks = split(line, ',') + if chunks.len != 2: + quit("Input needs comma split values, got: " & line) + result[chunks[0]] = chunks[1] + + if result.len < 1: quit("Input file empty!") + + let info = readCfgAtRuntime("data.cfg") + + when isMainModule: + echo info["licenseOwner"] + echo info["licenseKey"] + echo info["version"] + +Presumably this snippet of code could be used in a commercial software, reading +a configuration file to display information about the person who bought the +software. This external file would be generated by an online web shopping cart +to be included along the program containing the license information:: + + version,1.1 + licenseOwner,Hyori Lee + licenseKey,M1Tl3PjBWO2CC48m + +The ``readCfgAtRuntime`` proc will open the given filename and return a +``TTable`` from the `tables module `_. The parsing of the file is +done (without much care for handling invalid data or corner cases) using the +``split`` proc from the `strutils module `_. There are many +things which can fail; mind the purpose is explaining how to make this run at +compile time, not how to properly implement a DRM scheme. + +The reimplementation of this code as a compile time proc will allow us to get +rid of the ``data.cfg`` file we would need to distribute along the binary, plus +if the information is really constant, it doesn't make from a logical point of +view to have it *mutable* in a global variable, it would be better if it was a +constant. Finally, and likely the most valuable feature, we can implement some +verification at compile time. You could think if this as a better *unit +testing*, since it is impossible to obtain a binary unless everything is +correct, preventing you to ship to users a broken program which won't start +because a small critical file is missing or its contents changed by mistake to +something invalid. + + +Generating source code +++++++++++++++++++++++ + +Our first attempt will start by modifying the program to generate a compile +time string with the *generated source code*, which we then pass to the +``parseStmt`` proc from the `macros module `_. Here is the +modified source code implementing the macro: + +.. code-block:: nimrod + import macros, strutils + + macro readCfgAndBuildSource(cfgFilename: string): stmt = + let + inputString = slurp(cfgFilename.strVal) + var + rawLines = split(inputString, {char(0x0a), char(0x0d)}) + source = "" + + for line in rawLines: + var chunks = split(line, ',') + if chunks.len != 2: + error("Input needs comma split values, got: " & line) + source &= "const cfg" & chunks[0] & "= \"" & chunks[1] & "\"\n" + + if source.len < 1: error("Input file empty!") + result = parseStmt(source) + + readCfgAndBuildSource("data.cfg") + + when isMainModule: + echo cfglicenseOwner + echo cfglicenseKey + echo cfgversion + +The good news is not much has changed! First, we need to change the handling of +the input parameter. In the dynamic version the ``readCfgAtRuntime`` proc +receives a string parameter. However, in the macro version it is also declared +as string, but this is the *outside* interface of the macro. When the macro is +run, it actually gets a ``PNimrodNode`` object instead of a string, and we have +to call the ``strVal`` proc from the `macros module `_ to obtain +the string being passed in to the macro. + +Second, we cannot use the ``readFile`` proc from the `system module +`_ due to FFI restriction at compile time. If we try to use this +proc, or any other which depends on FFI, the compiler will error with the +message ``cannot evaluate`` and a dump of the macro's source code, along with a +stack trace where the compiler reached before bailing out. We can get around +this limitation by using the ``slurp`` proc from the `system module +`_, which was precisely made for compilation time (just like +``gorge`` which executes an external program and captures its output). + +The interesting thing is that our macro does not return a runtime ``TTable`` +object. Instead, it builds up Nimrod source code into the ``source`` variable. +For each line of the configuration file a ``const`` variable will be generated. +To avoid conflicts we prefix these variables with ``cfg``. In essence, what the +compiler is doing is replacing the line calling the macro with the following +snippet of code: + +.. code-block:: nimrod + const cfgversion= "1.1" + const cfglicenseOwner= "Hyori Lee" + const cfglicenseKey= "M1Tl3PjBWO2CC48m" + +You can verify this yourself adding the line ``echo source`` somewhere at the +end of the macro and compiling the program. Another difference is that instead +of calling the usual ``quit`` proc to abort (which we could still call) this +version calls the ``error`` proc. The ``error`` proc has the same behavior as +``quit`` but will dump also the source and file line information where the +error happened, making it easier for the programmer to find where compilation +failed. In this situation it would point to the line invoking the macro, but +**not** the line of ``data.cfg`` we are processing, that's something the macro +itself would need to control. + + +Generating AST by hand +++++++++++++++++++++++ + +To generate an AST we would need to intimately know the structures used by the +Nimrod compiler exposed in the `macros module `_, which at first +look seems a daunting task. But we can use a helper shortcut the ``dumpTree`` +macro, which is used as a statement macro instead of an expression macro. +Since we know that we want to generate a bunch of ``const`` symbols we can +create the following source file and compile it to see what the compiler +*expects* from us: + +.. code-block:: nimrod + import macros + + dumpTree: + const cfgversion: string = "1.1" + const cfglicenseOwner= "Hyori Lee" + const cfglicenseKey= "M1Tl3PjBWO2CC48m" + +During compilation of the source code we should see the following lines in the +output (again, since this is a macro, compilation is enough, you don't have to +run any binary):: + + StmtList + ConstSection + ConstDef + Ident !"cfgversion" + Ident !"string" + StrLit 1.1 + ConstSection + ConstDef + Ident !"cfglicenseOwner" + Empty + StrLit Hyori Lee + ConstSection + ConstDef + Ident !"cfglicenseKey" + Empty + StrLit M1Tl3PjBWO2CC48m + +With this output we have a better idea of what kind of input the compiler +expects. We need to generate a list of statements. For each constant the source +code generates a ``ConstSection`` and a ``ConstDef``. If we were to move all +the constants to a single ``const`` block we would see only a single +``ConstSection`` with three children. + +Maybe you didn't notice, but in the ``dumpTree`` example the first constant +explicitly specifies the type of the constant. That's why in the tree output +the two last constants have their second child ``Empty`` but the first has a +string identifier. So basically a ``const`` definition is made up from an +identifier, optionally a type (can be an *empty* node) and the value. Armed +with this knowledge, let's look at the finished version of the AST building +macro: + +.. code-block:: nimrod + import macros, strutils + + macro readCfgAndBuildAST(cfgFilename: string): stmt = + let + inputString = slurp(cfgFilename.strVal) + var + rawLines = split(inputString, {char(0x0a), char(0x0d)}) + + result = newNimNode(nnkStmtList) + for line in rawLines: + var chunks = split(line, ',') + if chunks.len != 2: + error("Input needs comma split values, got: " & line) + var + section = newNimNode(nnkConstSection) + constDef = newNimNode(nnkConstDef) + constDef.add(newIdentNode("cfg" & chunks[0])) + constDef.add(newEmptyNode()) + constDef.add(newStrLitNode(chunks[1])) + section.add(constDef) + result.add(section) + + if result.len < 1: error("Input file empty!") + + readCfgAndBuildAST("data.cfg") + + when isMainModule: + echo cfglicenseOwner + echo cfglicenseKey + echo cfgversion + +Since we are building on the previous example generating source code, we will +only mention the differences to it. Instead of creating a temporary ``string`` +variable and writing into it source code as if it were written *by hand*, we +use the ``result`` variable directly and create a statement list node +(``nnkStmtList``) which will hold our children. + +For each input line we have to create a constant definition (``nnkConstDef``) +and wrap it inside a constant section (``nnkConstSection``). Once these +variables are created, we fill them hierarchichally like the previous AST dump +tree showed: the constant definition is a child of the section definition, and +the constant definition has an identifier node, an empty node (we let the +compiler figure out the type), and a string literal with the value. + +A last tip when writing a macro: if you are not sure the AST you are building +looks ok, you may be tempted to use the ``dumpTree`` macro. But you can't use +it *inside* the macro you are writting/debugging. Instead ``echo`` the string +generated by ``treeRepr``. If at the end of the this example you add ``echo +treeRepr(result)`` you should get the same output as using the ``dumpTree`` +macro, but of course you can call that at any point of the macro where you +might be having troubles. From c3e7a970c6f331e730243cc9234b83faa9c7b611 Mon Sep 17 00:00:00 2001 From: Grzegorz Adam Hankiewicz Date: Tue, 19 Nov 2013 22:33:02 +0100 Subject: [PATCH 2/9] Modifies example to use splitLines. --- doc/tut2.txt | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/doc/tut2.txt b/doc/tut2.txt index 4d8d0be157..3e10d172c6 100644 --- a/doc/tut2.txt +++ b/doc/tut2.txt @@ -820,11 +820,12 @@ use the following snippet of code as the starting point: let inputString = readFile(cfgFilename) var - rawLines = split(inputString, {char(0x0a), char(0x0d)}) source = "" result = initTable[string, string]() - for line in rawLines: + for line in inputString.splitLines: + # Ignore empty lines + if line.len < 1: continue var chunks = split(line, ',') if chunks.len != 2: quit("Input needs comma split values, got: " & line) @@ -882,10 +883,11 @@ modified source code implementing the macro: let inputString = slurp(cfgFilename.strVal) var - rawLines = split(inputString, {char(0x0a), char(0x0d)}) source = "" - for line in rawLines: + for line in inputString.splitLines: + # Ignore empty lines + if line.len < 1: continue var chunks = split(line, ',') if chunks.len != 2: error("Input needs comma split values, got: " & line) @@ -1001,11 +1003,11 @@ macro: macro readCfgAndBuildAST(cfgFilename: string): stmt = let inputString = slurp(cfgFilename.strVal) - var - rawLines = split(inputString, {char(0x0a), char(0x0d)}) result = newNimNode(nnkStmtList) - for line in rawLines: + for line in inputString.splitLines: + # Ignore empty lines + if line.len < 1: continue var chunks = split(line, ',') if chunks.len != 2: error("Input needs comma split values, got: " & line) From 3a494f5d8d28414fc8df091c53b43ca666d229a2 Mon Sep 17 00:00:00 2001 From: Grzegorz Adam Hankiewicz Date: Tue, 19 Nov 2013 22:35:05 +0100 Subject: [PATCH 3/9] Corrects grammar and span of italics in text. --- doc/tut2.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/tut2.txt b/doc/tut2.txt index 3e10d172c6..f66a5135d1 100644 --- a/doc/tut2.txt +++ b/doc/tut2.txt @@ -861,7 +861,7 @@ rid of the ``data.cfg`` file we would need to distribute along the binary, plus if the information is really constant, it doesn't make from a logical point of view to have it *mutable* in a global variable, it would be better if it was a constant. Finally, and likely the most valuable feature, we can implement some -verification at compile time. You could think if this as a better *unit +verification at compile time. You could think of this as a *better unit testing*, since it is impossible to obtain a binary unless everything is correct, preventing you to ship to users a broken program which won't start because a small critical file is missing or its contents changed by mistake to From 0a953de3a819de8d0a14943054427924f31ff05e Mon Sep 17 00:00:00 2001 From: Grzegorz Adam Hankiewicz Date: Tue, 26 Nov 2013 13:34:54 +0100 Subject: [PATCH 4/9] Adds to tutorial info about unpacking tuples. --- doc/tut1.txt | 40 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/doc/tut1.txt b/doc/tut1.txt index 5c1cdb52e5..b46fb2c686 100644 --- a/doc/tut1.txt +++ b/doc/tut1.txt @@ -189,9 +189,18 @@ to a storage location: var x = "abc" # introduces a new variable `x` and assigns a value to it x = "xyz" # assigns a new value to `x` -``=`` is the *assignment operator*. The assignment operator cannot -be overloaded, overwritten or forbidden, but this might change in a future -version of Nimrod. +``=`` is the *assignment operator*. The assignment operator cannot be +overloaded, overwritten or forbidden, but this might change in a future version +of Nimrod. You can declare multiple variables with a single assignment +statement and all the variables will have the same value: + +.. code-block:: + var x, y = 3 # assigns 3 to the variables `x` and `y` + echo "x ", x # outputs "x 3" + echo "y ", y # outputs "y 3" + x = 42 # changes `x` to 42 without changing `y` + echo "x ", x # outputs "x 42" + echo "y ", y # outputs "y 3" Constants @@ -1352,6 +1361,31 @@ Even though you don't need to declare a type for a tuple to use it, tuples created with different field names will be considered different objects despite having the same field types. +Tuples can be *unpacked* during variable assignment. This can be handy to +assign directly the fields of the tuples to individually named variables. An +example of this is the ``splitFile`` proc from the `os module `_ which +returns the directory, name and extension of a path at the same time. For tuple +unpacking to work you have to use parenthesis around the values you want to +assign the unpacking to, otherwise you will be assigning the same value to all +the individual variables! Example: + +.. code-block:: nimrod + + import os + + let + path = "usr/local/nimrodc.html" + (dir, name, ext) = splitFile(path) + baddir, badname, badext = splitFile(path) + echo dir # outputs `usr/local` + echo name # outputs `nimrodc` + echo ext # outputs `.html` + # All the following output the same line: + # `(dir: usr/local, name: nimrodc, ext: .html)` + echo baddir + echo badname + echo badext + Reference and pointer types --------------------------- From f91a34c9e6ab91b57627803631cf099cd11e32ab Mon Sep 17 00:00:00 2001 From: Grzegorz Adam Hankiewicz Date: Sat, 30 Nov 2013 21:03:41 +0100 Subject: [PATCH 5/9] Adds an example to htmlparser showing how to save changes. --- lib/pure/htmlparser.nim | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/lib/pure/htmlparser.nim b/lib/pure/htmlparser.nim index d60d2e5830..c39f5ff97d 100644 --- a/lib/pure/htmlparser.nim +++ b/lib/pure/htmlparser.nim @@ -17,11 +17,37 @@ ## ## echo loadHtml("mydirty.html") ## -## ## Every tag in the resulting tree is in lower case. ## ## **Note:** The resulting ``PXmlNode`` already uses the ``clientData`` field, ## so it cannot be used by clients of this library. +## +## Example: Transforming hyperlinks +## ================================ +## +## This code demonstrates how you can iterate over all the tags in an HTML file +## and write back the modified version. In this case we look for hyperlinks +## ending with the extension ``.rst`` and convert them to ``.html``. +## +## .. code-block:: nimrod +## +## import htmlparser +## import xmltree # To use '$' for PXmlNode +## import strtabs # To access PXmlAttributes +## import os # To use splitFile +## import strutils # To use cmpIgnoreCase +## +## proc transformHyperlinks() = +## let html = loadHTML("input.html") +## +## for a in html.findAll("a"): +## let href = a.attrs["href"] +## if not href.isNil: +## let (dir, filename, ext) = splitFile(href) +## if cmpIgnoreCase(ext, ".rst") == 0: +## a.attrs["href"] = dir / filename & ".html" +## +## writeFile("output.html", $html) import strutils, streams, parsexml, xmltree, unicode, strtabs From d1284ff33d936ed9ab67db2cb85571d3c37e8feb Mon Sep 17 00:00:00 2001 From: Grzegorz Adam Hankiewicz Date: Sun, 1 Dec 2013 21:07:50 +0100 Subject: [PATCH 6/9] Mentions tuple unpacking only works in var/let blocks. --- doc/tut1.txt | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/doc/tut1.txt b/doc/tut1.txt index b46fb2c686..2070c69d60 100644 --- a/doc/tut1.txt +++ b/doc/tut1.txt @@ -1361,13 +1361,13 @@ Even though you don't need to declare a type for a tuple to use it, tuples created with different field names will be considered different objects despite having the same field types. -Tuples can be *unpacked* during variable assignment. This can be handy to -assign directly the fields of the tuples to individually named variables. An -example of this is the ``splitFile`` proc from the `os module `_ which -returns the directory, name and extension of a path at the same time. For tuple -unpacking to work you have to use parenthesis around the values you want to -assign the unpacking to, otherwise you will be assigning the same value to all -the individual variables! Example: +Tuples can be *unpacked* during variable assignment (and only then!). This can +be handy to assign directly the fields of the tuples to individually named +variables. An example of this is the ``splitFile`` proc from the `os module +`_ which returns the directory, name and extension of a path at the +same time. For tuple unpacking to work you have to use parenthesis around the +values you want to assign the unpacking to, otherwise you will be assigning the +same value to all the individual variables! Example: .. code-block:: nimrod @@ -1386,6 +1386,20 @@ the individual variables! Example: echo badname echo badext +Tuple unpacking **only** works in ``var`` or ``let`` blocks. The following code +won't compile: + +.. code-block:: nimrod + + import os + + var + path = "usr/local/nimrodc.html" + dir, name, ext = "" + + (dir, name, ext) = splitFile(path) + # --> Error: '(dir, name, ext)' cannot be assigned to + Reference and pointer types --------------------------- From dc9e17503ea38ab2b5cf1f7675cae234f2c9dc3a Mon Sep 17 00:00:00 2001 From: Grzegorz Adam Hankiewicz Date: Mon, 2 Dec 2013 20:50:56 +0100 Subject: [PATCH 7/9] Makes htmlparser handle whitespace. Refs #694. Without the flag, htmlparser will ignore some significant whitespace in HTML files. A more correct fix would be to not reuse the xml parser since the rules for HTML are slightly different, but this will do for the moment. --- lib/pure/htmlparser.nim | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/pure/htmlparser.nim b/lib/pure/htmlparser.nim index d60d2e5830..7bcc501a7a 100644 --- a/lib/pure/htmlparser.nim +++ b/lib/pure/htmlparser.nim @@ -528,7 +528,7 @@ proc parseHtml*(s: PStream, filename: string, ## parses the XML from stream `s` and returns a ``PXmlNode``. Every ## occured parsing error is added to the `errors` sequence. var x: TXmlParser - open(x, s, filename, {reportComments}) + open(x, s, filename, {reportComments, reportWhitespace}) next(x) # skip the DOCTYPE: if x.kind == xmlSpecial: next(x) From e145231a1d24676da07731a3d57a72dbc62c42a4 Mon Sep 17 00:00:00 2001 From: Erik O'Leary Date: Sun, 1 Dec 2013 17:29:28 -0600 Subject: [PATCH 8/9] Updated cfg file processing No longer look at deprecated file.cfg, compiler will only look at file.nimrod.cfg --- compiler/nimconf.nim | 5 ----- 1 file changed, 5 deletions(-) diff --git a/compiler/nimconf.nim b/compiler/nimconf.nim index 507812d9c8..7ec566a01f 100644 --- a/compiler/nimconf.nim +++ b/compiler/nimconf.nim @@ -243,11 +243,6 @@ proc LoadConfigs*(cfg: string) = readConfigFile(pd / cfg) if gProjectName.len != 0: - var conffile = changeFileExt(gProjectFull, "cfg") - if conffile != pd / cfg and existsFile(conffile): - readConfigFile(conffile) - rawMessage(warnConfigDeprecated, conffile) - # new project wide config file: readConfigFile(changeFileExt(gProjectFull, "nimrod.cfg")) From b5ac23477157ed8bcdac854d3ac000ace7aa8942 Mon Sep 17 00:00:00 2001 From: onionhammer Date: Sun, 1 Dec 2013 17:30:09 -0600 Subject: [PATCH 9/9] Renamed nimrod.cfg to nimrod.nimrod.cfg --- compiler/{nimrod.cfg => nimrod.nimrod.cfg} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename compiler/{nimrod.cfg => nimrod.nimrod.cfg} (100%) diff --git a/compiler/nimrod.cfg b/compiler/nimrod.nimrod.cfg similarity index 100% rename from compiler/nimrod.cfg rename to compiler/nimrod.nimrod.cfg