Go to file
Flaviu Tamas ba3aac1b13 Bump version
2015-01-21 21:08:15 -05:00
2015-01-21 16:34:21 -05:00
2015-01-20 18:23:54 -05:00
2015-01-20 21:17:45 -05:00
2015-01-07 17:48:43 -05:00
2015-01-18 14:01:25 -05:00
2015-01-21 21:08:15 -05:00
2015-01-20 21:17:45 -05:00
2015-01-16 19:22:36 -05:00

image::web/logo.png["NRE Logo", width=auto, link="https://github.com/flaviut/nre"]

== What is NRE?

A regular expression library for Nim using PCRE to do the hard work.

== Why?

The http://nim-lang.org/re.html[re.nim] module that http://nim-lang.org/[Nim]
provides in its standard library is inadequate:

 - It provides only a limited number of captures, while the underling library
   (PCRE) allows an unlimited number.
 - Instead of having one proc that returns both the bounds and substring, it
   has one for the bounds and another for the substring.
 - If the splitting regex is empty (`""`), then it returns the input string
   instead of following https://ideone.com/dDMjmz[Perl],
   http://jsfiddle.net/xtcbxurg/[Javascript], and
   https://ideone.com/hYJuJ5[Java]'s precedent of returning a list of each
   character (`"123".split(re"") == @["1", "2", "3"]`).

== Documentation

=== Operations

[[proc-find]]
==== find(string, Regex, start = 0, endpos = -1): RegexMatch

Finds the given pattern in the string between the end and start positions.

`start` :: The start point at which to start matching. `|abc` is `0`; `a|bc`
   is `1`
`endpos` :: The maximum index for a match; `-1` means the end of the string,
   otherwise it's an exclusive upper bound.

[[proc-match]]
==== match(string, Regex, start = 0, endpos = -1): RegexMatch

Like link:#proc-find[`find(...)`], but anchored to the start of the string.
This means that `"foo".match(re"f") == true`, but `"foo".match(re"o") ==
false`.

[[iter-find]]
==== iterator findIter(string, Regex, start = 0, endpos = -1): RegexMatch

Works the same as link:#proc-find[`find(...)`], but finds every non-overlapping
match. `"2222".find(re"22")` is `"22", "22"`, not `"22", "22", "22"`.

Arguments are the same as link:#proc-find[`find(...)`]

Variants:

 - `proc findAll(...)` returns a `seq[string]`

[[proc-split]]
==== split(string, Regex, maxsplit = -1): seq[string]

Splits the string with the given regex. This works according to the rules that
Perl and Javascript use.

  - If the match is zero-width, then the string is still split:
    `"123".split(r"") == @["1", "2", "3"]`.
  - If the pattern has a capture in it, it is added after the string split:
    `"12".split(re"(\d)") == @["", "1", "", "2", ""]`.
  - If `maxsplit != -1`, then the string will only be split `maxsplit - 1`
    times. This means that there will be `maxsplit` strings in the output seq.
    `"1.2.3".split(re"\.", maxsplit = 2) == @["1", "2.3"]`

[[proc-replace]]
==== replace(string, Regex, sub): string

Replaces each match of Regex in the string with `sub`, which should never be
or return `nil`.

If `sub` is a `proc (RegexMatch): string`, then it is executed with each match
and the return value is the replacement value.

If `sub` is a `proc (string): string`, then it is executed with the full text
of the match and and the return value is the replacement value.

If `sub` is a string, the syntax is as follows:

- `$$` - literal `$`
- `$123` - capture number `123`
- `$foo` - named capture `foo`
- `${foo}` - same as above
- `$1$#` - first and second captures
- `$#` - first capture
- `$0` - full match

[[proc-escapere]]
==== escapeRe(string): string

Escapes the string so it doesn't match any special characters. Incompatible
with the Extra flag (`X`).

=== RegexMatch

Represents the result of an execution. On failure, it is `nil`. The available
fields are as follows:

`pattern: Regex` :: the pattern that is being matched
`str: string` :: the string that was matched against
`captures[]: string` :: the string value of whatever was captured
at that id. If the value is invalid, then behavior is undefined. If the id is
`-1`, then the whole match is returned. If the given capture was not matched,
`nil` is returned.
 - `"abc".match(re"(\w)").captures[0] == "a"`
 - `"abc".match(re"(?<letter>\w)").captures["letter"] == "a"`
 - `"abc".match(re"(\w)\w").captures[-1] == "ab"`
`captureBounds[]: Option[Slice[int]]` :: gets the bounds of the
given capture according to the same rules as the above. If the capture is not
filled, then `None` is returned. The upper bound is exclusive, the lower bound
is inclusive.
 - `"abc".match(re"(\w)").captureBounds[0] == 0..1`
 - `"abc".match(re"").captureBounds[-1] == 0..0`
 - `"abc".match(re"abc").captureBounds[-1] == 0..3`
`match: string` :: the full text of the match.
`matchBounds: Slice[int]` :: the bounds of the match, as in `captureBounds[]`
`(captureBounds|captures).toTable` :: returns a table with each named capture
as a key.
`(captureBounds|captures).toSeq` :: returns all the captures by their number.
`$: string` :: same as `match`

=== Pattern

Represents the pattern that things are matched against, constructed with
`re(string, string)`. Examples: `re"foo"`, `re(r"foo # comment",
"x<anycrlf>")`. 

`pattern: string` :: the string that was used to create the pattern.
`captureCount: int` :: the number of captures that the pattern has.
`captureNameId: Table[string, int]` :: a table from the capture names to
   their numeric id.

==== Flags
 - `8` - treat both the pattern and subject as UTF8
 - `9` - prevents the pattern from being interpreted as UTF, no matter what
 - `A` - as if the pattern had a `^` at the beginning
 - `E` - DOLLAR_ENDONLY
 - `f` - fails if there is not a match on the first line
 - `i` - case insensitive
 - `m` - multi-line, `^` and `$` match the beginning and end of lines, not of the
   subject string
 - `N` - turn off auto-capture, `(?foo)` is necessary to capture.
 - `s` - `.` matches newline
 - `U` - expressions are not greedy by default. `?` can be added to a qualifier
   to make it greedy.
 - `u` - same as `8`
 - `W` - Unicode character properties; `\w` matches `к`.
 - `X` - "Extra", character escapes without special meaning (`\w` vs. `\a`) are
   errors
 - `x` - extended, comments (`#`) and newlines are ignored (extended)
 - `Y` - pcre.NO_START_OPTIMIZE,
 - `<cr>` - newlines are separated by `\r`
 - `<crlf>` - newlines are separated by `\r\n` (Windows default)
 - `<lf>` - newlines are separated by `\n` (UNIX default)
 - `<anycrlf>` - newlines are separated by any of the above
 - `<any>` - newlines are separated by any of the above and Unicode newlines:
[quote, , man pcre]
____
single characters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL
(next line, U+0085), LS (line separator, U+2028), and PS (paragraph
separator, U+2029). For the 8-bit library, the last two are recognized
only in UTF-8 mode.
____
 - `<bsr_anycrlf>` - `\R` matches CR, LF, or CRLF
 - `<bsr_unicode>` - `\R` matches any unicode newline
 - `<js>` - Javascript compatibility
 - `<no_study>` - turn off studying; study is enabled by deafault

== Other Notes

By default, NRE compiles it's own PCRE. If this is undesirable, pass
`-d:pcreDynlib` to use whatever dynamic library is available on the system.
This may have unexpected consequences if the dynamic library doesn't have
certain features enabled.
Description
Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
Readme 652 MiB
Languages
Nim 96.6%
HTML 1.8%
Python 0.5%
C 0.4%
Shell 0.4%
Other 0.2%