Update Readme

2026-07-11 19:59:32 +00:00 · 2015-01-18 11:47:05 -05:00
parent cc0d16c5ee
commit 39bc8c2bfb
1 changed files with 42 additions and 48 deletions
--- a/README.asciidoc
+++ b/README.asciidoc
@@ -1,13 +1,13 @@
-= N___ RegEx Library
+= NRE

 == What is NRE?

-A new regular expression library for Nim using PCRE to do the hard work.
+A regular expression library for Nim using PCRE to do the hard work.

 == Why?

 The http://nim-lang.org/re.html[re.nim] module that http://nim-lang.org/[Nim]
-provides in it's standard library is inadequate:
+provides in its standard library is inadequate:

 - It provides only a limited number of captures, while the underling library
   (PCRE) allows an unlimited number.
@@ -21,50 +21,6 @@ provides in it's standard library is inadequate:

 == Documentation

-Creating a pattern is easy: `re"([0-9]+)"`. By default, the extended flag is
-passed in order to encourage readable expressions, so `[0-9]+` is equivalent to
-`[0-9] +  # foo`. If you'd like to pass your own flags, then `re(r"([0-9]+)",
-"<flags>")` will work. Here is a list of the available flags:
-
- - `8` - treat both the pattern and subject as UTF8
- - `9` - prevents the pattern from being interpreted as UTF, no matter what
- - `A` - as if the pattern had a `^` at the beginning
- - `E` - DOLLAR_ENDONLY
- - `f` - fails if there is not a match on the first line
- - `i` - case insensitive
- - `m` - multi-line, `^` and `$` match the beginning and end of lines, not of the
-   subject string
- - `N` - turn off auto-capture, `(?foo)` is necessary to capture.
- - `s` - `.` matches newline
- - `S` - study the pattern to hopefully improve performance. JIT is unspported at
-   the moment.
- - `U` - expressions are not greedy by default. `?` can be added to a qualifier
-   to make it greedy.
- - `u` - same as `8`
- - `W` - Unicode character properties
- - `X` - "Extra", character escapes without special meaning (`\w` vs. `\a`) are
-   errors
- - `x` - extended, comments (`#`) and newlines are ignored (extended)
- - `Y` - pcre.NO_START_OPTIMIZE,
- - `<cr>` - newlines are separated by `\r`
- - `<crlf>` - newlines are separated by `\r\n` (Windows default)
- - `<lf>` - newlines are separated by `\n` (UNIX default)
- - `<anycrlf>` - newlines are separated by any of the above
- - `<any>` - newlines are separated by any of the above and Unicode newlines:
-[quote, , man pcre]
-____
-single characters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL
-(next line, U+0085), LS (line separator, U+2028), and PS (paragraph
-separator, U+2029). For the 8-bit library, the last two are recognized
-only in UTF-8 mode.
-____
- - `<bsr_anycrlf>` - `\R` matches CR, LF, or CRLF
- - `<bsr_unicode>` - `\R` matches any unicode newline
- - `<js>` - Javascript compatibility
-
-`Sx` is enabled by default in order to encourage use of whitespace for better
-readability.
-
 === Procedures

 [[proc-match]]
@@ -144,9 +100,47 @@ fields are as follows:
 === `Pattern`

 Represents the pattern that things are matched against, constructed with
-`initRegex(string)` or `re(string)`.
+`initRegex(string)` or `re(string)`. Examples: `re"foo"`, `re(r"foo # comment",
+"Sx<anycrlf>")`.

 `pattern: string` :: the string that was used to create the pattern.
 `captureCount: int` :: the number of captures that the pattern has.
 `captureNameId: Table[string, int]` :: a table from the capture names to
   their numeric id.
+
+==== Flags
+ - `8` - treat both the pattern and subject as UTF8
+ - `9` - prevents the pattern from being interpreted as UTF, no matter what
+ - `A` - as if the pattern had a `^` at the beginning
+ - `E` - DOLLAR_ENDONLY
+ - `f` - fails if there is not a match on the first line
+ - `i` - case insensitive
+ - `m` - multi-line, `^` and `$` match the beginning and end of lines, not of the
+   subject string
+ - `N` - turn off auto-capture, `(?foo)` is necessary to capture.
+ - `s` - `.` matches newline
+ - `S` - study the pattern to hopefully improve performance. JIT is unspported at
+   the moment.
+ - `U` - expressions are not greedy by default. `?` can be added to a qualifier
+   to make it greedy.
+ - `u` - same as `8`
+ - `W` - Unicode character properties
+ - `X` - "Extra", character escapes without special meaning (`\w` vs. `\a`) are
+   errors
+ - `x` - extended, comments (`#`) and newlines are ignored (extended)
+ - `Y` - pcre.NO_START_OPTIMIZE,
+ - `<cr>` - newlines are separated by `\r`
+ - `<crlf>` - newlines are separated by `\r\n` (Windows default)
+ - `<lf>` - newlines are separated by `\n` (UNIX default)
+ - `<anycrlf>` - newlines are separated by any of the above
+ - `<any>` - newlines are separated by any of the above and Unicode newlines:
+[quote, , man pcre]
+____
+single characters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL
+(next line, U+0085), LS (line separator, U+2028), and PS (paragraph
+separator, U+2029). For the 8-bit library, the last two are recognized
+only in UTF-8 mode.
+____
+ - `<bsr_anycrlf>` - `\R` matches CR, LF, or CRLF
+ - `<bsr_unicode>` - `\R` matches any unicode newline
+ - `<js>` - Javascript compatibility