Merge remote 'nre' into add-nre

* nre-proj/master: (132 commits)
  Change to options module
  Reweave readme
  Better handle errors
  Update documentation
  Change flags to inline
  Improve performance
  Add tests for empty or non-empty match
  Fix skipping an empty match at the end
  Add longer flags
  Fix getinfo overflows
  Use docweave
  Convert readme to RST
  Fix result shadowing warning
  Throw an exception when replacing with a nil value
  Fix potential buffer overflow
  Fix zero-length matches for multibyte characters
  Make splitting an empty string give 1 empty result
  Change endpos to inclusive
  Change endpos default from -1 to int.high
  Change capture upper bounds to inclusive
  ...
This commit is contained in:
Flaviu Tamas
2015-05-26 19:05:43 -04:00
47 changed files with 46616 additions and 0 deletions

9
lib/impure/nre/.gitignore vendored Normal file
View File

@@ -0,0 +1,9 @@
# all executables
*
!*/
!*.*
*.exe
# Wildcard patterns.
*.swp
nimcache

19
lib/impure/nre/LICENCE Normal file
View File

@@ -0,0 +1,19 @@
Copyright (c) 2015 Flaviu Tamas
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

258
lib/impure/nre/README.rst Normal file
View File

@@ -0,0 +1,258 @@
What is NRE?
============
A regular expression library for Nim using PCRE to do the hard work.
Why?
----
The `re.nim <http://nim-lang.org/re.html>`__ module that
`Nim <http://nim-lang.org/>`__ provides in its standard library is
inadequate:
- It provides only a limited number of captures, while the underling
library (PCRE) allows an unlimited number.
- Instead of having one proc that returns both the bounds and
substring, it has one for the bounds and another for the substring.
- If the splitting regex is empty (``""``), then it returns the input
string instead of following `Perl <https://ideone.com/dDMjmz>`__,
`Javascript <http://jsfiddle.net/xtcbxurg/>`__, and
`Java <https://ideone.com/hYJuJ5>`__'s precedent of returning a list
of each character (``"123".split(re"") == @["1", "2", "3"]``).
Other Notes
-----------
By default, NRE compiles its own PCRE. If this is undesirable, pass
``-d:pcreDynlib`` to use whatever dynamic library is available on the
system. This may have unexpected consequences if the dynamic library
doesnt have certain features enabled.
Types
-----
``type Regex* = ref object``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Represents the pattern that things are matched against, constructed with
``re(string)``. Examples: ``re"foo"``, ``re(r"(*ANYCRLF)(?x)foo #
comment".``
``pattern: string``
the string that was used to create the pattern.
``captureCount: int``
the number of captures that the pattern has.
``captureNameId: Table[string, int]``
a table from the capture names to their numeric id.
Options
.......
The following options may appear anywhere in the pattern, and they affect
the rest of it.
- ``(?i)`` - case insensitive
- ``(?m)`` - multi-line: ``^`` and ``$`` match the beginning and end of
lines, not of the subject string
- ``(?s)`` - ``.`` also matches newline (*dotall*)
- ``(?U)`` - expressions are not greedy by default. ``?`` can be added
to a qualifier to make it greedy
- ``(?x)`` - whitespace and comments (``#``) are ignored (*extended*)
- ``(?X)`` - character escapes without special meaning (``\w`` vs.
``\a``) are errors (*extra*)
One or a combination of these options may appear only at the beginning
of the pattern:
- ``(*UTF8)`` - treat both the pattern and subject as UTF-8
- ``(*UCP)`` - Unicode character properties; ``\w`` matches ``я``
- ``(*U)`` - a combination of the two options above
- ``(*FIRSTLINE*)`` - fails if there is not a match on the first line
- ``(*NO_AUTO_CAPTURE)`` - turn off auto-capture for groups;
``(?<name>...)`` can be used to capture
- ``(*CR)`` - newlines are separated by ``\r``
- ``(*LF)`` - newlines are separated by ``\n`` (UNIX default)
- ``(*CRLF)`` - newlines are separated by ``\r\n`` (Windows default)
- ``(*ANYCRLF)`` - newlines are separated by any of the above
- ``(*ANY)`` - newlines are separated by any of the above and Unicode
newlines:
single characters VT (vertical tab, U+000B), FF (form feed, U+000C),
NEL (next line, U+0085), LS (line separator, U+2028), and PS
(paragraph separator, U+2029). For the 8-bit library, the last two
are recognized only in UTF-8 mode.
— man pcre
- ``(*JAVASCRIPT_COMPAT)`` - JavaScript compatibility
- ``(*NO_STUDY)`` - turn off studying; study is enabled by default
For more details on the leading option groups, see the `Option
Setting <http://man7.org/linux/man-pages/man3/pcresyntax.3.html#OPTION_SETTING>`__
and the `Newline
Convention <http://man7.org/linux/man-pages/man3/pcresyntax.3.html#NEWLINE_CONVENTION>`__
sections of the `PCRE syntax
manual <http://man7.org/linux/man-pages/man3/pcresyntax.3.html>`__.
``type RegexMatch* = object``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Usually seen as Option[RegexMatch], it represents the result of an
execution. On failure, it is ``None[RegexMatch]``, but if you want
automated derefrence, import ``optional_t.nonstrict``. The available
fields are as follows:
``pattern: Regex``
the pattern that is being matched
``str: string``
the string that was matched against
``captures[]: string``
the string value of whatever was captured at that id. If the value
is invalid, then behavior is undefined. If the id is ``-1``, then
the whole match is returned. If the given capture was not matched,
``nil`` is returned.
- ``"abc".match(re"(\w)").captures[0] == "a"``
- ``"abc".match(re"(?<letter>\w)").captures["letter"] == "a"``
- ``"abc".match(re"(\w)\w").captures[-1] == "ab"``
``captureBounds[]: Option[Slice[int]]``
gets the bounds of the given capture according to the same rules as
the above. If the capture is not filled, then ``None`` is returned.
The bounds are both inclusive.
- ``"abc".match(re"(\w)").captureBounds[0] == 0 .. 0``
- ``"abc".match(re"").captureBounds[-1] == 0 .. -1``
- ``"abc".match(re"abc").captureBounds[-1] == 0 .. 2``
``match: string``
the full text of the match.
``matchBounds: Slice[int]``
the bounds of the match, as in ``captureBounds[]``
``(captureBounds|captures).toTable``
returns a table with each named capture as a key.
``(captureBounds|captures).toSeq``
returns all the captures by their number.
``$: string``
same as ``match``
``type RegexInternalError* = ref object of RegexException``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Internal error in the module, this probably means that there is a bug
``type InvalidUnicodeError* = ref object of RegexException``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Thrown when matching fails due to invalid unicode in strings
``type SyntaxError* = ref object of RegexException``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Thrown when there is a syntax error in the
regular expression string passed in
``type StudyError* = ref object of RegexException``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Thrown when studying the regular expression failes
for whatever reason. The message contains the error
code.
Operations
----------
``proc match*(str: string, pattern: Regex, start = 0, endpos = int.high): Option[RegexMatch]``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Like ```find(...)`` <#proc-find>`__, but anchored to the start of the
string. This means that ``"foo".match(re"f") == true``, but
``"foo".match(re"o") == false``.
``iterator findIter*(str: string, pattern: Regex, start = 0, endpos = int.high): RegexMatch``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Works the same as ```find(...)`` <#proc-find>`__, but finds every
non-overlapping match. ``"2222".find(re"22")`` is ``"22", "22"``, not
``"22", "22", "22"``.
Arguments are the same as ```find(...)`` <#proc-find>`__
Variants:
- ``proc findAll(...)`` returns a ``seq[string]``
``proc find*(str: string, pattern: Regex, start = 0, endpos = int.high): Option[RegexMatch]``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Finds the given pattern in the string between the end and start
positions.
``start``
The start point at which to start matching. ``|abc`` is ``0``;
``a|bc`` is ``1``
``endpos``
The maximum index for a match; ``int.high`` means the end of the
string, otherwise its an inclusive upper bound.
``proc split*(str: string, pattern: Regex, maxSplit = -1, start = 0): seq[string]``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Splits the string with the given regex. This works according to the
rules that Perl and Javascript use:
- If the match is zero-width, then the string is still split:
``"123".split(r"") == @["1", "2", "3"]``.
- If the pattern has a capture in it, it is added after the string
split: ``"12".split(re"(\d)") == @["", "1", "", "2", ""]``.
- If ``maxsplit != -1``, then the string will only be split
``maxsplit - 1`` times. This means that there will be ``maxsplit``
strings in the output seq.
``"1.2.3".split(re"\.", maxsplit = 2) == @["1", "2.3"]``
``start`` behaves the same as in ```find(...)`` <#proc-find>`__.
``proc replace*(str: string, pattern: Regex, subproc: proc (match: RegexMatch): string): string``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Replaces each match of Regex in the string with ``sub``, which should
never be or return ``nil``.
If ``sub`` is a ``proc (RegexMatch): string``, then it is executed with
each match and the return value is the replacement value.
If ``sub`` is a ``proc (string): string``, then it is executed with the
full text of the match and and the return value is the replacement
value.
If ``sub`` is a string, the syntax is as follows:
- ``$$`` - literal ``$``
- ``$123`` - capture number ``123``
- ``$foo`` - named capture ``foo``
- ``${foo}`` - same as above
- ``$1$#`` - first and second captures
- ``$#`` - first capture
- ``$0`` - full match
If a given capture is missing, a ``ValueError`` exception is thrown.
``proc escapeRe*(str: string): string``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Escapes the string so it doesnt match any special characters.
Incompatible with the Extra flag (``X``).

39
lib/impure/nre/circle.yml Normal file
View File

@@ -0,0 +1,39 @@
dependencies:
pre:
- |
if [ ! -x ~/nim/bin/nim ]; then
sudo apt-get install gcc
git clone -b devel --depth 1 git://github.com/araq/nim ~/nim/
git clone -b devel --depth 1 git://github.com/nim-lang/csources ~/nim/csources/
cd ~/nim/csources; sh build.sh; cd ..
rm -rf csources
bin/nim c koch
./koch boot -d:release
ln -fs ~/nim/bin/nim ~/bin/nim
else
cd ~/nim
git fetch origin
if ! git merge FETCH_HEAD | grep "Already up-to-date"; then
bin/nim c koch
./koch boot -d:release
fi
fi
- |
if [ ! -x ~/.nimble/bin/nimble ]; then
git clone --depth 1 git://github.com/nim-lang/nimble ~/nimble/
cd ~/nimble/
nim c src/nimble.nim
./src/nimble install
ln -fs ~/.nimble/bin/nimble ~/bin/nimble
fi
- nimble update
- nimble build
cache_directories:
- "~/bin/"
- "~/nim/"
- "~/.nimble/"
test:
override:
- ./runtests.sh

11
lib/impure/nre/nre.nimble Normal file
View File

@@ -0,0 +1,11 @@
[Package]
name = "nre"
author = "Flaviu Tamas"
version = "0.6.1"
description = "Yet another PCRE library"
license = "MIT"
srcDir = "src"
[Deps]
Requires: "nim >= 0.10.0"
Requires: "optional_t >= 1.2.0"

3
lib/impure/nre/runtests.sh Executable file
View File

@@ -0,0 +1,3 @@
#!/bin/sh
nim c --path:src -r --verbosity:0 --hints:off --linedir:on --debuginfo \
--stacktrace:on --linetrace:on "$@" ./test/testall.nim

657
lib/impure/nre/src/nre.nim Normal file
View File

@@ -0,0 +1,657 @@
import private.pcre as pcre
import private.util
import tables
import unsigned
from future import lc, `[]`
from strutils import toLower, `%`
from math import ceil
import options
from unicode import runeLenAt
## What is NRE?
## ============
##
## A regular expression library for Nim using PCRE to do the hard work.
##
## Why?
## ----
##
## The `re.nim <http://nim-lang.org/re.html>`__ module that
## `Nim <http://nim-lang.org/>`__ provides in its standard library is
## inadequate:
##
## - It provides only a limited number of captures, while the underling
## library (PCRE) allows an unlimited number.
##
## - Instead of having one proc that returns both the bounds and
## substring, it has one for the bounds and another for the substring.
##
## - If the splitting regex is empty (``""``), then it returns the input
## string instead of following `Perl <https://ideone.com/dDMjmz>`__,
## `Javascript <http://jsfiddle.net/xtcbxurg/>`__, and
## `Java <https://ideone.com/hYJuJ5>`__'s precedent of returning a list
## of each character (``"123".split(re"") == @["1", "2", "3"]``).
##
##
## Other Notes
## -----------
##
## By default, NRE compiles its own PCRE. If this is undesirable, pass
## ``-d:pcreDynlib`` to use whatever dynamic library is available on the
## system. This may have unexpected consequences if the dynamic library
## doesnt have certain features enabled.
# Type definitions {{{
type
Regex* = ref object
## Represents the pattern that things are matched against, constructed with
## ``re(string)``. Examples: ``re"foo"``, ``re(r"(*ANYCRLF)(?x)foo #
## comment".``
##
## ``pattern: string``
## the string that was used to create the pattern.
##
## ``captureCount: int``
## the number of captures that the pattern has.
##
## ``captureNameId: Table[string, int]``
## a table from the capture names to their numeric id.
##
##
## Options
## .......
##
## The following options may appear anywhere in the pattern, and they affect
## the rest of it.
##
## - ``(?i)`` - case insensitive
## - ``(?m)`` - multi-line: ``^`` and ``$`` match the beginning and end of
## lines, not of the subject string
## - ``(?s)`` - ``.`` also matches newline (*dotall*)
## - ``(?U)`` - expressions are not greedy by default. ``?`` can be added
## to a qualifier to make it greedy
## - ``(?x)`` - whitespace and comments (``#``) are ignored (*extended*)
## - ``(?X)`` - character escapes without special meaning (``\w`` vs.
## ``\a``) are errors (*extra*)
##
## One or a combination of these options may appear only at the beginning
## of the pattern:
##
## - ``(*UTF8)`` - treat both the pattern and subject as UTF-8
## - ``(*UCP)`` - Unicode character properties; ``\w`` matches ``я``
## - ``(*U)`` - a combination of the two options above
## - ``(*FIRSTLINE*)`` - fails if there is not a match on the first line
## - ``(*NO_AUTO_CAPTURE)`` - turn off auto-capture for groups;
## ``(?<name>...)`` can be used to capture
## - ``(*CR)`` - newlines are separated by ``\r``
## - ``(*LF)`` - newlines are separated by ``\n`` (UNIX default)
## - ``(*CRLF)`` - newlines are separated by ``\r\n`` (Windows default)
## - ``(*ANYCRLF)`` - newlines are separated by any of the above
## - ``(*ANY)`` - newlines are separated by any of the above and Unicode
## newlines:
##
## single characters VT (vertical tab, U+000B), FF (form feed, U+000C),
## NEL (next line, U+0085), LS (line separator, U+2028), and PS
## (paragraph separator, U+2029). For the 8-bit library, the last two
## are recognized only in UTF-8 mode.
## — man pcre
##
## - ``(*JAVASCRIPT_COMPAT)`` - JavaScript compatibility
## - ``(*NO_STUDY)`` - turn off studying; study is enabled by default
##
## For more details on the leading option groups, see the `Option
## Setting <http://man7.org/linux/man-pages/man3/pcresyntax.3.html#OPTION_SETTING>`__
## and the `Newline
## Convention <http://man7.org/linux/man-pages/man3/pcresyntax.3.html#NEWLINE_CONVENTION>`__
## sections of the `PCRE syntax
## manual <http://man7.org/linux/man-pages/man3/pcresyntax.3.html>`__.
pattern*: string ## not nil
pcreObj: ptr pcre.Pcre ## not nil
pcreExtra: ptr pcre.ExtraData ## nil
captureNameToId: Table[string, int]
RegexMatch* = object
## Usually seen as Option[RegexMatch], it represents the result of an
## execution. On failure, it is none, on success, it is some.
##
## ``pattern: Regex``
## the pattern that is being matched
##
## ``str: string``
## the string that was matched against
##
## ``captures[]: string``
## the string value of whatever was captured at that id. If the value
## is invalid, then behavior is undefined. If the id is ``-1``, then
## the whole match is returned. If the given capture was not matched,
## ``nil`` is returned.
##
## - ``"abc".match(re"(\w)").captures[0] == "a"``
## - ``"abc".match(re"(?<letter>\w)").captures["letter"] == "a"``
## - ``"abc".match(re"(\w)\w").captures[-1] == "ab"``
##
## ``captureBounds[]: Option[Slice[int]]``
## gets the bounds of the given capture according to the same rules as
## the above. If the capture is not filled, then ``None`` is returned.
## The bounds are both inclusive.
##
## - ``"abc".match(re"(\w)").captureBounds[0] == 0 .. 0``
## - ``"abc".match(re"").captureBounds[-1] == 0 .. -1``
## - ``"abc".match(re"abc").captureBounds[-1] == 0 .. 2``
##
## ``match: string``
## the full text of the match.
##
## ``matchBounds: Slice[int]``
## the bounds of the match, as in ``captureBounds[]``
##
## ``(captureBounds|captures).toTable``
## returns a table with each named capture as a key.
##
## ``(captureBounds|captures).toSeq``
## returns all the captures by their number.
##
## ``$: string``
## same as ``match``
pattern*: Regex ## The regex doing the matching.
## Not nil.
str*: string ## The string that was matched against.
## Not nil.
pcreMatchBounds: seq[Slice[cint]] ## First item is the bounds of the match
## Other items are the captures
## `a` is inclusive start, `b` is exclusive end
Captures* = distinct RegexMatch
CaptureBounds* = distinct RegexMatch
RegexException* = ref object of Exception
RegexInternalError* = ref object of RegexException
## Internal error in the module, this probably means that there is a bug
InvalidUnicodeError* = ref object of RegexException
## Thrown when matching fails due to invalid unicode in strings
pos*: int ## the location of the invalid unicode in bytes
SyntaxError* = ref object of RegexException
## Thrown when there is a syntax error in the
## regular expression string passed in
pos*: int ## the location of the syntax error in bytes
pattern*: string ## the pattern that caused the problem
StudyError* = ref object of RegexException
## Thrown when studying the regular expression failes
## for whatever reason. The message contains the error
## code.
# }}}
proc getinfo[T](pattern: Regex, opt: cint): T =
let retcode = pcre.fullinfo(pattern.pcreObj, pattern.pcreExtra, opt, addr result)
if retcode < 0:
# XXX Error message that doesn't expose implementation details
raise newException(FieldError, "Invalid getinfo for $1, errno $2" % [$opt, $retcode])
# Regex accessors {{{
proc captureCount*(pattern: Regex): int =
return getinfo[cint](pattern, pcre.INFO_CAPTURECOUNT)
proc captureNameId*(pattern: Regex): Table[string, int] =
return pattern.captureNameToId
proc matchesCrLf(pattern: Regex): bool =
let flags = uint32(getinfo[culong](pattern, pcre.INFO_OPTIONS))
let newlineFlags = flags and (pcre.NEWLINE_CRLF or
pcre.NEWLINE_ANY or
pcre.NEWLINE_ANYCRLF)
if newLineFlags > 0u32:
return true
# get flags from build config
var confFlags: cint
if pcre.config(pcre.CONFIG_NEWLINE, addr confFlags) != 0:
assert(false, "CONFIG_NEWLINE apparently got screwed up")
case confFlags
of 13: return false
of 10: return false
of (13 shl 8) or 10: return true
of -2: return true
of -1: return true
else: return false
# }}}
# Capture accessors {{{
proc captureBounds*(pattern: RegexMatch): CaptureBounds = return CaptureBounds(pattern)
proc captures*(pattern: RegexMatch): Captures = return Captures(pattern)
proc `[]`*(pattern: CaptureBounds, i: int): Option[Slice[int]] =
let pattern = RegexMatch(pattern)
if pattern.pcreMatchBounds[i + 1].a != -1:
let bounds = pattern.pcreMatchBounds[i + 1]
return some(int(bounds.a) .. int(bounds.b-1))
else:
return none(Slice[int])
proc `[]`*(pattern: Captures, i: int): string =
let pattern = RegexMatch(pattern)
let bounds = pattern.captureBounds[i]
if bounds.isSome:
let bounds = bounds.get
return pattern.str.substr(bounds.a, bounds.b)
else:
return nil
proc match*(pattern: RegexMatch): string =
return pattern.captures[-1]
proc matchBounds*(pattern: RegexMatch): Slice[int] =
return pattern.captureBounds[-1].get
proc `[]`*(pattern: CaptureBounds, name: string): Option[Slice[int]] =
let pattern = RegexMatch(pattern)
return pattern.captureBounds[pattern.pattern.captureNameToId.fget(name)]
proc `[]`*(pattern: Captures, name: string): string =
let pattern = RegexMatch(pattern)
return pattern.captures[pattern.pattern.captureNameToId.fget(name)]
template toTableImpl(cond: bool): stmt {.immediate, dirty.} =
for key in RegexMatch(pattern).pattern.captureNameId.keys:
let nextVal = pattern[key]
if cond:
result[key] = default
else:
result[key] = nextVal
proc toTable*(pattern: Captures, default: string = nil): Table[string, string] =
result = initTable[string, string]()
toTableImpl(nextVal == nil)
proc toTable*(pattern: CaptureBounds, default = none(Slice[int])):
Table[string, Option[Slice[int]]] =
result = initTable[string, Option[Slice[int]]]()
toTableImpl(nextVal.isNone)
template itemsImpl(cond: bool): stmt {.immediate, dirty.} =
for i in 0 .. <RegexMatch(pattern).pattern.captureCount:
let nextVal = pattern[i]
if cond:
yield default
else:
yield nextVal
iterator items*(pattern: CaptureBounds, default = none(Slice[int])): Option[Slice[int]] =
itemsImpl(nextVal.isNone)
iterator items*(pattern: Captures, default: string = nil): string =
itemsImpl(nextVal == nil)
proc toSeq*(pattern: CaptureBounds, default = none(Slice[int])): seq[Option[Slice[int]]] =
accumulateResult(pattern.items(default))
proc toSeq*(pattern: Captures, default: string = nil): seq[string] =
accumulateResult(pattern.items(default))
proc `$`*(pattern: RegexMatch): string =
return pattern.captures[-1]
proc `==`*(a, b: Regex): bool =
if not a.isNil and not b.isNil:
return a.pattern == b.pattern and
a.pcreObj == b.pcreObj and
a.pcreExtra == b.pcreExtra
else:
return system.`==`(a, b)
proc `==`*(a, b: RegexMatch): bool =
return a.pattern == b.pattern and
a.str == b.str
# }}}
# Creation & Destruction {{{
# PCRE Options {{{
const PcreOptions = {
"NEVER_UTF": pcre.NEVER_UTF,
"ANCHORED": pcre.ANCHORED,
"DOLLAR_ENDONLY": pcre.DOLLAR_ENDONLY,
"FIRSTLINE": pcre.FIRSTLINE,
"NO_AUTO_CAPTURE": pcre.NO_AUTO_CAPTURE,
"JAVASCRIPT_COMPAT": pcre.JAVASCRIPT_COMPAT,
"U": pcre.UTF8 or pcre.UCP
}.toTable
# Options that are supported inside regular expressions themselves
const SkipOptions = [
"LIMIT_MATCH=", "LIMIT_RECURSION=", "NO_AUTO_POSSESS", "NO_START_OPT",
"UTF8", "UTF16", "UTF32", "UTF", "UCP",
"CR", "LF", "CRLF", "ANYCRLF", "ANY", "BSR_ANYCRLF", "BSR_UNICODE"
]
proc extractOptions(pattern: string): tuple[pattern: string, flags: int, study: bool] =
result = ("", 0, true)
var optionStart = 0
var equals = false
for i, c in pattern:
if optionStart == i:
if c != '(':
break
optionStart = i
elif optionStart == i-1:
if c != '*':
break
elif c == ')':
let name = pattern[optionStart+2 .. i-1]
if equals or name in SkipOptions:
result.pattern.add pattern[optionStart .. i]
elif PcreOptions.hasKey name:
result.flags = result.flags or PcreOptions[name]
elif name == "NO_STUDY":
result.study = false
else:
break
optionStart = i+1
equals = false
elif not equals:
if c == '=':
equals = true
if pattern[optionStart+2 .. i] notin SkipOptions:
break
elif c notin {'A'..'Z', '0'..'9', '_'}:
break
result.pattern.add pattern[optionStart .. pattern.high]
# }}}
type UncheckedArray {.unchecked.}[T] = array[0 .. 0, T]
proc destroyRegex(pattern: Regex) =
pcre.free_substring(cast[cstring](pattern.pcreObj))
pattern.pcreObj = nil
if pattern.pcreExtra != nil:
pcre.free_study(pattern.pcreExtra)
proc getNameToNumberTable(pattern: Regex): Table[string, int] =
let entryCount = getinfo[cint](pattern, pcre.INFO_NAMECOUNT)
let entrySize = getinfo[cint](pattern, pcre.INFO_NAMEENTRYSIZE)
let table = cast[ptr UncheckedArray[uint8]](
getinfo[int](pattern, pcre.INFO_NAMETABLE))
result = initTable[string, int]()
for i in 0 .. <entryCount:
let pos = i * entrySize
let num = (int(table[pos]) shl 8) or int(table[pos + 1]) - 1
var name = ""
var idx = 2
while table[pos + idx] != 0:
name.add(char(table[pos + idx]))
idx += 1
result[name] = num
proc initRegex(pattern: string, flags: int, study = true): Regex =
new(result, destroyRegex)
result.pattern = pattern
var errorMsg: cstring
var errOffset: cint
result.pcreObj = pcre.compile(cstring(pattern),
# better hope int is at least 4 bytes..
cint(flags), addr errorMsg,
addr errOffset, nil)
if result.pcreObj == nil:
# failed to compile
raise SyntaxError(msg: $errorMsg, pos: errOffset, pattern: pattern)
if study:
# XXX investigate JIT
result.pcreExtra = pcre.study(result.pcreObj, 0x0, addr errorMsg)
if errorMsg != nil:
raise StudyError(msg: $errorMsg)
result.captureNameToId = result.getNameToNumberTable()
proc re*(pattern: string): Regex =
let (pattern, flags, study) = extractOptions(pattern)
initRegex(pattern, flags, study)
# }}}
# Operations {{{
proc matchImpl(str: string, pattern: Regex, start, endpos: int, flags: int): Option[RegexMatch] =
var myResult = RegexMatch(pattern : pattern, str : str)
# See PCRE man pages.
# 2x capture count to make room for start-end pairs
# 1x capture count as slack space for PCRE
let vecsize = (pattern.captureCount() + 1) * 3
# div 2 because each element is 2 cints long
myResult.pcreMatchBounds = newSeq[Slice[cint]](ceil(vecsize / 2).int)
myResult.pcreMatchBounds.setLen(vecsize div 3)
let strlen = if endpos == int.high: str.len else: endpos+1
doAssert(strlen <= str.len) # don't want buffer overflows
let execRet = pcre.exec(pattern.pcreObj,
pattern.pcreExtra,
cstring(str),
cint(strlen),
cint(start),
cint(flags),
cast[ptr cint](addr myResult.pcreMatchBounds[0]),
cint(vecsize))
if execRet >= 0:
return some(myResult)
case execRet:
of pcre.ERROR_NOMATCH:
return none(RegexMatch)
of pcre.ERROR_NULL:
raise newException(AccessViolationError, "Expected non-null parameters")
of pcre.ERROR_BADOPTION:
raise RegexInternalError(msg : "Unknown pattern flag. Either a bug or " &
"outdated PCRE.")
of pcre.ERROR_BADUTF8, pcre.ERROR_SHORTUTF8, pcre.ERROR_BADUTF8_OFFSET:
raise InvalidUnicodeError(msg : "Invalid unicode byte sequence",
pos : myResult.pcreMatchBounds[0].a)
else:
raise RegexInternalError(msg : "Unknown internal error: " & $execRet)
proc match*(str: string, pattern: Regex, start = 0, endpos = int.high): Option[RegexMatch] =
## Like ```find(...)`` <#proc-find>`__, but anchored to the start of the
## string. This means that ``"foo".match(re"f") == true``, but
## ``"foo".match(re"o") == false``.
return str.matchImpl(pattern, start, endpos, pcre.ANCHORED)
iterator findIter*(str: string, pattern: Regex, start = 0, endpos = int.high): RegexMatch =
## Works the same as ```find(...)`` <#proc-find>`__, but finds every
## non-overlapping match. ``"2222".find(re"22")`` is ``"22", "22"``, not
## ``"22", "22", "22"``.
##
## Arguments are the same as ```find(...)`` <#proc-find>`__
##
## Variants:
##
## - ``proc findAll(...)`` returns a ``seq[string]``
# see pcredemo for explaination
let matchesCrLf = pattern.matchesCrLf()
let unicode = uint32(getinfo[culong](pattern, pcre.INFO_OPTIONS) and
pcre.UTF8) > 0u32
let strlen = if endpos == int.high: str.len else: endpos+1
var offset = start
var match: Option[RegexMatch]
while true:
var flags = 0
if match.isSome and
match.get.matchBounds.a > match.get.matchBounds.b:
# 0-len match
flags = pcre.NOTEMPTY_ATSTART
match = str.matchImpl(pattern, offset, endpos, flags)
if match.isNone:
# either the end of the input or the string
# cannot be split here
if offset >= strlen:
break
if matchesCrLf and offset < (str.len - 1) and
str[offset] == '\r' and str[offset + 1] == '\l':
# if PCRE treats CrLf as newline, skip both at the same time
offset += 2
elif unicode:
# XXX what about invalid unicode?
offset += str.runeLenAt(offset)
assert(offset <= strlen)
else:
offset += 1
else:
offset = match.get.matchBounds.b + 1
yield match.get
proc find*(str: string, pattern: Regex, start = 0, endpos = int.high): Option[RegexMatch] =
## Finds the given pattern in the string between the end and start
## positions.
##
## ``start``
## The start point at which to start matching. ``|abc`` is ``0``;
## ``a|bc`` is ``1``
##
## ``endpos``
## The maximum index for a match; ``int.high`` means the end of the
## string, otherwise its an inclusive upper bound.
return str.matchImpl(pattern, start, endpos, 0)
proc findAll*(str: string, pattern: Regex, start = 0, endpos = int.high): seq[string] =
result = @[]
for match in str.findIter(pattern, start, endpos):
result.add(match.match)
proc split*(str: string, pattern: Regex, maxSplit = -1, start = 0): seq[string] =
## Splits the string with the given regex. This works according to the
## rules that Perl and Javascript use:
##
## - If the match is zero-width, then the string is still split:
## ``"123".split(r"") == @["1", "2", "3"]``.
##
## - If the pattern has a capture in it, it is added after the string
## split: ``"12".split(re"(\d)") == @["", "1", "", "2", ""]``.
##
## - If ``maxsplit != -1``, then the string will only be split
## ``maxsplit - 1`` times. This means that there will be ``maxsplit``
## strings in the output seq.
## ``"1.2.3".split(re"\.", maxsplit = 2) == @["1", "2.3"]``
##
## ``start`` behaves the same as in ```find(...)`` <#proc-find>`__.
result = @[]
var lastIdx = start
var splits = 0
var bounds = 0 .. 0
for match in str.findIter(pattern, start = start):
# bounds are inclusive:
#
# 0123456
# ^^^
# (1, 3)
bounds = match.matchBounds
# "12".split("") would be @["", "1", "2"], but
# if we skip an empty first match, it's the correct
# @["1", "2"]
if bounds.a <= bounds.b or bounds.a > start:
result.add(str.substr(lastIdx, bounds.a - 1))
splits += 1
lastIdx = bounds.b + 1
for cap in match.captures:
# if there are captures, include them in the result
result.add(cap)
if splits == maxSplit - 1:
break
# "12".split("\b") would be @["1", "2", ""], but
# if we skip an empty last match, it's the correct
# @["1", "2"]
if bounds.a <= bounds.b or bounds.b < str.high:
# last match: Each match takes the previous substring,
# but "1 2".split(/ /) needs to return @["1", "2"].
# This handles "2"
result.add(str.substr(bounds.b + 1, str.high))
template replaceImpl(str: string, pattern: Regex,
replacement: expr): stmt {.immediate, dirty.} =
# XXX seems very similar to split, maybe I can reduce code duplication
# somehow?
result = ""
var lastIdx = 0
for match {.inject.} in str.findIter(pattern):
let bounds = match.matchBounds
result.add(str.substr(lastIdx, bounds.a - 1))
let nextVal = replacement
assert(nextVal != nil)
result.add(nextVal)
lastIdx = bounds.b + 1
result.add(str.substr(lastIdx, str.len - 1))
return result
proc replace*(str: string, pattern: Regex,
subproc: proc (match: RegexMatch): string): string =
## Replaces each match of Regex in the string with ``sub``, which should
## never be or return ``nil``.
##
## If ``sub`` is a ``proc (RegexMatch): string``, then it is executed with
## each match and the return value is the replacement value.
##
## If ``sub`` is a ``proc (string): string``, then it is executed with the
## full text of the match and and the return value is the replacement
## value.
##
## If ``sub`` is a string, the syntax is as follows:
##
## - ``$$`` - literal ``$``
## - ``$123`` - capture number ``123``
## - ``$foo`` - named capture ``foo``
## - ``${foo}`` - same as above
## - ``$1$#`` - first and second captures
## - ``$#`` - first capture
## - ``$0`` - full match
##
## If a given capture is missing, a ``ValueError`` exception is thrown.
replaceImpl(str, pattern, subproc(match))
proc replace*(str: string, pattern: Regex,
subproc: proc (match: string): string): string =
replaceImpl(str, pattern, subproc(match.match))
proc replace*(str: string, pattern: Regex, sub: string): string =
# - 1 because the string numbers are 0-indexed
replaceImpl(str, pattern,
formatStr(sub, match.captures[name], match.captures[id - 1]))
# }}}
let SpecialCharMatcher = re"([\\+*?[^\]$(){}=!<>|:-])"
proc escapeRe*(str: string): string =
## Escapes the string so it doesnt match any special characters.
## Incompatible with the Extra flag (``X``).
str.replace(SpecialCharMatcher, "\\$1")

View File

@@ -0,0 +1,630 @@
#ifdef C2NIM
#def PCRE_EXP_DECL extern
#prefix PCRE_
#mangle "'pcre'{[0-9]*}_{.*}" "$2$1"
#header "pcre.h"
#cdecl
#endif
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* This is the public header file for the PCRE library, to be #included by
applications that call the PCRE functions.
Copyright (c) 1997-2014 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* The current PCRE version information. */
#define PCRE_MAJOR 8
#define PCRE_MINOR 36
#define PCRE_PRERELEASE
#define PCRE_DATE 2014-09-26
/* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE, the appropriate
export setting is defined in pcre_internal.h, which includes this file. So we
don't change existing definitions of PCRE_EXP_DECL and PCRECPP_EXP_DECL. */
/* By default, we use the standard "extern" declarations. */
/* Have to include stdlib.h in order to ensure that size_t is defined;
it is needed here for malloc. */
#include <stdlib.h>
/* Allow for C++ users */
/* Public options. Some are compile-time only, some are run-time only, and some
are both. Most of the compile-time options are saved with the compiled regex so
that they can be inspected during studying (and therefore JIT compiling). Note
that pcre_study() has its own set of options. Originally, all the options
defined here used distinct bits. However, almost all the bits in a 32-bit word
are now used, so in order to conserve them, option bits that were previously
only recognized at matching time (i.e. by pcre_exec() or pcre_dfa_exec()) may
also be used for compile-time options that affect only compiling and are not
relevant for studying or JIT compiling.
Some options for pcre_compile() change its behaviour but do not affect the
behaviour of the execution functions. Other options are passed through to the
execution functions and affect their behaviour, with or without affecting the
behaviour of pcre_compile().
Options that can be passed to pcre_compile() are tagged Cx below, with these
variants:
C1 Affects compile only
C2 Does not affect compile; affects exec, dfa_exec
C3 Affects compile, exec, dfa_exec
C4 Affects compile, exec, dfa_exec, study
C5 Affects compile, exec, study
Options that can be set for pcre_exec() and/or pcre_dfa_exec() are flagged with
E and D, respectively. They take precedence over C3, C4, and C5 settings passed
from pcre_compile(). Those that are compatible with JIT execution are flagged
with J. */
#define PCRE_CASELESS 0x00000001 /* C1 */
#define PCRE_MULTILINE 0x00000002 /* C1 */
#define PCRE_DOTALL 0x00000004 /* C1 */
#define PCRE_EXTENDED 0x00000008 /* C1 */
#define PCRE_ANCHORED 0x00000010 /* C4 E D */
#define PCRE_DOLLAR_ENDONLY 0x00000020 /* C2 */
#define PCRE_EXTRA 0x00000040 /* C1 */
#define PCRE_NOTBOL 0x00000080 /* E D J */
#define PCRE_NOTEOL 0x00000100 /* E D J */
#define PCRE_UNGREEDY 0x00000200 /* C1 */
#define PCRE_NOTEMPTY 0x00000400 /* E D J */
#define PCRE_UTF8 0x00000800 /* C4 ) */
#define PCRE_UTF16 0x00000800 /* C4 ) Synonyms */
#define PCRE_UTF32 0x00000800 /* C4 ) */
#define PCRE_NO_AUTO_CAPTURE 0x00001000 /* C1 */
#define PCRE_NO_UTF8_CHECK 0x00002000 /* C1 E D J ) */
#define PCRE_NO_UTF16_CHECK 0x00002000 /* C1 E D J ) Synonyms */
#define PCRE_NO_UTF32_CHECK 0x00002000 /* C1 E D J ) */
#define PCRE_AUTO_CALLOUT 0x00004000 /* C1 */
#define PCRE_PARTIAL_SOFT 0x00008000 /* E D J ) Synonyms */
#define PCRE_PARTIAL 0x00008000 /* E D J ) */
/* This pair use the same bit. */
#define PCRE_NEVER_UTF 0x00010000 /* C1 ) Overlaid */
#define PCRE_DFA_SHORTEST 0x00010000 /* D ) Overlaid */
/* This pair use the same bit. */
#define PCRE_NO_AUTO_POSSESS 0x00020000 /* C1 ) Overlaid */
#define PCRE_DFA_RESTART 0x00020000 /* D ) Overlaid */
#define PCRE_FIRSTLINE 0x00040000 /* C3 */
#define PCRE_DUPNAMES 0x00080000 /* C1 */
#define PCRE_NEWLINE_CR 0x00100000 /* C3 E D */
#define PCRE_NEWLINE_LF 0x00200000 /* C3 E D */
#define PCRE_NEWLINE_CRLF 0x00300000 /* C3 E D */
#define PCRE_NEWLINE_ANY 0x00400000 /* C3 E D */
#define PCRE_NEWLINE_ANYCRLF 0x00500000 /* C3 E D */
#define PCRE_BSR_ANYCRLF 0x00800000 /* C3 E D */
#define PCRE_BSR_UNICODE 0x01000000 /* C3 E D */
#define PCRE_JAVASCRIPT_COMPAT 0x02000000 /* C5 */
#define PCRE_NO_START_OPTIMIZE 0x04000000 /* C2 E D ) Synonyms */
#define PCRE_NO_START_OPTIMISE 0x04000000 /* C2 E D ) */
#define PCRE_PARTIAL_HARD 0x08000000 /* E D J */
#define PCRE_NOTEMPTY_ATSTART 0x10000000 /* E D J */
#define PCRE_UCP 0x20000000 /* C3 */
/* Exec-time and get/set-time error codes */
#define PCRE_ERROR_NOMATCH (-1)
#define PCRE_ERROR_NULL (-2)
#define PCRE_ERROR_BADOPTION (-3)
#define PCRE_ERROR_BADMAGIC (-4)
#define PCRE_ERROR_UNKNOWN_OPCODE (-5)
#define PCRE_ERROR_UNKNOWN_NODE (-5) /* For backward compatibility */
#define PCRE_ERROR_NOMEMORY (-6)
#define PCRE_ERROR_NOSUBSTRING (-7)
#define PCRE_ERROR_MATCHLIMIT (-8)
#define PCRE_ERROR_CALLOUT (-9) /* Never used by PCRE itself */
#define PCRE_ERROR_BADUTF8 (-10) /* Same for 8/16/32 */
#define PCRE_ERROR_BADUTF16 (-10) /* Same for 8/16/32 */
#define PCRE_ERROR_BADUTF32 (-10) /* Same for 8/16/32 */
#define PCRE_ERROR_BADUTF8_OFFSET (-11) /* Same for 8/16 */
#define PCRE_ERROR_BADUTF16_OFFSET (-11) /* Same for 8/16 */
#define PCRE_ERROR_PARTIAL (-12)
#define PCRE_ERROR_BADPARTIAL (-13)
#define PCRE_ERROR_INTERNAL (-14)
#define PCRE_ERROR_BADCOUNT (-15)
#define PCRE_ERROR_DFA_UITEM (-16)
#define PCRE_ERROR_DFA_UCOND (-17)
#define PCRE_ERROR_DFA_UMLIMIT (-18)
#define PCRE_ERROR_DFA_WSSIZE (-19)
#define PCRE_ERROR_DFA_RECURSE (-20)
#define PCRE_ERROR_RECURSIONLIMIT (-21)
#define PCRE_ERROR_NULLWSLIMIT (-22) /* No longer actually used */
#define PCRE_ERROR_BADNEWLINE (-23)
#define PCRE_ERROR_BADOFFSET (-24)
#define PCRE_ERROR_SHORTUTF8 (-25)
#define PCRE_ERROR_SHORTUTF16 (-25) /* Same for 8/16 */
#define PCRE_ERROR_RECURSELOOP (-26)
#define PCRE_ERROR_JIT_STACKLIMIT (-27)
#define PCRE_ERROR_BADMODE (-28)
#define PCRE_ERROR_BADENDIANNESS (-29)
#define PCRE_ERROR_DFA_BADRESTART (-30)
#define PCRE_ERROR_JIT_BADOPTION (-31)
#define PCRE_ERROR_BADLENGTH (-32)
#define PCRE_ERROR_UNSET (-33)
/* Specific error codes for UTF-8 validity checks */
#define PCRE_UTF8_ERR0 0
#define PCRE_UTF8_ERR1 1
#define PCRE_UTF8_ERR2 2
#define PCRE_UTF8_ERR3 3
#define PCRE_UTF8_ERR4 4
#define PCRE_UTF8_ERR5 5
#define PCRE_UTF8_ERR6 6
#define PCRE_UTF8_ERR7 7
#define PCRE_UTF8_ERR8 8
#define PCRE_UTF8_ERR9 9
#define PCRE_UTF8_ERR10 10
#define PCRE_UTF8_ERR11 11
#define PCRE_UTF8_ERR12 12
#define PCRE_UTF8_ERR13 13
#define PCRE_UTF8_ERR14 14
#define PCRE_UTF8_ERR15 15
#define PCRE_UTF8_ERR16 16
#define PCRE_UTF8_ERR17 17
#define PCRE_UTF8_ERR18 18
#define PCRE_UTF8_ERR19 19
#define PCRE_UTF8_ERR20 20
#define PCRE_UTF8_ERR21 21
#define PCRE_UTF8_ERR22 22 /* Unused (was non-character) */
/* Specific error codes for UTF-16 validity checks */
#define PCRE_UTF16_ERR0 0
#define PCRE_UTF16_ERR1 1
#define PCRE_UTF16_ERR2 2
#define PCRE_UTF16_ERR3 3
#define PCRE_UTF16_ERR4 4 /* Unused (was non-character) */
/* Specific error codes for UTF-32 validity checks */
#define PCRE_UTF32_ERR0 0
#define PCRE_UTF32_ERR1 1
#define PCRE_UTF32_ERR2 2 /* Unused (was non-character) */
#define PCRE_UTF32_ERR3 3
/* Request types for pcre_fullinfo() */
#define PCRE_INFO_OPTIONS 0
#define PCRE_INFO_SIZE 1
#define PCRE_INFO_CAPTURECOUNT 2
#define PCRE_INFO_BACKREFMAX 3
#define PCRE_INFO_FIRSTBYTE 4
#define PCRE_INFO_FIRSTCHAR 4 /* For backwards compatibility */
#define PCRE_INFO_FIRSTTABLE 5
#define PCRE_INFO_LASTLITERAL 6
#define PCRE_INFO_NAMEENTRYSIZE 7
#define PCRE_INFO_NAMECOUNT 8
#define PCRE_INFO_NAMETABLE 9
#define PCRE_INFO_STUDYSIZE 10
#define PCRE_INFO_DEFAULT_TABLES 11
#define PCRE_INFO_OKPARTIAL 12
#define PCRE_INFO_JCHANGED 13
#define PCRE_INFO_HASCRORLF 14
#define PCRE_INFO_MINLENGTH 15
#define PCRE_INFO_JIT 16
#define PCRE_INFO_JITSIZE 17
#define PCRE_INFO_MAXLOOKBEHIND 18
#define PCRE_INFO_FIRSTCHARACTER 19
#define PCRE_INFO_FIRSTCHARACTERFLAGS 20
#define PCRE_INFO_REQUIREDCHAR 21
#define PCRE_INFO_REQUIREDCHARFLAGS 22
#define PCRE_INFO_MATCHLIMIT 23
#define PCRE_INFO_RECURSIONLIMIT 24
#define PCRE_INFO_MATCH_EMPTY 25
/* Request types for pcre_config(). Do not re-arrange, in order to remain
compatible. */
#define PCRE_CONFIG_UTF8 0
#define PCRE_CONFIG_NEWLINE 1
#define PCRE_CONFIG_LINK_SIZE 2
#define PCRE_CONFIG_POSIX_MALLOC_THRESHOLD 3
#define PCRE_CONFIG_MATCH_LIMIT 4
#define PCRE_CONFIG_STACKRECURSE 5
#define PCRE_CONFIG_UNICODE_PROPERTIES 6
#define PCRE_CONFIG_MATCH_LIMIT_RECURSION 7
#define PCRE_CONFIG_BSR 8
#define PCRE_CONFIG_JIT 9
#define PCRE_CONFIG_UTF16 10
#define PCRE_CONFIG_JITTARGET 11
#define PCRE_CONFIG_UTF32 12
#define PCRE_CONFIG_PARENS_LIMIT 13
/* Request types for pcre_study(). Do not re-arrange, in order to remain
compatible. */
#define PCRE_STUDY_JIT_COMPILE 0x0001
#define PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE 0x0002
#define PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE 0x0004
#define PCRE_STUDY_EXTRA_NEEDED 0x0008
/* Bit flags for the pcre[16|32]_extra structure. Do not re-arrange or redefine
these bits, just add new ones on the end, in order to remain compatible. */
#define PCRE_EXTRA_STUDY_DATA 0x0001
#define PCRE_EXTRA_MATCH_LIMIT 0x0002
#define PCRE_EXTRA_CALLOUT_DATA 0x0004
#define PCRE_EXTRA_TABLES 0x0008
#define PCRE_EXTRA_MATCH_LIMIT_RECURSION 0x0010
#define PCRE_EXTRA_MARK 0x0020
#define PCRE_EXTRA_EXECUTABLE_JIT 0x0040
/* Types */
struct real_pcre; /* declaration; the definition is private */
typedef struct real_pcre pcre;
struct real_pcre16; /* declaration; the definition is private */
typedef struct real_pcre16 pcre16;
struct real_pcre32; /* declaration; the definition is private */
typedef struct real_pcre32 pcre32;
struct real_pcre_jit_stack; /* declaration; the definition is private */
typedef struct real_pcre_jit_stack pcre_jit_stack;
struct real_pcre16_jit_stack; /* declaration; the definition is private */
typedef struct real_pcre16_jit_stack pcre16_jit_stack;
struct real_pcre32_jit_stack; /* declaration; the definition is private */
typedef struct real_pcre32_jit_stack pcre32_jit_stack;
/* If PCRE is compiled with 16 bit character support, PCRE_UCHAR16 must contain
a 16 bit wide signed data type. Otherwise it can be a dummy data type since
pcre16 functions are not implemented. There is a check for this in pcre_internal.h. */
typedef unsigned short PCRE_UCHAR16;
typedef const PCRE_UCHAR16 *PCRE_SPTR16;
/* If PCRE is compiled with 32 bit character support, PCRE_UCHAR32 must contain
a 32 bit wide signed data type. Otherwise it can be a dummy data type since
pcre32 functions are not implemented. There is a check for this in pcre_internal.h. */
typedef unsigned int PCRE_UCHAR32;
typedef const PCRE_UCHAR32 *PCRE_SPTR32;
/* When PCRE is compiled as a C++ library, the subject pointer type can be
replaced with a custom type. For conventional use, the public interface is a
const char *. */
typedef const char *PCRE_SPTR;
/* The structure for passing additional data to pcre_exec(). This is defined in
such as way as to be extensible. Always add new fields at the end, in order to
remain compatible. */
typedef struct pcre_extra {
unsigned long int flags; /* Bits for which fields are set */
void *study_data; /* Opaque data from pcre_study() */
unsigned long int match_limit; /* Maximum number of calls to match() */
void *callout_data; /* Data passed back in callouts */
const unsigned char *tables; /* Pointer to character tables */
unsigned long int match_limit_recursion; /* Max recursive calls to match() */
unsigned char **mark; /* For passing back a mark pointer */
void *executable_jit; /* Contains a pointer to a compiled jit code */
} pcre_extra;
/* Same structure as above, but with 16 bit char pointers. */
typedef struct pcre16_extra {
unsigned long int flags; /* Bits for which fields are set */
void *study_data; /* Opaque data from pcre_study() */
unsigned long int match_limit; /* Maximum number of calls to match() */
void *callout_data; /* Data passed back in callouts */
const unsigned char *tables; /* Pointer to character tables */
unsigned long int match_limit_recursion; /* Max recursive calls to match() */
PCRE_UCHAR16 **mark; /* For passing back a mark pointer */
void *executable_jit; /* Contains a pointer to a compiled jit code */
} pcre16_extra;
/* Same structure as above, but with 32 bit char pointers. */
typedef struct pcre32_extra {
unsigned long int flags; /* Bits for which fields are set */
void *study_data; /* Opaque data from pcre_study() */
unsigned long int match_limit; /* Maximum number of calls to match() */
void *callout_data; /* Data passed back in callouts */
const unsigned char *tables; /* Pointer to character tables */
unsigned long int match_limit_recursion; /* Max recursive calls to match() */
PCRE_UCHAR32 **mark; /* For passing back a mark pointer */
void *executable_jit; /* Contains a pointer to a compiled jit code */
} pcre32_extra;
/* The structure for passing out data via the pcre_callout_function. We use a
structure so that new fields can be added on the end in future versions,
without changing the API of the function, thereby allowing old clients to work
without modification. */
typedef struct pcre_callout_block {
int version; /* Identifies version of block */
/* ------------------------ Version 0 ------------------------------- */
int callout_number; /* Number compiled into pattern */
int *offset_vector; /* The offset vector */
PCRE_SPTR subject; /* The subject being matched */
int subject_length; /* The length of the subject */
int start_match; /* Offset to start of this match attempt */
int current_position; /* Where we currently are in the subject */
int capture_top; /* Max current capture */
int capture_last; /* Most recently closed capture */
void *callout_data; /* Data passed in with the call */
/* ------------------- Added for Version 1 -------------------------- */
int pattern_position; /* Offset to next item in the pattern */
int next_item_length; /* Length of next item in the pattern */
/* ------------------- Added for Version 2 -------------------------- */
const unsigned char *mark; /* Pointer to current mark or NULL */
/* ------------------------------------------------------------------ */
} pcre_callout_block;
/* Same structure as above, but with 16 bit char pointers. */
typedef struct pcre16_callout_block {
int version; /* Identifies version of block */
/* ------------------------ Version 0 ------------------------------- */
int callout_number; /* Number compiled into pattern */
int *offset_vector; /* The offset vector */
PCRE_SPTR16 subject; /* The subject being matched */
int subject_length; /* The length of the subject */
int start_match; /* Offset to start of this match attempt */
int current_position; /* Where we currently are in the subject */
int capture_top; /* Max current capture */
int capture_last; /* Most recently closed capture */
void *callout_data; /* Data passed in with the call */
/* ------------------- Added for Version 1 -------------------------- */
int pattern_position; /* Offset to next item in the pattern */
int next_item_length; /* Length of next item in the pattern */
/* ------------------- Added for Version 2 -------------------------- */
const PCRE_UCHAR16 *mark; /* Pointer to current mark or NULL */
/* ------------------------------------------------------------------ */
} pcre16_callout_block;
/* Same structure as above, but with 32 bit char pointers. */
typedef struct pcre32_callout_block {
int version; /* Identifies version of block */
/* ------------------------ Version 0 ------------------------------- */
int callout_number; /* Number compiled into pattern */
int *offset_vector; /* The offset vector */
PCRE_SPTR32 subject; /* The subject being matched */
int subject_length; /* The length of the subject */
int start_match; /* Offset to start of this match attempt */
int current_position; /* Where we currently are in the subject */
int capture_top; /* Max current capture */
int capture_last; /* Most recently closed capture */
void *callout_data; /* Data passed in with the call */
/* ------------------- Added for Version 1 -------------------------- */
int pattern_position; /* Offset to next item in the pattern */
int next_item_length; /* Length of next item in the pattern */
/* ------------------- Added for Version 2 -------------------------- */
const PCRE_UCHAR32 *mark; /* Pointer to current mark or NULL */
/* ------------------------------------------------------------------ */
} pcre32_callout_block;
/* Indirection for store get and free functions. These can be set to
alternative malloc/free functions if required. Special ones are used in the
non-recursive case for "frames". There is also an optional callout function
that is triggered by the (?) regex item. For Virtual Pascal, these definitions
have to take another form. */
#ifndef VPCOMPAT
PCRE_EXP_DECL void *(*pcre_malloc)(size_t);
PCRE_EXP_DECL void (*pcre_free)(void *);
PCRE_EXP_DECL void *(*pcre_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre_stack_free)(void *);
PCRE_EXP_DECL int (*pcre_callout)(pcre_callout_block *);
PCRE_EXP_DECL int (*pcre_stack_guard)(void);
PCRE_EXP_DECL void *(*pcre16_malloc)(size_t);
PCRE_EXP_DECL void (*pcre16_free)(void *);
PCRE_EXP_DECL void *(*pcre16_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre16_stack_free)(void *);
PCRE_EXP_DECL int (*pcre16_callout)(pcre16_callout_block *);
PCRE_EXP_DECL int (*pcre16_stack_guard)(void);
PCRE_EXP_DECL void *(*pcre32_malloc)(size_t);
PCRE_EXP_DECL void (*pcre32_free)(void *);
PCRE_EXP_DECL void *(*pcre32_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre32_stack_free)(void *);
PCRE_EXP_DECL int (*pcre32_callout)(pcre32_callout_block *);
PCRE_EXP_DECL int (*pcre32_stack_guard)(void);
#else /* VPCOMPAT */
PCRE_EXP_DECL void *pcre_malloc(size_t);
PCRE_EXP_DECL void pcre_free(void *);
PCRE_EXP_DECL void *pcre_stack_malloc(size_t);
PCRE_EXP_DECL void pcre_stack_free(void *);
PCRE_EXP_DECL int pcre_callout(pcre_callout_block *);
PCRE_EXP_DECL int pcre_stack_guard(void);
PCRE_EXP_DECL void *pcre16_malloc(size_t);
PCRE_EXP_DECL void pcre16_free(void *);
PCRE_EXP_DECL void *pcre16_stack_malloc(size_t);
PCRE_EXP_DECL void pcre16_stack_free(void *);
PCRE_EXP_DECL int pcre16_callout(pcre16_callout_block *);
PCRE_EXP_DECL int pcre16_stack_guard(void);
PCRE_EXP_DECL void *pcre32_malloc(size_t);
PCRE_EXP_DECL void pcre32_free(void *);
PCRE_EXP_DECL void *pcre32_stack_malloc(size_t);
PCRE_EXP_DECL void pcre32_stack_free(void *);
PCRE_EXP_DECL int pcre32_callout(pcre32_callout_block *);
PCRE_EXP_DECL int pcre32_stack_guard(void);
#endif /* VPCOMPAT */
/* User defined callback which provides a stack just before the match starts. */
typedef pcre_jit_stack *(*pcre_jit_callback)(void *);
typedef pcre16_jit_stack *(*pcre16_jit_callback)(void *);
typedef pcre32_jit_stack *(*pcre32_jit_callback)(void *);
/* Exported PCRE functions */
PCRE_EXP_DECL pcre *pcre_compile(const char *, int, const char **, int *,
const unsigned char *);
PCRE_EXP_DECL pcre16 *pcre16_compile(PCRE_SPTR16, int, const char **, int *,
const unsigned char *);
PCRE_EXP_DECL pcre32 *pcre32_compile(PCRE_SPTR32, int, const char **, int *,
const unsigned char *);
PCRE_EXP_DECL pcre *pcre_compile2(const char *, int, int *, const char **,
int *, const unsigned char *);
PCRE_EXP_DECL pcre16 *pcre16_compile2(PCRE_SPTR16, int, int *, const char **,
int *, const unsigned char *);
PCRE_EXP_DECL pcre32 *pcre32_compile2(PCRE_SPTR32, int, int *, const char **,
int *, const unsigned char *);
PCRE_EXP_DECL int pcre_config(int, void *);
PCRE_EXP_DECL int pcre16_config(int, void *);
PCRE_EXP_DECL int pcre32_config(int, void *);
PCRE_EXP_DECL int pcre_copy_named_substring(const pcre *, const char *,
int *, int, const char *, char *, int);
PCRE_EXP_DECL int pcre16_copy_named_substring(const pcre16 *, PCRE_SPTR16,
int *, int, PCRE_SPTR16, PCRE_UCHAR16 *, int);
PCRE_EXP_DECL int pcre32_copy_named_substring(const pcre32 *, PCRE_SPTR32,
int *, int, PCRE_SPTR32, PCRE_UCHAR32 *, int);
PCRE_EXP_DECL int pcre_copy_substring(const char *, int *, int, int,
char *, int);
PCRE_EXP_DECL int pcre16_copy_substring(PCRE_SPTR16, int *, int, int,
PCRE_UCHAR16 *, int);
PCRE_EXP_DECL int pcre32_copy_substring(PCRE_SPTR32, int *, int, int,
PCRE_UCHAR32 *, int);
PCRE_EXP_DECL int pcre_dfa_exec(const pcre *, const pcre_extra *,
const char *, int, int, int, int *, int , int *, int);
PCRE_EXP_DECL int pcre16_dfa_exec(const pcre16 *, const pcre16_extra *,
PCRE_SPTR16, int, int, int, int *, int , int *, int);
PCRE_EXP_DECL int pcre32_dfa_exec(const pcre32 *, const pcre32_extra *,
PCRE_SPTR32, int, int, int, int *, int , int *, int);
PCRE_EXP_DECL int pcre_exec(const pcre *, const pcre_extra *, PCRE_SPTR,
int, int, int, int *, int);
PCRE_EXP_DECL int pcre16_exec(const pcre16 *, const pcre16_extra *,
PCRE_SPTR16, int, int, int, int *, int);
PCRE_EXP_DECL int pcre32_exec(const pcre32 *, const pcre32_extra *,
PCRE_SPTR32, int, int, int, int *, int);
PCRE_EXP_DECL int pcre_jit_exec(const pcre *, const pcre_extra *,
PCRE_SPTR, int, int, int, int *, int,
pcre_jit_stack *);
PCRE_EXP_DECL int pcre16_jit_exec(const pcre16 *, const pcre16_extra *,
PCRE_SPTR16, int, int, int, int *, int,
pcre16_jit_stack *);
PCRE_EXP_DECL int pcre32_jit_exec(const pcre32 *, const pcre32_extra *,
PCRE_SPTR32, int, int, int, int *, int,
pcre32_jit_stack *);
PCRE_EXP_DECL void pcre_free_substring(const char *);
PCRE_EXP_DECL void pcre16_free_substring(PCRE_SPTR16);
PCRE_EXP_DECL void pcre32_free_substring(PCRE_SPTR32);
PCRE_EXP_DECL void pcre_free_substring_list(const char **);
PCRE_EXP_DECL void pcre16_free_substring_list(PCRE_SPTR16 *);
PCRE_EXP_DECL void pcre32_free_substring_list(PCRE_SPTR32 *);
PCRE_EXP_DECL int pcre_fullinfo(const pcre *, const pcre_extra *, int,
void *);
PCRE_EXP_DECL int pcre16_fullinfo(const pcre16 *, const pcre16_extra *, int,
void *);
PCRE_EXP_DECL int pcre32_fullinfo(const pcre32 *, const pcre32_extra *, int,
void *);
PCRE_EXP_DECL int pcre_get_named_substring(const pcre *, const char *,
int *, int, const char *, const char **);
PCRE_EXP_DECL int pcre16_get_named_substring(const pcre16 *, PCRE_SPTR16,
int *, int, PCRE_SPTR16, PCRE_SPTR16 *);
PCRE_EXP_DECL int pcre32_get_named_substring(const pcre32 *, PCRE_SPTR32,
int *, int, PCRE_SPTR32, PCRE_SPTR32 *);
PCRE_EXP_DECL int pcre_get_stringnumber(const pcre *, const char *);
PCRE_EXP_DECL int pcre16_get_stringnumber(const pcre16 *, PCRE_SPTR16);
PCRE_EXP_DECL int pcre32_get_stringnumber(const pcre32 *, PCRE_SPTR32);
PCRE_EXP_DECL int pcre_get_stringtable_entries(const pcre *, const char *,
char **, char **);
PCRE_EXP_DECL int pcre16_get_stringtable_entries(const pcre16 *, PCRE_SPTR16,
PCRE_UCHAR16 **, PCRE_UCHAR16 **);
PCRE_EXP_DECL int pcre32_get_stringtable_entries(const pcre32 *, PCRE_SPTR32,
PCRE_UCHAR32 **, PCRE_UCHAR32 **);
PCRE_EXP_DECL int pcre_get_substring(const char *, int *, int, int,
const char **);
PCRE_EXP_DECL int pcre16_get_substring(PCRE_SPTR16, int *, int, int,
PCRE_SPTR16 *);
PCRE_EXP_DECL int pcre32_get_substring(PCRE_SPTR32, int *, int, int,
PCRE_SPTR32 *);
PCRE_EXP_DECL int pcre_get_substring_list(const char *, int *, int,
const char ***);
PCRE_EXP_DECL int pcre16_get_substring_list(PCRE_SPTR16, int *, int,
PCRE_SPTR16 **);
PCRE_EXP_DECL int pcre32_get_substring_list(PCRE_SPTR32, int *, int,
PCRE_SPTR32 **);
PCRE_EXP_DECL const unsigned char *pcre_maketables(void);
PCRE_EXP_DECL const unsigned char *pcre16_maketables(void);
PCRE_EXP_DECL const unsigned char *pcre32_maketables(void);
PCRE_EXP_DECL int pcre_refcount(pcre *, int);
PCRE_EXP_DECL int pcre16_refcount(pcre16 *, int);
PCRE_EXP_DECL int pcre32_refcount(pcre32 *, int);
PCRE_EXP_DECL pcre_extra *pcre_study(const pcre *, int, const char **);
PCRE_EXP_DECL pcre16_extra *pcre16_study(const pcre16 *, int, const char **);
PCRE_EXP_DECL pcre32_extra *pcre32_study(const pcre32 *, int, const char **);
PCRE_EXP_DECL void pcre_free_study(pcre_extra *);
PCRE_EXP_DECL void pcre16_free_study(pcre16_extra *);
PCRE_EXP_DECL void pcre32_free_study(pcre32_extra *);
PCRE_EXP_DECL const char *pcre_version(void);
PCRE_EXP_DECL const char *pcre16_version(void);
PCRE_EXP_DECL const char *pcre32_version(void);
/* Utility functions for byte order swaps. */
PCRE_EXP_DECL int pcre_pattern_to_host_byte_order(pcre *, pcre_extra *,
const unsigned char *);
PCRE_EXP_DECL int pcre16_pattern_to_host_byte_order(pcre16 *, pcre16_extra *,
const unsigned char *);
PCRE_EXP_DECL int pcre32_pattern_to_host_byte_order(pcre32 *, pcre32_extra *,
const unsigned char *);
PCRE_EXP_DECL int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *,
PCRE_SPTR16, int, int *, int);
PCRE_EXP_DECL int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *,
PCRE_SPTR32, int, int *, int);
/* JIT compiler related functions. */
PCRE_EXP_DECL pcre_jit_stack *pcre_jit_stack_alloc(int, int);
PCRE_EXP_DECL pcre16_jit_stack *pcre16_jit_stack_alloc(int, int);
PCRE_EXP_DECL pcre32_jit_stack *pcre32_jit_stack_alloc(int, int);
PCRE_EXP_DECL void pcre_jit_stack_free(pcre_jit_stack *);
PCRE_EXP_DECL void pcre16_jit_stack_free(pcre16_jit_stack *);
PCRE_EXP_DECL void pcre32_jit_stack_free(pcre32_jit_stack *);
PCRE_EXP_DECL void pcre_assign_jit_stack(pcre_extra *,
pcre_jit_callback, void *);
PCRE_EXP_DECL void pcre16_assign_jit_stack(pcre16_extra *,
pcre16_jit_callback, void *);
PCRE_EXP_DECL void pcre32_assign_jit_stack(pcre32_extra *,
pcre32_jit_callback, void *);
PCRE_EXP_DECL void pcre_jit_free_unused_memory(void);
PCRE_EXP_DECL void pcre16_jit_free_unused_memory(void);
PCRE_EXP_DECL void pcre32_jit_free_unused_memory(void);

View File

@@ -0,0 +1,469 @@
when defined(pcreDynlib):
const pcreHeader = "<pcre.h>"
when not defined(pcreDll):
when hostOS == "windows":
const pcreDll = "pcre.dll"
elif hostOS == "macosx":
const pcreDll = "libpcre(.3|.1|).dylib"
else:
const pcreDll = "libpcre.so(.3|.1|)"
{.pragma: pcreImport, dynlib: pcreDll.}
else:
{.pragma: pcreImport, header: pcreHeader.}
{.deadCodeElim: on.} # Don't error unless unsupported features are used
else:
{. passC: "-DHAVE_CONFIG_H", passC: "-I private/pcre_src",
passL: "-I private/pcre_src" .}
{. compile: "private/pcre_src/pcre_byte_order.c" .}
{. compile: "private/pcre_src/pcre_compile.c" .}
{. compile: "private/pcre_src/pcre_config.c" .}
{. compile: "private/pcre_src/pcre_dfa_exec.c" .}
{. compile: "private/pcre_src/pcre_exec.c" .}
{. compile: "private/pcre_src/pcre_fullinfo.c" .}
{. compile: "private/pcre_src/pcre_get.c" .}
{. compile: "private/pcre_src/pcre_globals.c" .}
{. compile: "private/pcre_src/pcre_jit_compile.c" .}
{. compile: "private/pcre_src/pcre_maketables.c" .}
{. compile: "private/pcre_src/pcre_newline.c" .}
{. compile: "private/pcre_src/pcre_ord2utf8.c" .}
{. compile: "private/pcre_src/pcre_refcount.c" .}
{. compile: "private/pcre_src/pcre_string_utils.c" .}
{. compile: "private/pcre_src/pcre_study.c" .}
{. compile: "private/pcre_src/pcre_tables.c" .}
{. compile: "private/pcre_src/pcre_ucd.c" .}
{. compile: "private/pcre_src/pcre_valid_utf8.c" .}
{. compile: "private/pcre_src/pcre_version.c" .}
{. compile: "private/pcre_src/pcre_xclass.c" .}
{. compile: "private/pcre_src/pcre_chartables.c" .}
{.pragma: pcreImport.}
#************************************************
# Perl-Compatible Regular Expressions *
#***********************************************
# This is the public header file for the Pcre library, to be #included by
#applications that call the Pcre functions.
#
# Copyright (c) 1997-2014 University of Cambridge
#
#-----------------------------------------------------------------------------
#Redistribution and use in source and binary forms, with or without
#modification, are permitted provided that the following conditions are met:
#
# Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# Neither the name of the University of Cambridge nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
#THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
#AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
#IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
#ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
#LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
#CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
#SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
#INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
#CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
#ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
#POSSIBILITY OF SUCH DAMAGE.
#-----------------------------------------------------------------------------
#
# The current Pcre version information.
const
MAJOR* = 8
MINOR* = 36
PRERELEASE* = true
DATE* = 2014 - 9 - 26
# When an application links to a Pcre DLL in Windows, the symbols that are
#imported have to be identified as such. When building PCRE, the appropriate
#export setting is defined in pcre_internal.h, which includes this file. So we
#don't change existing definitions of PCRE_EXP_DECL and PCRECPP_EXP_DECL.
# By default, we use the standard "extern" declarations.
# Have to include stdlib.h in order to ensure that size_t is defined;
#it is needed here for malloc.
# Allow for C++ users
# Public options. Some are compile-time only, some are run-time only, and some
#are both. Most of the compile-time options are saved with the compiled regex so
#that they can be inspected during studying (and therefore JIT compiling). Note
#that pcre_study() has its own set of options. Originally, all the options
#defined here used distinct bits. However, almost all the bits in a 32-bit word
#are now used, so in order to conserve them, option bits that were previously
#only recognized at matching time (i.e. by pcre_exec() or pcre_dfa_exec()) may
#also be used for compile-time options that affect only compiling and are not
#relevant for studying or JIT compiling.
#
#Some options for pcre_compile() change its behaviour but do not affect the
#behaviour of the execution functions. Other options are passed through to the
#execution functions and affect their behaviour, with or without affecting the
#behaviour of pcre_compile().
#
#Options that can be passed to pcre_compile() are tagged Cx below, with these
#variants:
#
#C1 Affects compile only
#C2 Does not affect compile; affects exec, dfa_exec
#C3 Affects compile, exec, dfa_exec
#C4 Affects compile, exec, dfa_exec, study
#C5 Affects compile, exec, study
#
#Options that can be set for pcre_exec() and/or pcre_dfa_exec() are flagged with
#E and D, respectively. They take precedence over C3, C4, and C5 settings passed
#from pcre_compile(). Those that are compatible with JIT execution are flagged
#with J.
const
CASELESS* = 0x00000001
MULTILINE* = 0x00000002
DOTALL* = 0x00000004
EXTENDED* = 0x00000008
ANCHORED* = 0x00000010
DOLLAR_ENDONLY* = 0x00000020
EXTRA* = 0x00000040
NOTBOL* = 0x00000080
NOTEOL* = 0x00000100
UNGREEDY* = 0x00000200
NOTEMPTY* = 0x00000400
UTF8* = 0x00000800
UTF16* = 0x00000800
UTF32* = 0x00000800
NO_AUTO_CAPTURE* = 0x00001000
NO_UTF8_CHECK* = 0x00002000
NO_UTF16_CHECK* = 0x00002000
NO_UTF32_CHECK* = 0x00002000
AUTO_CALLOUT* = 0x00004000
PARTIAL_SOFT* = 0x00008000
PARTIAL* = 0x00008000
# This pair use the same bit.
const
NEVER_UTF* = 0x00010000
DFA_SHORTEST* = 0x00010000
# This pair use the same bit.
const
NO_AUTO_POSSESS* = 0x00020000
DFA_RESTART* = 0x00020000
FIRSTLINE* = 0x00040000
DUPNAMES* = 0x00080000
NEWLINE_CR* = 0x00100000
NEWLINE_LF* = 0x00200000
NEWLINE_CRLF* = 0x00300000
NEWLINE_ANY* = 0x00400000
NEWLINE_ANYCRLF* = 0x00500000
BSR_ANYCRLF* = 0x00800000
BSR_UNICODE* = 0x01000000
JAVASCRIPT_COMPAT* = 0x02000000
NO_START_OPTIMIZE* = 0x04000000
NO_START_OPTIMISE* = 0x04000000
PARTIAL_HARD* = 0x08000000
NOTEMPTY_ATSTART* = 0x10000000
UCP* = 0x20000000
# Exec-time and get/set-time error codes
const
ERROR_NOMATCH* = (- 1)
ERROR_NULL* = (- 2)
ERROR_BADOPTION* = (- 3)
ERROR_BADMAGIC* = (- 4)
ERROR_UNKNOWN_OPCODE* = (- 5)
ERROR_UNKNOWN_NODE* = (- 5) # For backward compatibility
ERROR_NOMEMORY* = (- 6)
ERROR_NOSUBSTRING* = (- 7)
ERROR_MATCHLIMIT* = (- 8)
ERROR_CALLOUT* = (- 9) # Never used by Pcre itself
ERROR_BADUTF8* = (- 10) # Same for 8/16/32
ERROR_BADUTF16* = (- 10) # Same for 8/16/32
ERROR_BADUTF32* = (- 10) # Same for 8/16/32
ERROR_BADUTF8_OFFSET* = (- 11) # Same for 8/16
ERROR_BADUTF16_OFFSET* = (- 11) # Same for 8/16
ERROR_PARTIAL* = (- 12)
ERROR_BADPARTIAL* = (- 13)
ERROR_INTERNAL* = (- 14)
ERROR_BADCOUNT* = (- 15)
ERROR_DFA_UITEM* = (- 16)
ERROR_DFA_UCOND* = (- 17)
ERROR_DFA_UMLIMIT* = (- 18)
ERROR_DFA_WSSIZE* = (- 19)
ERROR_DFA_RECURSE* = (- 20)
ERROR_RECURSIONLIMIT* = (- 21)
ERROR_NULLWSLIMIT* = (- 22) # No longer actually used
ERROR_BADNEWLINE* = (- 23)
ERROR_BADOFFSET* = (- 24)
ERROR_SHORTUTF8* = (- 25)
ERROR_SHORTUTF16* = (- 25) # Same for 8/16
ERROR_RECURSELOOP* = (- 26)
ERROR_JIT_STACKLIMIT* = (- 27)
ERROR_BADMODE* = (- 28)
ERROR_BADENDIANNESS* = (- 29)
ERROR_DFA_BADRESTART* = (- 30)
ERROR_JIT_BADOPTION* = (- 31)
ERROR_BADLENGTH* = (- 32)
ERROR_UNSET* = (- 33)
# Specific error codes for UTF-8 validity checks
const
UTF8_ERR0* = 0
UTF8_ERR1* = 1
UTF8_ERR2* = 2
UTF8_ERR3* = 3
UTF8_ERR4* = 4
UTF8_ERR5* = 5
UTF8_ERR6* = 6
UTF8_ERR7* = 7
UTF8_ERR8* = 8
UTF8_ERR9* = 9
UTF8_ERR10* = 10
UTF8_ERR11* = 11
UTF8_ERR12* = 12
UTF8_ERR13* = 13
UTF8_ERR14* = 14
UTF8_ERR15* = 15
UTF8_ERR16* = 16
UTF8_ERR17* = 17
UTF8_ERR18* = 18
UTF8_ERR19* = 19
UTF8_ERR20* = 20
UTF8_ERR21* = 21
UTF8_ERR22* = 22
# Specific error codes for UTF-16 validity checks
const
UTF16_ERR0* = 0
UTF16_ERR1* = 1
UTF16_ERR2* = 2
UTF16_ERR3* = 3
UTF16_ERR4* = 4
# Specific error codes for UTF-32 validity checks
const
UTF32_ERR0* = 0
UTF32_ERR1* = 1
UTF32_ERR2* = 2
UTF32_ERR3* = 3
# Request types for pcre_fullinfo()
const
INFO_OPTIONS* = 0
INFO_SIZE* = 1
INFO_CAPTURECOUNT* = 2
INFO_BACKREFMAX* = 3
INFO_FIRSTBYTE* = 4
INFO_FIRSTCHAR* = 4
INFO_FIRSTTABLE* = 5
INFO_LASTLITERAL* = 6
INFO_NAMEENTRYSIZE* = 7
INFO_NAMECOUNT* = 8
INFO_NAMETABLE* = 9
INFO_STUDYSIZE* = 10
INFO_DEFAULT_TABLES* = 11
INFO_OKPARTIAL* = 12
INFO_JCHANGED* = 13
INFO_HASCRORLF* = 14
INFO_MINLENGTH* = 15
INFO_JIT* = 16
INFO_JITSIZE* = 17
INFO_MAXLOOKBEHIND* = 18
INFO_FIRSTCHARACTER* = 19
INFO_FIRSTCHARACTERFLAGS* = 20
INFO_REQUIREDCHAR* = 21
INFO_REQUIREDCHARFLAGS* = 22
INFO_MATCHLIMIT* = 23
INFO_RECURSIONLIMIT* = 24
INFO_MATCH_EMPTY* = 25
# Request types for pcre_config(). Do not re-arrange, in order to remain
#compatible.
const
CONFIG_UTF8* = 0
CONFIG_NEWLINE* = 1
CONFIG_LINK_SIZE* = 2
CONFIG_POSIX_MALLOC_THRESHOLD* = 3
CONFIG_MATCH_LIMIT* = 4
CONFIG_STACKRECURSE* = 5
CONFIG_UNICODE_PROPERTIES* = 6
CONFIG_MATCH_LIMIT_RECURSION* = 7
CONFIG_BSR* = 8
CONFIG_JIT* = 9
CONFIG_UTF16* = 10
CONFIG_JITTARGET* = 11
CONFIG_UTF32* = 12
CONFIG_PARENS_LIMIT* = 13
# Request types for pcre_study(). Do not re-arrange, in order to remain
#compatible.
const
STUDY_JIT_COMPILE* = 0x00000001
STUDY_JIT_PARTIAL_SOFT_COMPILE* = 0x00000002
STUDY_JIT_PARTIAL_HARD_COMPILE* = 0x00000004
STUDY_EXTRA_NEEDED* = 0x00000008
# Bit flags for the pcre[16|32]_extra structure. Do not re-arrange or redefine
#these bits, just add new ones on the end, in order to remain compatible.
const
EXTRA_STUDY_DATA* = 0x00000001
EXTRA_MATCH_LIMIT* = 0x00000002
EXTRA_CALLOUT_DATA* = 0x00000004
EXTRA_TABLES* = 0x00000008
EXTRA_MATCH_LIMIT_RECURSION* = 0x00000010
EXTRA_MARK* = 0x00000020
EXTRA_EXECUTABLE_JIT* = 0x00000040
# Types
type
Pcre* = object
Pcre16* = object
Pcre32* = object
jit_stack* = object
jit_stack16* = object
jit_stack32* = object
# The structure for passing additional data to pcre_exec(). This is defined in
#such as way as to be extensible. Always add new fields at the end, in order to
#remain compatible.
type
ExtraData* = object
flags*: culong # Bits for which fields are set
study_data*: pointer # Opaque data from pcre_study()
match_limit*: culong # Maximum number of calls to match()
callout_data*: pointer # Data passed back in callouts
tables*: ptr cuchar # Pointer to character tables
match_limit_recursion*: culong # Max
# recursive calls to match()
mark*: ptr ptr cuchar # For passing back a mark pointer
executable_jit*: pointer # Contains a pointer to a compiled jit code
# The structure for passing out data via the pcre_callout_function. We use a
#structure so that new fields can be added on the end in future versions,
#without changing the API of the function, thereby allowing old clients to work
#without modification.
type
callout_block* = object
version*: cint # Identifies version of block
# ------------------------ Version 0 -------------------------------
callout_number*: cint # Number compiled into pattern
offset_vector*: ptr cint # The offset vector
subject*: cstring # The subject being matched
subject_length*: cint # The length of the subject
start_match*: cint # Offset to start of this match attempt
current_position*: cint # Where we currently are in the subject
capture_top*: cint # Max current capture
capture_last*: cint # Most recently closed capture
callout_data*: pointer # Data passed in with the call
# ------------------- Added for Version 1
# --------------------------
pattern_position*: cint # Offset to next item in the pattern
next_item_length*: cint # Length of next item in the pattern
# ------------------- Added for Version 2
# --------------------------
mark*: ptr cuchar # Pointer to current mark or NULL
#
# ------------------------------------------------------------------
# Indirection for store get and free functions. These can be set to
#alternative malloc/free functions if required. Special ones are used in the
#non-recursive case for "frames". There is also an optional callout function
#that is triggered by the (?) regex item. For Virtual Pascal, these definitions
#have to take another form.
proc malloc*(a2: csize): pointer {.cdecl, importc: "pcre_malloc", pcreImport.}
proc free*(a2: pointer) {.cdecl, importc: "pcre_free", pcreImport.}
proc stack_malloc*(a2: csize): pointer {.cdecl, importc: "pcre_stack_malloc", pcreImport.}
proc stack_free*(a2: pointer) {.cdecl, importc: "pcre_free", pcreImport.}
proc callout*(a2: ptr callout_block): cint {.cdecl, importc: "pcre_callout", pcreImport.}
proc stack_guard*(): cint {.cdecl, importc: "pcre_stack_guard", pcreImport.}
# User defined callback which provides a stack just before the match starts.
type
jit_callback* = proc (a2: pointer): ptr jit_stack {.cdecl.}
# Exported Pcre functions
proc compile*(a2: cstring; a3: cint; a4: ptr cstring; a5: ptr cint;
a6: ptr cuchar): ptr Pcre {.cdecl, importc: "pcre_compile",
pcreImport.}
proc compile2*(a2: cstring; a3: cint; a4: ptr cint; a5: ptr cstring;
a6: ptr cint; a7: ptr cuchar): ptr Pcre {.cdecl,
importc: "pcre_compile2", pcreImport.}
proc config*(a2: cint; a3: pointer): cint {.cdecl, importc: "pcre_config",
pcreImport.}
proc copy_named_substring*(a2: ptr Pcre; a3: cstring; a4: ptr cint; a5: cint;
a6: cstring; a7: cstring; a8: cint): cint {.cdecl,
importc: "pcre_copy_named_substring", pcreImport.}
proc copy_substring*(a2: cstring; a3: ptr cint; a4: cint; a5: cint; a6: cstring;
a7: cint): cint {.cdecl, importc: "pcre_copy_substring",
pcreImport.}
proc dfa_exec*(a2: ptr Pcre; a3: ptr ExtraData; a4: cstring; a5: cint; a6: cint;
a7: cint; a8: ptr cint; a9: cint; a10: ptr cint; a11: cint): cint {.
cdecl, importc: "pcre_dfa_exec", pcreImport.}
proc exec*(a2: ptr Pcre; a3: ptr ExtraData; a4: cstring; a5: cint; a6: cint; a7: cint;
a8: ptr cint; a9: cint): cint {.cdecl, importc: "pcre_exec",
pcreImport.}
proc jit_exec*(a2: ptr Pcre; a3: ptr ExtraData; a4: cstring; a5: cint; a6: cint;
a7: cint; a8: ptr cint; a9: cint; a10: ptr jit_stack): cint {.
cdecl, importc: "pcre_jit_exec", pcreImport.}
proc free_substring*(a2: cstring) {.cdecl, importc: "pcre_free_substring",
pcreImport.}
proc free_substring_list*(a2: ptr cstring) {.cdecl,
importc: "pcre_free_substring_list", pcreImport.}
proc fullinfo*(a2: ptr Pcre; a3: ptr ExtraData; a4: cint; a5: pointer): cint {.
cdecl, importc: "pcre_fullinfo", pcreImport.}
proc get_named_substring*(a2: ptr Pcre; a3: cstring; a4: ptr cint; a5: cint;
a6: cstring; a7: cstringArray): cint {.cdecl,
importc: "pcre_get_named_substring", pcreImport.}
proc get_stringnumber*(a2: ptr Pcre; a3: cstring): cint {.cdecl,
importc: "pcre_get_stringnumber", pcreImport.}
proc get_stringtable_entries*(a2: ptr Pcre; a3: cstring; a4: cstringArray;
a5: cstringArray): cint {.cdecl,
importc: "pcre_get_stringtable_entries", pcreImport.}
proc get_substring*(a2: cstring; a3: ptr cint; a4: cint; a5: cint;
a6: cstringArray): cint {.cdecl,
importc: "pcre_get_substring", pcreImport.}
proc get_substring_list*(a2: cstring; a3: ptr cint; a4: cint;
a5: ptr cstringArray): cint {.cdecl,
importc: "pcre_get_substring_list", pcreImport.}
proc maketables*(): ptr cuchar {.cdecl, importc: "pcre_maketables",
pcreImport.}
proc refcount*(a2: ptr Pcre; a3: cint): cint {.cdecl, importc: "pcre_refcount",
pcreImport.}
proc study*(a2: ptr Pcre; a3: cint; a4: ptr cstring): ptr ExtraData {.cdecl,
importc: "pcre_study", pcreImport.}
proc free_study*(a2: ptr ExtraData) {.cdecl, importc: "pcre_free_study",
pcreImport.}
proc version*(): cstring {.cdecl, importc: "pcre_version", pcreImport.}
# Utility functions for byte order swaps.
proc pattern_to_host_byte_order*(a2: ptr Pcre; a3: ptr ExtraData; a4: ptr cuchar): cint {.
cdecl, importc: "pcre_pattern_to_host_byte_order", pcreImport.}
# JIT compiler related functions.
proc jit_stack_alloc*(a2: cint; a3: cint): ptr jit_stack {.cdecl,
importc: "pcre_jit_stack_alloc", pcreImport.}
proc jit_stack_free*(a2: ptr jit_stack) {.cdecl, importc: "pcre_jit_stack_free",
pcreImport.}
proc assign_jit_stack*(a2: ptr ExtraData; a3: jit_callback; a4: pointer) {.cdecl,
importc: "pcre_assign_jit_stack", pcreImport.}
proc jit_free_unused_memory*() {.cdecl, importc: "pcre_jit_free_unused_memory",
pcreImport.}

View File

@@ -0,0 +1,349 @@
/* config.h. Generated from config.h.in by configure. */
/* config.h.in. Generated from configure.ac by autoheader. */
/* PCRE is written in Standard C, but there are a few non-standard things it
can cope with, allowing it to run on SunOS4 and other "close to standard"
systems.
In environments that support the GNU autotools, config.h.in is converted into
config.h by the "configure" script. In environments that use CMake,
config-cmake.in is converted into config.h. If you are going to build PCRE "by
hand" without using "configure" or CMake, you should copy the distributed
config.h.generic to config.h, and edit the macro definitions to be the way you
need them. You must then add -DHAVE_CONFIG_H to all of your compile commands,
so that config.h is included at the start of every source.
Alternatively, you can avoid editing by using -D on the compiler command line
to set the macro values. In this case, you do not have to set -DHAVE_CONFIG_H,
but if you do, default values will be taken from config.h for non-boolean
macros that are not defined on the command line.
Boolean macros such as HAVE_STDLIB_H and SUPPORT_PCRE8 should either be defined
(conventionally to 1) for TRUE, and not defined at all for FALSE. All such
macros are listed as a commented #undef in config.h.generic. Macros such as
MATCH_LIMIT, whose actual value is relevant, have defaults defined, but are
surrounded by #ifndef/#endif lines so that the value can be overridden by -D.
PCRE uses memmove() if HAVE_MEMMOVE is defined; otherwise it uses bcopy() if
HAVE_BCOPY is defined. If your system has neither bcopy() nor memmove(), make
sure both macros are undefined; an emulation function will then be used. */
/* By default, the \R escape sequence matches any Unicode line ending
character or sequence of characters. If BSR_ANYCRLF is defined (to any
value), this is changed so that backslash-R matches only CR, LF, or CRLF.
The build-time default can be overridden by the user of PCRE at runtime. */
/* #undef BSR_ANYCRLF */
/* If you are compiling for a system that uses EBCDIC instead of ASCII
character codes, define this macro to any value. You must also edit the
NEWLINE macro below to set a suitable EBCDIC newline, commonly 21 (0x15).
On systems that can use "configure" or CMake to set EBCDIC, NEWLINE is
automatically adjusted. When EBCDIC is set, PCRE assumes that all input
strings are in EBCDIC. If you do not define this macro, PCRE will assume
input strings are ASCII or UTF-8/16/32 Unicode. It is not possible to build
a version of PCRE that supports both EBCDIC and UTF-8/16/32. */
/* #undef EBCDIC */
/* In an EBCDIC environment, define this macro to any value to arrange for the
NL character to be 0x25 instead of the default 0x15. NL plays the role that
LF does in an ASCII/Unicode environment. The value must also be set in the
NEWLINE macro below. On systems that can use "configure" or CMake to set
EBCDIC_NL25, the adjustment of NEWLINE is automatic. */
/* #undef EBCDIC_NL25 */
/* Define to 1 if you have the `bcopy' function. */
#define HAVE_BCOPY 1
/* Define to 1 if you have the <bits/type_traits.h> header file. */
/* #undef HAVE_BITS_TYPE_TRAITS_H */
/* Define to 1 if you have the <bzlib.h> header file. */
#define HAVE_BZLIB_H 1
/* Define to 1 if you have the <dirent.h> header file. */
#define HAVE_DIRENT_H 1
/* Define to 1 if you have the <dlfcn.h> header file. */
#define HAVE_DLFCN_H 1
/* Define to 1 if you have the <editline/readline.h> header file. */
/* #undef HAVE_EDITLINE_READLINE_H */
/* Define to 1 if you have the <edit/readline/readline.h> header file. */
/* #undef HAVE_EDIT_READLINE_READLINE_H */
/* Define to 1 if you have the <inttypes.h> header file. */
#define HAVE_INTTYPES_H 1
/* Define to 1 if you have the <limits.h> header file. */
#define HAVE_LIMITS_H 1
/* Define to 1 if the system has the type `long long'. */
/* #undef HAVE_LONG_LONG */
/* Define to 1 if you have the `memmove' function. */
#define HAVE_MEMMOVE 1
/* Define to 1 if you have the <memory.h> header file. */
#define HAVE_MEMORY_H 1
/* Define if you have POSIX threads libraries and header files. */
/* #undef HAVE_PTHREAD */
/* Have PTHREAD_PRIO_INHERIT. */
/* #undef HAVE_PTHREAD_PRIO_INHERIT */
/* Define to 1 if you have the <readline/history.h> header file. */
/* #undef HAVE_READLINE_HISTORY_H */
/* Define to 1 if you have the <readline/readline.h> header file. */
/* #undef HAVE_READLINE_READLINE_H */
/* Define to 1 if you have the <stdint.h> header file. */
#define HAVE_STDINT_H 1
/* Define to 1 if you have the <stdlib.h> header file. */
#define HAVE_STDLIB_H 1
/* Define to 1 if you have the `strerror' function. */
#define HAVE_STRERROR 1
/* Define to 1 if you have the <string> header file. */
/* #undef HAVE_STRING */
/* Define to 1 if you have the <strings.h> header file. */
#define HAVE_STRINGS_H 1
/* Define to 1 if you have the <string.h> header file. */
#define HAVE_STRING_H 1
/* Define to 1 if you have `strtoimax'. */
/* #undef HAVE_STRTOIMAX */
/* Define to 1 if you have `strtoll'. */
/* #undef HAVE_STRTOLL */
/* Define to 1 if you have `strtoq'. */
/* #undef HAVE_STRTOQ */
/* Define to 1 if you have the <sys/stat.h> header file. */
#define HAVE_SYS_STAT_H 1
/* Define to 1 if you have the <sys/types.h> header file. */
#define HAVE_SYS_TYPES_H 1
/* Define to 1 if you have the <type_traits.h> header file. */
/* #undef HAVE_TYPE_TRAITS_H */
/* Define to 1 if you have the <unistd.h> header file. */
#define HAVE_UNISTD_H 1
/* Define to 1 if the system has the type `unsigned long long'. */
/* #undef HAVE_UNSIGNED_LONG_LONG */
/* Define to 1 if the compiler supports simple visibility declarations. */
#define HAVE_VISIBILITY 1
/* Define to 1 if you have the <windows.h> header file. */
/* #undef HAVE_WINDOWS_H */
/* Define to 1 if you have the <zlib.h> header file. */
#define HAVE_ZLIB_H 1
/* Define to 1 if you have `_strtoi64'. */
/* #undef HAVE__STRTOI64 */
/* The value of LINK_SIZE determines the number of bytes used to store links
as offsets within the compiled regex. The default is 2, which allows for
compiled patterns up to 64K long. This covers the vast majority of cases.
However, PCRE can also be compiled to use 3 or 4 bytes instead. This allows
for longer patterns in extreme cases. */
#define LINK_SIZE 2
/* Define to the sub-directory in which libtool stores uninstalled libraries.
*/
#define LT_OBJDIR ".libs/"
/* The value of MATCH_LIMIT determines the default number of times the
internal match() function can be called during a single execution of
pcre_exec(). There is a runtime interface for setting a different limit.
The limit exists in order to catch runaway regular expressions that take
for ever to determine that they do not match. The default is set very large
so that it does not accidentally catch legitimate cases. */
#define MATCH_LIMIT 10000000
/* The above limit applies to all calls of match(), whether or not they
increase the recursion depth. In some environments it is desirable to limit
the depth of recursive calls of match() more strictly, in order to restrict
the maximum amount of stack (or heap, if NO_RECURSE is defined) that is
used. The value of MATCH_LIMIT_RECURSION applies only to recursive calls of
match(). To have any useful effect, it must be less than the value of
MATCH_LIMIT. The default is to use the same value as MATCH_LIMIT. There is
a runtime method for setting a different limit. */
#define MATCH_LIMIT_RECURSION MATCH_LIMIT
/* This limit is parameterized just in case anybody ever wants to change it.
Care must be taken if it is increased, because it guards against integer
overflow caused by enormously large patterns. */
#define MAX_NAME_COUNT 10000
/* This limit is parameterized just in case anybody ever wants to change it.
Care must be taken if it is increased, because it guards against integer
overflow caused by enormously large patterns. */
#define MAX_NAME_SIZE 32
/* The value of NEWLINE determines the default newline character sequence.
PCRE client programs can override this by selecting other values at run
time. In ASCII environments, the value can be 10 (LF), 13 (CR), or 3338
(CRLF); in EBCDIC environments the value can be 21 or 37 (LF), 13 (CR), or
3349 or 3365 (CRLF) because there are two alternative codepoints (0x15 and
0x25) that are used as the NL line terminator that is equivalent to ASCII
LF. In both ASCII and EBCDIC environments the value can also be -1 (ANY),
or -2 (ANYCRLF). */
#define NEWLINE -2
/* PCRE uses recursive function calls to handle backtracking while matching.
This can sometimes be a problem on systems that have stacks of limited
size. Define NO_RECURSE to any value to get a version that doesn't use
recursion in the match() function; instead it creates its own stack by
steam using pcre_recurse_malloc() to obtain memory from the heap. For more
detail, see the comments and other stuff just above the match() function.
*/
/* #undef NO_RECURSE */
/* Name of package */
#define PACKAGE "pcre"
/* Define to the address where bug reports for this package should be sent. */
#define PACKAGE_BUGREPORT ""
/* Define to the full name of this package. */
#define PACKAGE_NAME "PCRE"
/* Define to the full name and version of this package. */
#define PACKAGE_STRING "PCRE 8.36"
/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "pcre"
/* Define to the home page for this package. */
#define PACKAGE_URL ""
/* Define to the version of this package. */
#define PACKAGE_VERSION "8.36"
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
parentheses (of any kind) in a pattern. This limits the amount of system
stack that is used while compiling a pattern. */
#define PARENS_NEST_LIMIT 250
/* to make a symbol visible */
#define PCRECPP_EXP_DECL extern __attribute__ ((visibility ("default")))
/* to make a symbol visible */
#define PCRECPP_EXP_DEFN __attribute__ ((visibility ("default")))
/* The value of PCREGREP_BUFSIZE determines the size of buffer used by
pcregrep to hold parts of the file it is searching. This is also the
minimum value. The actual amount of memory used by pcregrep is three times
this number, because it allows for the buffering of "before" and "after"
lines. */
#define PCREGREP_BUFSIZE 20480
/* to make a symbol visible */
#define PCREPOSIX_EXP_DECL extern __attribute__ ((visibility ("default")))
/* to make a symbol visible */
#define PCREPOSIX_EXP_DEFN extern __attribute__ ((visibility ("default")))
/* to make a symbol visible */
#define PCRE_EXP_DATA_DEFN __attribute__ ((visibility ("default")))
/* to make a symbol visible */
#define PCRE_EXP_DECL extern __attribute__ ((visibility ("default")))
/* If you are compiling for a system other than a Unix-like system or
Win32, and it needs some magic to be inserted before the definition
of a function that is exported by the library, define this macro to
contain the relevant magic. If you do not define this macro, a suitable
__declspec value is used for Windows systems; in other environments
"extern" is used for a C compiler and "extern C" for a C++ compiler.
This macro apears at the start of every exported function that is part
of the external API. It does not appear on functions that are "external"
in the C sense, but which are internal to the library. */
#define PCRE_EXP_DEFN __attribute__ ((visibility ("default")))
/* Define to any value if linking statically (TODO: make nice with Libtool) */
/* #undef PCRE_STATIC */
/* When calling PCRE via the POSIX interface, additional working storage is
required for holding the pointers to capturing substrings because PCRE
requires three integers per substring, whereas the POSIX interface provides
only two. If the number of expected substrings is small, the wrapper
function uses space on the stack, because this is faster than using
malloc() for each call. The threshold above which the stack is no longer
used is defined by POSIX_MALLOC_THRESHOLD. */
#define POSIX_MALLOC_THRESHOLD 10
/* Define to necessary symbol if this constant uses a non-standard name on
your system. */
/* #undef PTHREAD_CREATE_JOINABLE */
/* Define to 1 if you have the ANSI C header files. */
#define STDC_HEADERS 1
/* Define to any value to enable support for Just-In-Time compiling. */
/* #undef SUPPORT_JIT */
/* Define to any value to allow pcregrep to be linked with libbz2, so that it
is able to handle .bz2 files. */
/* #undef SUPPORT_LIBBZ2 */
/* Define to any value to allow pcretest to be linked with libedit. */
/* #undef SUPPORT_LIBEDIT */
/* Define to any value to allow pcretest to be linked with libreadline. */
/* #undef SUPPORT_LIBREADLINE */
/* Define to any value to allow pcregrep to be linked with libz, so that it is
able to handle .gz files. */
/* #undef SUPPORT_LIBZ */
/* Define to any value to enable the 16 bit PCRE library. */
/* #undef SUPPORT_PCRE16 */
/* Define to any value to enable the 32 bit PCRE library. */
/* #undef SUPPORT_PCRE32 */
/* Define to any value to enable the 8 bit PCRE library. */
#define SUPPORT_PCRE8 /**/
/* Define to any value to enable JIT support in pcregrep. */
/* #undef SUPPORT_PCREGREP_JIT */
/* Define to any value to enable support for Unicode properties. */
#define SUPPORT_UCP
/* Define to any value to enable support for the UTF-8/16/32 Unicode encoding.
This will work even in an EBCDIC environment, but it is incompatible with
the EBCDIC macro. That is, PCRE can support *either* EBCDIC code *or*
ASCII/UTF-8/16/32, but not both at once. */
#define SUPPORT_UTF
/* Define to any value for valgrind support to find invalid memory reads. */
/* #undef SUPPORT_VALGRIND */
/* Version number of package */
#define VERSION "8.36"
/* Define to empty if `const' does not conform to ANSI C. */
/* #undef const */
/* Define to the type of a signed integer type of width exactly 64 bits if
such a type exists and the standard includes do not define it. */
/* #undef int64_t */
/* Define to `unsigned int' if <sys/types.h> does not define. */
/* #undef size_t */

View File

@@ -0,0 +1,677 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* This is the public header file for the PCRE library, to be #included by
applications that call the PCRE functions.
Copyright (c) 1997-2014 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
#ifndef _PCRE_H
#define _PCRE_H
/* The current PCRE version information. */
#define PCRE_MAJOR 8
#define PCRE_MINOR 36
#define PCRE_PRERELEASE
#define PCRE_DATE 2014-09-26
/* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE, the appropriate
export setting is defined in pcre_internal.h, which includes this file. So we
don't change existing definitions of PCRE_EXP_DECL and PCRECPP_EXP_DECL. */
#if defined(_WIN32) && !defined(PCRE_STATIC)
# ifndef PCRE_EXP_DECL
# define PCRE_EXP_DECL extern __declspec(dllimport)
# endif
# ifdef __cplusplus
# ifndef PCRECPP_EXP_DECL
# define PCRECPP_EXP_DECL extern __declspec(dllimport)
# endif
# ifndef PCRECPP_EXP_DEFN
# define PCRECPP_EXP_DEFN __declspec(dllimport)
# endif
# endif
#endif
/* By default, we use the standard "extern" declarations. */
#ifndef PCRE_EXP_DECL
# ifdef __cplusplus
# define PCRE_EXP_DECL extern "C"
# else
# define PCRE_EXP_DECL extern
# endif
#endif
#ifdef __cplusplus
# ifndef PCRECPP_EXP_DECL
# define PCRECPP_EXP_DECL extern
# endif
# ifndef PCRECPP_EXP_DEFN
# define PCRECPP_EXP_DEFN
# endif
#endif
/* Have to include stdlib.h in order to ensure that size_t is defined;
it is needed here for malloc. */
#include <stdlib.h>
/* Allow for C++ users */
#ifdef __cplusplus
extern "C" {
#endif
/* Public options. Some are compile-time only, some are run-time only, and some
are both. Most of the compile-time options are saved with the compiled regex so
that they can be inspected during studying (and therefore JIT compiling). Note
that pcre_study() has its own set of options. Originally, all the options
defined here used distinct bits. However, almost all the bits in a 32-bit word
are now used, so in order to conserve them, option bits that were previously
only recognized at matching time (i.e. by pcre_exec() or pcre_dfa_exec()) may
also be used for compile-time options that affect only compiling and are not
relevant for studying or JIT compiling.
Some options for pcre_compile() change its behaviour but do not affect the
behaviour of the execution functions. Other options are passed through to the
execution functions and affect their behaviour, with or without affecting the
behaviour of pcre_compile().
Options that can be passed to pcre_compile() are tagged Cx below, with these
variants:
C1 Affects compile only
C2 Does not affect compile; affects exec, dfa_exec
C3 Affects compile, exec, dfa_exec
C4 Affects compile, exec, dfa_exec, study
C5 Affects compile, exec, study
Options that can be set for pcre_exec() and/or pcre_dfa_exec() are flagged with
E and D, respectively. They take precedence over C3, C4, and C5 settings passed
from pcre_compile(). Those that are compatible with JIT execution are flagged
with J. */
#define PCRE_CASELESS 0x00000001 /* C1 */
#define PCRE_MULTILINE 0x00000002 /* C1 */
#define PCRE_DOTALL 0x00000004 /* C1 */
#define PCRE_EXTENDED 0x00000008 /* C1 */
#define PCRE_ANCHORED 0x00000010 /* C4 E D */
#define PCRE_DOLLAR_ENDONLY 0x00000020 /* C2 */
#define PCRE_EXTRA 0x00000040 /* C1 */
#define PCRE_NOTBOL 0x00000080 /* E D J */
#define PCRE_NOTEOL 0x00000100 /* E D J */
#define PCRE_UNGREEDY 0x00000200 /* C1 */
#define PCRE_NOTEMPTY 0x00000400 /* E D J */
#define PCRE_UTF8 0x00000800 /* C4 ) */
#define PCRE_UTF16 0x00000800 /* C4 ) Synonyms */
#define PCRE_UTF32 0x00000800 /* C4 ) */
#define PCRE_NO_AUTO_CAPTURE 0x00001000 /* C1 */
#define PCRE_NO_UTF8_CHECK 0x00002000 /* C1 E D J ) */
#define PCRE_NO_UTF16_CHECK 0x00002000 /* C1 E D J ) Synonyms */
#define PCRE_NO_UTF32_CHECK 0x00002000 /* C1 E D J ) */
#define PCRE_AUTO_CALLOUT 0x00004000 /* C1 */
#define PCRE_PARTIAL_SOFT 0x00008000 /* E D J ) Synonyms */
#define PCRE_PARTIAL 0x00008000 /* E D J ) */
/* This pair use the same bit. */
#define PCRE_NEVER_UTF 0x00010000 /* C1 ) Overlaid */
#define PCRE_DFA_SHORTEST 0x00010000 /* D ) Overlaid */
/* This pair use the same bit. */
#define PCRE_NO_AUTO_POSSESS 0x00020000 /* C1 ) Overlaid */
#define PCRE_DFA_RESTART 0x00020000 /* D ) Overlaid */
#define PCRE_FIRSTLINE 0x00040000 /* C3 */
#define PCRE_DUPNAMES 0x00080000 /* C1 */
#define PCRE_NEWLINE_CR 0x00100000 /* C3 E D */
#define PCRE_NEWLINE_LF 0x00200000 /* C3 E D */
#define PCRE_NEWLINE_CRLF 0x00300000 /* C3 E D */
#define PCRE_NEWLINE_ANY 0x00400000 /* C3 E D */
#define PCRE_NEWLINE_ANYCRLF 0x00500000 /* C3 E D */
#define PCRE_BSR_ANYCRLF 0x00800000 /* C3 E D */
#define PCRE_BSR_UNICODE 0x01000000 /* C3 E D */
#define PCRE_JAVASCRIPT_COMPAT 0x02000000 /* C5 */
#define PCRE_NO_START_OPTIMIZE 0x04000000 /* C2 E D ) Synonyms */
#define PCRE_NO_START_OPTIMISE 0x04000000 /* C2 E D ) */
#define PCRE_PARTIAL_HARD 0x08000000 /* E D J */
#define PCRE_NOTEMPTY_ATSTART 0x10000000 /* E D J */
#define PCRE_UCP 0x20000000 /* C3 */
/* Exec-time and get/set-time error codes */
#define PCRE_ERROR_NOMATCH (-1)
#define PCRE_ERROR_NULL (-2)
#define PCRE_ERROR_BADOPTION (-3)
#define PCRE_ERROR_BADMAGIC (-4)
#define PCRE_ERROR_UNKNOWN_OPCODE (-5)
#define PCRE_ERROR_UNKNOWN_NODE (-5) /* For backward compatibility */
#define PCRE_ERROR_NOMEMORY (-6)
#define PCRE_ERROR_NOSUBSTRING (-7)
#define PCRE_ERROR_MATCHLIMIT (-8)
#define PCRE_ERROR_CALLOUT (-9) /* Never used by PCRE itself */
#define PCRE_ERROR_BADUTF8 (-10) /* Same for 8/16/32 */
#define PCRE_ERROR_BADUTF16 (-10) /* Same for 8/16/32 */
#define PCRE_ERROR_BADUTF32 (-10) /* Same for 8/16/32 */
#define PCRE_ERROR_BADUTF8_OFFSET (-11) /* Same for 8/16 */
#define PCRE_ERROR_BADUTF16_OFFSET (-11) /* Same for 8/16 */
#define PCRE_ERROR_PARTIAL (-12)
#define PCRE_ERROR_BADPARTIAL (-13)
#define PCRE_ERROR_INTERNAL (-14)
#define PCRE_ERROR_BADCOUNT (-15)
#define PCRE_ERROR_DFA_UITEM (-16)
#define PCRE_ERROR_DFA_UCOND (-17)
#define PCRE_ERROR_DFA_UMLIMIT (-18)
#define PCRE_ERROR_DFA_WSSIZE (-19)
#define PCRE_ERROR_DFA_RECURSE (-20)
#define PCRE_ERROR_RECURSIONLIMIT (-21)
#define PCRE_ERROR_NULLWSLIMIT (-22) /* No longer actually used */
#define PCRE_ERROR_BADNEWLINE (-23)
#define PCRE_ERROR_BADOFFSET (-24)
#define PCRE_ERROR_SHORTUTF8 (-25)
#define PCRE_ERROR_SHORTUTF16 (-25) /* Same for 8/16 */
#define PCRE_ERROR_RECURSELOOP (-26)
#define PCRE_ERROR_JIT_STACKLIMIT (-27)
#define PCRE_ERROR_BADMODE (-28)
#define PCRE_ERROR_BADENDIANNESS (-29)
#define PCRE_ERROR_DFA_BADRESTART (-30)
#define PCRE_ERROR_JIT_BADOPTION (-31)
#define PCRE_ERROR_BADLENGTH (-32)
#define PCRE_ERROR_UNSET (-33)
/* Specific error codes for UTF-8 validity checks */
#define PCRE_UTF8_ERR0 0
#define PCRE_UTF8_ERR1 1
#define PCRE_UTF8_ERR2 2
#define PCRE_UTF8_ERR3 3
#define PCRE_UTF8_ERR4 4
#define PCRE_UTF8_ERR5 5
#define PCRE_UTF8_ERR6 6
#define PCRE_UTF8_ERR7 7
#define PCRE_UTF8_ERR8 8
#define PCRE_UTF8_ERR9 9
#define PCRE_UTF8_ERR10 10
#define PCRE_UTF8_ERR11 11
#define PCRE_UTF8_ERR12 12
#define PCRE_UTF8_ERR13 13
#define PCRE_UTF8_ERR14 14
#define PCRE_UTF8_ERR15 15
#define PCRE_UTF8_ERR16 16
#define PCRE_UTF8_ERR17 17
#define PCRE_UTF8_ERR18 18
#define PCRE_UTF8_ERR19 19
#define PCRE_UTF8_ERR20 20
#define PCRE_UTF8_ERR21 21
#define PCRE_UTF8_ERR22 22 /* Unused (was non-character) */
/* Specific error codes for UTF-16 validity checks */
#define PCRE_UTF16_ERR0 0
#define PCRE_UTF16_ERR1 1
#define PCRE_UTF16_ERR2 2
#define PCRE_UTF16_ERR3 3
#define PCRE_UTF16_ERR4 4 /* Unused (was non-character) */
/* Specific error codes for UTF-32 validity checks */
#define PCRE_UTF32_ERR0 0
#define PCRE_UTF32_ERR1 1
#define PCRE_UTF32_ERR2 2 /* Unused (was non-character) */
#define PCRE_UTF32_ERR3 3
/* Request types for pcre_fullinfo() */
#define PCRE_INFO_OPTIONS 0
#define PCRE_INFO_SIZE 1
#define PCRE_INFO_CAPTURECOUNT 2
#define PCRE_INFO_BACKREFMAX 3
#define PCRE_INFO_FIRSTBYTE 4
#define PCRE_INFO_FIRSTCHAR 4 /* For backwards compatibility */
#define PCRE_INFO_FIRSTTABLE 5
#define PCRE_INFO_LASTLITERAL 6
#define PCRE_INFO_NAMEENTRYSIZE 7
#define PCRE_INFO_NAMECOUNT 8
#define PCRE_INFO_NAMETABLE 9
#define PCRE_INFO_STUDYSIZE 10
#define PCRE_INFO_DEFAULT_TABLES 11
#define PCRE_INFO_OKPARTIAL 12
#define PCRE_INFO_JCHANGED 13
#define PCRE_INFO_HASCRORLF 14
#define PCRE_INFO_MINLENGTH 15
#define PCRE_INFO_JIT 16
#define PCRE_INFO_JITSIZE 17
#define PCRE_INFO_MAXLOOKBEHIND 18
#define PCRE_INFO_FIRSTCHARACTER 19
#define PCRE_INFO_FIRSTCHARACTERFLAGS 20
#define PCRE_INFO_REQUIREDCHAR 21
#define PCRE_INFO_REQUIREDCHARFLAGS 22
#define PCRE_INFO_MATCHLIMIT 23
#define PCRE_INFO_RECURSIONLIMIT 24
#define PCRE_INFO_MATCH_EMPTY 25
/* Request types for pcre_config(). Do not re-arrange, in order to remain
compatible. */
#define PCRE_CONFIG_UTF8 0
#define PCRE_CONFIG_NEWLINE 1
#define PCRE_CONFIG_LINK_SIZE 2
#define PCRE_CONFIG_POSIX_MALLOC_THRESHOLD 3
#define PCRE_CONFIG_MATCH_LIMIT 4
#define PCRE_CONFIG_STACKRECURSE 5
#define PCRE_CONFIG_UNICODE_PROPERTIES 6
#define PCRE_CONFIG_MATCH_LIMIT_RECURSION 7
#define PCRE_CONFIG_BSR 8
#define PCRE_CONFIG_JIT 9
#define PCRE_CONFIG_UTF16 10
#define PCRE_CONFIG_JITTARGET 11
#define PCRE_CONFIG_UTF32 12
#define PCRE_CONFIG_PARENS_LIMIT 13
/* Request types for pcre_study(). Do not re-arrange, in order to remain
compatible. */
#define PCRE_STUDY_JIT_COMPILE 0x0001
#define PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE 0x0002
#define PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE 0x0004
#define PCRE_STUDY_EXTRA_NEEDED 0x0008
/* Bit flags for the pcre[16|32]_extra structure. Do not re-arrange or redefine
these bits, just add new ones on the end, in order to remain compatible. */
#define PCRE_EXTRA_STUDY_DATA 0x0001
#define PCRE_EXTRA_MATCH_LIMIT 0x0002
#define PCRE_EXTRA_CALLOUT_DATA 0x0004
#define PCRE_EXTRA_TABLES 0x0008
#define PCRE_EXTRA_MATCH_LIMIT_RECURSION 0x0010
#define PCRE_EXTRA_MARK 0x0020
#define PCRE_EXTRA_EXECUTABLE_JIT 0x0040
/* Types */
struct real_pcre; /* declaration; the definition is private */
typedef struct real_pcre pcre;
struct real_pcre16; /* declaration; the definition is private */
typedef struct real_pcre16 pcre16;
struct real_pcre32; /* declaration; the definition is private */
typedef struct real_pcre32 pcre32;
struct real_pcre_jit_stack; /* declaration; the definition is private */
typedef struct real_pcre_jit_stack pcre_jit_stack;
struct real_pcre16_jit_stack; /* declaration; the definition is private */
typedef struct real_pcre16_jit_stack pcre16_jit_stack;
struct real_pcre32_jit_stack; /* declaration; the definition is private */
typedef struct real_pcre32_jit_stack pcre32_jit_stack;
/* If PCRE is compiled with 16 bit character support, PCRE_UCHAR16 must contain
a 16 bit wide signed data type. Otherwise it can be a dummy data type since
pcre16 functions are not implemented. There is a check for this in pcre_internal.h. */
#ifndef PCRE_UCHAR16
#define PCRE_UCHAR16 unsigned short
#endif
#ifndef PCRE_SPTR16
#define PCRE_SPTR16 const PCRE_UCHAR16 *
#endif
/* If PCRE is compiled with 32 bit character support, PCRE_UCHAR32 must contain
a 32 bit wide signed data type. Otherwise it can be a dummy data type since
pcre32 functions are not implemented. There is a check for this in pcre_internal.h. */
#ifndef PCRE_UCHAR32
#define PCRE_UCHAR32 unsigned int
#endif
#ifndef PCRE_SPTR32
#define PCRE_SPTR32 const PCRE_UCHAR32 *
#endif
/* When PCRE is compiled as a C++ library, the subject pointer type can be
replaced with a custom type. For conventional use, the public interface is a
const char *. */
#ifndef PCRE_SPTR
#define PCRE_SPTR const char *
#endif
/* The structure for passing additional data to pcre_exec(). This is defined in
such as way as to be extensible. Always add new fields at the end, in order to
remain compatible. */
typedef struct pcre_extra {
unsigned long int flags; /* Bits for which fields are set */
void *study_data; /* Opaque data from pcre_study() */
unsigned long int match_limit; /* Maximum number of calls to match() */
void *callout_data; /* Data passed back in callouts */
const unsigned char *tables; /* Pointer to character tables */
unsigned long int match_limit_recursion; /* Max recursive calls to match() */
unsigned char **mark; /* For passing back a mark pointer */
void *executable_jit; /* Contains a pointer to a compiled jit code */
} pcre_extra;
/* Same structure as above, but with 16 bit char pointers. */
typedef struct pcre16_extra {
unsigned long int flags; /* Bits for which fields are set */
void *study_data; /* Opaque data from pcre_study() */
unsigned long int match_limit; /* Maximum number of calls to match() */
void *callout_data; /* Data passed back in callouts */
const unsigned char *tables; /* Pointer to character tables */
unsigned long int match_limit_recursion; /* Max recursive calls to match() */
PCRE_UCHAR16 **mark; /* For passing back a mark pointer */
void *executable_jit; /* Contains a pointer to a compiled jit code */
} pcre16_extra;
/* Same structure as above, but with 32 bit char pointers. */
typedef struct pcre32_extra {
unsigned long int flags; /* Bits for which fields are set */
void *study_data; /* Opaque data from pcre_study() */
unsigned long int match_limit; /* Maximum number of calls to match() */
void *callout_data; /* Data passed back in callouts */
const unsigned char *tables; /* Pointer to character tables */
unsigned long int match_limit_recursion; /* Max recursive calls to match() */
PCRE_UCHAR32 **mark; /* For passing back a mark pointer */
void *executable_jit; /* Contains a pointer to a compiled jit code */
} pcre32_extra;
/* The structure for passing out data via the pcre_callout_function. We use a
structure so that new fields can be added on the end in future versions,
without changing the API of the function, thereby allowing old clients to work
without modification. */
typedef struct pcre_callout_block {
int version; /* Identifies version of block */
/* ------------------------ Version 0 ------------------------------- */
int callout_number; /* Number compiled into pattern */
int *offset_vector; /* The offset vector */
PCRE_SPTR subject; /* The subject being matched */
int subject_length; /* The length of the subject */
int start_match; /* Offset to start of this match attempt */
int current_position; /* Where we currently are in the subject */
int capture_top; /* Max current capture */
int capture_last; /* Most recently closed capture */
void *callout_data; /* Data passed in with the call */
/* ------------------- Added for Version 1 -------------------------- */
int pattern_position; /* Offset to next item in the pattern */
int next_item_length; /* Length of next item in the pattern */
/* ------------------- Added for Version 2 -------------------------- */
const unsigned char *mark; /* Pointer to current mark or NULL */
/* ------------------------------------------------------------------ */
} pcre_callout_block;
/* Same structure as above, but with 16 bit char pointers. */
typedef struct pcre16_callout_block {
int version; /* Identifies version of block */
/* ------------------------ Version 0 ------------------------------- */
int callout_number; /* Number compiled into pattern */
int *offset_vector; /* The offset vector */
PCRE_SPTR16 subject; /* The subject being matched */
int subject_length; /* The length of the subject */
int start_match; /* Offset to start of this match attempt */
int current_position; /* Where we currently are in the subject */
int capture_top; /* Max current capture */
int capture_last; /* Most recently closed capture */
void *callout_data; /* Data passed in with the call */
/* ------------------- Added for Version 1 -------------------------- */
int pattern_position; /* Offset to next item in the pattern */
int next_item_length; /* Length of next item in the pattern */
/* ------------------- Added for Version 2 -------------------------- */
const PCRE_UCHAR16 *mark; /* Pointer to current mark or NULL */
/* ------------------------------------------------------------------ */
} pcre16_callout_block;
/* Same structure as above, but with 32 bit char pointers. */
typedef struct pcre32_callout_block {
int version; /* Identifies version of block */
/* ------------------------ Version 0 ------------------------------- */
int callout_number; /* Number compiled into pattern */
int *offset_vector; /* The offset vector */
PCRE_SPTR32 subject; /* The subject being matched */
int subject_length; /* The length of the subject */
int start_match; /* Offset to start of this match attempt */
int current_position; /* Where we currently are in the subject */
int capture_top; /* Max current capture */
int capture_last; /* Most recently closed capture */
void *callout_data; /* Data passed in with the call */
/* ------------------- Added for Version 1 -------------------------- */
int pattern_position; /* Offset to next item in the pattern */
int next_item_length; /* Length of next item in the pattern */
/* ------------------- Added for Version 2 -------------------------- */
const PCRE_UCHAR32 *mark; /* Pointer to current mark or NULL */
/* ------------------------------------------------------------------ */
} pcre32_callout_block;
/* Indirection for store get and free functions. These can be set to
alternative malloc/free functions if required. Special ones are used in the
non-recursive case for "frames". There is also an optional callout function
that is triggered by the (?) regex item. For Virtual Pascal, these definitions
have to take another form. */
#ifndef VPCOMPAT
PCRE_EXP_DECL void *(*pcre_malloc)(size_t);
PCRE_EXP_DECL void (*pcre_free)(void *);
PCRE_EXP_DECL void *(*pcre_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre_stack_free)(void *);
PCRE_EXP_DECL int (*pcre_callout)(pcre_callout_block *);
PCRE_EXP_DECL int (*pcre_stack_guard)(void);
PCRE_EXP_DECL void *(*pcre16_malloc)(size_t);
PCRE_EXP_DECL void (*pcre16_free)(void *);
PCRE_EXP_DECL void *(*pcre16_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre16_stack_free)(void *);
PCRE_EXP_DECL int (*pcre16_callout)(pcre16_callout_block *);
PCRE_EXP_DECL int (*pcre16_stack_guard)(void);
PCRE_EXP_DECL void *(*pcre32_malloc)(size_t);
PCRE_EXP_DECL void (*pcre32_free)(void *);
PCRE_EXP_DECL void *(*pcre32_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre32_stack_free)(void *);
PCRE_EXP_DECL int (*pcre32_callout)(pcre32_callout_block *);
PCRE_EXP_DECL int (*pcre32_stack_guard)(void);
#else /* VPCOMPAT */
PCRE_EXP_DECL void *pcre_malloc(size_t);
PCRE_EXP_DECL void pcre_free(void *);
PCRE_EXP_DECL void *pcre_stack_malloc(size_t);
PCRE_EXP_DECL void pcre_stack_free(void *);
PCRE_EXP_DECL int pcre_callout(pcre_callout_block *);
PCRE_EXP_DECL int pcre_stack_guard(void);
PCRE_EXP_DECL void *pcre16_malloc(size_t);
PCRE_EXP_DECL void pcre16_free(void *);
PCRE_EXP_DECL void *pcre16_stack_malloc(size_t);
PCRE_EXP_DECL void pcre16_stack_free(void *);
PCRE_EXP_DECL int pcre16_callout(pcre16_callout_block *);
PCRE_EXP_DECL int pcre16_stack_guard(void);
PCRE_EXP_DECL void *pcre32_malloc(size_t);
PCRE_EXP_DECL void pcre32_free(void *);
PCRE_EXP_DECL void *pcre32_stack_malloc(size_t);
PCRE_EXP_DECL void pcre32_stack_free(void *);
PCRE_EXP_DECL int pcre32_callout(pcre32_callout_block *);
PCRE_EXP_DECL int pcre32_stack_guard(void);
#endif /* VPCOMPAT */
/* User defined callback which provides a stack just before the match starts. */
typedef pcre_jit_stack *(*pcre_jit_callback)(void *);
typedef pcre16_jit_stack *(*pcre16_jit_callback)(void *);
typedef pcre32_jit_stack *(*pcre32_jit_callback)(void *);
/* Exported PCRE functions */
PCRE_EXP_DECL pcre *pcre_compile(const char *, int, const char **, int *,
const unsigned char *);
PCRE_EXP_DECL pcre16 *pcre16_compile(PCRE_SPTR16, int, const char **, int *,
const unsigned char *);
PCRE_EXP_DECL pcre32 *pcre32_compile(PCRE_SPTR32, int, const char **, int *,
const unsigned char *);
PCRE_EXP_DECL pcre *pcre_compile2(const char *, int, int *, const char **,
int *, const unsigned char *);
PCRE_EXP_DECL pcre16 *pcre16_compile2(PCRE_SPTR16, int, int *, const char **,
int *, const unsigned char *);
PCRE_EXP_DECL pcre32 *pcre32_compile2(PCRE_SPTR32, int, int *, const char **,
int *, const unsigned char *);
PCRE_EXP_DECL int pcre_config(int, void *);
PCRE_EXP_DECL int pcre16_config(int, void *);
PCRE_EXP_DECL int pcre32_config(int, void *);
PCRE_EXP_DECL int pcre_copy_named_substring(const pcre *, const char *,
int *, int, const char *, char *, int);
PCRE_EXP_DECL int pcre16_copy_named_substring(const pcre16 *, PCRE_SPTR16,
int *, int, PCRE_SPTR16, PCRE_UCHAR16 *, int);
PCRE_EXP_DECL int pcre32_copy_named_substring(const pcre32 *, PCRE_SPTR32,
int *, int, PCRE_SPTR32, PCRE_UCHAR32 *, int);
PCRE_EXP_DECL int pcre_copy_substring(const char *, int *, int, int,
char *, int);
PCRE_EXP_DECL int pcre16_copy_substring(PCRE_SPTR16, int *, int, int,
PCRE_UCHAR16 *, int);
PCRE_EXP_DECL int pcre32_copy_substring(PCRE_SPTR32, int *, int, int,
PCRE_UCHAR32 *, int);
PCRE_EXP_DECL int pcre_dfa_exec(const pcre *, const pcre_extra *,
const char *, int, int, int, int *, int , int *, int);
PCRE_EXP_DECL int pcre16_dfa_exec(const pcre16 *, const pcre16_extra *,
PCRE_SPTR16, int, int, int, int *, int , int *, int);
PCRE_EXP_DECL int pcre32_dfa_exec(const pcre32 *, const pcre32_extra *,
PCRE_SPTR32, int, int, int, int *, int , int *, int);
PCRE_EXP_DECL int pcre_exec(const pcre *, const pcre_extra *, PCRE_SPTR,
int, int, int, int *, int);
PCRE_EXP_DECL int pcre16_exec(const pcre16 *, const pcre16_extra *,
PCRE_SPTR16, int, int, int, int *, int);
PCRE_EXP_DECL int pcre32_exec(const pcre32 *, const pcre32_extra *,
PCRE_SPTR32, int, int, int, int *, int);
PCRE_EXP_DECL int pcre_jit_exec(const pcre *, const pcre_extra *,
PCRE_SPTR, int, int, int, int *, int,
pcre_jit_stack *);
PCRE_EXP_DECL int pcre16_jit_exec(const pcre16 *, const pcre16_extra *,
PCRE_SPTR16, int, int, int, int *, int,
pcre16_jit_stack *);
PCRE_EXP_DECL int pcre32_jit_exec(const pcre32 *, const pcre32_extra *,
PCRE_SPTR32, int, int, int, int *, int,
pcre32_jit_stack *);
PCRE_EXP_DECL void pcre_free_substring(const char *);
PCRE_EXP_DECL void pcre16_free_substring(PCRE_SPTR16);
PCRE_EXP_DECL void pcre32_free_substring(PCRE_SPTR32);
PCRE_EXP_DECL void pcre_free_substring_list(const char **);
PCRE_EXP_DECL void pcre16_free_substring_list(PCRE_SPTR16 *);
PCRE_EXP_DECL void pcre32_free_substring_list(PCRE_SPTR32 *);
PCRE_EXP_DECL int pcre_fullinfo(const pcre *, const pcre_extra *, int,
void *);
PCRE_EXP_DECL int pcre16_fullinfo(const pcre16 *, const pcre16_extra *, int,
void *);
PCRE_EXP_DECL int pcre32_fullinfo(const pcre32 *, const pcre32_extra *, int,
void *);
PCRE_EXP_DECL int pcre_get_named_substring(const pcre *, const char *,
int *, int, const char *, const char **);
PCRE_EXP_DECL int pcre16_get_named_substring(const pcre16 *, PCRE_SPTR16,
int *, int, PCRE_SPTR16, PCRE_SPTR16 *);
PCRE_EXP_DECL int pcre32_get_named_substring(const pcre32 *, PCRE_SPTR32,
int *, int, PCRE_SPTR32, PCRE_SPTR32 *);
PCRE_EXP_DECL int pcre_get_stringnumber(const pcre *, const char *);
PCRE_EXP_DECL int pcre16_get_stringnumber(const pcre16 *, PCRE_SPTR16);
PCRE_EXP_DECL int pcre32_get_stringnumber(const pcre32 *, PCRE_SPTR32);
PCRE_EXP_DECL int pcre_get_stringtable_entries(const pcre *, const char *,
char **, char **);
PCRE_EXP_DECL int pcre16_get_stringtable_entries(const pcre16 *, PCRE_SPTR16,
PCRE_UCHAR16 **, PCRE_UCHAR16 **);
PCRE_EXP_DECL int pcre32_get_stringtable_entries(const pcre32 *, PCRE_SPTR32,
PCRE_UCHAR32 **, PCRE_UCHAR32 **);
PCRE_EXP_DECL int pcre_get_substring(const char *, int *, int, int,
const char **);
PCRE_EXP_DECL int pcre16_get_substring(PCRE_SPTR16, int *, int, int,
PCRE_SPTR16 *);
PCRE_EXP_DECL int pcre32_get_substring(PCRE_SPTR32, int *, int, int,
PCRE_SPTR32 *);
PCRE_EXP_DECL int pcre_get_substring_list(const char *, int *, int,
const char ***);
PCRE_EXP_DECL int pcre16_get_substring_list(PCRE_SPTR16, int *, int,
PCRE_SPTR16 **);
PCRE_EXP_DECL int pcre32_get_substring_list(PCRE_SPTR32, int *, int,
PCRE_SPTR32 **);
PCRE_EXP_DECL const unsigned char *pcre_maketables(void);
PCRE_EXP_DECL const unsigned char *pcre16_maketables(void);
PCRE_EXP_DECL const unsigned char *pcre32_maketables(void);
PCRE_EXP_DECL int pcre_refcount(pcre *, int);
PCRE_EXP_DECL int pcre16_refcount(pcre16 *, int);
PCRE_EXP_DECL int pcre32_refcount(pcre32 *, int);
PCRE_EXP_DECL pcre_extra *pcre_study(const pcre *, int, const char **);
PCRE_EXP_DECL pcre16_extra *pcre16_study(const pcre16 *, int, const char **);
PCRE_EXP_DECL pcre32_extra *pcre32_study(const pcre32 *, int, const char **);
PCRE_EXP_DECL void pcre_free_study(pcre_extra *);
PCRE_EXP_DECL void pcre16_free_study(pcre16_extra *);
PCRE_EXP_DECL void pcre32_free_study(pcre32_extra *);
PCRE_EXP_DECL const char *pcre_version(void);
PCRE_EXP_DECL const char *pcre16_version(void);
PCRE_EXP_DECL const char *pcre32_version(void);
/* Utility functions for byte order swaps. */
PCRE_EXP_DECL int pcre_pattern_to_host_byte_order(pcre *, pcre_extra *,
const unsigned char *);
PCRE_EXP_DECL int pcre16_pattern_to_host_byte_order(pcre16 *, pcre16_extra *,
const unsigned char *);
PCRE_EXP_DECL int pcre32_pattern_to_host_byte_order(pcre32 *, pcre32_extra *,
const unsigned char *);
PCRE_EXP_DECL int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *,
PCRE_SPTR16, int, int *, int);
PCRE_EXP_DECL int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *,
PCRE_SPTR32, int, int *, int);
/* JIT compiler related functions. */
PCRE_EXP_DECL pcre_jit_stack *pcre_jit_stack_alloc(int, int);
PCRE_EXP_DECL pcre16_jit_stack *pcre16_jit_stack_alloc(int, int);
PCRE_EXP_DECL pcre32_jit_stack *pcre32_jit_stack_alloc(int, int);
PCRE_EXP_DECL void pcre_jit_stack_free(pcre_jit_stack *);
PCRE_EXP_DECL void pcre16_jit_stack_free(pcre16_jit_stack *);
PCRE_EXP_DECL void pcre32_jit_stack_free(pcre32_jit_stack *);
PCRE_EXP_DECL void pcre_assign_jit_stack(pcre_extra *,
pcre_jit_callback, void *);
PCRE_EXP_DECL void pcre16_assign_jit_stack(pcre16_extra *,
pcre16_jit_callback, void *);
PCRE_EXP_DECL void pcre32_assign_jit_stack(pcre32_extra *,
pcre32_jit_callback, void *);
PCRE_EXP_DECL void pcre_jit_free_unused_memory(void);
PCRE_EXP_DECL void pcre16_jit_free_unused_memory(void);
PCRE_EXP_DECL void pcre32_jit_free_unused_memory(void);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* End of pcre.h */

View File

@@ -0,0 +1,319 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2014 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains an internal function that tests a compiled pattern to
see if it was compiled with the opposite endianness. If so, it uses an
auxiliary local function to flip the appropriate bytes. */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "pcre_internal.h"
/*************************************************
* Swap byte functions *
*************************************************/
/* The following functions swap the bytes of a pcre_uint16
and pcre_uint32 value.
Arguments:
value any number
Returns: the byte swapped value
*/
static pcre_uint32
swap_uint32(pcre_uint32 value)
{
return ((value & 0x000000ff) << 24) |
((value & 0x0000ff00) << 8) |
((value & 0x00ff0000) >> 8) |
(value >> 24);
}
static pcre_uint16
swap_uint16(pcre_uint16 value)
{
return (value >> 8) | (value << 8);
}
/*************************************************
* Test for a byte-flipped compiled regex *
*************************************************/
/* This function swaps the bytes of a compiled pattern usually
loaded form the disk. It also sets the tables pointer, which
is likely an invalid pointer after reload.
Arguments:
argument_re points to the compiled expression
extra_data points to extra data or is NULL
tables points to the character tables or NULL
Returns: 0 if the swap is successful, negative on error
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DECL int pcre_pattern_to_host_byte_order(pcre *argument_re,
pcre_extra *extra_data, const unsigned char *tables)
#elif defined COMPILE_PCRE16
PCRE_EXP_DECL int pcre16_pattern_to_host_byte_order(pcre16 *argument_re,
pcre16_extra *extra_data, const unsigned char *tables)
#elif defined COMPILE_PCRE32
PCRE_EXP_DECL int pcre32_pattern_to_host_byte_order(pcre32 *argument_re,
pcre32_extra *extra_data, const unsigned char *tables)
#endif
{
REAL_PCRE *re = (REAL_PCRE *)argument_re;
pcre_study_data *study;
#ifndef COMPILE_PCRE8
pcre_uchar *ptr;
int length;
#if defined SUPPORT_UTF && defined COMPILE_PCRE16
BOOL utf;
BOOL utf16_char;
#endif /* SUPPORT_UTF && COMPILE_PCRE16 */
#endif /* !COMPILE_PCRE8 */
if (re == NULL) return PCRE_ERROR_NULL;
if (re->magic_number == MAGIC_NUMBER)
{
if ((re->flags & PCRE_MODE) == 0) return PCRE_ERROR_BADMODE;
re->tables = tables;
return 0;
}
if (re->magic_number != REVERSED_MAGIC_NUMBER) return PCRE_ERROR_BADMAGIC;
if ((swap_uint32(re->flags) & PCRE_MODE) == 0) return PCRE_ERROR_BADMODE;
re->magic_number = MAGIC_NUMBER;
re->size = swap_uint32(re->size);
re->options = swap_uint32(re->options);
re->flags = swap_uint32(re->flags);
re->limit_match = swap_uint32(re->limit_match);
re->limit_recursion = swap_uint32(re->limit_recursion);
#if defined COMPILE_PCRE8 || defined COMPILE_PCRE16
re->first_char = swap_uint16(re->first_char);
re->req_char = swap_uint16(re->req_char);
#elif defined COMPILE_PCRE32
re->first_char = swap_uint32(re->first_char);
re->req_char = swap_uint32(re->req_char);
#endif
re->max_lookbehind = swap_uint16(re->max_lookbehind);
re->top_bracket = swap_uint16(re->top_bracket);
re->top_backref = swap_uint16(re->top_backref);
re->name_table_offset = swap_uint16(re->name_table_offset);
re->name_entry_size = swap_uint16(re->name_entry_size);
re->name_count = swap_uint16(re->name_count);
re->ref_count = swap_uint16(re->ref_count);
re->tables = tables;
if (extra_data != NULL && (extra_data->flags & PCRE_EXTRA_STUDY_DATA) != 0)
{
study = (pcre_study_data *)extra_data->study_data;
study->size = swap_uint32(study->size);
study->flags = swap_uint32(study->flags);
study->minlength = swap_uint32(study->minlength);
}
#ifndef COMPILE_PCRE8
ptr = (pcre_uchar *)re + re->name_table_offset;
length = re->name_count * re->name_entry_size;
#if defined SUPPORT_UTF && defined COMPILE_PCRE16
utf = (re->options & PCRE_UTF16) != 0;
utf16_char = FALSE;
#endif /* SUPPORT_UTF && COMPILE_PCRE16 */
while(TRUE)
{
/* Swap previous characters. */
while (length-- > 0)
{
#if defined COMPILE_PCRE16
*ptr = swap_uint16(*ptr);
#elif defined COMPILE_PCRE32
*ptr = swap_uint32(*ptr);
#endif
ptr++;
}
#if defined SUPPORT_UTF && defined COMPILE_PCRE16
if (utf16_char)
{
if (HAS_EXTRALEN(ptr[-1]))
{
/* We know that there is only one extra character in UTF-16. */
*ptr = swap_uint16(*ptr);
ptr++;
}
}
utf16_char = FALSE;
#endif /* SUPPORT_UTF */
/* Get next opcode. */
length = 0;
#if defined COMPILE_PCRE16
*ptr = swap_uint16(*ptr);
#elif defined COMPILE_PCRE32
*ptr = swap_uint32(*ptr);
#endif
switch (*ptr)
{
case OP_END:
return 0;
#if defined SUPPORT_UTF && defined COMPILE_PCRE16
case OP_CHAR:
case OP_CHARI:
case OP_NOT:
case OP_NOTI:
case OP_STAR:
case OP_MINSTAR:
case OP_PLUS:
case OP_MINPLUS:
case OP_QUERY:
case OP_MINQUERY:
case OP_UPTO:
case OP_MINUPTO:
case OP_EXACT:
case OP_POSSTAR:
case OP_POSPLUS:
case OP_POSQUERY:
case OP_POSUPTO:
case OP_STARI:
case OP_MINSTARI:
case OP_PLUSI:
case OP_MINPLUSI:
case OP_QUERYI:
case OP_MINQUERYI:
case OP_UPTOI:
case OP_MINUPTOI:
case OP_EXACTI:
case OP_POSSTARI:
case OP_POSPLUSI:
case OP_POSQUERYI:
case OP_POSUPTOI:
case OP_NOTSTAR:
case OP_NOTMINSTAR:
case OP_NOTPLUS:
case OP_NOTMINPLUS:
case OP_NOTQUERY:
case OP_NOTMINQUERY:
case OP_NOTUPTO:
case OP_NOTMINUPTO:
case OP_NOTEXACT:
case OP_NOTPOSSTAR:
case OP_NOTPOSPLUS:
case OP_NOTPOSQUERY:
case OP_NOTPOSUPTO:
case OP_NOTSTARI:
case OP_NOTMINSTARI:
case OP_NOTPLUSI:
case OP_NOTMINPLUSI:
case OP_NOTQUERYI:
case OP_NOTMINQUERYI:
case OP_NOTUPTOI:
case OP_NOTMINUPTOI:
case OP_NOTEXACTI:
case OP_NOTPOSSTARI:
case OP_NOTPOSPLUSI:
case OP_NOTPOSQUERYI:
case OP_NOTPOSUPTOI:
if (utf) utf16_char = TRUE;
#endif
/* Fall through. */
default:
length = PRIV(OP_lengths)[*ptr] - 1;
break;
case OP_CLASS:
case OP_NCLASS:
/* Skip the character bit map. */
ptr += 32/sizeof(pcre_uchar);
length = 0;
break;
case OP_XCLASS:
/* Reverse the size of the XCLASS instance. */
ptr++;
#if defined COMPILE_PCRE16
*ptr = swap_uint16(*ptr);
#elif defined COMPILE_PCRE32
*ptr = swap_uint32(*ptr);
#endif
#ifndef COMPILE_PCRE32
if (LINK_SIZE > 1)
{
/* LINK_SIZE can be 1 or 2 in 16 bit mode. */
ptr++;
*ptr = swap_uint16(*ptr);
}
#endif
ptr++;
length = (GET(ptr, -LINK_SIZE)) - (1 + LINK_SIZE + 1);
#if defined COMPILE_PCRE16
*ptr = swap_uint16(*ptr);
#elif defined COMPILE_PCRE32
*ptr = swap_uint32(*ptr);
#endif
if ((*ptr & XCL_MAP) != 0)
{
/* Skip the character bit map. */
ptr += 32/sizeof(pcre_uchar);
length -= 32/sizeof(pcre_uchar);
}
break;
}
ptr++;
}
/* Control should never reach here in 16/32 bit mode. */
#else /* In 8-bit mode, the pattern does not need to be processed. */
return 0;
#endif /* !COMPILE_PCRE8 */
}
/* End of pcre_byte_order.c */

View File

@@ -0,0 +1,198 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* This file contains character tables that are used when no external tables
are passed to PCRE by the application that calls it. The tables are used only
for characters whose code values are less than 256.
This is a default version of the tables that assumes ASCII encoding. A program
called dftables (which is distributed with PCRE) can be used to build
alternative versions of this file. This is necessary if you are running in an
EBCDIC environment, or if you want to default to a different encoding, for
example ISO-8859-1. When dftables is run, it creates these tables in the
current locale. If PCRE is configured with --enable-rebuild-chartables, this
happens automatically.
The following #includes are present because without them gcc 4.x may remove the
array definition from the final binary if PCRE is built into a static library
and dead code stripping is activated. This leads to link errors. Pulling in the
header ensures that the array gets flagged as "someone outside this compilation
unit might reference this" and so it will always be supplied to the linker. */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "pcre_internal.h"
const pcre_uint8 PRIV(default_tables)[] = {
/* This table is a lower casing table. */
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63,
64, 97, 98, 99,100,101,102,103,
104,105,106,107,108,109,110,111,
112,113,114,115,116,117,118,119,
120,121,122, 91, 92, 93, 94, 95,
96, 97, 98, 99,100,101,102,103,
104,105,106,107,108,109,110,111,
112,113,114,115,116,117,118,119,
120,121,122,123,124,125,126,127,
128,129,130,131,132,133,134,135,
136,137,138,139,140,141,142,143,
144,145,146,147,148,149,150,151,
152,153,154,155,156,157,158,159,
160,161,162,163,164,165,166,167,
168,169,170,171,172,173,174,175,
176,177,178,179,180,181,182,183,
184,185,186,187,188,189,190,191,
192,193,194,195,196,197,198,199,
200,201,202,203,204,205,206,207,
208,209,210,211,212,213,214,215,
216,217,218,219,220,221,222,223,
224,225,226,227,228,229,230,231,
232,233,234,235,236,237,238,239,
240,241,242,243,244,245,246,247,
248,249,250,251,252,253,254,255,
/* This table is a case flipping table. */
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63,
64, 97, 98, 99,100,101,102,103,
104,105,106,107,108,109,110,111,
112,113,114,115,116,117,118,119,
120,121,122, 91, 92, 93, 94, 95,
96, 65, 66, 67, 68, 69, 70, 71,
72, 73, 74, 75, 76, 77, 78, 79,
80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90,123,124,125,126,127,
128,129,130,131,132,133,134,135,
136,137,138,139,140,141,142,143,
144,145,146,147,148,149,150,151,
152,153,154,155,156,157,158,159,
160,161,162,163,164,165,166,167,
168,169,170,171,172,173,174,175,
176,177,178,179,180,181,182,183,
184,185,186,187,188,189,190,191,
192,193,194,195,196,197,198,199,
200,201,202,203,204,205,206,207,
208,209,210,211,212,213,214,215,
216,217,218,219,220,221,222,223,
224,225,226,227,228,229,230,231,
232,233,234,235,236,237,238,239,
240,241,242,243,244,245,246,247,
248,249,250,251,252,253,254,255,
/* This table contains bit maps for various character classes. Each map is 32
bytes long and the bits run from the least significant end of each byte. The
classes that have their own maps are: space, xdigit, digit, upper, lower, word,
graph, print, punct, and cntrl. Other classes are built from combinations. */
0x00,0x3e,0x00,0x00,0x01,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03,
0x7e,0x00,0x00,0x00,0x7e,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0xfe,0xff,0xff,0x07,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0xfe,0xff,0xff,0x07,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03,
0xfe,0xff,0xff,0x87,0xfe,0xff,0xff,0x07,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0xfe,0xff,0xff,0xff,
0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x7f,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0xff,0xff,0xff,0xff,
0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x7f,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0xfe,0xff,0x00,0xfc,
0x01,0x00,0x00,0xf8,0x01,0x00,0x00,0x78,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0xff,0xff,0xff,0xff,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x80,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
/* This table identifies various classes of character by individual bits:
0x01 white space character
0x02 letter
0x04 decimal digit
0x08 hexadecimal digit
0x10 alphanumeric or '_'
0x80 regular expression metacharacter or binary zero
*/
0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 0- 7 */
0x00,0x01,0x01,0x01,0x01,0x01,0x00,0x00, /* 8- 15 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 16- 23 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */
0x01,0x00,0x00,0x00,0x80,0x00,0x00,0x00, /* - ' */
0x80,0x80,0x80,0x80,0x00,0x00,0x80,0x00, /* ( - / */
0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c, /* 0 - 7 */
0x1c,0x1c,0x00,0x00,0x00,0x00,0x00,0x80, /* 8 - ? */
0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* @ - G */
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* H - O */
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* P - W */
0x12,0x12,0x12,0x80,0x80,0x00,0x80,0x10, /* X - _ */
0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* ` - g */
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* h - o */
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* p - w */
0x12,0x12,0x12,0x80,0x80,0x00,0x00,0x00, /* x -127 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 128-135 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 136-143 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 144-151 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 152-159 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 160-167 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 168-175 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 176-183 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 184-191 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 192-199 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 200-207 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 208-215 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 216-223 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 224-231 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 232-239 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 240-247 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};/* 248-255 */
/* End of pcre_chartables.c */

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,190 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2012 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains the external function pcre_config(). */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
/* Keep the original link size. */
static int real_link_size = LINK_SIZE;
#include "pcre_internal.h"
/*************************************************
* Return info about what features are configured *
*************************************************/
/* This function has an extensible interface so that additional items can be
added compatibly.
Arguments:
what what information is required
where where to put the information
Returns: 0 if data returned, negative on error
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre_config(int what, void *where)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre16_config(int what, void *where)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre32_config(int what, void *where)
#endif
{
switch (what)
{
case PCRE_CONFIG_UTF8:
#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32
*((int *)where) = 0;
return PCRE_ERROR_BADOPTION;
#else
#if defined SUPPORT_UTF
*((int *)where) = 1;
#else
*((int *)where) = 0;
#endif
break;
#endif
case PCRE_CONFIG_UTF16:
#if defined COMPILE_PCRE8 || defined COMPILE_PCRE32
*((int *)where) = 0;
return PCRE_ERROR_BADOPTION;
#else
#if defined SUPPORT_UTF
*((int *)where) = 1;
#else
*((int *)where) = 0;
#endif
break;
#endif
case PCRE_CONFIG_UTF32:
#if defined COMPILE_PCRE8 || defined COMPILE_PCRE16
*((int *)where) = 0;
return PCRE_ERROR_BADOPTION;
#else
#if defined SUPPORT_UTF
*((int *)where) = 1;
#else
*((int *)where) = 0;
#endif
break;
#endif
case PCRE_CONFIG_UNICODE_PROPERTIES:
#ifdef SUPPORT_UCP
*((int *)where) = 1;
#else
*((int *)where) = 0;
#endif
break;
case PCRE_CONFIG_JIT:
#ifdef SUPPORT_JIT
*((int *)where) = 1;
#else
*((int *)where) = 0;
#endif
break;
case PCRE_CONFIG_JITTARGET:
#ifdef SUPPORT_JIT
*((const char **)where) = PRIV(jit_get_target)();
#else
*((const char **)where) = NULL;
#endif
break;
case PCRE_CONFIG_NEWLINE:
*((int *)where) = NEWLINE;
break;
case PCRE_CONFIG_BSR:
#ifdef BSR_ANYCRLF
*((int *)where) = 1;
#else
*((int *)where) = 0;
#endif
break;
case PCRE_CONFIG_LINK_SIZE:
*((int *)where) = real_link_size;
break;
case PCRE_CONFIG_POSIX_MALLOC_THRESHOLD:
*((int *)where) = POSIX_MALLOC_THRESHOLD;
break;
case PCRE_CONFIG_PARENS_LIMIT:
*((unsigned long int *)where) = PARENS_NEST_LIMIT;
break;
case PCRE_CONFIG_MATCH_LIMIT:
*((unsigned long int *)where) = MATCH_LIMIT;
break;
case PCRE_CONFIG_MATCH_LIMIT_RECURSION:
*((unsigned long int *)where) = MATCH_LIMIT_RECURSION;
break;
case PCRE_CONFIG_STACKRECURSE:
#ifdef NO_RECURSE
*((int *)where) = 0;
#else
*((int *)where) = 1;
#endif
break;
default: return PCRE_ERROR_BADOPTION;
}
return 0;
}
/* End of pcre_config.c */

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,245 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2013 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains the external function pcre_fullinfo(), which returns
information about a compiled pattern. */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "pcre_internal.h"
/*************************************************
* Return info about compiled pattern *
*************************************************/
/* This is a newer "info" function which has an extensible interface so
that additional items can be added compatibly.
Arguments:
argument_re points to compiled code
extra_data points extra data, or NULL
what what information is required
where where to put the information
Returns: 0 if data returned, negative on error
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre_fullinfo(const pcre *argument_re, const pcre_extra *extra_data,
int what, void *where)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre16_fullinfo(const pcre16 *argument_re, const pcre16_extra *extra_data,
int what, void *where)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre32_fullinfo(const pcre32 *argument_re, const pcre32_extra *extra_data,
int what, void *where)
#endif
{
const REAL_PCRE *re = (const REAL_PCRE *)argument_re;
const pcre_study_data *study = NULL;
if (re == NULL || where == NULL) return PCRE_ERROR_NULL;
if (extra_data != NULL && (extra_data->flags & PCRE_EXTRA_STUDY_DATA) != 0)
study = (const pcre_study_data *)extra_data->study_data;
/* Check that the first field in the block is the magic number. If it is not,
return with PCRE_ERROR_BADMAGIC. However, if the magic number is equal to
REVERSED_MAGIC_NUMBER we return with PCRE_ERROR_BADENDIANNESS, which
means that the pattern is likely compiled with different endianness. */
if (re->magic_number != MAGIC_NUMBER)
return re->magic_number == REVERSED_MAGIC_NUMBER?
PCRE_ERROR_BADENDIANNESS:PCRE_ERROR_BADMAGIC;
/* Check that this pattern was compiled in the correct bit mode */
if ((re->flags & PCRE_MODE) == 0) return PCRE_ERROR_BADMODE;
switch (what)
{
case PCRE_INFO_OPTIONS:
*((unsigned long int *)where) = re->options & PUBLIC_COMPILE_OPTIONS;
break;
case PCRE_INFO_SIZE:
*((size_t *)where) = re->size;
break;
case PCRE_INFO_STUDYSIZE:
*((size_t *)where) = (study == NULL)? 0 : study->size;
break;
case PCRE_INFO_JITSIZE:
#ifdef SUPPORT_JIT
*((size_t *)where) =
(extra_data != NULL &&
(extra_data->flags & PCRE_EXTRA_EXECUTABLE_JIT) != 0 &&
extra_data->executable_jit != NULL)?
PRIV(jit_get_size)(extra_data->executable_jit) : 0;
#else
*((size_t *)where) = 0;
#endif
break;
case PCRE_INFO_CAPTURECOUNT:
*((int *)where) = re->top_bracket;
break;
case PCRE_INFO_BACKREFMAX:
*((int *)where) = re->top_backref;
break;
case PCRE_INFO_FIRSTBYTE:
*((int *)where) =
((re->flags & PCRE_FIRSTSET) != 0)? (int)re->first_char :
((re->flags & PCRE_STARTLINE) != 0)? -1 : -2;
break;
case PCRE_INFO_FIRSTCHARACTER:
*((pcre_uint32 *)where) =
(re->flags & PCRE_FIRSTSET) != 0 ? re->first_char : 0;
break;
case PCRE_INFO_FIRSTCHARACTERFLAGS:
*((int *)where) =
((re->flags & PCRE_FIRSTSET) != 0) ? 1 :
((re->flags & PCRE_STARTLINE) != 0) ? 2 : 0;
break;
/* Make sure we pass back the pointer to the bit vector in the external
block, not the internal copy (with flipped integer fields). */
case PCRE_INFO_FIRSTTABLE:
*((const pcre_uint8 **)where) =
(study != NULL && (study->flags & PCRE_STUDY_MAPPED) != 0)?
((const pcre_study_data *)extra_data->study_data)->start_bits : NULL;
break;
case PCRE_INFO_MINLENGTH:
*((int *)where) =
(study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0)?
(int)(study->minlength) : -1;
break;
case PCRE_INFO_JIT:
*((int *)where) = extra_data != NULL &&
(extra_data->flags & PCRE_EXTRA_EXECUTABLE_JIT) != 0 &&
extra_data->executable_jit != NULL;
break;
case PCRE_INFO_LASTLITERAL:
*((int *)where) =
((re->flags & PCRE_REQCHSET) != 0)? (int)re->req_char : -1;
break;
case PCRE_INFO_REQUIREDCHAR:
*((pcre_uint32 *)where) =
((re->flags & PCRE_REQCHSET) != 0) ? re->req_char : 0;
break;
case PCRE_INFO_REQUIREDCHARFLAGS:
*((int *)where) =
((re->flags & PCRE_REQCHSET) != 0);
break;
case PCRE_INFO_NAMEENTRYSIZE:
*((int *)where) = re->name_entry_size;
break;
case PCRE_INFO_NAMECOUNT:
*((int *)where) = re->name_count;
break;
case PCRE_INFO_NAMETABLE:
*((const pcre_uchar **)where) = (const pcre_uchar *)re + re->name_table_offset;
break;
case PCRE_INFO_DEFAULT_TABLES:
*((const pcre_uint8 **)where) = (const pcre_uint8 *)(PRIV(default_tables));
break;
/* From release 8.00 this will always return TRUE because NOPARTIAL is
no longer ever set (the restrictions have been removed). */
case PCRE_INFO_OKPARTIAL:
*((int *)where) = (re->flags & PCRE_NOPARTIAL) == 0;
break;
case PCRE_INFO_JCHANGED:
*((int *)where) = (re->flags & PCRE_JCHANGED) != 0;
break;
case PCRE_INFO_HASCRORLF:
*((int *)where) = (re->flags & PCRE_HASCRORLF) != 0;
break;
case PCRE_INFO_MAXLOOKBEHIND:
*((int *)where) = re->max_lookbehind;
break;
case PCRE_INFO_MATCHLIMIT:
if ((re->flags & PCRE_MLSET) == 0) return PCRE_ERROR_UNSET;
*((pcre_uint32 *)where) = re->limit_match;
break;
case PCRE_INFO_RECURSIONLIMIT:
if ((re->flags & PCRE_RLSET) == 0) return PCRE_ERROR_UNSET;
*((pcre_uint32 *)where) = re->limit_recursion;
break;
case PCRE_INFO_MATCH_EMPTY:
*((int *)where) = (re->flags & PCRE_MATCH_EMPTY) != 0;
break;
default: return PCRE_ERROR_BADOPTION;
}
return 0;
}
/* End of pcre_fullinfo.c */

View File

@@ -0,0 +1,662 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2012 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains some convenience functions for extracting substrings
from the subject string after a regex match has succeeded. The original idea
for these functions came from Scott Wimer. */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "pcre_internal.h"
/*************************************************
* Find number for named string *
*************************************************/
/* This function is used by the get_first_set() function below, as well
as being generally available. It assumes that names are unique.
Arguments:
code the compiled regex
stringname the name whose number is required
Returns: the number of the named parentheses, or a negative number
(PCRE_ERROR_NOSUBSTRING) if not found
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre_get_stringnumber(const pcre *code, const char *stringname)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre16_get_stringnumber(const pcre16 *code, PCRE_SPTR16 stringname)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre32_get_stringnumber(const pcre32 *code, PCRE_SPTR32 stringname)
#endif
{
int rc;
int entrysize;
int top, bot;
pcre_uchar *nametable;
#ifdef COMPILE_PCRE8
if ((rc = pcre_fullinfo(code, NULL, PCRE_INFO_NAMECOUNT, &top)) != 0)
return rc;
if (top <= 0) return PCRE_ERROR_NOSUBSTRING;
if ((rc = pcre_fullinfo(code, NULL, PCRE_INFO_NAMEENTRYSIZE, &entrysize)) != 0)
return rc;
if ((rc = pcre_fullinfo(code, NULL, PCRE_INFO_NAMETABLE, &nametable)) != 0)
return rc;
#endif
#ifdef COMPILE_PCRE16
if ((rc = pcre16_fullinfo(code, NULL, PCRE_INFO_NAMECOUNT, &top)) != 0)
return rc;
if (top <= 0) return PCRE_ERROR_NOSUBSTRING;
if ((rc = pcre16_fullinfo(code, NULL, PCRE_INFO_NAMEENTRYSIZE, &entrysize)) != 0)
return rc;
if ((rc = pcre16_fullinfo(code, NULL, PCRE_INFO_NAMETABLE, &nametable)) != 0)
return rc;
#endif
#ifdef COMPILE_PCRE32
if ((rc = pcre32_fullinfo(code, NULL, PCRE_INFO_NAMECOUNT, &top)) != 0)
return rc;
if (top <= 0) return PCRE_ERROR_NOSUBSTRING;
if ((rc = pcre32_fullinfo(code, NULL, PCRE_INFO_NAMEENTRYSIZE, &entrysize)) != 0)
return rc;
if ((rc = pcre32_fullinfo(code, NULL, PCRE_INFO_NAMETABLE, &nametable)) != 0)
return rc;
#endif
bot = 0;
while (top > bot)
{
int mid = (top + bot) / 2;
pcre_uchar *entry = nametable + entrysize*mid;
int c = STRCMP_UC_UC((pcre_uchar *)stringname,
(pcre_uchar *)(entry + IMM2_SIZE));
if (c == 0) return GET2(entry, 0);
if (c > 0) bot = mid + 1; else top = mid;
}
return PCRE_ERROR_NOSUBSTRING;
}
/*************************************************
* Find (multiple) entries for named string *
*************************************************/
/* This is used by the get_first_set() function below, as well as being
generally available. It is used when duplicated names are permitted.
Arguments:
code the compiled regex
stringname the name whose entries required
firstptr where to put the pointer to the first entry
lastptr where to put the pointer to the last entry
Returns: the length of each entry, or a negative number
(PCRE_ERROR_NOSUBSTRING) if not found
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre_get_stringtable_entries(const pcre *code, const char *stringname,
char **firstptr, char **lastptr)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre16_get_stringtable_entries(const pcre16 *code, PCRE_SPTR16 stringname,
PCRE_UCHAR16 **firstptr, PCRE_UCHAR16 **lastptr)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre32_get_stringtable_entries(const pcre32 *code, PCRE_SPTR32 stringname,
PCRE_UCHAR32 **firstptr, PCRE_UCHAR32 **lastptr)
#endif
{
int rc;
int entrysize;
int top, bot;
pcre_uchar *nametable, *lastentry;
#ifdef COMPILE_PCRE8
if ((rc = pcre_fullinfo(code, NULL, PCRE_INFO_NAMECOUNT, &top)) != 0)
return rc;
if (top <= 0) return PCRE_ERROR_NOSUBSTRING;
if ((rc = pcre_fullinfo(code, NULL, PCRE_INFO_NAMEENTRYSIZE, &entrysize)) != 0)
return rc;
if ((rc = pcre_fullinfo(code, NULL, PCRE_INFO_NAMETABLE, &nametable)) != 0)
return rc;
#endif
#ifdef COMPILE_PCRE16
if ((rc = pcre16_fullinfo(code, NULL, PCRE_INFO_NAMECOUNT, &top)) != 0)
return rc;
if (top <= 0) return PCRE_ERROR_NOSUBSTRING;
if ((rc = pcre16_fullinfo(code, NULL, PCRE_INFO_NAMEENTRYSIZE, &entrysize)) != 0)
return rc;
if ((rc = pcre16_fullinfo(code, NULL, PCRE_INFO_NAMETABLE, &nametable)) != 0)
return rc;
#endif
#ifdef COMPILE_PCRE32
if ((rc = pcre32_fullinfo(code, NULL, PCRE_INFO_NAMECOUNT, &top)) != 0)
return rc;
if (top <= 0) return PCRE_ERROR_NOSUBSTRING;
if ((rc = pcre32_fullinfo(code, NULL, PCRE_INFO_NAMEENTRYSIZE, &entrysize)) != 0)
return rc;
if ((rc = pcre32_fullinfo(code, NULL, PCRE_INFO_NAMETABLE, &nametable)) != 0)
return rc;
#endif
lastentry = nametable + entrysize * (top - 1);
bot = 0;
while (top > bot)
{
int mid = (top + bot) / 2;
pcre_uchar *entry = nametable + entrysize*mid;
int c = STRCMP_UC_UC((pcre_uchar *)stringname,
(pcre_uchar *)(entry + IMM2_SIZE));
if (c == 0)
{
pcre_uchar *first = entry;
pcre_uchar *last = entry;
while (first > nametable)
{
if (STRCMP_UC_UC((pcre_uchar *)stringname,
(pcre_uchar *)(first - entrysize + IMM2_SIZE)) != 0) break;
first -= entrysize;
}
while (last < lastentry)
{
if (STRCMP_UC_UC((pcre_uchar *)stringname,
(pcre_uchar *)(last + entrysize + IMM2_SIZE)) != 0) break;
last += entrysize;
}
#if defined COMPILE_PCRE8
*firstptr = (char *)first;
*lastptr = (char *)last;
#elif defined COMPILE_PCRE16
*firstptr = (PCRE_UCHAR16 *)first;
*lastptr = (PCRE_UCHAR16 *)last;
#elif defined COMPILE_PCRE32
*firstptr = (PCRE_UCHAR32 *)first;
*lastptr = (PCRE_UCHAR32 *)last;
#endif
return entrysize;
}
if (c > 0) bot = mid + 1; else top = mid;
}
return PCRE_ERROR_NOSUBSTRING;
}
/*************************************************
* Find first set of multiple named strings *
*************************************************/
/* This function allows for duplicate names in the table of named substrings.
It returns the number of the first one that was set in a pattern match.
Arguments:
code the compiled regex
stringname the name of the capturing substring
ovector the vector of matched substrings
Returns: the number of the first that is set,
or the number of the last one if none are set,
or a negative number on error
*/
#if defined COMPILE_PCRE8
static int
get_first_set(const pcre *code, const char *stringname, int *ovector)
#elif defined COMPILE_PCRE16
static int
get_first_set(const pcre16 *code, PCRE_SPTR16 stringname, int *ovector)
#elif defined COMPILE_PCRE32
static int
get_first_set(const pcre32 *code, PCRE_SPTR32 stringname, int *ovector)
#endif
{
const REAL_PCRE *re = (const REAL_PCRE *)code;
int entrysize;
pcre_uchar *entry;
#if defined COMPILE_PCRE8
char *first, *last;
#elif defined COMPILE_PCRE16
PCRE_UCHAR16 *first, *last;
#elif defined COMPILE_PCRE32
PCRE_UCHAR32 *first, *last;
#endif
#if defined COMPILE_PCRE8
if ((re->options & PCRE_DUPNAMES) == 0 && (re->flags & PCRE_JCHANGED) == 0)
return pcre_get_stringnumber(code, stringname);
entrysize = pcre_get_stringtable_entries(code, stringname, &first, &last);
#elif defined COMPILE_PCRE16
if ((re->options & PCRE_DUPNAMES) == 0 && (re->flags & PCRE_JCHANGED) == 0)
return pcre16_get_stringnumber(code, stringname);
entrysize = pcre16_get_stringtable_entries(code, stringname, &first, &last);
#elif defined COMPILE_PCRE32
if ((re->options & PCRE_DUPNAMES) == 0 && (re->flags & PCRE_JCHANGED) == 0)
return pcre32_get_stringnumber(code, stringname);
entrysize = pcre32_get_stringtable_entries(code, stringname, &first, &last);
#endif
if (entrysize <= 0) return entrysize;
for (entry = (pcre_uchar *)first; entry <= (pcre_uchar *)last; entry += entrysize)
{
int n = GET2(entry, 0);
if (ovector[n*2] >= 0) return n;
}
return GET2(entry, 0);
}
/*************************************************
* Copy captured string to given buffer *
*************************************************/
/* This function copies a single captured substring into a given buffer.
Note that we use memcpy() rather than strncpy() in case there are binary zeros
in the string.
Arguments:
subject the subject string that was matched
ovector pointer to the offsets table
stringcount the number of substrings that were captured
(i.e. the yield of the pcre_exec call, unless
that was zero, in which case it should be 1/3
of the offset table size)
stringnumber the number of the required substring
buffer where to put the substring
size the size of the buffer
Returns: if successful:
the length of the copied string, not including the zero
that is put on the end; can be zero
if not successful:
PCRE_ERROR_NOMEMORY (-6) buffer too small
PCRE_ERROR_NOSUBSTRING (-7) no such captured substring
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre_copy_substring(const char *subject, int *ovector, int stringcount,
int stringnumber, char *buffer, int size)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre16_copy_substring(PCRE_SPTR16 subject, int *ovector, int stringcount,
int stringnumber, PCRE_UCHAR16 *buffer, int size)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre32_copy_substring(PCRE_SPTR32 subject, int *ovector, int stringcount,
int stringnumber, PCRE_UCHAR32 *buffer, int size)
#endif
{
int yield;
if (stringnumber < 0 || stringnumber >= stringcount)
return PCRE_ERROR_NOSUBSTRING;
stringnumber *= 2;
yield = ovector[stringnumber+1] - ovector[stringnumber];
if (size < yield + 1) return PCRE_ERROR_NOMEMORY;
memcpy(buffer, subject + ovector[stringnumber], IN_UCHARS(yield));
buffer[yield] = 0;
return yield;
}
/*************************************************
* Copy named captured string to given buffer *
*************************************************/
/* This function copies a single captured substring into a given buffer,
identifying it by name. If the regex permits duplicate names, the first
substring that is set is chosen.
Arguments:
code the compiled regex
subject the subject string that was matched
ovector pointer to the offsets table
stringcount the number of substrings that were captured
(i.e. the yield of the pcre_exec call, unless
that was zero, in which case it should be 1/3
of the offset table size)
stringname the name of the required substring
buffer where to put the substring
size the size of the buffer
Returns: if successful:
the length of the copied string, not including the zero
that is put on the end; can be zero
if not successful:
PCRE_ERROR_NOMEMORY (-6) buffer too small
PCRE_ERROR_NOSUBSTRING (-7) no such captured substring
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre_copy_named_substring(const pcre *code, const char *subject,
int *ovector, int stringcount, const char *stringname,
char *buffer, int size)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre16_copy_named_substring(const pcre16 *code, PCRE_SPTR16 subject,
int *ovector, int stringcount, PCRE_SPTR16 stringname,
PCRE_UCHAR16 *buffer, int size)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre32_copy_named_substring(const pcre32 *code, PCRE_SPTR32 subject,
int *ovector, int stringcount, PCRE_SPTR32 stringname,
PCRE_UCHAR32 *buffer, int size)
#endif
{
int n = get_first_set(code, stringname, ovector);
if (n <= 0) return n;
#if defined COMPILE_PCRE8
return pcre_copy_substring(subject, ovector, stringcount, n, buffer, size);
#elif defined COMPILE_PCRE16
return pcre16_copy_substring(subject, ovector, stringcount, n, buffer, size);
#elif defined COMPILE_PCRE32
return pcre32_copy_substring(subject, ovector, stringcount, n, buffer, size);
#endif
}
/*************************************************
* Copy all captured strings to new store *
*************************************************/
/* This function gets one chunk of store and builds a list of pointers and all
of the captured substrings in it. A NULL pointer is put on the end of the list.
Arguments:
subject the subject string that was matched
ovector pointer to the offsets table
stringcount the number of substrings that were captured
(i.e. the yield of the pcre_exec call, unless
that was zero, in which case it should be 1/3
of the offset table size)
listptr set to point to the list of pointers
Returns: if successful: 0
if not successful:
PCRE_ERROR_NOMEMORY (-6) failed to get store
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre_get_substring_list(const char *subject, int *ovector, int stringcount,
const char ***listptr)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre16_get_substring_list(PCRE_SPTR16 subject, int *ovector, int stringcount,
PCRE_SPTR16 **listptr)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre32_get_substring_list(PCRE_SPTR32 subject, int *ovector, int stringcount,
PCRE_SPTR32 **listptr)
#endif
{
int i;
int size = sizeof(pcre_uchar *);
int double_count = stringcount * 2;
pcre_uchar **stringlist;
pcre_uchar *p;
for (i = 0; i < double_count; i += 2)
size += sizeof(pcre_uchar *) + IN_UCHARS(ovector[i+1] - ovector[i] + 1);
stringlist = (pcre_uchar **)(PUBL(malloc))(size);
if (stringlist == NULL) return PCRE_ERROR_NOMEMORY;
#if defined COMPILE_PCRE8
*listptr = (const char **)stringlist;
#elif defined COMPILE_PCRE16
*listptr = (PCRE_SPTR16 *)stringlist;
#elif defined COMPILE_PCRE32
*listptr = (PCRE_SPTR32 *)stringlist;
#endif
p = (pcre_uchar *)(stringlist + stringcount + 1);
for (i = 0; i < double_count; i += 2)
{
int len = ovector[i+1] - ovector[i];
memcpy(p, subject + ovector[i], IN_UCHARS(len));
*stringlist++ = p;
p += len;
*p++ = 0;
}
*stringlist = NULL;
return 0;
}
/*************************************************
* Free store obtained by get_substring_list *
*************************************************/
/* This function exists for the benefit of people calling PCRE from non-C
programs that can call its functions, but not free() or (PUBL(free))()
directly.
Argument: the result of a previous pcre_get_substring_list()
Returns: nothing
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN void PCRE_CALL_CONVENTION
pcre_free_substring_list(const char **pointer)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN void PCRE_CALL_CONVENTION
pcre16_free_substring_list(PCRE_SPTR16 *pointer)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN void PCRE_CALL_CONVENTION
pcre32_free_substring_list(PCRE_SPTR32 *pointer)
#endif
{
(PUBL(free))((void *)pointer);
}
/*************************************************
* Copy captured string to new store *
*************************************************/
/* This function copies a single captured substring into a piece of new
store
Arguments:
subject the subject string that was matched
ovector pointer to the offsets table
stringcount the number of substrings that were captured
(i.e. the yield of the pcre_exec call, unless
that was zero, in which case it should be 1/3
of the offset table size)
stringnumber the number of the required substring
stringptr where to put a pointer to the substring
Returns: if successful:
the length of the string, not including the zero that
is put on the end; can be zero
if not successful:
PCRE_ERROR_NOMEMORY (-6) failed to get store
PCRE_ERROR_NOSUBSTRING (-7) substring not present
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre_get_substring(const char *subject, int *ovector, int stringcount,
int stringnumber, const char **stringptr)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre16_get_substring(PCRE_SPTR16 subject, int *ovector, int stringcount,
int stringnumber, PCRE_SPTR16 *stringptr)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre32_get_substring(PCRE_SPTR32 subject, int *ovector, int stringcount,
int stringnumber, PCRE_SPTR32 *stringptr)
#endif
{
int yield;
pcre_uchar *substring;
if (stringnumber < 0 || stringnumber >= stringcount)
return PCRE_ERROR_NOSUBSTRING;
stringnumber *= 2;
yield = ovector[stringnumber+1] - ovector[stringnumber];
substring = (pcre_uchar *)(PUBL(malloc))(IN_UCHARS(yield + 1));
if (substring == NULL) return PCRE_ERROR_NOMEMORY;
memcpy(substring, subject + ovector[stringnumber], IN_UCHARS(yield));
substring[yield] = 0;
#if defined COMPILE_PCRE8
*stringptr = (const char *)substring;
#elif defined COMPILE_PCRE16
*stringptr = (PCRE_SPTR16)substring;
#elif defined COMPILE_PCRE32
*stringptr = (PCRE_SPTR32)substring;
#endif
return yield;
}
/*************************************************
* Copy named captured string to new store *
*************************************************/
/* This function copies a single captured substring, identified by name, into
new store. If the regex permits duplicate names, the first substring that is
set is chosen.
Arguments:
code the compiled regex
subject the subject string that was matched
ovector pointer to the offsets table
stringcount the number of substrings that were captured
(i.e. the yield of the pcre_exec call, unless
that was zero, in which case it should be 1/3
of the offset table size)
stringname the name of the required substring
stringptr where to put the pointer
Returns: if successful:
the length of the copied string, not including the zero
that is put on the end; can be zero
if not successful:
PCRE_ERROR_NOMEMORY (-6) couldn't get memory
PCRE_ERROR_NOSUBSTRING (-7) no such captured substring
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre_get_named_substring(const pcre *code, const char *subject,
int *ovector, int stringcount, const char *stringname,
const char **stringptr)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre16_get_named_substring(const pcre16 *code, PCRE_SPTR16 subject,
int *ovector, int stringcount, PCRE_SPTR16 stringname,
PCRE_SPTR16 *stringptr)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre32_get_named_substring(const pcre32 *code, PCRE_SPTR32 subject,
int *ovector, int stringcount, PCRE_SPTR32 stringname,
PCRE_SPTR32 *stringptr)
#endif
{
int n = get_first_set(code, stringname, ovector);
if (n <= 0) return n;
#if defined COMPILE_PCRE8
return pcre_get_substring(subject, ovector, stringcount, n, stringptr);
#elif defined COMPILE_PCRE16
return pcre16_get_substring(subject, ovector, stringcount, n, stringptr);
#elif defined COMPILE_PCRE32
return pcre32_get_substring(subject, ovector, stringcount, n, stringptr);
#endif
}
/*************************************************
* Free store obtained by get_substring *
*************************************************/
/* This function exists for the benefit of people calling PCRE from non-C
programs that can call its functions, but not free() or (PUBL(free))()
directly.
Argument: the result of a previous pcre_get_substring()
Returns: nothing
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN void PCRE_CALL_CONVENTION
pcre_free_substring(const char *pointer)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN void PCRE_CALL_CONVENTION
pcre16_free_substring(PCRE_SPTR16 pointer)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN void PCRE_CALL_CONVENTION
pcre32_free_substring(PCRE_SPTR32 pointer)
#endif
{
(PUBL(free))((void *)pointer);
}
/* End of pcre_get.c */

View File

@@ -0,0 +1,86 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2014 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains global variables that are exported by the PCRE library.
PCRE is thread-clean and doesn't use any global variables in the normal sense.
However, it calls memory allocation and freeing functions via the four
indirections below, and it can optionally do callouts, using the fifth
indirection. These values can be changed by the caller, but are shared between
all threads.
For MS Visual Studio and Symbian OS, there are problems in initializing these
variables to non-local functions. In these cases, therefore, an indirection via
a local function is used.
Also, when compiling for Virtual Pascal, things are done differently, and
global variables are not used. */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "pcre_internal.h"
#if defined _MSC_VER || defined __SYMBIAN32__
static void* LocalPcreMalloc(size_t aSize)
{
return malloc(aSize);
}
static void LocalPcreFree(void* aPtr)
{
free(aPtr);
}
PCRE_EXP_DATA_DEFN void *(*PUBL(malloc))(size_t) = LocalPcreMalloc;
PCRE_EXP_DATA_DEFN void (*PUBL(free))(void *) = LocalPcreFree;
PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = LocalPcreMalloc;
PCRE_EXP_DATA_DEFN void (*PUBL(stack_free))(void *) = LocalPcreFree;
PCRE_EXP_DATA_DEFN int (*PUBL(callout))(PUBL(callout_block) *) = NULL;
PCRE_EXP_DATA_DEFN int (*PUBL(stack_guard))(void) = NULL;
#elif !defined VPCOMPAT
PCRE_EXP_DATA_DEFN void *(*PUBL(malloc))(size_t) = malloc;
PCRE_EXP_DATA_DEFN void (*PUBL(free))(void *) = free;
PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = malloc;
PCRE_EXP_DATA_DEFN void (*PUBL(stack_free))(void *) = free;
PCRE_EXP_DATA_DEFN int (*PUBL(callout))(PUBL(callout_block) *) = NULL;
PCRE_EXP_DATA_DEFN int (*PUBL(stack_guard))(void) = NULL;
#endif
/* End of pcre_globals.c */

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,156 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2012 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains the external function pcre_maketables(), which builds
character tables for PCRE in the current locale. The file is compiled on its
own as part of the PCRE library. However, it is also included in the
compilation of dftables.c, in which case the macro DFTABLES is defined. */
#ifndef DFTABLES
# ifdef HAVE_CONFIG_H
# include "config.h"
# endif
# include "pcre_internal.h"
#endif
/*************************************************
* Create PCRE character tables *
*************************************************/
/* This function builds a set of character tables for use by PCRE and returns
a pointer to them. They are build using the ctype functions, and consequently
their contents will depend upon the current locale setting. When compiled as
part of the library, the store is obtained via PUBL(malloc)(), but when
compiled inside dftables, use malloc().
Arguments: none
Returns: pointer to the contiguous block of data
*/
#if defined COMPILE_PCRE8
const unsigned char *
pcre_maketables(void)
#elif defined COMPILE_PCRE16
const unsigned char *
pcre16_maketables(void)
#elif defined COMPILE_PCRE32
const unsigned char *
pcre32_maketables(void)
#endif
{
unsigned char *yield, *p;
int i;
#ifndef DFTABLES
yield = (unsigned char*)(PUBL(malloc))(tables_length);
#else
yield = (unsigned char*)malloc(tables_length);
#endif
if (yield == NULL) return NULL;
p = yield;
/* First comes the lower casing table */
for (i = 0; i < 256; i++) *p++ = tolower(i);
/* Next the case-flipping table */
for (i = 0; i < 256; i++) *p++ = islower(i)? toupper(i) : tolower(i);
/* Then the character class tables. Don't try to be clever and save effort on
exclusive ones - in some locales things may be different.
Note that the table for "space" includes everything "isspace" gives, including
VT in the default locale. This makes it work for the POSIX class [:space:].
From release 8.34 is is also correct for Perl space, because Perl added VT at
release 5.18.
Note also that it is possible for a character to be alnum or alpha without
being lower or upper, such as "male and female ordinals" (\xAA and \xBA) in the
fr_FR locale (at least under Debian Linux's locales as of 12/2005). So we must
test for alnum specially. */
memset(p, 0, cbit_length);
for (i = 0; i < 256; i++)
{
if (isdigit(i)) p[cbit_digit + i/8] |= 1 << (i&7);
if (isupper(i)) p[cbit_upper + i/8] |= 1 << (i&7);
if (islower(i)) p[cbit_lower + i/8] |= 1 << (i&7);
if (isalnum(i)) p[cbit_word + i/8] |= 1 << (i&7);
if (i == '_') p[cbit_word + i/8] |= 1 << (i&7);
if (isspace(i)) p[cbit_space + i/8] |= 1 << (i&7);
if (isxdigit(i))p[cbit_xdigit + i/8] |= 1 << (i&7);
if (isgraph(i)) p[cbit_graph + i/8] |= 1 << (i&7);
if (isprint(i)) p[cbit_print + i/8] |= 1 << (i&7);
if (ispunct(i)) p[cbit_punct + i/8] |= 1 << (i&7);
if (iscntrl(i)) p[cbit_cntrl + i/8] |= 1 << (i&7);
}
p += cbit_length;
/* Finally, the character type table. In this, we used to exclude VT from the
white space chars, because Perl didn't recognize it as such for \s and for
comments within regexes. However, Perl changed at release 5.18, so PCRE changed
at release 8.34. */
for (i = 0; i < 256; i++)
{
int x = 0;
if (isspace(i)) x += ctype_space;
if (isalpha(i)) x += ctype_letter;
if (isdigit(i)) x += ctype_digit;
if (isxdigit(i)) x += ctype_xdigit;
if (isalnum(i) || i == '_') x += ctype_word;
/* Note: strchr includes the terminating zero in the characters it considers.
In this instance, that is ok because we want binary zero to be flagged as a
meta-character, which in this sense is any character that terminates a run
of data characters. */
if (strchr("\\*+?{^.$|()[", i) != 0) x += ctype_meta;
*p++ = x;
}
return yield;
}
/* End of pcre_maketables.c */

View File

@@ -0,0 +1,210 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2012 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains internal functions for testing newlines when more than
one kind of newline is to be recognized. When a newline is found, its length is
returned. In principle, we could implement several newline "types", each
referring to a different set of newline characters. At present, PCRE supports
only NLTYPE_FIXED, which gets handled without these functions, NLTYPE_ANYCRLF,
and NLTYPE_ANY. The full list of Unicode newline characters is taken from
http://unicode.org/unicode/reports/tr18/. */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "pcre_internal.h"
/*************************************************
* Check for newline at given position *
*************************************************/
/* It is guaranteed that the initial value of ptr is less than the end of the
string that is being processed.
Arguments:
ptr pointer to possible newline
type the newline type
endptr pointer to the end of the string
lenptr where to return the length
utf TRUE if in utf mode
Returns: TRUE or FALSE
*/
BOOL
PRIV(is_newline)(PCRE_PUCHAR ptr, int type, PCRE_PUCHAR endptr, int *lenptr,
BOOL utf)
{
pcre_uint32 c;
(void)utf;
#ifdef SUPPORT_UTF
if (utf)
{
GETCHAR(c, ptr);
}
else
#endif /* SUPPORT_UTF */
c = *ptr;
/* Note that this function is called only for ANY or ANYCRLF. */
if (type == NLTYPE_ANYCRLF) switch(c)
{
case CHAR_LF: *lenptr = 1; return TRUE;
case CHAR_CR: *lenptr = (ptr < endptr - 1 && ptr[1] == CHAR_LF)? 2 : 1;
return TRUE;
default: return FALSE;
}
/* NLTYPE_ANY */
else switch(c)
{
#ifdef EBCDIC
case CHAR_NEL:
#endif
case CHAR_LF:
case CHAR_VT:
case CHAR_FF: *lenptr = 1; return TRUE;
case CHAR_CR:
*lenptr = (ptr < endptr - 1 && ptr[1] == CHAR_LF)? 2 : 1;
return TRUE;
#ifndef EBCDIC
#ifdef COMPILE_PCRE8
case CHAR_NEL: *lenptr = utf? 2 : 1; return TRUE;
case 0x2028: /* LS */
case 0x2029: *lenptr = 3; return TRUE; /* PS */
#else /* COMPILE_PCRE16 || COMPILE_PCRE32 */
case CHAR_NEL:
case 0x2028: /* LS */
case 0x2029: *lenptr = 1; return TRUE; /* PS */
#endif /* COMPILE_PCRE8 */
#endif /* Not EBCDIC */
default: return FALSE;
}
}
/*************************************************
* Check for newline at previous position *
*************************************************/
/* It is guaranteed that the initial value of ptr is greater than the start of
the string that is being processed.
Arguments:
ptr pointer to possible newline
type the newline type
startptr pointer to the start of the string
lenptr where to return the length
utf TRUE if in utf mode
Returns: TRUE or FALSE
*/
BOOL
PRIV(was_newline)(PCRE_PUCHAR ptr, int type, PCRE_PUCHAR startptr, int *lenptr,
BOOL utf)
{
pcre_uint32 c;
(void)utf;
ptr--;
#ifdef SUPPORT_UTF
if (utf)
{
BACKCHAR(ptr);
GETCHAR(c, ptr);
}
else
#endif /* SUPPORT_UTF */
c = *ptr;
/* Note that this function is called only for ANY or ANYCRLF. */
if (type == NLTYPE_ANYCRLF) switch(c)
{
case CHAR_LF:
*lenptr = (ptr > startptr && ptr[-1] == CHAR_CR)? 2 : 1;
return TRUE;
case CHAR_CR: *lenptr = 1; return TRUE;
default: return FALSE;
}
/* NLTYPE_ANY */
else switch(c)
{
case CHAR_LF:
*lenptr = (ptr > startptr && ptr[-1] == CHAR_CR)? 2 : 1;
return TRUE;
#ifdef EBCDIC
case CHAR_NEL:
#endif
case CHAR_VT:
case CHAR_FF:
case CHAR_CR: *lenptr = 1; return TRUE;
#ifndef EBCDIC
#ifdef COMPILE_PCRE8
case CHAR_NEL: *lenptr = utf? 2 : 1; return TRUE;
case 0x2028: /* LS */
case 0x2029: *lenptr = 3; return TRUE; /* PS */
#else /* COMPILE_PCRE16 || COMPILE_PCRE32 */
case CHAR_NEL:
case 0x2028: /* LS */
case 0x2029: *lenptr = 1; return TRUE; /* PS */
#endif /* COMPILE_PCRE8 */
#endif /* NotEBCDIC */
default: return FALSE;
}
}
/* End of pcre_newline.c */

View File

@@ -0,0 +1,94 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2012 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This file contains a private PCRE function that converts an ordinal
character value into a UTF8 string. */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#define COMPILE_PCRE8
#include "pcre_internal.h"
/*************************************************
* Convert character value to UTF-8 *
*************************************************/
/* This function takes an integer value in the range 0 - 0x10ffff
and encodes it as a UTF-8 character in 1 to 4 pcre_uchars.
Arguments:
cvalue the character value
buffer pointer to buffer for result - at least 6 pcre_uchars long
Returns: number of characters placed in the buffer
*/
unsigned
int
PRIV(ord2utf)(pcre_uint32 cvalue, pcre_uchar *buffer)
{
#ifdef SUPPORT_UTF
register int i, j;
for (i = 0; i < PRIV(utf8_table1_size); i++)
if ((int)cvalue <= PRIV(utf8_table1)[i]) break;
buffer += i;
for (j = i; j > 0; j--)
{
*buffer-- = 0x80 | (cvalue & 0x3f);
cvalue >>= 6;
}
*buffer = PRIV(utf8_table2)[i] | cvalue;
return i + 1;
#else
(void)(cvalue); /* Keep compiler happy; this function won't ever be */
(void)(buffer); /* called when SUPPORT_UTF is not defined. */
return 0;
#endif
}
/* End of pcre_ord2utf8.c */

View File

@@ -0,0 +1,92 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2012 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains the external function pcre_refcount(), which is an
auxiliary function that can be used to maintain a reference count in a compiled
pattern data block. This might be helpful in applications where the block is
shared by different users. */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "pcre_internal.h"
/*************************************************
* Maintain reference count *
*************************************************/
/* The reference count is a 16-bit field, initialized to zero. It is not
possible to transfer a non-zero count from one host to a different host that
has a different byte order - though I can't see why anyone in their right mind
would ever want to do that!
Arguments:
argument_re points to compiled code
adjust value to add to the count
Returns: the (possibly updated) count value (a non-negative number), or
a negative error number
*/
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre_refcount(pcre *argument_re, int adjust)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre16_refcount(pcre16 *argument_re, int adjust)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN int PCRE_CALL_CONVENTION
pcre32_refcount(pcre32 *argument_re, int adjust)
#endif
{
REAL_PCRE *re = (REAL_PCRE *)argument_re;
if (re == NULL) return PCRE_ERROR_NULL;
if (re->magic_number != MAGIC_NUMBER) return PCRE_ERROR_BADMAGIC;
if ((re->flags & PCRE_MODE) == 0) return PCRE_ERROR_BADMODE;
re->ref_count = (-adjust > re->ref_count)? 0 :
(adjust + re->ref_count > 65535)? 65535 :
re->ref_count + adjust;
return re->ref_count;
}
/* End of pcre_refcount.c */

View File

@@ -0,0 +1,211 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2014 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains internal functions for comparing and finding the length
of strings for different data item sizes. */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "pcre_internal.h"
#ifndef COMPILE_PCRE8
/*************************************************
* Compare string utilities *
*************************************************/
/* The following two functions compares two strings. Basically a strcmp
for non 8 bit characters.
Arguments:
str1 first string
str2 second string
Returns: 0 if both string are equal (like strcmp), 1 otherwise
*/
int
PRIV(strcmp_uc_uc)(const pcre_uchar *str1, const pcre_uchar *str2)
{
pcre_uchar c1;
pcre_uchar c2;
while (*str1 != '\0' || *str2 != '\0')
{
c1 = *str1++;
c2 = *str2++;
if (c1 != c2)
return ((c1 > c2) << 1) - 1;
}
/* Both length and characters must be equal. */
return 0;
}
#ifdef COMPILE_PCRE32
int
PRIV(strcmp_uc_uc_utf)(const pcre_uchar *str1, const pcre_uchar *str2)
{
pcre_uchar c1;
pcre_uchar c2;
while (*str1 != '\0' || *str2 != '\0')
{
c1 = UCHAR21INC(str1);
c2 = UCHAR21INC(str2);
if (c1 != c2)
return ((c1 > c2) << 1) - 1;
}
/* Both length and characters must be equal. */
return 0;
}
#endif /* COMPILE_PCRE32 */
int
PRIV(strcmp_uc_c8)(const pcre_uchar *str1, const char *str2)
{
const pcre_uint8 *ustr2 = (pcre_uint8 *)str2;
pcre_uchar c1;
pcre_uchar c2;
while (*str1 != '\0' || *ustr2 != '\0')
{
c1 = *str1++;
c2 = (pcre_uchar)*ustr2++;
if (c1 != c2)
return ((c1 > c2) << 1) - 1;
}
/* Both length and characters must be equal. */
return 0;
}
#ifdef COMPILE_PCRE32
int
PRIV(strcmp_uc_c8_utf)(const pcre_uchar *str1, const char *str2)
{
const pcre_uint8 *ustr2 = (pcre_uint8 *)str2;
pcre_uchar c1;
pcre_uchar c2;
while (*str1 != '\0' || *ustr2 != '\0')
{
c1 = UCHAR21INC(str1);
c2 = (pcre_uchar)*ustr2++;
if (c1 != c2)
return ((c1 > c2) << 1) - 1;
}
/* Both length and characters must be equal. */
return 0;
}
#endif /* COMPILE_PCRE32 */
/* The following two functions compares two, fixed length
strings. Basically an strncmp for non 8 bit characters.
Arguments:
str1 first string
str2 second string
num size of the string
Returns: 0 if both string are equal (like strcmp), 1 otherwise
*/
int
PRIV(strncmp_uc_uc)(const pcre_uchar *str1, const pcre_uchar *str2, unsigned int num)
{
pcre_uchar c1;
pcre_uchar c2;
while (num-- > 0)
{
c1 = *str1++;
c2 = *str2++;
if (c1 != c2)
return ((c1 > c2) << 1) - 1;
}
/* Both length and characters must be equal. */
return 0;
}
int
PRIV(strncmp_uc_c8)(const pcre_uchar *str1, const char *str2, unsigned int num)
{
const pcre_uint8 *ustr2 = (pcre_uint8 *)str2;
pcre_uchar c1;
pcre_uchar c2;
while (num-- > 0)
{
c1 = *str1++;
c2 = (pcre_uchar)*ustr2++;
if (c1 != c2)
return ((c1 > c2) << 1) - 1;
}
/* Both length and characters must be equal. */
return 0;
}
/* The following function returns with the length of
a zero terminated string. Basically an strlen for non 8 bit characters.
Arguments:
str string
Returns: length of the string
*/
unsigned int
PRIV(strlen_uc)(const pcre_uchar *str)
{
unsigned int len = 0;
while (*str++ != 0)
len++;
return len;
}
#endif /* !COMPILE_PCRE8 */
/* End of pcre_string_utils.c */

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,727 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2012 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
#ifndef PCRE_INCLUDED
/* This module contains some fixed tables that are used by more than one of the
PCRE code modules. The tables are also #included by the pcretest program, which
uses macros to change their names from _pcre_xxx to xxxx, thereby avoiding name
clashes with the library. */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "pcre_internal.h"
#endif /* PCRE_INCLUDED */
/* Table of sizes for the fixed-length opcodes. It's defined in a macro so that
the definition is next to the definition of the opcodes in pcre_internal.h. */
const pcre_uint8 PRIV(OP_lengths)[] = { OP_LENGTHS };
/* Tables of horizontal and vertical whitespace characters, suitable for
adding to classes. */
const pcre_uint32 PRIV(hspace_list)[] = { HSPACE_LIST };
const pcre_uint32 PRIV(vspace_list)[] = { VSPACE_LIST };
/*************************************************
* Tables for UTF-8 support *
*************************************************/
/* These are the breakpoints for different numbers of bytes in a UTF-8
character. */
#if (defined SUPPORT_UTF && defined COMPILE_PCRE8) \
|| (defined PCRE_INCLUDED && (defined SUPPORT_PCRE16 || defined SUPPORT_PCRE32))
/* These tables are also required by pcretest in 16- or 32-bit mode. */
const int PRIV(utf8_table1)[] =
{ 0x7f, 0x7ff, 0xffff, 0x1fffff, 0x3ffffff, 0x7fffffff};
const int PRIV(utf8_table1_size) = sizeof(PRIV(utf8_table1)) / sizeof(int);
/* These are the indicator bits and the mask for the data bits to set in the
first byte of a character, indexed by the number of additional bytes. */
const int PRIV(utf8_table2)[] = { 0, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc};
const int PRIV(utf8_table3)[] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
/* Table of the number of extra bytes, indexed by the first byte masked with
0x3f. The highest number for a valid UTF-8 first byte is in fact 0x3d. */
const pcre_uint8 PRIV(utf8_table4)[] = {
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5 };
#endif /* (SUPPORT_UTF && COMPILE_PCRE8) || (PCRE_INCLUDED && SUPPORT_PCRE[16|32])*/
#ifdef SUPPORT_UTF
/* Table to translate from particular type value to the general value. */
const pcre_uint32 PRIV(ucp_gentype)[] = {
ucp_C, ucp_C, ucp_C, ucp_C, ucp_C, /* Cc, Cf, Cn, Co, Cs */
ucp_L, ucp_L, ucp_L, ucp_L, ucp_L, /* Ll, Lu, Lm, Lo, Lt */
ucp_M, ucp_M, ucp_M, /* Mc, Me, Mn */
ucp_N, ucp_N, ucp_N, /* Nd, Nl, No */
ucp_P, ucp_P, ucp_P, ucp_P, ucp_P, /* Pc, Pd, Pe, Pf, Pi */
ucp_P, ucp_P, /* Ps, Po */
ucp_S, ucp_S, ucp_S, ucp_S, /* Sc, Sk, Sm, So */
ucp_Z, ucp_Z, ucp_Z /* Zl, Zp, Zs */
};
/* This table encodes the rules for finding the end of an extended grapheme
cluster. Every code point has a grapheme break property which is one of the
ucp_gbXX values defined in ucp.h. The 2-dimensional table is indexed by the
properties of two adjacent code points. The left property selects a word from
the table, and the right property selects a bit from that word like this:
ucp_gbtable[left-property] & (1 << right-property)
The value is non-zero if a grapheme break is NOT permitted between the relevant
two code points. The breaking rules are as follows:
1. Break at the start and end of text (pretty obviously).
2. Do not break between a CR and LF; otherwise, break before and after
controls.
3. Do not break Hangul syllable sequences, the rules for which are:
L may be followed by L, V, LV or LVT
LV or V may be followed by V or T
LVT or T may be followed by T
4. Do not break before extending characters.
The next two rules are only for extended grapheme clusters (but that's what we
are implementing).
5. Do not break before SpacingMarks.
6. Do not break after Prepend characters.
7. Otherwise, break everywhere.
*/
const pcre_uint32 PRIV(ucp_gbtable[]) = {
(1<<ucp_gbLF), /* 0 CR */
0, /* 1 LF */
0, /* 2 Control */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 3 Extend */
(1<<ucp_gbExtend)|(1<<ucp_gbPrepend)| /* 4 Prepend */
(1<<ucp_gbSpacingMark)|(1<<ucp_gbL)|
(1<<ucp_gbV)|(1<<ucp_gbT)|(1<<ucp_gbLV)|
(1<<ucp_gbLVT)|(1<<ucp_gbOther),
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark), /* 5 SpacingMark */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbL)| /* 6 L */
(1<<ucp_gbL)|(1<<ucp_gbV)|(1<<ucp_gbLV)|(1<<ucp_gbLVT),
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbV)| /* 7 V */
(1<<ucp_gbT),
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbT), /* 8 T */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbV)| /* 9 LV */
(1<<ucp_gbT),
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbT), /* 10 LVT */
(1<<ucp_gbRegionalIndicator), /* 11 RegionalIndicator */
(1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark) /* 12 Other */
};
#ifdef SUPPORT_JIT
/* This table reverses PRIV(ucp_gentype). We can save the cost
of a memory load. */
const int PRIV(ucp_typerange)[] = {
ucp_Cc, ucp_Cs,
ucp_Ll, ucp_Lu,
ucp_Mc, ucp_Mn,
ucp_Nd, ucp_No,
ucp_Pc, ucp_Ps,
ucp_Sc, ucp_So,
ucp_Zl, ucp_Zs,
};
#endif /* SUPPORT_JIT */
/* The pcre_utt[] table below translates Unicode property names into type and
code values. It is searched by binary chop, so must be in collating sequence of
name. Originally, the table contained pointers to the name strings in the first
field of each entry. However, that leads to a large number of relocations when
a shared library is dynamically loaded. A significant reduction is made by
putting all the names into a single, large string and then using offsets in the
table itself. Maintenance is more error-prone, but frequent changes to this
data are unlikely.
July 2008: There is now a script called maint/GenerateUtt.py that can be used
to generate this data automatically instead of maintaining it by hand.
The script was updated in March 2009 to generate a new EBCDIC-compliant
version. Like all other character and string literals that are compared against
the regular expression pattern, we must use STR_ macros instead of literal
strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Any0 STR_A STR_n STR_y "\0"
#define STRING_Arabic0 STR_A STR_r STR_a STR_b STR_i STR_c "\0"
#define STRING_Armenian0 STR_A STR_r STR_m STR_e STR_n STR_i STR_a STR_n "\0"
#define STRING_Avestan0 STR_A STR_v STR_e STR_s STR_t STR_a STR_n "\0"
#define STRING_Balinese0 STR_B STR_a STR_l STR_i STR_n STR_e STR_s STR_e "\0"
#define STRING_Bamum0 STR_B STR_a STR_m STR_u STR_m "\0"
#define STRING_Bassa_Vah0 STR_B STR_a STR_s STR_s STR_a STR_UNDERSCORE STR_V STR_a STR_h "\0"
#define STRING_Batak0 STR_B STR_a STR_t STR_a STR_k "\0"
#define STRING_Bengali0 STR_B STR_e STR_n STR_g STR_a STR_l STR_i "\0"
#define STRING_Bopomofo0 STR_B STR_o STR_p STR_o STR_m STR_o STR_f STR_o "\0"
#define STRING_Brahmi0 STR_B STR_r STR_a STR_h STR_m STR_i "\0"
#define STRING_Braille0 STR_B STR_r STR_a STR_i STR_l STR_l STR_e "\0"
#define STRING_Buginese0 STR_B STR_u STR_g STR_i STR_n STR_e STR_s STR_e "\0"
#define STRING_Buhid0 STR_B STR_u STR_h STR_i STR_d "\0"
#define STRING_C0 STR_C "\0"
#define STRING_Canadian_Aboriginal0 STR_C STR_a STR_n STR_a STR_d STR_i STR_a STR_n STR_UNDERSCORE STR_A STR_b STR_o STR_r STR_i STR_g STR_i STR_n STR_a STR_l "\0"
#define STRING_Carian0 STR_C STR_a STR_r STR_i STR_a STR_n "\0"
#define STRING_Caucasian_Albanian0 STR_C STR_a STR_u STR_c STR_a STR_s STR_i STR_a STR_n STR_UNDERSCORE STR_A STR_l STR_b STR_a STR_n STR_i STR_a STR_n "\0"
#define STRING_Cc0 STR_C STR_c "\0"
#define STRING_Cf0 STR_C STR_f "\0"
#define STRING_Chakma0 STR_C STR_h STR_a STR_k STR_m STR_a "\0"
#define STRING_Cham0 STR_C STR_h STR_a STR_m "\0"
#define STRING_Cherokee0 STR_C STR_h STR_e STR_r STR_o STR_k STR_e STR_e "\0"
#define STRING_Cn0 STR_C STR_n "\0"
#define STRING_Co0 STR_C STR_o "\0"
#define STRING_Common0 STR_C STR_o STR_m STR_m STR_o STR_n "\0"
#define STRING_Coptic0 STR_C STR_o STR_p STR_t STR_i STR_c "\0"
#define STRING_Cs0 STR_C STR_s "\0"
#define STRING_Cuneiform0 STR_C STR_u STR_n STR_e STR_i STR_f STR_o STR_r STR_m "\0"
#define STRING_Cypriot0 STR_C STR_y STR_p STR_r STR_i STR_o STR_t "\0"
#define STRING_Cyrillic0 STR_C STR_y STR_r STR_i STR_l STR_l STR_i STR_c "\0"
#define STRING_Deseret0 STR_D STR_e STR_s STR_e STR_r STR_e STR_t "\0"
#define STRING_Devanagari0 STR_D STR_e STR_v STR_a STR_n STR_a STR_g STR_a STR_r STR_i "\0"
#define STRING_Duployan0 STR_D STR_u STR_p STR_l STR_o STR_y STR_a STR_n "\0"
#define STRING_Egyptian_Hieroglyphs0 STR_E STR_g STR_y STR_p STR_t STR_i STR_a STR_n STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0"
#define STRING_Elbasan0 STR_E STR_l STR_b STR_a STR_s STR_a STR_n "\0"
#define STRING_Ethiopic0 STR_E STR_t STR_h STR_i STR_o STR_p STR_i STR_c "\0"
#define STRING_Georgian0 STR_G STR_e STR_o STR_r STR_g STR_i STR_a STR_n "\0"
#define STRING_Glagolitic0 STR_G STR_l STR_a STR_g STR_o STR_l STR_i STR_t STR_i STR_c "\0"
#define STRING_Gothic0 STR_G STR_o STR_t STR_h STR_i STR_c "\0"
#define STRING_Grantha0 STR_G STR_r STR_a STR_n STR_t STR_h STR_a "\0"
#define STRING_Greek0 STR_G STR_r STR_e STR_e STR_k "\0"
#define STRING_Gujarati0 STR_G STR_u STR_j STR_a STR_r STR_a STR_t STR_i "\0"
#define STRING_Gurmukhi0 STR_G STR_u STR_r STR_m STR_u STR_k STR_h STR_i "\0"
#define STRING_Han0 STR_H STR_a STR_n "\0"
#define STRING_Hangul0 STR_H STR_a STR_n STR_g STR_u STR_l "\0"
#define STRING_Hanunoo0 STR_H STR_a STR_n STR_u STR_n STR_o STR_o "\0"
#define STRING_Hebrew0 STR_H STR_e STR_b STR_r STR_e STR_w "\0"
#define STRING_Hiragana0 STR_H STR_i STR_r STR_a STR_g STR_a STR_n STR_a "\0"
#define STRING_Imperial_Aramaic0 STR_I STR_m STR_p STR_e STR_r STR_i STR_a STR_l STR_UNDERSCORE STR_A STR_r STR_a STR_m STR_a STR_i STR_c "\0"
#define STRING_Inherited0 STR_I STR_n STR_h STR_e STR_r STR_i STR_t STR_e STR_d "\0"
#define STRING_Inscriptional_Pahlavi0 STR_I STR_n STR_s STR_c STR_r STR_i STR_p STR_t STR_i STR_o STR_n STR_a STR_l STR_UNDERSCORE STR_P STR_a STR_h STR_l STR_a STR_v STR_i "\0"
#define STRING_Inscriptional_Parthian0 STR_I STR_n STR_s STR_c STR_r STR_i STR_p STR_t STR_i STR_o STR_n STR_a STR_l STR_UNDERSCORE STR_P STR_a STR_r STR_t STR_h STR_i STR_a STR_n "\0"
#define STRING_Javanese0 STR_J STR_a STR_v STR_a STR_n STR_e STR_s STR_e "\0"
#define STRING_Kaithi0 STR_K STR_a STR_i STR_t STR_h STR_i "\0"
#define STRING_Kannada0 STR_K STR_a STR_n STR_n STR_a STR_d STR_a "\0"
#define STRING_Katakana0 STR_K STR_a STR_t STR_a STR_k STR_a STR_n STR_a "\0"
#define STRING_Kayah_Li0 STR_K STR_a STR_y STR_a STR_h STR_UNDERSCORE STR_L STR_i "\0"
#define STRING_Kharoshthi0 STR_K STR_h STR_a STR_r STR_o STR_s STR_h STR_t STR_h STR_i "\0"
#define STRING_Khmer0 STR_K STR_h STR_m STR_e STR_r "\0"
#define STRING_Khojki0 STR_K STR_h STR_o STR_j STR_k STR_i "\0"
#define STRING_Khudawadi0 STR_K STR_h STR_u STR_d STR_a STR_w STR_a STR_d STR_i "\0"
#define STRING_L0 STR_L "\0"
#define STRING_L_AMPERSAND0 STR_L STR_AMPERSAND "\0"
#define STRING_Lao0 STR_L STR_a STR_o "\0"
#define STRING_Latin0 STR_L STR_a STR_t STR_i STR_n "\0"
#define STRING_Lepcha0 STR_L STR_e STR_p STR_c STR_h STR_a "\0"
#define STRING_Limbu0 STR_L STR_i STR_m STR_b STR_u "\0"
#define STRING_Linear_A0 STR_L STR_i STR_n STR_e STR_a STR_r STR_UNDERSCORE STR_A "\0"
#define STRING_Linear_B0 STR_L STR_i STR_n STR_e STR_a STR_r STR_UNDERSCORE STR_B "\0"
#define STRING_Lisu0 STR_L STR_i STR_s STR_u "\0"
#define STRING_Ll0 STR_L STR_l "\0"
#define STRING_Lm0 STR_L STR_m "\0"
#define STRING_Lo0 STR_L STR_o "\0"
#define STRING_Lt0 STR_L STR_t "\0"
#define STRING_Lu0 STR_L STR_u "\0"
#define STRING_Lycian0 STR_L STR_y STR_c STR_i STR_a STR_n "\0"
#define STRING_Lydian0 STR_L STR_y STR_d STR_i STR_a STR_n "\0"
#define STRING_M0 STR_M "\0"
#define STRING_Mahajani0 STR_M STR_a STR_h STR_a STR_j STR_a STR_n STR_i "\0"
#define STRING_Malayalam0 STR_M STR_a STR_l STR_a STR_y STR_a STR_l STR_a STR_m "\0"
#define STRING_Mandaic0 STR_M STR_a STR_n STR_d STR_a STR_i STR_c "\0"
#define STRING_Manichaean0 STR_M STR_a STR_n STR_i STR_c STR_h STR_a STR_e STR_a STR_n "\0"
#define STRING_Mc0 STR_M STR_c "\0"
#define STRING_Me0 STR_M STR_e "\0"
#define STRING_Meetei_Mayek0 STR_M STR_e STR_e STR_t STR_e STR_i STR_UNDERSCORE STR_M STR_a STR_y STR_e STR_k "\0"
#define STRING_Mende_Kikakui0 STR_M STR_e STR_n STR_d STR_e STR_UNDERSCORE STR_K STR_i STR_k STR_a STR_k STR_u STR_i "\0"
#define STRING_Meroitic_Cursive0 STR_M STR_e STR_r STR_o STR_i STR_t STR_i STR_c STR_UNDERSCORE STR_C STR_u STR_r STR_s STR_i STR_v STR_e "\0"
#define STRING_Meroitic_Hieroglyphs0 STR_M STR_e STR_r STR_o STR_i STR_t STR_i STR_c STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0"
#define STRING_Miao0 STR_M STR_i STR_a STR_o "\0"
#define STRING_Mn0 STR_M STR_n "\0"
#define STRING_Modi0 STR_M STR_o STR_d STR_i "\0"
#define STRING_Mongolian0 STR_M STR_o STR_n STR_g STR_o STR_l STR_i STR_a STR_n "\0"
#define STRING_Mro0 STR_M STR_r STR_o "\0"
#define STRING_Myanmar0 STR_M STR_y STR_a STR_n STR_m STR_a STR_r "\0"
#define STRING_N0 STR_N "\0"
#define STRING_Nabataean0 STR_N STR_a STR_b STR_a STR_t STR_a STR_e STR_a STR_n "\0"
#define STRING_Nd0 STR_N STR_d "\0"
#define STRING_New_Tai_Lue0 STR_N STR_e STR_w STR_UNDERSCORE STR_T STR_a STR_i STR_UNDERSCORE STR_L STR_u STR_e "\0"
#define STRING_Nko0 STR_N STR_k STR_o "\0"
#define STRING_Nl0 STR_N STR_l "\0"
#define STRING_No0 STR_N STR_o "\0"
#define STRING_Ogham0 STR_O STR_g STR_h STR_a STR_m "\0"
#define STRING_Ol_Chiki0 STR_O STR_l STR_UNDERSCORE STR_C STR_h STR_i STR_k STR_i "\0"
#define STRING_Old_Italic0 STR_O STR_l STR_d STR_UNDERSCORE STR_I STR_t STR_a STR_l STR_i STR_c "\0"
#define STRING_Old_North_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_N STR_o STR_r STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0"
#define STRING_Old_Permic0 STR_O STR_l STR_d STR_UNDERSCORE STR_P STR_e STR_r STR_m STR_i STR_c "\0"
#define STRING_Old_Persian0 STR_O STR_l STR_d STR_UNDERSCORE STR_P STR_e STR_r STR_s STR_i STR_a STR_n "\0"
#define STRING_Old_South_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_S STR_o STR_u STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0"
#define STRING_Old_Turkic0 STR_O STR_l STR_d STR_UNDERSCORE STR_T STR_u STR_r STR_k STR_i STR_c "\0"
#define STRING_Oriya0 STR_O STR_r STR_i STR_y STR_a "\0"
#define STRING_Osmanya0 STR_O STR_s STR_m STR_a STR_n STR_y STR_a "\0"
#define STRING_P0 STR_P "\0"
#define STRING_Pahawh_Hmong0 STR_P STR_a STR_h STR_a STR_w STR_h STR_UNDERSCORE STR_H STR_m STR_o STR_n STR_g "\0"
#define STRING_Palmyrene0 STR_P STR_a STR_l STR_m STR_y STR_r STR_e STR_n STR_e "\0"
#define STRING_Pau_Cin_Hau0 STR_P STR_a STR_u STR_UNDERSCORE STR_C STR_i STR_n STR_UNDERSCORE STR_H STR_a STR_u "\0"
#define STRING_Pc0 STR_P STR_c "\0"
#define STRING_Pd0 STR_P STR_d "\0"
#define STRING_Pe0 STR_P STR_e "\0"
#define STRING_Pf0 STR_P STR_f "\0"
#define STRING_Phags_Pa0 STR_P STR_h STR_a STR_g STR_s STR_UNDERSCORE STR_P STR_a "\0"
#define STRING_Phoenician0 STR_P STR_h STR_o STR_e STR_n STR_i STR_c STR_i STR_a STR_n "\0"
#define STRING_Pi0 STR_P STR_i "\0"
#define STRING_Po0 STR_P STR_o "\0"
#define STRING_Ps0 STR_P STR_s "\0"
#define STRING_Psalter_Pahlavi0 STR_P STR_s STR_a STR_l STR_t STR_e STR_r STR_UNDERSCORE STR_P STR_a STR_h STR_l STR_a STR_v STR_i "\0"
#define STRING_Rejang0 STR_R STR_e STR_j STR_a STR_n STR_g "\0"
#define STRING_Runic0 STR_R STR_u STR_n STR_i STR_c "\0"
#define STRING_S0 STR_S "\0"
#define STRING_Samaritan0 STR_S STR_a STR_m STR_a STR_r STR_i STR_t STR_a STR_n "\0"
#define STRING_Saurashtra0 STR_S STR_a STR_u STR_r STR_a STR_s STR_h STR_t STR_r STR_a "\0"
#define STRING_Sc0 STR_S STR_c "\0"
#define STRING_Sharada0 STR_S STR_h STR_a STR_r STR_a STR_d STR_a "\0"
#define STRING_Shavian0 STR_S STR_h STR_a STR_v STR_i STR_a STR_n "\0"
#define STRING_Siddham0 STR_S STR_i STR_d STR_d STR_h STR_a STR_m "\0"
#define STRING_Sinhala0 STR_S STR_i STR_n STR_h STR_a STR_l STR_a "\0"
#define STRING_Sk0 STR_S STR_k "\0"
#define STRING_Sm0 STR_S STR_m "\0"
#define STRING_So0 STR_S STR_o "\0"
#define STRING_Sora_Sompeng0 STR_S STR_o STR_r STR_a STR_UNDERSCORE STR_S STR_o STR_m STR_p STR_e STR_n STR_g "\0"
#define STRING_Sundanese0 STR_S STR_u STR_n STR_d STR_a STR_n STR_e STR_s STR_e "\0"
#define STRING_Syloti_Nagri0 STR_S STR_y STR_l STR_o STR_t STR_i STR_UNDERSCORE STR_N STR_a STR_g STR_r STR_i "\0"
#define STRING_Syriac0 STR_S STR_y STR_r STR_i STR_a STR_c "\0"
#define STRING_Tagalog0 STR_T STR_a STR_g STR_a STR_l STR_o STR_g "\0"
#define STRING_Tagbanwa0 STR_T STR_a STR_g STR_b STR_a STR_n STR_w STR_a "\0"
#define STRING_Tai_Le0 STR_T STR_a STR_i STR_UNDERSCORE STR_L STR_e "\0"
#define STRING_Tai_Tham0 STR_T STR_a STR_i STR_UNDERSCORE STR_T STR_h STR_a STR_m "\0"
#define STRING_Tai_Viet0 STR_T STR_a STR_i STR_UNDERSCORE STR_V STR_i STR_e STR_t "\0"
#define STRING_Takri0 STR_T STR_a STR_k STR_r STR_i "\0"
#define STRING_Tamil0 STR_T STR_a STR_m STR_i STR_l "\0"
#define STRING_Telugu0 STR_T STR_e STR_l STR_u STR_g STR_u "\0"
#define STRING_Thaana0 STR_T STR_h STR_a STR_a STR_n STR_a "\0"
#define STRING_Thai0 STR_T STR_h STR_a STR_i "\0"
#define STRING_Tibetan0 STR_T STR_i STR_b STR_e STR_t STR_a STR_n "\0"
#define STRING_Tifinagh0 STR_T STR_i STR_f STR_i STR_n STR_a STR_g STR_h "\0"
#define STRING_Tirhuta0 STR_T STR_i STR_r STR_h STR_u STR_t STR_a "\0"
#define STRING_Ugaritic0 STR_U STR_g STR_a STR_r STR_i STR_t STR_i STR_c "\0"
#define STRING_Vai0 STR_V STR_a STR_i "\0"
#define STRING_Warang_Citi0 STR_W STR_a STR_r STR_a STR_n STR_g STR_UNDERSCORE STR_C STR_i STR_t STR_i "\0"
#define STRING_Xan0 STR_X STR_a STR_n "\0"
#define STRING_Xps0 STR_X STR_p STR_s "\0"
#define STRING_Xsp0 STR_X STR_s STR_p "\0"
#define STRING_Xuc0 STR_X STR_u STR_c "\0"
#define STRING_Xwd0 STR_X STR_w STR_d "\0"
#define STRING_Yi0 STR_Y STR_i "\0"
#define STRING_Z0 STR_Z "\0"
#define STRING_Zl0 STR_Z STR_l "\0"
#define STRING_Zp0 STR_Z STR_p "\0"
#define STRING_Zs0 STR_Z STR_s "\0"
const char PRIV(utt_names)[] =
STRING_Any0
STRING_Arabic0
STRING_Armenian0
STRING_Avestan0
STRING_Balinese0
STRING_Bamum0
STRING_Bassa_Vah0
STRING_Batak0
STRING_Bengali0
STRING_Bopomofo0
STRING_Brahmi0
STRING_Braille0
STRING_Buginese0
STRING_Buhid0
STRING_C0
STRING_Canadian_Aboriginal0
STRING_Carian0
STRING_Caucasian_Albanian0
STRING_Cc0
STRING_Cf0
STRING_Chakma0
STRING_Cham0
STRING_Cherokee0
STRING_Cn0
STRING_Co0
STRING_Common0
STRING_Coptic0
STRING_Cs0
STRING_Cuneiform0
STRING_Cypriot0
STRING_Cyrillic0
STRING_Deseret0
STRING_Devanagari0
STRING_Duployan0
STRING_Egyptian_Hieroglyphs0
STRING_Elbasan0
STRING_Ethiopic0
STRING_Georgian0
STRING_Glagolitic0
STRING_Gothic0
STRING_Grantha0
STRING_Greek0
STRING_Gujarati0
STRING_Gurmukhi0
STRING_Han0
STRING_Hangul0
STRING_Hanunoo0
STRING_Hebrew0
STRING_Hiragana0
STRING_Imperial_Aramaic0
STRING_Inherited0
STRING_Inscriptional_Pahlavi0
STRING_Inscriptional_Parthian0
STRING_Javanese0
STRING_Kaithi0
STRING_Kannada0
STRING_Katakana0
STRING_Kayah_Li0
STRING_Kharoshthi0
STRING_Khmer0
STRING_Khojki0
STRING_Khudawadi0
STRING_L0
STRING_L_AMPERSAND0
STRING_Lao0
STRING_Latin0
STRING_Lepcha0
STRING_Limbu0
STRING_Linear_A0
STRING_Linear_B0
STRING_Lisu0
STRING_Ll0
STRING_Lm0
STRING_Lo0
STRING_Lt0
STRING_Lu0
STRING_Lycian0
STRING_Lydian0
STRING_M0
STRING_Mahajani0
STRING_Malayalam0
STRING_Mandaic0
STRING_Manichaean0
STRING_Mc0
STRING_Me0
STRING_Meetei_Mayek0
STRING_Mende_Kikakui0
STRING_Meroitic_Cursive0
STRING_Meroitic_Hieroglyphs0
STRING_Miao0
STRING_Mn0
STRING_Modi0
STRING_Mongolian0
STRING_Mro0
STRING_Myanmar0
STRING_N0
STRING_Nabataean0
STRING_Nd0
STRING_New_Tai_Lue0
STRING_Nko0
STRING_Nl0
STRING_No0
STRING_Ogham0
STRING_Ol_Chiki0
STRING_Old_Italic0
STRING_Old_North_Arabian0
STRING_Old_Permic0
STRING_Old_Persian0
STRING_Old_South_Arabian0
STRING_Old_Turkic0
STRING_Oriya0
STRING_Osmanya0
STRING_P0
STRING_Pahawh_Hmong0
STRING_Palmyrene0
STRING_Pau_Cin_Hau0
STRING_Pc0
STRING_Pd0
STRING_Pe0
STRING_Pf0
STRING_Phags_Pa0
STRING_Phoenician0
STRING_Pi0
STRING_Po0
STRING_Ps0
STRING_Psalter_Pahlavi0
STRING_Rejang0
STRING_Runic0
STRING_S0
STRING_Samaritan0
STRING_Saurashtra0
STRING_Sc0
STRING_Sharada0
STRING_Shavian0
STRING_Siddham0
STRING_Sinhala0
STRING_Sk0
STRING_Sm0
STRING_So0
STRING_Sora_Sompeng0
STRING_Sundanese0
STRING_Syloti_Nagri0
STRING_Syriac0
STRING_Tagalog0
STRING_Tagbanwa0
STRING_Tai_Le0
STRING_Tai_Tham0
STRING_Tai_Viet0
STRING_Takri0
STRING_Tamil0
STRING_Telugu0
STRING_Thaana0
STRING_Thai0
STRING_Tibetan0
STRING_Tifinagh0
STRING_Tirhuta0
STRING_Ugaritic0
STRING_Vai0
STRING_Warang_Citi0
STRING_Xan0
STRING_Xps0
STRING_Xsp0
STRING_Xuc0
STRING_Xwd0
STRING_Yi0
STRING_Z0
STRING_Zl0
STRING_Zp0
STRING_Zs0;
const ucp_type_table PRIV(utt)[] = {
{ 0, PT_ANY, 0 },
{ 4, PT_SC, ucp_Arabic },
{ 11, PT_SC, ucp_Armenian },
{ 20, PT_SC, ucp_Avestan },
{ 28, PT_SC, ucp_Balinese },
{ 37, PT_SC, ucp_Bamum },
{ 43, PT_SC, ucp_Bassa_Vah },
{ 53, PT_SC, ucp_Batak },
{ 59, PT_SC, ucp_Bengali },
{ 67, PT_SC, ucp_Bopomofo },
{ 76, PT_SC, ucp_Brahmi },
{ 83, PT_SC, ucp_Braille },
{ 91, PT_SC, ucp_Buginese },
{ 100, PT_SC, ucp_Buhid },
{ 106, PT_GC, ucp_C },
{ 108, PT_SC, ucp_Canadian_Aboriginal },
{ 128, PT_SC, ucp_Carian },
{ 135, PT_SC, ucp_Caucasian_Albanian },
{ 154, PT_PC, ucp_Cc },
{ 157, PT_PC, ucp_Cf },
{ 160, PT_SC, ucp_Chakma },
{ 167, PT_SC, ucp_Cham },
{ 172, PT_SC, ucp_Cherokee },
{ 181, PT_PC, ucp_Cn },
{ 184, PT_PC, ucp_Co },
{ 187, PT_SC, ucp_Common },
{ 194, PT_SC, ucp_Coptic },
{ 201, PT_PC, ucp_Cs },
{ 204, PT_SC, ucp_Cuneiform },
{ 214, PT_SC, ucp_Cypriot },
{ 222, PT_SC, ucp_Cyrillic },
{ 231, PT_SC, ucp_Deseret },
{ 239, PT_SC, ucp_Devanagari },
{ 250, PT_SC, ucp_Duployan },
{ 259, PT_SC, ucp_Egyptian_Hieroglyphs },
{ 280, PT_SC, ucp_Elbasan },
{ 288, PT_SC, ucp_Ethiopic },
{ 297, PT_SC, ucp_Georgian },
{ 306, PT_SC, ucp_Glagolitic },
{ 317, PT_SC, ucp_Gothic },
{ 324, PT_SC, ucp_Grantha },
{ 332, PT_SC, ucp_Greek },
{ 338, PT_SC, ucp_Gujarati },
{ 347, PT_SC, ucp_Gurmukhi },
{ 356, PT_SC, ucp_Han },
{ 360, PT_SC, ucp_Hangul },
{ 367, PT_SC, ucp_Hanunoo },
{ 375, PT_SC, ucp_Hebrew },
{ 382, PT_SC, ucp_Hiragana },
{ 391, PT_SC, ucp_Imperial_Aramaic },
{ 408, PT_SC, ucp_Inherited },
{ 418, PT_SC, ucp_Inscriptional_Pahlavi },
{ 440, PT_SC, ucp_Inscriptional_Parthian },
{ 463, PT_SC, ucp_Javanese },
{ 472, PT_SC, ucp_Kaithi },
{ 479, PT_SC, ucp_Kannada },
{ 487, PT_SC, ucp_Katakana },
{ 496, PT_SC, ucp_Kayah_Li },
{ 505, PT_SC, ucp_Kharoshthi },
{ 516, PT_SC, ucp_Khmer },
{ 522, PT_SC, ucp_Khojki },
{ 529, PT_SC, ucp_Khudawadi },
{ 539, PT_GC, ucp_L },
{ 541, PT_LAMP, 0 },
{ 544, PT_SC, ucp_Lao },
{ 548, PT_SC, ucp_Latin },
{ 554, PT_SC, ucp_Lepcha },
{ 561, PT_SC, ucp_Limbu },
{ 567, PT_SC, ucp_Linear_A },
{ 576, PT_SC, ucp_Linear_B },
{ 585, PT_SC, ucp_Lisu },
{ 590, PT_PC, ucp_Ll },
{ 593, PT_PC, ucp_Lm },
{ 596, PT_PC, ucp_Lo },
{ 599, PT_PC, ucp_Lt },
{ 602, PT_PC, ucp_Lu },
{ 605, PT_SC, ucp_Lycian },
{ 612, PT_SC, ucp_Lydian },
{ 619, PT_GC, ucp_M },
{ 621, PT_SC, ucp_Mahajani },
{ 630, PT_SC, ucp_Malayalam },
{ 640, PT_SC, ucp_Mandaic },
{ 648, PT_SC, ucp_Manichaean },
{ 659, PT_PC, ucp_Mc },
{ 662, PT_PC, ucp_Me },
{ 665, PT_SC, ucp_Meetei_Mayek },
{ 678, PT_SC, ucp_Mende_Kikakui },
{ 692, PT_SC, ucp_Meroitic_Cursive },
{ 709, PT_SC, ucp_Meroitic_Hieroglyphs },
{ 730, PT_SC, ucp_Miao },
{ 735, PT_PC, ucp_Mn },
{ 738, PT_SC, ucp_Modi },
{ 743, PT_SC, ucp_Mongolian },
{ 753, PT_SC, ucp_Mro },
{ 757, PT_SC, ucp_Myanmar },
{ 765, PT_GC, ucp_N },
{ 767, PT_SC, ucp_Nabataean },
{ 777, PT_PC, ucp_Nd },
{ 780, PT_SC, ucp_New_Tai_Lue },
{ 792, PT_SC, ucp_Nko },
{ 796, PT_PC, ucp_Nl },
{ 799, PT_PC, ucp_No },
{ 802, PT_SC, ucp_Ogham },
{ 808, PT_SC, ucp_Ol_Chiki },
{ 817, PT_SC, ucp_Old_Italic },
{ 828, PT_SC, ucp_Old_North_Arabian },
{ 846, PT_SC, ucp_Old_Permic },
{ 857, PT_SC, ucp_Old_Persian },
{ 869, PT_SC, ucp_Old_South_Arabian },
{ 887, PT_SC, ucp_Old_Turkic },
{ 898, PT_SC, ucp_Oriya },
{ 904, PT_SC, ucp_Osmanya },
{ 912, PT_GC, ucp_P },
{ 914, PT_SC, ucp_Pahawh_Hmong },
{ 927, PT_SC, ucp_Palmyrene },
{ 937, PT_SC, ucp_Pau_Cin_Hau },
{ 949, PT_PC, ucp_Pc },
{ 952, PT_PC, ucp_Pd },
{ 955, PT_PC, ucp_Pe },
{ 958, PT_PC, ucp_Pf },
{ 961, PT_SC, ucp_Phags_Pa },
{ 970, PT_SC, ucp_Phoenician },
{ 981, PT_PC, ucp_Pi },
{ 984, PT_PC, ucp_Po },
{ 987, PT_PC, ucp_Ps },
{ 990, PT_SC, ucp_Psalter_Pahlavi },
{ 1006, PT_SC, ucp_Rejang },
{ 1013, PT_SC, ucp_Runic },
{ 1019, PT_GC, ucp_S },
{ 1021, PT_SC, ucp_Samaritan },
{ 1031, PT_SC, ucp_Saurashtra },
{ 1042, PT_PC, ucp_Sc },
{ 1045, PT_SC, ucp_Sharada },
{ 1053, PT_SC, ucp_Shavian },
{ 1061, PT_SC, ucp_Siddham },
{ 1069, PT_SC, ucp_Sinhala },
{ 1077, PT_PC, ucp_Sk },
{ 1080, PT_PC, ucp_Sm },
{ 1083, PT_PC, ucp_So },
{ 1086, PT_SC, ucp_Sora_Sompeng },
{ 1099, PT_SC, ucp_Sundanese },
{ 1109, PT_SC, ucp_Syloti_Nagri },
{ 1122, PT_SC, ucp_Syriac },
{ 1129, PT_SC, ucp_Tagalog },
{ 1137, PT_SC, ucp_Tagbanwa },
{ 1146, PT_SC, ucp_Tai_Le },
{ 1153, PT_SC, ucp_Tai_Tham },
{ 1162, PT_SC, ucp_Tai_Viet },
{ 1171, PT_SC, ucp_Takri },
{ 1177, PT_SC, ucp_Tamil },
{ 1183, PT_SC, ucp_Telugu },
{ 1190, PT_SC, ucp_Thaana },
{ 1197, PT_SC, ucp_Thai },
{ 1202, PT_SC, ucp_Tibetan },
{ 1210, PT_SC, ucp_Tifinagh },
{ 1219, PT_SC, ucp_Tirhuta },
{ 1227, PT_SC, ucp_Ugaritic },
{ 1236, PT_SC, ucp_Vai },
{ 1240, PT_SC, ucp_Warang_Citi },
{ 1252, PT_ALNUM, 0 },
{ 1256, PT_PXSPACE, 0 },
{ 1260, PT_SPACE, 0 },
{ 1264, PT_UCNC, 0 },
{ 1268, PT_WORD, 0 },
{ 1272, PT_SC, ucp_Yi },
{ 1275, PT_GC, ucp_Z },
{ 1277, PT_PC, ucp_Zl },
{ 1280, PT_PC, ucp_Zp },
{ 1283, PT_PC, ucp_Zs }
};
const int PRIV(utt_size) = sizeof(PRIV(utt)) / sizeof(ucp_type_table);
#endif /* SUPPORT_UTF */
/* End of pcre_tables.c */

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,301 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2013 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains an internal function for validating UTF-8 character
strings. */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "pcre_internal.h"
/*************************************************
* Validate a UTF-8 string *
*************************************************/
/* This function is called (optionally) at the start of compile or match, to
check that a supposed UTF-8 string is actually valid. The early check means
that subsequent code can assume it is dealing with a valid string. The check
can be turned off for maximum performance, but the consequences of supplying an
invalid string are then undefined.
Originally, this function checked according to RFC 2279, allowing for values in
the range 0 to 0x7fffffff, up to 6 bytes long, but ensuring that they were in
the canonical format. Once somebody had pointed out RFC 3629 to me (it
obsoletes 2279), additional restrictions were applied. The values are now
limited to be between 0 and 0x0010ffff, no more than 4 bytes long, and the
subrange 0xd000 to 0xdfff is excluded. However, the format of 5-byte and 6-byte
characters is still checked.
From release 8.13 more information about the details of the error are passed
back in the returned value:
PCRE_UTF8_ERR0 No error
PCRE_UTF8_ERR1 Missing 1 byte at the end of the string
PCRE_UTF8_ERR2 Missing 2 bytes at the end of the string
PCRE_UTF8_ERR3 Missing 3 bytes at the end of the string
PCRE_UTF8_ERR4 Missing 4 bytes at the end of the string
PCRE_UTF8_ERR5 Missing 5 bytes at the end of the string
PCRE_UTF8_ERR6 2nd-byte's two top bits are not 0x80
PCRE_UTF8_ERR7 3rd-byte's two top bits are not 0x80
PCRE_UTF8_ERR8 4th-byte's two top bits are not 0x80
PCRE_UTF8_ERR9 5th-byte's two top bits are not 0x80
PCRE_UTF8_ERR10 6th-byte's two top bits are not 0x80
PCRE_UTF8_ERR11 5-byte character is not permitted by RFC 3629
PCRE_UTF8_ERR12 6-byte character is not permitted by RFC 3629
PCRE_UTF8_ERR13 4-byte character with value > 0x10ffff is not permitted
PCRE_UTF8_ERR14 3-byte character with value 0xd000-0xdfff is not permitted
PCRE_UTF8_ERR15 Overlong 2-byte sequence
PCRE_UTF8_ERR16 Overlong 3-byte sequence
PCRE_UTF8_ERR17 Overlong 4-byte sequence
PCRE_UTF8_ERR18 Overlong 5-byte sequence (won't ever occur)
PCRE_UTF8_ERR19 Overlong 6-byte sequence (won't ever occur)
PCRE_UTF8_ERR20 Isolated 0x80 byte (not within UTF-8 character)
PCRE_UTF8_ERR21 Byte with the illegal value 0xfe or 0xff
PCRE_UTF8_ERR22 Unused (was non-character)
Arguments:
string points to the string
length length of string, or -1 if the string is zero-terminated
errp pointer to an error position offset variable
Returns: = 0 if the string is a valid UTF-8 string
> 0 otherwise, setting the offset of the bad character
*/
int
PRIV(valid_utf)(PCRE_PUCHAR string, int length, int *erroroffset)
{
#ifdef SUPPORT_UTF
register PCRE_PUCHAR p;
if (length < 0)
{
for (p = string; *p != 0; p++);
length = (int)(p - string);
}
for (p = string; length-- > 0; p++)
{
register pcre_uchar ab, c, d;
c = *p;
if (c < 128) continue; /* ASCII character */
if (c < 0xc0) /* Isolated 10xx xxxx byte */
{
*erroroffset = (int)(p - string);
return PCRE_UTF8_ERR20;
}
if (c >= 0xfe) /* Invalid 0xfe or 0xff bytes */
{
*erroroffset = (int)(p - string);
return PCRE_UTF8_ERR21;
}
ab = PRIV(utf8_table4)[c & 0x3f]; /* Number of additional bytes */
if (length < ab)
{
*erroroffset = (int)(p - string); /* Missing bytes */
return ab - length; /* Codes ERR1 to ERR5 */
}
length -= ab; /* Length remaining */
/* Check top bits in the second byte */
if (((d = *(++p)) & 0xc0) != 0x80)
{
*erroroffset = (int)(p - string) - 1;
return PCRE_UTF8_ERR6;
}
/* For each length, check that the remaining bytes start with the 0x80 bit
set and not the 0x40 bit. Then check for an overlong sequence, and for the
excluded range 0xd800 to 0xdfff. */
switch (ab)
{
/* 2-byte character. No further bytes to check for 0x80. Check first byte
for for xx00 000x (overlong sequence). */
case 1: if ((c & 0x3e) == 0)
{
*erroroffset = (int)(p - string) - 1;
return PCRE_UTF8_ERR15;
}
break;
/* 3-byte character. Check third byte for 0x80. Then check first 2 bytes
for 1110 0000, xx0x xxxx (overlong sequence) or
1110 1101, 1010 xxxx (0xd800 - 0xdfff) */
case 2:
if ((*(++p) & 0xc0) != 0x80) /* Third byte */
{
*erroroffset = (int)(p - string) - 2;
return PCRE_UTF8_ERR7;
}
if (c == 0xe0 && (d & 0x20) == 0)
{
*erroroffset = (int)(p - string) - 2;
return PCRE_UTF8_ERR16;
}
if (c == 0xed && d >= 0xa0)
{
*erroroffset = (int)(p - string) - 2;
return PCRE_UTF8_ERR14;
}
break;
/* 4-byte character. Check 3rd and 4th bytes for 0x80. Then check first 2
bytes for for 1111 0000, xx00 xxxx (overlong sequence), then check for a
character greater than 0x0010ffff (f4 8f bf bf) */
case 3:
if ((*(++p) & 0xc0) != 0x80) /* Third byte */
{
*erroroffset = (int)(p - string) - 2;
return PCRE_UTF8_ERR7;
}
if ((*(++p) & 0xc0) != 0x80) /* Fourth byte */
{
*erroroffset = (int)(p - string) - 3;
return PCRE_UTF8_ERR8;
}
if (c == 0xf0 && (d & 0x30) == 0)
{
*erroroffset = (int)(p - string) - 3;
return PCRE_UTF8_ERR17;
}
if (c > 0xf4 || (c == 0xf4 && d > 0x8f))
{
*erroroffset = (int)(p - string) - 3;
return PCRE_UTF8_ERR13;
}
break;
/* 5-byte and 6-byte characters are not allowed by RFC 3629, and will be
rejected by the length test below. However, we do the appropriate tests
here so that overlong sequences get diagnosed, and also in case there is
ever an option for handling these larger code points. */
/* 5-byte character. Check 3rd, 4th, and 5th bytes for 0x80. Then check for
1111 1000, xx00 0xxx */
case 4:
if ((*(++p) & 0xc0) != 0x80) /* Third byte */
{
*erroroffset = (int)(p - string) - 2;
return PCRE_UTF8_ERR7;
}
if ((*(++p) & 0xc0) != 0x80) /* Fourth byte */
{
*erroroffset = (int)(p - string) - 3;
return PCRE_UTF8_ERR8;
}
if ((*(++p) & 0xc0) != 0x80) /* Fifth byte */
{
*erroroffset = (int)(p - string) - 4;
return PCRE_UTF8_ERR9;
}
if (c == 0xf8 && (d & 0x38) == 0)
{
*erroroffset = (int)(p - string) - 4;
return PCRE_UTF8_ERR18;
}
break;
/* 6-byte character. Check 3rd-6th bytes for 0x80. Then check for
1111 1100, xx00 00xx. */
case 5:
if ((*(++p) & 0xc0) != 0x80) /* Third byte */
{
*erroroffset = (int)(p - string) - 2;
return PCRE_UTF8_ERR7;
}
if ((*(++p) & 0xc0) != 0x80) /* Fourth byte */
{
*erroroffset = (int)(p - string) - 3;
return PCRE_UTF8_ERR8;
}
if ((*(++p) & 0xc0) != 0x80) /* Fifth byte */
{
*erroroffset = (int)(p - string) - 4;
return PCRE_UTF8_ERR9;
}
if ((*(++p) & 0xc0) != 0x80) /* Sixth byte */
{
*erroroffset = (int)(p - string) - 5;
return PCRE_UTF8_ERR10;
}
if (c == 0xfc && (d & 0x3c) == 0)
{
*erroroffset = (int)(p - string) - 5;
return PCRE_UTF8_ERR19;
}
break;
}
/* Character is valid under RFC 2279, but 4-byte and 5-byte characters are
excluded by RFC 3629. The pointer p is currently at the last byte of the
character. */
if (ab > 3)
{
*erroroffset = (int)(p - string) - ab;
return (ab == 4)? PCRE_UTF8_ERR11 : PCRE_UTF8_ERR12;
}
}
#else /* Not SUPPORT_UTF */
(void)(string); /* Keep picky compilers happy */
(void)(length);
(void)(erroroffset);
#endif
return PCRE_UTF8_ERR0; /* This indicates success */
}
/* End of pcre_valid_utf8.c */

View File

@@ -0,0 +1,98 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2012 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains the external function pcre_version(), which returns a
string that identifies the PCRE version that is in use. */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "pcre_internal.h"
/*************************************************
* Return version string *
*************************************************/
/* These macros are the standard way of turning unquoted text into C strings.
They allow macros like PCRE_MAJOR to be defined without quotes, which is
convenient for user programs that want to test its value. */
#define STRING(a) # a
#define XSTRING(s) STRING(s)
/* A problem turned up with PCRE_PRERELEASE, which is defined empty for
production releases. Originally, it was used naively in this code:
return XSTRING(PCRE_MAJOR)
"." XSTRING(PCRE_MINOR)
XSTRING(PCRE_PRERELEASE)
" " XSTRING(PCRE_DATE);
However, when PCRE_PRERELEASE is empty, this leads to an attempted expansion of
STRING(). The C standard states: "If (before argument substitution) any
argument consists of no preprocessing tokens, the behavior is undefined." It
turns out the gcc treats this case as a single empty string - which is what we
really want - but Visual C grumbles about the lack of an argument for the
macro. Unfortunately, both are within their rights. To cope with both ways of
handling this, I had resort to some messy hackery that does a test at run time.
I could find no way of detecting that a macro is defined as an empty string at
pre-processor time. This hack uses a standard trick for avoiding calling
the STRING macro with an empty argument when doing the test. */
#if defined COMPILE_PCRE8
PCRE_EXP_DEFN const char * PCRE_CALL_CONVENTION
pcre_version(void)
#elif defined COMPILE_PCRE16
PCRE_EXP_DEFN const char * PCRE_CALL_CONVENTION
pcre16_version(void)
#elif defined COMPILE_PCRE32
PCRE_EXP_DEFN const char * PCRE_CALL_CONVENTION
pcre32_version(void)
#endif
{
return (XSTRING(Z PCRE_PRERELEASE)[1] == 0)?
XSTRING(PCRE_MAJOR.PCRE_MINOR PCRE_DATE) :
XSTRING(PCRE_MAJOR.PCRE_MINOR) XSTRING(PCRE_PRERELEASE PCRE_DATE);
}
/* End of pcre_version.c */

View File

@@ -0,0 +1,268 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2013 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains an internal function that is used to match an extended
class. It is used by both pcre_exec() and pcre_def_exec(). */
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "pcre_internal.h"
/*************************************************
* Match character against an XCLASS *
*************************************************/
/* This function is called to match a character against an extended class that
might contain values > 255 and/or Unicode properties.
Arguments:
c the character
data points to the flag byte of the XCLASS data
Returns: TRUE if character matches, else FALSE
*/
BOOL
PRIV(xclass)(pcre_uint32 c, const pcre_uchar *data, BOOL utf)
{
pcre_uchar t;
BOOL negated = (*data & XCL_NOT) != 0;
(void)utf;
#ifdef COMPILE_PCRE8
/* In 8 bit mode, this must always be TRUE. Help the compiler to know that. */
utf = TRUE;
#endif
/* Character values < 256 are matched against a bitmap, if one is present. If
not, we still carry on, because there may be ranges that start below 256 in the
additional data. */
if (c < 256)
{
if ((*data & XCL_HASPROP) == 0)
{
if ((*data & XCL_MAP) == 0) return negated;
return (((pcre_uint8 *)(data + 1))[c/8] & (1 << (c&7))) != 0;
}
if ((*data & XCL_MAP) != 0 &&
(((pcre_uint8 *)(data + 1))[c/8] & (1 << (c&7))) != 0)
return !negated; /* char found */
}
/* First skip the bit map if present. Then match against the list of Unicode
properties or large chars or ranges that end with a large char. We won't ever
encounter XCL_PROP or XCL_NOTPROP when UCP support is not compiled. */
if ((*data++ & XCL_MAP) != 0) data += 32 / sizeof(pcre_uchar);
while ((t = *data++) != XCL_END)
{
pcre_uint32 x, y;
if (t == XCL_SINGLE)
{
#ifdef SUPPORT_UTF
if (utf)
{
GETCHARINC(x, data); /* macro generates multiple statements */
}
else
#endif
x = *data++;
if (c == x) return !negated;
}
else if (t == XCL_RANGE)
{
#ifdef SUPPORT_UTF
if (utf)
{
GETCHARINC(x, data); /* macro generates multiple statements */
GETCHARINC(y, data); /* macro generates multiple statements */
}
else
#endif
{
x = *data++;
y = *data++;
}
if (c >= x && c <= y) return !negated;
}
#ifdef SUPPORT_UCP
else /* XCL_PROP & XCL_NOTPROP */
{
const ucd_record *prop = GET_UCD(c);
BOOL isprop = t == XCL_PROP;
switch(*data)
{
case PT_ANY:
if (isprop) return !negated;
break;
case PT_LAMP:
if ((prop->chartype == ucp_Lu || prop->chartype == ucp_Ll ||
prop->chartype == ucp_Lt) == isprop) return !negated;
break;
case PT_GC:
if ((data[1] == PRIV(ucp_gentype)[prop->chartype]) == isprop)
return !negated;
break;
case PT_PC:
if ((data[1] == prop->chartype) == isprop) return !negated;
break;
case PT_SC:
if ((data[1] == prop->script) == isprop) return !negated;
break;
case PT_ALNUM:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_L ||
PRIV(ucp_gentype)[prop->chartype] == ucp_N) == isprop)
return !negated;
break;
/* Perl space used to exclude VT, but from Perl 5.18 it is included,
which means that Perl space and POSIX space are now identical. PCRE
was changed at release 8.34. */
case PT_SPACE: /* Perl space */
case PT_PXSPACE: /* POSIX space */
switch(c)
{
HSPACE_CASES:
VSPACE_CASES:
if (isprop) return !negated;
break;
default:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Z) == isprop)
return !negated;
break;
}
break;
case PT_WORD:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_L ||
PRIV(ucp_gentype)[prop->chartype] == ucp_N || c == CHAR_UNDERSCORE)
== isprop)
return !negated;
break;
case PT_UCNC:
if (c < 0xa0)
{
if ((c == CHAR_DOLLAR_SIGN || c == CHAR_COMMERCIAL_AT ||
c == CHAR_GRAVE_ACCENT) == isprop)
return !negated;
}
else
{
if ((c < 0xd800 || c > 0xdfff) == isprop)
return !negated;
}
break;
/* The following three properties can occur only in an XCLASS, as there
is no \p or \P coding for them. */
/* Graphic character. Implement this as not Z (space or separator) and
not C (other), except for Cf (format) with a few exceptions. This seems
to be what Perl does. The exceptional characters are:
U+061C Arabic Letter Mark
U+180E Mongolian Vowel Separator
U+2066 - U+2069 Various "isolate"s
*/
case PT_PXGRAPH:
if ((PRIV(ucp_gentype)[prop->chartype] != ucp_Z &&
(PRIV(ucp_gentype)[prop->chartype] != ucp_C ||
(prop->chartype == ucp_Cf &&
c != 0x061c && c != 0x180e && (c < 0x2066 || c > 0x2069))
)) == isprop)
return !negated;
break;
/* Printable character: same as graphic, with the addition of Zs, i.e.
not Zl and not Zp, and U+180E. */
case PT_PXPRINT:
if ((prop->chartype != ucp_Zl &&
prop->chartype != ucp_Zp &&
(PRIV(ucp_gentype)[prop->chartype] != ucp_C ||
(prop->chartype == ucp_Cf &&
c != 0x061c && (c < 0x2066 || c > 0x2069))
)) == isprop)
return !negated;
break;
/* Punctuation: all Unicode punctuation, plus ASCII characters that
Unicode treats as symbols rather than punctuation, for Perl
compatibility (these are $+<=>^`|~). */
case PT_PXPUNCT:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_P ||
(c < 256 && PRIV(ucp_gentype)[prop->chartype] == ucp_S)) == isprop)
return !negated;
break;
/* This should never occur, but compilers may mutter if there is no
default. */
default:
return FALSE;
}
data += 2;
}
#endif /* SUPPORT_UCP */
}
return negated; /* char did not match */
}
/* End of pcre_xclass.c */

View File

@@ -0,0 +1,224 @@
/*************************************************
* Unicode Property Table handler *
*************************************************/
#ifndef _UCP_H
#define _UCP_H
/* This file contains definitions of the property values that are returned by
the UCD access macros. New values that are added for new releases of Unicode
should always be at the end of each enum, for backwards compatibility.
IMPORTANT: Note also that the specific numeric values of the enums have to be
the same as the values that are generated by the maint/MultiStage2.py script,
where the equivalent property descriptive names are listed in vectors.
ALSO: The specific values of the first two enums are assumed for the table
called catposstab in pcre_compile.c. */
/* These are the general character categories. */
enum {
ucp_C, /* Other */
ucp_L, /* Letter */
ucp_M, /* Mark */
ucp_N, /* Number */
ucp_P, /* Punctuation */
ucp_S, /* Symbol */
ucp_Z /* Separator */
};
/* These are the particular character categories. */
enum {
ucp_Cc, /* Control */
ucp_Cf, /* Format */
ucp_Cn, /* Unassigned */
ucp_Co, /* Private use */
ucp_Cs, /* Surrogate */
ucp_Ll, /* Lower case letter */
ucp_Lm, /* Modifier letter */
ucp_Lo, /* Other letter */
ucp_Lt, /* Title case letter */
ucp_Lu, /* Upper case letter */
ucp_Mc, /* Spacing mark */
ucp_Me, /* Enclosing mark */
ucp_Mn, /* Non-spacing mark */
ucp_Nd, /* Decimal number */
ucp_Nl, /* Letter number */
ucp_No, /* Other number */
ucp_Pc, /* Connector punctuation */
ucp_Pd, /* Dash punctuation */
ucp_Pe, /* Close punctuation */
ucp_Pf, /* Final punctuation */
ucp_Pi, /* Initial punctuation */
ucp_Po, /* Other punctuation */
ucp_Ps, /* Open punctuation */
ucp_Sc, /* Currency symbol */
ucp_Sk, /* Modifier symbol */
ucp_Sm, /* Mathematical symbol */
ucp_So, /* Other symbol */
ucp_Zl, /* Line separator */
ucp_Zp, /* Paragraph separator */
ucp_Zs /* Space separator */
};
/* These are grapheme break properties. Note that the code for processing them
assumes that the values are less than 16. If more values are added that take
the number to 16 or more, the code will have to be rewritten. */
enum {
ucp_gbCR, /* 0 */
ucp_gbLF, /* 1 */
ucp_gbControl, /* 2 */
ucp_gbExtend, /* 3 */
ucp_gbPrepend, /* 4 */
ucp_gbSpacingMark, /* 5 */
ucp_gbL, /* 6 Hangul syllable type L */
ucp_gbV, /* 7 Hangul syllable type V */
ucp_gbT, /* 8 Hangul syllable type T */
ucp_gbLV, /* 9 Hangul syllable type LV */
ucp_gbLVT, /* 10 Hangul syllable type LVT */
ucp_gbRegionalIndicator, /* 11 */
ucp_gbOther /* 12 */
};
/* These are the script identifications. */
enum {
ucp_Arabic,
ucp_Armenian,
ucp_Bengali,
ucp_Bopomofo,
ucp_Braille,
ucp_Buginese,
ucp_Buhid,
ucp_Canadian_Aboriginal,
ucp_Cherokee,
ucp_Common,
ucp_Coptic,
ucp_Cypriot,
ucp_Cyrillic,
ucp_Deseret,
ucp_Devanagari,
ucp_Ethiopic,
ucp_Georgian,
ucp_Glagolitic,
ucp_Gothic,
ucp_Greek,
ucp_Gujarati,
ucp_Gurmukhi,
ucp_Han,
ucp_Hangul,
ucp_Hanunoo,
ucp_Hebrew,
ucp_Hiragana,
ucp_Inherited,
ucp_Kannada,
ucp_Katakana,
ucp_Kharoshthi,
ucp_Khmer,
ucp_Lao,
ucp_Latin,
ucp_Limbu,
ucp_Linear_B,
ucp_Malayalam,
ucp_Mongolian,
ucp_Myanmar,
ucp_New_Tai_Lue,
ucp_Ogham,
ucp_Old_Italic,
ucp_Old_Persian,
ucp_Oriya,
ucp_Osmanya,
ucp_Runic,
ucp_Shavian,
ucp_Sinhala,
ucp_Syloti_Nagri,
ucp_Syriac,
ucp_Tagalog,
ucp_Tagbanwa,
ucp_Tai_Le,
ucp_Tamil,
ucp_Telugu,
ucp_Thaana,
ucp_Thai,
ucp_Tibetan,
ucp_Tifinagh,
ucp_Ugaritic,
ucp_Yi,
/* New for Unicode 5.0: */
ucp_Balinese,
ucp_Cuneiform,
ucp_Nko,
ucp_Phags_Pa,
ucp_Phoenician,
/* New for Unicode 5.1: */
ucp_Carian,
ucp_Cham,
ucp_Kayah_Li,
ucp_Lepcha,
ucp_Lycian,
ucp_Lydian,
ucp_Ol_Chiki,
ucp_Rejang,
ucp_Saurashtra,
ucp_Sundanese,
ucp_Vai,
/* New for Unicode 5.2: */
ucp_Avestan,
ucp_Bamum,
ucp_Egyptian_Hieroglyphs,
ucp_Imperial_Aramaic,
ucp_Inscriptional_Pahlavi,
ucp_Inscriptional_Parthian,
ucp_Javanese,
ucp_Kaithi,
ucp_Lisu,
ucp_Meetei_Mayek,
ucp_Old_South_Arabian,
ucp_Old_Turkic,
ucp_Samaritan,
ucp_Tai_Tham,
ucp_Tai_Viet,
/* New for Unicode 6.0.0: */
ucp_Batak,
ucp_Brahmi,
ucp_Mandaic,
/* New for Unicode 6.1.0: */
ucp_Chakma,
ucp_Meroitic_Cursive,
ucp_Meroitic_Hieroglyphs,
ucp_Miao,
ucp_Sharada,
ucp_Sora_Sompeng,
ucp_Takri,
/* New for Unicode 7.0.0: */
ucp_Bassa_Vah,
ucp_Caucasian_Albanian,
ucp_Duployan,
ucp_Elbasan,
ucp_Grantha,
ucp_Khojki,
ucp_Khudawadi,
ucp_Linear_A,
ucp_Mahajani,
ucp_Manichaean,
ucp_Mende_Kikakui,
ucp_Modi,
ucp_Mro,
ucp_Nabataean,
ucp_Old_North_Arabian,
ucp_Old_Permic,
ucp_Pahawh_Hmong,
ucp_Palmyrene,
ucp_Psalter_Pahlavi,
ucp_Pau_Cin_Hau,
ucp_Siddham,
ucp_Tirhuta,
ucp_Warang_Citi
};
#endif
/* End of ucp.h */

View File

@@ -0,0 +1,62 @@
import tables
proc fget*[K, V](self: Table[K, V], key: K): V =
if self.hasKey(key):
return self[key]
else:
raise newException(KeyError, "Key does not exist in table: " & $key)
const Ident = {'a'..'z', 'A'..'Z', '0'..'9', '_', '\128'..'\255'}
const StartIdent = Ident - {'0'..'9'}
proc checkNil(arg: string): string =
if arg == nil:
raise newException(ValueError, "Cannot use nil capture")
else:
return arg
template formatStr*(howExpr, namegetter, idgetter: expr): expr =
let how = howExpr
var val = newStringOfCap(how.len)
var i = 0
var lastNum = 1
while i < how.len:
if how[i] != '$':
val.add(how[i])
i += 1
else:
if how[i + 1] == '$':
val.add('$')
i += 2
elif how[i + 1] == '#':
var id {.inject.} = lastNum
val.add(checkNil(idgetter))
lastNum += 1
i += 2
elif how[i + 1] in {'0'..'9'}:
i += 1
var id {.inject.} = 0
while i < how.len and how[i] in {'0'..'9'}:
id += (id * 10) + (ord(how[i]) - ord('0'))
i += 1
val.add(checkNil(idgetter))
lastNum = id + 1
elif how[i + 1] in StartIdent:
i += 1
var name {.inject.} = ""
while i < how.len and how[i] in Ident:
name.add(how[i])
i += 1
val.add(checkNil(namegetter))
elif how[i + 1] == '{':
i += 2
var name {.inject.} = ""
while i < how.len and how[i] != '}':
name.add(how[i])
i += 1
i += 1
val.add(checkNil(namegetter))
else:
raise newException(Exception, "Syntax error in format string at " & $i)
val

View File

@@ -0,0 +1,59 @@
import unittest, optional_nonstrict
include nre
suite "captures":
test "map capture names to numbers":
check(getNameToNumberTable(re("(?<v1>1(?<v2>2(?<v3>3))(?'v4'4))()")) ==
{ "v1" : 0, "v2" : 1, "v3" : 2, "v4" : 3 }.toTable())
test "capture bounds are correct":
let ex1 = re("([0-9])")
check("1 23".find(ex1).matchBounds == 0 .. 0)
check("1 23".find(ex1).captureBounds[0].get == 0 .. 0)
check("1 23".find(ex1, 1).matchBounds == 2 .. 2)
check("1 23".find(ex1, 3).matchBounds == 3 .. 3)
let ex2 = re("()()()()()()()()()()([0-9])")
check("824".find(ex2).captureBounds[0].get == 0 .. -1)
check("824".find(ex2).captureBounds[10].get == 0 .. 0)
let ex3 = re("([0-9]+)")
check("824".find(ex3).captureBounds[0].get == 0 .. 2)
test "named captures":
let ex1 = "foobar".find(re("(?<foo>foo)(?<bar>bar)"))
check(ex1.captures["foo"] == "foo")
check(ex1.captures["bar"] == "bar")
let ex2 = "foo".find(re("(?<foo>foo)(?<bar>bar)?"))
check(ex2.captures["foo"] == "foo")
check(ex2.captures["bar"] == nil)
test "named capture bounds":
let ex1 = "foo".find(re("(?<foo>foo)(?<bar>bar)?"))
check(ex1.captureBounds["foo"] == some(0..2))
check(ex1.captureBounds["bar"] == none(Slice[int]))
test "capture count":
let ex1 = re("(?<foo>foo)(?<bar>bar)?")
check(ex1.captureCount == 2)
check(ex1.captureNameId == {"foo" : 0, "bar" : 1}.toTable())
test "named capture table":
let ex1 = "foo".find(re("(?<foo>foo)(?<bar>bar)?"))
check(ex1.captures.toTable == {"foo" : "foo", "bar" : nil}.toTable())
check(ex1.captureBounds.toTable == {"foo" : some(0..2), "bar" : none(Slice[int])}.toTable())
check(ex1.captures.toTable("") == {"foo" : "foo", "bar" : ""}.toTable())
let ex2 = "foobar".find(re("(?<foo>foo)(?<bar>bar)?"))
check(ex2.captures.toTable == {"foo" : "foo", "bar" : "bar"}.toTable())
test "capture sequence":
let ex1 = "foo".find(re("(?<foo>foo)(?<bar>bar)?"))
check(ex1.captures.toSeq == @["foo", nil])
check(ex1.captureBounds.toSeq == @[some(0..2), none(Slice[int])])
check(ex1.captures.toSeq("") == @["foo", ""])
let ex2 = "foobar".find(re("(?<foo>foo)(?<bar>bar)?"))
check(ex2.captures.toSeq == @["foo", "bar"])

View File

@@ -0,0 +1,7 @@
import nre, unittest
suite "escape strings":
test "escape strings":
check("123".escapeRe() == "123")
check("[]".escapeRe() == r"\[\]")
check("()".escapeRe() == r"\(\)")

View File

@@ -0,0 +1,25 @@
import unittest, sequtils, nre, optional_nonstrict
suite "find":
test "find text":
check("3213a".find(re"[a-z]").match == "a")
check(toSeq(findIter("1 2 3 4 5 6 7 8 ", re" ")).map(
proc (a: RegexMatch): string = a.match
) == @[" ", " ", " ", " ", " ", " ", " ", " "])
test "find bounds":
check(toSeq(findIter("1 2 3 4 5 ", re" ")).map(
proc (a: RegexMatch): Slice[int] = a.matchBounds
) == @[1..1, 3..3, 5..5, 7..7, 9..9])
test "overlapping find":
check("222".findAll(re"22") == @["22"])
check("2222".findAll(re"22") == @["22", "22"])
test "len 0 find":
check("".findAll(re"\ ") == newSeq[string]())
check("".findAll(re"") == @[""])
check("abc".findAll(re"") == @["", "", "", ""])
check("word word".findAll(re"\b") == @["", "", "", ""])
check("word\r\lword".findAll(re"(*ANYCRLF)(?m)$") == @["", ""])
check("слово слово".findAll(re"(*U)\b") == @["", "", "", ""])

View File

@@ -0,0 +1,36 @@
import unittest, private/pcre
include nre
suite "Test NRE initialization":
test "correct intialization":
check(re("[0-9]+") != nil)
check(re("(?i)[0-9]+") != nil)
test "options":
check(extractOptions("(*NEVER_UTF)") ==
("", pcre.NEVER_UTF, true))
check(extractOptions("(*UTF8)(*ANCHORED)(*UCP)z") ==
("(*UTF8)(*UCP)z", pcre.ANCHORED, true))
check(extractOptions("(*ANCHORED)(*UTF8)(*JAVASCRIPT_COMPAT)z") ==
("(*UTF8)z", pcre.ANCHORED or pcre.JAVASCRIPT_COMPAT, true))
check(extractOptions("(*NO_STUDY)(") == ("(", 0, false))
check(extractOptions("(*LIMIT_MATCH=6)(*ANCHORED)z") ==
("(*LIMIT_MATCH=6)z", pcre.ANCHORED, true))
test "incorrect options":
for s in ["CR", "(CR", "(*CR", "(*abc)", "(*abc)CR",
"(?i)",
"(*LIMIT_MATCH=5", "(*NO_AUTO_POSSESS=5)"]:
let ss = s & "(*NEVER_UTF)"
check(extractOptions(ss) == (ss, 0, true))
test "invalid regex":
expect(SyntaxError): discard re("[0-9")
try:
discard re("[0-9")
except SyntaxError:
let ex = SyntaxError(getCurrentException())
check(ex.pos == 4)
check(ex.pattern == "[0-9")

View File

@@ -0,0 +1,18 @@
include nre, unittest, optional_nonstrict
suite "match":
test "upper bound must be inclusive":
check("abc".match(re"abc", endpos = -1) == none(RegexMatch))
check("abc".match(re"abc", endpos = 1) == none(RegexMatch))
check("abc".match(re"abc", endpos = 2) != none(RegexMatch))
test "match examples":
check("abc".match(re"(\w)").captures[0] == "a")
check("abc".match(re"(?<letter>\w)").captures["letter"] == "a")
check("abc".match(re"(\w)\w").captures[-1] == "ab")
check("abc".match(re"(\w)").captureBounds[0].get == 0 .. 0)
check("abc".match(re"").captureBounds[-1].get == 0 .. -1)
check("abc".match(re"abc").captureBounds[-1].get == 0 .. 2)
test "match test cases":
check("123".match(re"").matchBounds == 0 .. -1)

View File

@@ -0,0 +1,16 @@
import unittest, nre, strutils, optional_nonstrict
suite "Misc tests":
test "unicode":
check("".find(re"(*UTF8)").match == "")
check("перевірка".replace(re"(*U)\w", "") == "")
test "empty or non-empty match":
check("abc".findall(re"|.").join(":") == ":a::b::c:")
check("abc".findall(re".|").join(":") == "a:b:c:")
check("abc".replace(re"|.", "x") == "xxxxxxx")
check("abc".replace(re".|", "x") == "xxxx")
check("abc".split(re"|.").join(":") == ":::::")
check("abc".split(re".|").join(":") == ":::")

View File

@@ -0,0 +1,3 @@
import options
converter option2val*[T](val: Option[T]): T =
return val.get()

View File

@@ -0,0 +1,20 @@
include nre
import unittest
suite "replace":
test "replace with 0-length strings":
check("".replace(re"1", proc (v: RegexMatch): string = "1") == "")
check(" ".replace(re"", proc (v: RegexMatch): string = "1") == "1 1")
check("".replace(re"", proc (v: RegexMatch): string = "1") == "1")
test "regular replace":
check("123".replace(re"\d", "foo") == "foofoofoo")
check("123".replace(re"(\d)", "$1$1") == "112233")
check("123".replace(re"(\d)(\d)", "$1$2") == "123")
check("123".replace(re"(\d)(\d)", "$#$#") == "123")
check("123".replace(re"(?<foo>\d)(\d)", "$foo$#$#") == "1123")
check("123".replace(re"(?<foo>\d)(\d)", "${foo}$#$#") == "1123")
test "replacing missing captures should throw instead of segfaulting":
expect ValueError: discard "ab".replace(re"(a)|(b)", "$1$2")
expect ValueError: discard "b".replace(re"(a)?(b)", "$1$2")

View File

@@ -0,0 +1,52 @@
import unittest, strutils
include nre
suite "string splitting":
test "splitting strings":
check("1 2 3 4 5 6 ".split(re" ") == @["1", "2", "3", "4", "5", "6", ""])
check("1 2 ".split(re(" ")) == @["1", "", "2", "", ""])
check("1 2".split(re(" ")) == @["1", "2"])
check("foo".split(re("foo")) == @["", ""])
check("".split(re"foo") == @[""])
test "captured patterns":
check("12".split(re"(\d)") == @["", "1", "", "2", ""])
test "maxsplit":
check("123".split(re"", maxsplit = 2) == @["1", "23"])
check("123".split(re"", maxsplit = 1) == @["123"])
check("123".split(re"", maxsplit = -1) == @["1", "2", "3"])
test "split with 0-length match":
check("12345".split(re("")) == @["1", "2", "3", "4", "5"])
check("".split(re"") == newSeq[string]())
check("word word".split(re"\b") == @["word", " ", "word"])
check("word\r\lword".split(re"(*ANYCRLF)(?m)$") == @["word", "\r\lword"])
check("слово слово".split(re"(*U)(\b)") == @["", "слово", "", " ", "", "слово", ""])
test "perl split tests":
check("forty-two" .split(re"") .join(",") == "f,o,r,t,y,-,t,w,o")
check("forty-two" .split(re"", 3) .join(",") == "f,o,rty-two")
check("split this string" .split(re" ") .join(",") == "split,this,string")
check("split this string" .split(re" ", 2) .join(",") == "split,this string")
check("try$this$string" .split(re"\$") .join(",") == "try,this,string")
check("try$this$string" .split(re"\$", 2) .join(",") == "try,this$string")
check("comma, separated, values" .split(re", ") .join("|") == "comma|separated|values")
check("comma, separated, values" .split(re", ", 2) .join("|") == "comma|separated, values")
check("Perl6::Camelia::Test" .split(re"::") .join(",") == "Perl6,Camelia,Test")
check("Perl6::Camelia::Test" .split(re"::", 2) .join(",") == "Perl6,Camelia::Test")
check("split,me,please" .split(re",") .join("|") == "split|me|please")
check("split,me,please" .split(re",", 2) .join("|") == "split|me,please")
check("Hello World Goodbye Mars".split(re"\s+") .join(",") == "Hello,World,Goodbye,Mars")
check("Hello World Goodbye Mars".split(re"\s+", 3).join(",") == "Hello,World,Goodbye Mars")
check("Hello test" .split(re"(\s+)") .join(",") == "Hello, ,test")
check("this will be split" .split(re" ") .join(",") == "this,will,be,split")
check("this will be split" .split(re" ", 3) .join(",") == "this,will,be split")
check("a.b" .split(re"\.") .join(",") == "a,b")
check("" .split(re"") .len == 0)
check(":" .split(re"") .len == 1)
test "start position":
check("abc".split(re"", start = 1) == @["b", "c"])
check("abc".split(re"", start = 2) == @["c"])
check("abc".split(re"", start = 3) == newSeq[string]())

View File

@@ -0,0 +1,9 @@
import nre
import init
import captures
import find
import split
import match
import replace
import escape
import misc

BIN
lib/impure/nre/web/logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

View File

@@ -0,0 +1 @@
<svg xmlns:svg="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg" width="627" height="140" version="1.1"><style>.s0{-inkscape-font-specification:DejaVu Sans Mono;fill:#c17d11;font-family:DejaVu Sans Mono;font-size:35.7;}.s1{-inkscape-font-specification:DejaVu Sans Mono;fill:#c00;font-family:DejaVu Sans Mono;font-size:35.7;}.s2{-inkscape-font-specification:DejaVu Sans Mono;fill:#4e9a06;font-family:DejaVu Sans Mono;font-size:35.7;}</style><g transform="translate(0,-912.36217)"><g transform="matrix(3.3601391,0,0,3.3601391,-52.384026,-75.180678)" style="fill:#000;font-family:Sans;font-size:40;letter-spacing:0;line-height:125;word-spacing:0"><path d="m77.2 293.9 125 0 0 41.7-125 0z" style="fill:#e9b96e;opacity:0.5"/><path d="m91.3 296 0 9.7-3 0 0-9.7 3 0 -6.7 0 0 9.7-3 0 0-9.7 3 0" class="s0"/><path d="m93.5 298.6 7.4 0 0 2.5-4.2 0 0 26.9 4.2 0 0 2.5-7.4 0 0-31.8" class="s1"/><path d="m109.4 314.4c0-0.6 0.2-1.2 0.7-1.7 0.5-0.5 1-0.7 1.6-0.7 0.7 0 1.2 0.2 1.7 0.7 0.5 0.5 0.7 1 0.7 1.7 0 0.7-0.2 1.2-0.7 1.7-0.5 0.5-1 0.7-1.7 0.7-0.7 0-1.2-0.2-1.6-0.7-0.4-0.4-0.6-1-0.6-1.7m2.3-10.6c-1.6 0-2.9 0.9-3.7 2.7-0.8 1.8-1.2 4.5-1.2 8.1 0 3.6 0.4 6.3 1.2 8.1 0.8 1.8 2 2.7 3.7 2.7 1.7 0 2.9-0.9 3.7-2.7 0.8-1.8 1.2-4.5 1.2-8.1 0-3.6-0.4-6.3-1.2-8.1-0.8-1.8-2-2.7-3.7-2.7m0-2.8c2.8 0 4.9 1.1 6.3 3.4 1.4 2.3 2.1 5.6 2.1 10.1 0 4.4-0.7 7.8-2.1 10.1-1.4 2.3-3.5 3.4-6.3 3.4-2.8 0-4.9-1.1-6.3-3.4-1.4-2.3-2.1-5.6-2.1-10.1 0-4.5 0.7-7.8 2.1-10.1 1.4-2.3 3.5-3.4 6.3-3.4" class="s2"/><path d="m124.6 313.1 9.1 0 0 2.9-9.1 0 0-2.9" class="s1"/><path d="m146.4 316.1c1.5 0 2.7-0.5 3.5-1.6 0.9-1.1 1.3-2.6 1.3-4.5 0-1.9-0.4-3.4-1.3-4.5-0.8-1.1-2-1.6-3.5-1.6-1.6 0-2.7 0.5-3.5 1.6-0.8 1-1.2 2.5-1.2 4.6 0 2 0.4 3.5 1.2 4.6 0.8 1 2 1.5 3.5 1.5m-6.3 11 0-3.2c0.7 0.4 1.5 0.8 2.3 1 0.8 0.2 1.7 0.3 2.6 0.3 2.2 0 3.9-0.8 5.1-2.5 1.2-1.7 1.7-4.2 1.7-7.4-0.5 1.2-1.3 2.1-2.3 2.7-1 0.6-2.1 0.9-3.4 0.9-2.5 0-4.5-0.8-5.8-2.3-1.4-1.5-2.1-3.7-2.1-6.6 0-2.8 0.7-5 2.1-6.5 1.4-1.5 3.4-2.3 6-2.3 3 0 5.2 1.1 6.6 3.3 1.4 2.2 2.1 5.6 2.1 10.3 0 4.4-0.8 7.7-2.5 10-1.7 2.3-4.1 3.5-7.4 3.5-0.8 0-1.7-0.1-2.6-0.3-0.9-0.2-1.7-0.4-2.5-0.8" class="s2"/><path d="m165.2 298.6 0 31.8-7.4 0 0-2.5 4.2 0 0-26.9-4.2 0 0-2.5 7.4 0M179.9 305.3l0 7.7 7.8 0 0 3-7.8 0 0 7.7-2.9 0 0-7.7-7.7 0 0-3 7.7 0 0-7.7 2.9 0" class="s1"/><path d="m197.7 296 0 9.7-3 0 0-9.7 3 0 -6.7 0 0 9.7-3 0 0-9.7 3 0" class="s0"/><g transform="translate(-2.0396636,0.18413477)" style="fill:#000;font-family:Sans;font-size:35.7"><path d="m38.6 313.9 0 11.8-3.2 0 0-11.7c0-1.8-0.4-3.2-1.1-4.2-0.7-0.9-1.8-1.4-3.2-1.4-1.7 0-3.1 0.6-4.1 1.7-1 1.1-1.5 2.6-1.5 4.5l0 11-3.2 0 0-19.5 3.2 0 0 3c0.8-1.2 1.7-2.1 2.7-2.6 1-0.6 2.2-0.9 3.6-0.9 2.2 0 3.9 0.7 5.1 2.1 1.2 1.4 1.7 3.4 1.7 6.1M56.4 309.1c-0.4-0.2-0.8-0.4-1.2-0.5-0.4-0.1-0.9-0.2-1.4-0.2-1.8 0-3.2 0.6-4.2 1.8-1 1.2-1.4 2.9-1.4 5.1l0 10.3-3.2 0 0-19.5 3.2 0 0 3c0.7-1.2 1.6-2.1 2.6-2.6 1.1-0.6 2.4-0.9 3.9-0.9 0.2 0 0.5 0 0.7 0.1 0.3 0 0.6 0.1 0.9 0.1l0 3.3M75.7 315.1l0 1.6-14.8 0c0.1 2.2 0.8 3.9 2 5.1 1.2 1.2 2.9 1.7 5 1.7 1.2 0 2.4-0.2 3.6-0.5 1.2-0.3 2.3-0.8 3.5-1.4l0 3c-1.2 0.5-2.3 0.9-3.5 1.1-1.2 0.3-2.4 0.4-3.7 0.4-3.1 0-5.6-0.9-7.4-2.7-1.8-1.8-2.7-4.3-2.7-7.4 0-3.2 0.9-5.7 2.6-7.6 1.7-1.9 4.1-2.8 7-2.8 2.6 0 4.7 0.8 6.2 2.5 1.5 1.7 2.3 4 2.3 6.9m-3.2-0.9c0-1.8-0.5-3.2-1.5-4.2-1-1-2.2-1.6-3.8-1.6-1.8 0-3.2 0.5-4.3 1.5-1.1 1-1.7 2.4-1.8 4.3l11.4 0"/></g></g></g></svg>

After

Width:  |  Height:  |  Size: 3.3 KiB