Regex: Update PCRE to v8.35.

I was über lazy at first, so took libs from SM.
But actually it's quite easy to compile, so let's update to latest version \o/.
This commit is contained in:
Arkshine
2014-07-05 13:53:30 +02:00
parent d1153b8049
commit d4de0e6f1e
241 changed files with 51074 additions and 15011 deletions

View File

@ -13,46 +13,63 @@ from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
<li><a name="TOC2" href="#SEC2">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
<li><a name="TOC3" href="#SEC3">BUILDING SHARED AND STATIC LIBRARIES</a>
<li><a name="TOC4" href="#SEC4">C++ SUPPORT</a>
<li><a name="TOC5" href="#SEC5">UTF-8, UTF-16 AND UTF-32 SUPPORT</a>
<li><a name="TOC6" href="#SEC6">UNICODE CHARACTER PROPERTY SUPPORT</a>
<li><a name="TOC7" href="#SEC7">JUST-IN-TIME COMPILER SUPPORT</a>
<li><a name="TOC8" href="#SEC8">CODE VALUE OF NEWLINE</a>
<li><a name="TOC9" href="#SEC9">WHAT \R MATCHES</a>
<li><a name="TOC10" href="#SEC10">POSIX MALLOC USAGE</a>
<li><a name="TOC11" href="#SEC11">HANDLING VERY LARGE PATTERNS</a>
<li><a name="TOC12" href="#SEC12">AVOIDING EXCESSIVE STACK USAGE</a>
<li><a name="TOC13" href="#SEC13">LIMITING PCRE RESOURCE USAGE</a>
<li><a name="TOC14" href="#SEC14">CREATING CHARACTER TABLES AT BUILD TIME</a>
<li><a name="TOC15" href="#SEC15">USING EBCDIC CODE</a>
<li><a name="TOC16" href="#SEC16">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
<li><a name="TOC17" href="#SEC17">PCREGREP BUFFER SIZE</a>
<li><a name="TOC18" href="#SEC18">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a>
<li><a name="TOC19" href="#SEC19">DEBUGGING WITH VALGRIND SUPPORT</a>
<li><a name="TOC20" href="#SEC20">CODE COVERAGE REPORTING</a>
<li><a name="TOC21" href="#SEC21">SEE ALSO</a>
<li><a name="TOC22" href="#SEC22">AUTHOR</a>
<li><a name="TOC23" href="#SEC23">REVISION</a>
<li><a name="TOC1" href="#SEC1">BUILDING PCRE</a>
<li><a name="TOC2" href="#SEC2">PCRE BUILD-TIME OPTIONS</a>
<li><a name="TOC3" href="#SEC3">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
<li><a name="TOC4" href="#SEC4">BUILDING SHARED AND STATIC LIBRARIES</a>
<li><a name="TOC5" href="#SEC5">C++ SUPPORT</a>
<li><a name="TOC6" href="#SEC6">UTF-8, UTF-16 AND UTF-32 SUPPORT</a>
<li><a name="TOC7" href="#SEC7">UNICODE CHARACTER PROPERTY SUPPORT</a>
<li><a name="TOC8" href="#SEC8">JUST-IN-TIME COMPILER SUPPORT</a>
<li><a name="TOC9" href="#SEC9">CODE VALUE OF NEWLINE</a>
<li><a name="TOC10" href="#SEC10">WHAT \R MATCHES</a>
<li><a name="TOC11" href="#SEC11">POSIX MALLOC USAGE</a>
<li><a name="TOC12" href="#SEC12">HANDLING VERY LARGE PATTERNS</a>
<li><a name="TOC13" href="#SEC13">AVOIDING EXCESSIVE STACK USAGE</a>
<li><a name="TOC14" href="#SEC14">LIMITING PCRE RESOURCE USAGE</a>
<li><a name="TOC15" href="#SEC15">CREATING CHARACTER TABLES AT BUILD TIME</a>
<li><a name="TOC16" href="#SEC16">USING EBCDIC CODE</a>
<li><a name="TOC17" href="#SEC17">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
<li><a name="TOC18" href="#SEC18">PCREGREP BUFFER SIZE</a>
<li><a name="TOC19" href="#SEC19">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a>
<li><a name="TOC20" href="#SEC20">DEBUGGING WITH VALGRIND SUPPORT</a>
<li><a name="TOC21" href="#SEC21">CODE COVERAGE REPORTING</a>
<li><a name="TOC22" href="#SEC22">SEE ALSO</a>
<li><a name="TOC23" href="#SEC23">AUTHOR</a>
<li><a name="TOC24" href="#SEC24">REVISION</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
<br><a name="SEC1" href="#TOC1">BUILDING PCRE</a><br>
<P>
This document describes the optional features of PCRE that can be selected when
the library is compiled. It assumes use of the <b>configure</b> script, where
the optional features are selected or deselected by providing options to
<b>configure</b> before running the <b>make</b> command. However, the same
options can be selected in both Unix-like and non-Unix-like environments using
the GUI facility of <b>cmake-gui</b> if you are using <b>CMake</b> instead of
<b>configure</b> to build PCRE.
PCRE is distributed with a <b>configure</b> script that can be used to build the
library in Unix-like environments using the applications known as Autotools.
Also in the distribution are files to support building using <b>CMake</b>
instead of <b>configure</b>. The text file
<a href="README.txt"><b>README</b></a>
contains general information about building with Autotools (some of which is
repeated below), and also has some comments about building on various operating
systems. There is a lot more information about building PCRE without using
Autotools (including information about using <b>CMake</b> and building "by
hand") in the text file called
<a href="NON-AUTOTOOLS-BUILD.txt"><b>NON-AUTOTOOLS-BUILD</b>.</a>
You should consult this file as well as the
<a href="README.txt"><b>README</b></a>
file if you are building in a non-Unix-like environment.
</P>
<br><a name="SEC2" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
<P>
The rest of this document describes the optional features of PCRE that can be
selected when the library is compiled. It assumes use of the <b>configure</b>
script, where the optional features are selected or deselected by providing
options to <b>configure</b> before running the <b>make</b> command. However, the
same options can be selected in both Unix-like and non-Unix-like environments
using the GUI facility of <b>cmake-gui</b> if you are using <b>CMake</b> instead
of <b>configure</b> to build PCRE.
</P>
<P>
There is a lot more information about building PCRE without using
<b>configure</b> (including information about using <b>CMake</b> or building "by
hand") in the file called <i>NON-AUTOTOOLS-BUILD</i>, which is part of the PCRE
distribution. You should consult this file as well as the <i>README</i> file if
you are building in a non-Unix-like environment.
If you are not using Autotools or <b>CMake</b>, option selection can be done by
editing the <b>config.h</b> file, or by passing parameter settings to the
compiler, as described in
<a href="NON-AUTOTOOLS-BUILD.txt"><b>NON-AUTOTOOLS-BUILD</b>.</a>
</P>
<P>
The complete list of options for <b>configure</b> (which includes the standard
@ -67,7 +84,7 @@ The following sections include descriptions of options whose names begin with
--enable and --disable always come in pairs, so the complementary option always
exists as well, but as it specifies the default, it is not described.
</P>
<br><a name="SEC2" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
<br><a name="SEC3" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
<P>
By default, a library called <b>libpcre</b> is built, containing functions that
take string arguments contained in vectors of bytes, either as single-byte
@ -78,7 +95,7 @@ strings, by adding
<pre>
--enable-pcre16
</pre>
to the <b>configure</b> command. You can also build a separate
to the <b>configure</b> command. You can also build yet another separate
library, called <b>libpcre32</b>, in which strings are contained in vectors of
32-bit data units and interpreted either as single-unit characters or UTF-32
strings, by adding
@ -94,17 +111,17 @@ and POSIX wrappers are for the 8-bit library only, and that <b>pcregrep</b> is
an 8-bit program. None of these are built if you select only the 16-bit or
32-bit libraries.
</P>
<br><a name="SEC3" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
<br><a name="SEC4" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
<P>
The PCRE building process uses <b>libtool</b> to build both shared and static
Unix libraries by default. You can suppress one of these by adding one of
The Autotools PCRE building process uses <b>libtool</b> to build both shared and
static libraries by default. You can suppress one of these by adding one of
<pre>
--disable-shared
--disable-static
</pre>
to the <b>configure</b> command, as required.
</P>
<br><a name="SEC4" href="#TOC1">C++ SUPPORT</a><br>
<br><a name="SEC5" href="#TOC1">C++ SUPPORT</a><br>
<P>
By default, if the 8-bit library is being built, the <b>configure</b> script
will search for a C++ compiler and C++ header files. If it finds them, it
@ -115,7 +132,7 @@ strings). You can disable this by adding
</pre>
to the <b>configure</b> command.
</P>
<br><a name="SEC5" href="#TOC1">UTF-8, UTF-16 AND UTF-32 SUPPORT</a><br>
<br><a name="SEC6" href="#TOC1">UTF-8, UTF-16 AND UTF-32 SUPPORT</a><br>
<P>
To build PCRE with support for UTF Unicode character strings, add
<pre>
@ -143,7 +160,7 @@ not possible to support both EBCDIC and UTF-8 codes in the same version of the
library. Consequently, --enable-utf and --enable-ebcdic are mutually
exclusive.
</P>
<br><a name="SEC6" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
<br><a name="SEC7" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
<P>
UTF support allows the libraries to process character codepoints up to 0x10ffff
in the strings that they handle. On its own, however, it does not provide any
@ -163,7 +180,7 @@ supported. Details are given in the
<a href="pcrepattern.html"><b>pcrepattern</b></a>
documentation.
</P>
<br><a name="SEC7" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
<br><a name="SEC8" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
<P>
Just-in-time compiler support is included in the build by specifying
<pre>
@ -180,7 +197,7 @@ pcregrep automatically makes use of it, unless you add
</pre>
to the "configure" command.
</P>
<br><a name="SEC8" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
<br><a name="SEC9" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
<P>
By default, PCRE interprets the linefeed (LF) character as indicating the end
of a line. This is the normal newline character on Unix-like systems. You can
@ -213,7 +230,7 @@ Whatever line ending convention is selected when PCRE is built can be
overridden when the library functions are called. At build time it is
conventional to use the standard for your operating system.
</P>
<br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br>
<br><a name="SEC10" href="#TOC1">WHAT \R MATCHES</a><br>
<P>
By default, the sequence \R in a pattern matches any Unicode newline sequence,
whatever has been selected as the line ending sequence. If you specify
@ -224,7 +241,7 @@ the default is changed so that \R matches only CR, LF, or CRLF. Whatever is
selected when PCRE is built can be overridden when the library functions are
called.
</P>
<br><a name="SEC10" href="#TOC1">POSIX MALLOC USAGE</a><br>
<br><a name="SEC11" href="#TOC1">POSIX MALLOC USAGE</a><br>
<P>
When the 8-bit library is called through the POSIX interface (see the
<a href="pcreposix.html"><b>pcreposix</b></a>
@ -240,7 +257,7 @@ such as
</pre>
to the <b>configure</b> command.
</P>
<br><a name="SEC11" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
<br><a name="SEC12" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
<P>
Within a compiled pattern, offset values are used to point from one part to
another (for example, from an opening parenthesis to an alternation
@ -259,7 +276,7 @@ longer offsets slows down the operation of PCRE because it has to load
additional data when handling them. For the 32-bit library the value is always
4 and cannot be overridden; the value of --with-link-size is ignored.
</P>
<br><a name="SEC12" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
<br><a name="SEC13" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
<P>
When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
by making recursive calls to an internal function called <b>match()</b>. In
@ -290,7 +307,7 @@ perform better than <b>malloc()</b> and <b>free()</b>. PCRE runs noticeably more
slowly when built in this way. This option affects only the <b>pcre_exec()</b>
function; it is not relevant for <b>pcre_dfa_exec()</b>.
</P>
<br><a name="SEC13" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
<br><a name="SEC14" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
<P>
Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
(sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>
@ -319,7 +336,7 @@ constraints. However, you can set a lower limit by adding, for example,
</pre>
to the <b>configure</b> command. This value can also be overridden at run time.
</P>
<br><a name="SEC14" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
<br><a name="SEC15" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
<P>
PCRE uses fixed tables for processing characters whose code values are less
than 256. By default, PCRE is built with a set of tables that are distributed
@ -336,7 +353,7 @@ compiling, because <b>dftables</b> is run on the local host. If you need to
create alternative tables when cross compiling, you will have to do so "by
hand".)
</P>
<br><a name="SEC15" href="#TOC1">USING EBCDIC CODE</a><br>
<br><a name="SEC16" href="#TOC1">USING EBCDIC CODE</a><br>
<P>
PCRE assumes by default that it will run in an environment where the character
code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
@ -367,7 +384,7 @@ The options that select newline behaviour, such as --enable-newline-is-cr,
and equivalent run-time options, refer to these character values in an EBCDIC
environment.
</P>
<br><a name="SEC16" href="#TOC1">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
<br><a name="SEC17" href="#TOC1">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
<P>
By default, <b>pcregrep</b> reads all files as plain text. You can build it so
that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads
@ -380,7 +397,7 @@ to the <b>configure</b> command. These options naturally require that the
relevant libraries are installed on your system. Configuration will fail if
they are not.
</P>
<br><a name="SEC17" href="#TOC1">PCREGREP BUFFER SIZE</a><br>
<br><a name="SEC18" href="#TOC1">PCREGREP BUFFER SIZE</a><br>
<P>
<b>pcregrep</b> uses an internal buffer to hold a "window" on the file it is
scanning, in order to be able to output "before" and "after" lines when it
@ -395,7 +412,7 @@ parameter value by adding, for example,
to the <b>configure</b> command. The caller of \fPpcregrep\fP can, however,
override this value by specifying a run-time option.
</P>
<br><a name="SEC18" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>
<br><a name="SEC19" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>
<P>
If you add
<pre>
@ -426,7 +443,7 @@ automatically included, you may need to add something like
</pre>
immediately before the <b>configure</b> command.
</P>
<br><a name="SEC19" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
<br><a name="SEC20" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
<P>
By adding the
<pre>
@ -436,7 +453,7 @@ option to to the <b>configure</b> command, PCRE will use valgrind annotations
to mark certain memory regions as unaddressable. This allows it to detect
invalid memory accesses, and is mostly useful for debugging PCRE itself.
</P>
<br><a name="SEC20" href="#TOC1">CODE COVERAGE REPORTING</a><br>
<br><a name="SEC21" href="#TOC1">CODE COVERAGE REPORTING</a><br>
<P>
If your C compiler is gcc, you can build a version of PCRE that can generate a
code coverage report for its test suite. To enable this, you must install
@ -493,11 +510,11 @@ This cleans all coverage data including the generated coverage report. For more
information about code coverage, see the <b>gcov</b> and <b>lcov</b>
documentation.
</P>
<br><a name="SEC21" href="#TOC1">SEE ALSO</a><br>
<br><a name="SEC22" href="#TOC1">SEE ALSO</a><br>
<P>
<b>pcreapi</b>(3), <b>pcre16</b>, <b>pcre32</b>, <b>pcre_config</b>(3).
</P>
<br><a name="SEC22" href="#TOC1">AUTHOR</a><br>
<br><a name="SEC23" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
@ -506,11 +523,11 @@ University Computing Service
Cambridge CB2 3QH, England.
<br>
</P>
<br><a name="SEC23" href="#TOC1">REVISION</a><br>
<br><a name="SEC24" href="#TOC1">REVISION</a><br>
<P>
Last updated: 30 October 2012
Last updated: 12 May 2013
<br>
Copyright &copy; 1997-2012 University of Cambridge.
Copyright &copy; 1997-2013 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE index page</a>.