Regex: Update PCRE to v8.35.
I was über lazy at first, so took libs from SM. But actually it's quite easy to compile, so let's update to latest version \o/.
This commit is contained in:
764
tools/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt
Normal file
764
tools/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt
Normal file
@ -0,0 +1,764 @@
|
||||
Building PCRE without using autotools
|
||||
-------------------------------------
|
||||
|
||||
This document contains the following sections:
|
||||
|
||||
General
|
||||
Generic instructions for the PCRE C library
|
||||
The C++ wrapper functions
|
||||
Building for virtual Pascal
|
||||
Stack size in Windows environments
|
||||
Linking programs in Windows environments
|
||||
Calling conventions in Windows environments
|
||||
Comments about Win32 builds
|
||||
Building PCRE on Windows with CMake
|
||||
Use of relative paths with CMake on Windows
|
||||
Testing with RunTest.bat
|
||||
Building under Windows CE with Visual Studio 200x
|
||||
Building under Windows with BCC5.5
|
||||
Building using Borland C++ Builder 2007 (CB2007) and higher
|
||||
Building PCRE on OpenVMS
|
||||
Building PCRE on Stratus OpenVOS
|
||||
Building PCRE on native z/OS and z/VM
|
||||
|
||||
|
||||
GENERAL
|
||||
|
||||
I (Philip Hazel) have no experience of Windows or VMS sytems and how their
|
||||
libraries work. The items in the PCRE distribution and Makefile that relate to
|
||||
anything other than Linux systems are untested by me.
|
||||
|
||||
There are some other comments and files (including some documentation in CHM
|
||||
format) in the Contrib directory on the FTP site:
|
||||
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
|
||||
|
||||
The basic PCRE library consists entirely of code written in Standard C, and so
|
||||
should compile successfully on any system that has a Standard C compiler and
|
||||
library. The C++ wrapper functions are a separate issue (see below).
|
||||
|
||||
The PCRE distribution includes a "configure" file for use by the configure/make
|
||||
(autotools) build system, as found in many Unix-like environments. The README
|
||||
file contains information about the options for "configure".
|
||||
|
||||
There is also support for CMake, which some users prefer, especially in Windows
|
||||
environments, though it can also be run in Unix-like environments. See the
|
||||
section entitled "Building PCRE on Windows with CMake" below.
|
||||
|
||||
Versions of config.h and pcre.h are distributed in the PCRE tarballs under the
|
||||
names config.h.generic and pcre.h.generic. These are provided for those who
|
||||
build PCRE without using "configure" or CMake. If you use "configure" or CMake,
|
||||
the .generic versions are not used.
|
||||
|
||||
|
||||
GENERIC INSTRUCTIONS FOR THE PCRE C LIBRARY
|
||||
|
||||
The following are generic instructions for building the PCRE C library "by
|
||||
hand". If you are going to use CMake, this section does not apply to you; you
|
||||
can skip ahead to the CMake section.
|
||||
|
||||
(1) Copy or rename the file config.h.generic as config.h, and edit the macro
|
||||
settings that it contains to whatever is appropriate for your environment.
|
||||
|
||||
In particular, you can alter the definition of the NEWLINE macro to
|
||||
specify what character(s) you want to be interpreted as line terminators.
|
||||
In an EBCDIC environment, you MUST change NEWLINE, because its default
|
||||
value is 10, an ASCII LF. The usual EBCDIC newline character is 21 (0x15,
|
||||
NL), though in some cases it may be 37 (0x25).
|
||||
|
||||
When you compile any of the PCRE modules, you must specify -DHAVE_CONFIG_H
|
||||
to your compiler so that config.h is included in the sources.
|
||||
|
||||
An alternative approach is not to edit config.h, but to use -D on the
|
||||
compiler command line to make any changes that you need to the
|
||||
configuration options. In this case -DHAVE_CONFIG_H must not be set.
|
||||
|
||||
NOTE: There have been occasions when the way in which certain parameters
|
||||
in config.h are used has changed between releases. (In the configure/make
|
||||
world, this is handled automatically.) When upgrading to a new release,
|
||||
you are strongly advised to review config.h.generic before re-using what
|
||||
you had previously.
|
||||
|
||||
(2) Copy or rename the file pcre.h.generic as pcre.h.
|
||||
|
||||
(3) EITHER:
|
||||
Copy or rename file pcre_chartables.c.dist as pcre_chartables.c.
|
||||
|
||||
OR:
|
||||
Compile dftables.c as a stand-alone program (using -DHAVE_CONFIG_H if
|
||||
you have set up config.h), and then run it with the single argument
|
||||
"pcre_chartables.c". This generates a set of standard character tables
|
||||
and writes them to that file. The tables are generated using the default
|
||||
C locale for your system. If you want to use a locale that is specified
|
||||
by LC_xxx environment variables, add the -L option to the dftables
|
||||
command. You must use this method if you are building on a system that
|
||||
uses EBCDIC code.
|
||||
|
||||
The tables in pcre_chartables.c are defaults. The caller of PCRE can
|
||||
specify alternative tables at run time.
|
||||
|
||||
(4) Ensure that you have the following header files:
|
||||
|
||||
pcre_internal.h
|
||||
ucp.h
|
||||
|
||||
(5) For an 8-bit library, compile the following source files, setting
|
||||
-DHAVE_CONFIG_H as a compiler option if you have set up config.h with your
|
||||
configuration, or else use other -D settings to change the configuration
|
||||
as required.
|
||||
|
||||
pcre_byte_order.c
|
||||
pcre_chartables.c
|
||||
pcre_compile.c
|
||||
pcre_config.c
|
||||
pcre_dfa_exec.c
|
||||
pcre_exec.c
|
||||
pcre_fullinfo.c
|
||||
pcre_get.c
|
||||
pcre_globals.c
|
||||
pcre_jit_compile.c
|
||||
pcre_maketables.c
|
||||
pcre_newline.c
|
||||
pcre_ord2utf8.c
|
||||
pcre_refcount.c
|
||||
pcre_string_utils.c
|
||||
pcre_study.c
|
||||
pcre_tables.c
|
||||
pcre_ucd.c
|
||||
pcre_valid_utf8.c
|
||||
pcre_version.c
|
||||
pcre_xclass.c
|
||||
|
||||
Make sure that you include -I. in the compiler command (or equivalent for
|
||||
an unusual compiler) so that all included PCRE header files are first
|
||||
sought in the current directory. Otherwise you run the risk of picking up
|
||||
a previously-installed file from somewhere else.
|
||||
|
||||
Note that you must still compile pcre_jit_compile.c, even if you have not
|
||||
defined SUPPORT_JIT in config.h, because when JIT support is not
|
||||
configured, dummy functions are compiled. When JIT support IS configured,
|
||||
pcre_jit_compile.c #includes sources from the sljit subdirectory, where
|
||||
there should be 16 files, all of whose names begin with "sljit".
|
||||
|
||||
(6) Now link all the compiled code into an object library in whichever form
|
||||
your system keeps such libraries. This is the basic PCRE C 8-bit library.
|
||||
If your system has static and shared libraries, you may have to do this
|
||||
once for each type.
|
||||
|
||||
(7) If you want to build a 16-bit library (as well as, or instead of the 8-bit
|
||||
or 32-bit libraries) repeat steps 5-6 with the following files:
|
||||
|
||||
pcre16_byte_order.c
|
||||
pcre16_chartables.c
|
||||
pcre16_compile.c
|
||||
pcre16_config.c
|
||||
pcre16_dfa_exec.c
|
||||
pcre16_exec.c
|
||||
pcre16_fullinfo.c
|
||||
pcre16_get.c
|
||||
pcre16_globals.c
|
||||
pcre16_jit_compile.c
|
||||
pcre16_maketables.c
|
||||
pcre16_newline.c
|
||||
pcre16_ord2utf16.c
|
||||
pcre16_refcount.c
|
||||
pcre16_string_utils.c
|
||||
pcre16_study.c
|
||||
pcre16_tables.c
|
||||
pcre16_ucd.c
|
||||
pcre16_utf16_utils.c
|
||||
pcre16_valid_utf16.c
|
||||
pcre16_version.c
|
||||
pcre16_xclass.c
|
||||
|
||||
(8) If you want to build a 32-bit library (as well as, or instead of the 8-bit
|
||||
or 16-bit libraries) repeat steps 5-6 with the following files:
|
||||
|
||||
pcre32_byte_order.c
|
||||
pcre32_chartables.c
|
||||
pcre32_compile.c
|
||||
pcre32_config.c
|
||||
pcre32_dfa_exec.c
|
||||
pcre32_exec.c
|
||||
pcre32_fullinfo.c
|
||||
pcre32_get.c
|
||||
pcre32_globals.c
|
||||
pcre32_jit_compile.c
|
||||
pcre32_maketables.c
|
||||
pcre32_newline.c
|
||||
pcre32_ord2utf32.c
|
||||
pcre32_refcount.c
|
||||
pcre32_string_utils.c
|
||||
pcre32_study.c
|
||||
pcre32_tables.c
|
||||
pcre32_ucd.c
|
||||
pcre32_utf32_utils.c
|
||||
pcre32_valid_utf32.c
|
||||
pcre32_version.c
|
||||
pcre32_xclass.c
|
||||
|
||||
(9) If you want to build the POSIX wrapper functions (which apply only to the
|
||||
8-bit library), ensure that you have the pcreposix.h file and then compile
|
||||
pcreposix.c (remembering -DHAVE_CONFIG_H if necessary). Link the result
|
||||
(on its own) as the pcreposix library.
|
||||
|
||||
(10) The pcretest program can be linked with any combination of the 8-bit,
|
||||
16-bit and 32-bit libraries (depending on what you selected in config.h).
|
||||
Compile pcretest.c and pcre_printint.c (again, don't forget
|
||||
-DHAVE_CONFIG_H) and link them together with the appropriate library/ies.
|
||||
If you compiled an 8-bit library, pcretest also needs the pcreposix
|
||||
wrapper library unless you compiled it with -DNOPOSIX.
|
||||
|
||||
(11) Run pcretest on the testinput files in the testdata directory, and check
|
||||
that the output matches the corresponding testoutput files. There are
|
||||
comments about what each test does in the section entitled "Testing PCRE"
|
||||
in the README file. If you compiled more than one of the 8-bit, 16-bit and
|
||||
32-bit libraries, you need to run pcretest with the -16 option to do
|
||||
16-bit tests and with the -32 option to do 32-bit tests.
|
||||
|
||||
Some tests are relevant only when certain build-time options are selected.
|
||||
For example, test 4 is for UTF-8/UTF-16/UTF-32 support, and will not run
|
||||
if you have built PCRE without it. See the comments at the start of each
|
||||
testinput file. If you have a suitable Unix-like shell, the RunTest script
|
||||
will run the appropriate tests for you. The command "RunTest list" will
|
||||
output a list of all the tests.
|
||||
|
||||
Note that the supplied files are in Unix format, with just LF characters
|
||||
as line terminators. You may need to edit them to change this if your
|
||||
system uses a different convention. If you are using Windows, you probably
|
||||
should use the wintestinput3 file instead of testinput3 (and the
|
||||
corresponding output file). This is a locale test; wintestinput3 sets the
|
||||
locale to "french" rather than "fr_FR", and there some minor output
|
||||
differences.
|
||||
|
||||
(12) If you have built PCRE with SUPPORT_JIT, the JIT features will be tested
|
||||
by the testdata files. However, you might also like to build and run
|
||||
the freestanding JIT test program, pcre_jit_test.c.
|
||||
|
||||
(13) If you want to use the pcregrep command, compile and link pcregrep.c; it
|
||||
uses only the basic 8-bit PCRE library (it does not need the pcreposix
|
||||
library).
|
||||
|
||||
|
||||
THE C++ WRAPPER FUNCTIONS
|
||||
|
||||
The PCRE distribution also contains some C++ wrapper functions and tests,
|
||||
applicable to the 8-bit library, which were contributed by Google Inc. On a
|
||||
system that can use "configure" and "make", the functions are automatically
|
||||
built into a library called pcrecpp. It should be straightforward to compile
|
||||
the .cc files manually on other systems. The files called xxx_unittest.cc are
|
||||
test programs for each of the corresponding xxx.cc files.
|
||||
|
||||
|
||||
BUILDING FOR VIRTUAL PASCAL
|
||||
|
||||
A script for building PCRE using Borland's C++ compiler for use with VPASCAL
|
||||
was contributed by Alexander Tokarev. Stefan Weber updated the script and added
|
||||
additional files. The following files in the distribution are for building PCRE
|
||||
for use with VP/Borland: makevp_c.txt, makevp_l.txt, makevp.bat, pcregexp.pas.
|
||||
|
||||
|
||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||
|
||||
The default processor stack size of 1Mb in some Windows environments is too
|
||||
small for matching patterns that need much recursion. In particular, test 2 may
|
||||
fail because of this. Normally, running out of stack causes a crash, but there
|
||||
have been cases where the test program has just died silently. See your linker
|
||||
documentation for how to increase stack size if you experience problems. The
|
||||
Linux default of 8Mb is a reasonable choice for the stack, though even that can
|
||||
be too small for some pattern/subject combinations.
|
||||
|
||||
PCRE has a compile configuration option to disable the use of stack for
|
||||
recursion so that heap is used instead. However, pattern matching is
|
||||
significantly slower when this is done. There is more about stack usage in the
|
||||
"pcrestack" documentation.
|
||||
|
||||
|
||||
LINKING PROGRAMS IN WINDOWS ENVIRONMENTS
|
||||
|
||||
If you want to statically link a program against a PCRE library in the form of
|
||||
a non-dll .a file, you must define PCRE_STATIC before including pcre.h or
|
||||
pcrecpp.h, otherwise the pcre_malloc() and pcre_free() exported functions will
|
||||
be declared __declspec(dllimport), with unwanted results.
|
||||
|
||||
|
||||
CALLING CONVENTIONS IN WINDOWS ENVIRONMENTS
|
||||
|
||||
It is possible to compile programs to use different calling conventions using
|
||||
MSVC. Search the web for "calling conventions" for more information. To make it
|
||||
easier to change the calling convention for the exported functions in the
|
||||
PCRE library, the macro PCRE_CALL_CONVENTION is present in all the external
|
||||
definitions. It can be set externally when compiling (e.g. in CFLAGS). If it is
|
||||
not set, it defaults to empty; the default calling convention is then used
|
||||
(which is what is wanted most of the time).
|
||||
|
||||
|
||||
COMMENTS ABOUT WIN32 BUILDS (see also "BUILDING PCRE ON WINDOWS WITH CMAKE")
|
||||
|
||||
There are two ways of building PCRE using the "configure, make, make install"
|
||||
paradigm on Windows systems: using MinGW or using Cygwin. These are not at all
|
||||
the same thing; they are completely different from each other. There is also
|
||||
support for building using CMake, which some users find a more straightforward
|
||||
way of building PCRE under Windows.
|
||||
|
||||
The MinGW home page (http://www.mingw.org/) says this:
|
||||
|
||||
MinGW: A collection of freely available and freely distributable Windows
|
||||
specific header files and import libraries combined with GNU toolsets that
|
||||
allow one to produce native Windows programs that do not rely on any
|
||||
3rd-party C runtime DLLs.
|
||||
|
||||
The Cygwin home page (http://www.cygwin.com/) says this:
|
||||
|
||||
Cygwin is a Linux-like environment for Windows. It consists of two parts:
|
||||
|
||||
. A DLL (cygwin1.dll) which acts as a Linux API emulation layer providing
|
||||
substantial Linux API functionality
|
||||
|
||||
. A collection of tools which provide Linux look and feel.
|
||||
|
||||
The Cygwin DLL currently works with all recent, commercially released x86 32
|
||||
bit and 64 bit versions of Windows, with the exception of Windows CE.
|
||||
|
||||
On both MinGW and Cygwin, PCRE should build correctly using:
|
||||
|
||||
./configure && make && make install
|
||||
|
||||
This should create two libraries called libpcre and libpcreposix, and, if you
|
||||
have enabled building the C++ wrapper, a third one called libpcrecpp. These are
|
||||
independent libraries: when you link with libpcreposix or libpcrecpp you must
|
||||
also link with libpcre, which contains the basic functions. (Some earlier
|
||||
releases of PCRE included the basic libpcre functions in libpcreposix. This no
|
||||
longer happens.)
|
||||
|
||||
A user submitted a special-purpose patch that makes it easy to create
|
||||
"pcre.dll" under mingw32 using the "msys" environment. It provides "pcre.dll"
|
||||
as a special target. If you use this target, no other files are built, and in
|
||||
particular, the pcretest and pcregrep programs are not built. An example of how
|
||||
this might be used is:
|
||||
|
||||
./configure --enable-utf --disable-cpp CFLAGS="-03 -s"; make pcre.dll
|
||||
|
||||
Using Cygwin's compiler generates libraries and executables that depend on
|
||||
cygwin1.dll. If a library that is generated this way is distributed,
|
||||
cygwin1.dll has to be distributed as well. Since cygwin1.dll is under the GPL
|
||||
licence, this forces not only PCRE to be under the GPL, but also the entire
|
||||
application. A distributor who wants to keep their own code proprietary must
|
||||
purchase an appropriate Cygwin licence.
|
||||
|
||||
MinGW has no such restrictions. The MinGW compiler generates a library or
|
||||
executable that can run standalone on Windows without any third party dll or
|
||||
licensing issues.
|
||||
|
||||
But there is more complication:
|
||||
|
||||
If a Cygwin user uses the -mno-cygwin Cygwin gcc flag, what that really does is
|
||||
to tell Cygwin's gcc to use the MinGW gcc. Cygwin's gcc is only acting as a
|
||||
front end to MinGW's gcc (if you install Cygwin's gcc, you get both Cygwin's
|
||||
gcc and MinGW's gcc). So, a user can:
|
||||
|
||||
. Build native binaries by using MinGW or by getting Cygwin and using
|
||||
-mno-cygwin.
|
||||
|
||||
. Build binaries that depend on cygwin1.dll by using Cygwin with the normal
|
||||
compiler flags.
|
||||
|
||||
The test files that are supplied with PCRE are in UNIX format, with LF
|
||||
characters as line terminators. Unless your PCRE library uses a default newline
|
||||
option that includes LF as a valid newline, it may be necessary to change the
|
||||
line terminators in the test files to get some of the tests to work.
|
||||
|
||||
|
||||
BUILDING PCRE ON WINDOWS WITH CMAKE
|
||||
|
||||
CMake is an alternative configuration facility that can be used instead of
|
||||
"configure". CMake creates project files (make files, solution files, etc.)
|
||||
tailored to numerous development environments, including Visual Studio,
|
||||
Borland, Msys, MinGW, NMake, and Unix. If possible, use short paths with no
|
||||
spaces in the names for your CMake installation and your PCRE source and build
|
||||
directories.
|
||||
|
||||
The following instructions were contributed by a PCRE user. If they are not
|
||||
followed exactly, errors may occur. In the event that errors do occur, it is
|
||||
recommended that you delete the CMake cache before attempting to repeat the
|
||||
CMake build process. In the CMake GUI, the cache can be deleted by selecting
|
||||
"File > Delete Cache".
|
||||
|
||||
1. Install the latest CMake version available from http://www.cmake.org/, and
|
||||
ensure that cmake\bin is on your path.
|
||||
|
||||
2. Unzip (retaining folder structure) the PCRE source tree into a source
|
||||
directory such as C:\pcre. You should ensure your local date and time
|
||||
is not earlier than the file dates in your source dir if the release is
|
||||
very new.
|
||||
|
||||
3. Create a new, empty build directory, preferably a subdirectory of the
|
||||
source dir. For example, C:\pcre\pcre-xx\build.
|
||||
|
||||
4. Run cmake-gui from the Shell envirornment of your build tool, for example,
|
||||
Msys for Msys/MinGW or Visual Studio Command Prompt for VC/VC++. Do not try
|
||||
to start Cmake from the Windows Start menu, as this can lead to errors.
|
||||
|
||||
5. Enter C:\pcre\pcre-xx and C:\pcre\pcre-xx\build for the source and build
|
||||
directories, respectively.
|
||||
|
||||
6. Hit the "Configure" button.
|
||||
|
||||
7. Select the particular IDE / build tool that you are using (Visual
|
||||
Studio, MSYS makefiles, MinGW makefiles, etc.)
|
||||
|
||||
8. The GUI will then list several configuration options. This is where
|
||||
you can enable UTF-8 support or other PCRE optional features.
|
||||
|
||||
9. Hit "Configure" again. The adjacent "Generate" button should now be
|
||||
active.
|
||||
|
||||
10. Hit "Generate".
|
||||
|
||||
11. The build directory should now contain a usable build system, be it a
|
||||
solution file for Visual Studio, makefiles for MinGW, etc. Exit from
|
||||
cmake-gui and use the generated build system with your compiler or IDE.
|
||||
E.g., for MinGW you can run "make", or for Visual Studio, open the PCRE
|
||||
solution, select the desired configuration (Debug, or Release, etc.) and
|
||||
build the ALL_BUILD project.
|
||||
|
||||
12. If during configuration with cmake-gui you've elected to build the test
|
||||
programs, you can execute them by building the test project. E.g., for
|
||||
MinGW: "make test"; for Visual Studio build the RUN_TESTS project. The
|
||||
most recent build configuration is targeted by the tests. A summary of
|
||||
test results is presented. Complete test output is subsequently
|
||||
available for review in Testing\Temporary under your build dir.
|
||||
|
||||
|
||||
USE OF RELATIVE PATHS WITH CMAKE ON WINDOWS
|
||||
|
||||
A PCRE user comments as follows: I thought that others may want to know the
|
||||
current state of CMAKE_USE_RELATIVE_PATHS support on Windows. Here it is:
|
||||
|
||||
-- AdditionalIncludeDirectories is only partially modified (only the
|
||||
first path - see below)
|
||||
-- Only some of the contained file paths are modified - shown below for
|
||||
pcre.vcproj
|
||||
-- It properly modifies
|
||||
|
||||
I am sure CMake people can fix that if they want to. Until then one will
|
||||
need to replace existing absolute paths in project files with relative
|
||||
paths manually (e.g. from VS) - relative to project file location. I did
|
||||
just that before being told to try CMAKE_USE_RELATIVE_PATHS. Not a big
|
||||
deal.
|
||||
|
||||
AdditionalIncludeDirectories="E:\builds\pcre\build;E:\builds\pcre\pcre-7.5;"
|
||||
AdditionalIncludeDirectories=".;E:\builds\pcre\pcre-7.5;"
|
||||
|
||||
RelativePath="pcre.h"
|
||||
RelativePath="pcre_chartables.c"
|
||||
RelativePath="pcre_chartables.c.rule"
|
||||
|
||||
|
||||
TESTING WITH RUNTEST.BAT
|
||||
|
||||
If configured with CMake, building the test project ("make test" or building
|
||||
ALL_TESTS in Visual Studio) creates (and runs) pcre_test.bat (and depending
|
||||
on your configuration options, possibly other test programs) in the build
|
||||
directory. Pcre_test.bat runs RunTest.Bat with correct source and exe paths.
|
||||
|
||||
For manual testing with RunTest.bat, provided the build dir is a subdirectory
|
||||
of the source directory: Open command shell window. Chdir to the location
|
||||
of your pcretest.exe and pcregrep.exe programs. Call RunTest.bat with
|
||||
"..\RunTest.Bat" or "..\..\RunTest.bat" as appropriate.
|
||||
|
||||
To run only a particular test with RunTest.Bat provide a test number argument.
|
||||
|
||||
Otherwise:
|
||||
|
||||
1. Copy RunTest.bat into the directory where pcretest.exe and pcregrep.exe
|
||||
have been created.
|
||||
|
||||
2. Edit RunTest.bat to indentify the full or relative location of
|
||||
the pcre source (wherein which the testdata folder resides), e.g.:
|
||||
|
||||
set srcdir=C:\pcre\pcre-8.20
|
||||
|
||||
3. In a Windows command environment, chdir to the location of your bat and
|
||||
exe programs.
|
||||
|
||||
4. Run RunTest.bat. Test outputs will automatically be compared to expected
|
||||
results, and discrepancies will be identified in the console output.
|
||||
|
||||
To independently test the just-in-time compiler, run pcre_jit_test.exe.
|
||||
To test pcrecpp, run pcrecpp_unittest.exe, pcre_stringpiece_unittest.exe and
|
||||
pcre_scanner_unittest.exe.
|
||||
|
||||
|
||||
BUILDING UNDER WINDOWS CE WITH VISUAL STUDIO 200x
|
||||
|
||||
Vincent Richomme sent a zip archive of files to help with this process. They
|
||||
can be found in the file "pcre-vsbuild.zip" in the Contrib directory of the FTP
|
||||
site.
|
||||
|
||||
|
||||
BUILDING UNDER WINDOWS WITH BCC5.5
|
||||
|
||||
Michael Roy sent these comments about building PCRE under Windows with BCC5.5:
|
||||
|
||||
Some of the core BCC libraries have a version of PCRE from 1998 built in, which
|
||||
can lead to pcre_exec() giving an erroneous PCRE_ERROR_NULL from a version
|
||||
mismatch. I'm including an easy workaround below, if you'd like to include it
|
||||
in the non-unix instructions:
|
||||
|
||||
When linking a project with BCC5.5, pcre.lib must be included before any of the
|
||||
libraries cw32.lib, cw32i.lib, cw32mt.lib, and cw32mti.lib on the command line.
|
||||
|
||||
|
||||
BUILDING USING BORLAND C++ BUILDER 2007 (CB2007) AND HIGHER
|
||||
|
||||
A PCRE user sent these comments about this environment (see also the comment
|
||||
from another user that follows them):
|
||||
|
||||
The XE versions of C++ Builder come with a RegularExpressionsCore class which
|
||||
contain a version of TPerlRegEx. However, direct use of the C PCRE library may
|
||||
be desirable.
|
||||
|
||||
The default makevp.bat, however, supplied with PCRE builds a version of PCRE
|
||||
that is not usable with any version of C++ Builder because the compiler ships
|
||||
with an embedded version of PCRE, version 2.01 from 1998! [See also the note
|
||||
about BCC5.5 above.] If you want to use PCRE you'll need to rename the
|
||||
functions (pcre_compile to pcre_compile_bcc, etc) or do as I have done and just
|
||||
use the 16 bit versions. I'm using std::wstring everywhere anyway. Since the
|
||||
embedded version of PCRE does not have the 16 bit function names, there is no
|
||||
conflict.
|
||||
|
||||
Building PCRE using a C++ Builder static library project file (recommended):
|
||||
|
||||
1. Rename or remove pcre.h, pcreposi.h, and pcreposix.h from your C++ Builder
|
||||
original include path.
|
||||
|
||||
2. Download PCRE from pcre.org and extract to a directory.
|
||||
|
||||
3. Rename pcre_chartables.c.dist to pcre_chartables.c, pcre.h.generic to
|
||||
pcre.h, and config.h.generic to config.h.
|
||||
|
||||
4. Edit pcre.h and pcre_config.c so that they include config.h.
|
||||
|
||||
5. Edit config.h like so:
|
||||
|
||||
Comment out the following lines:
|
||||
#define PACKAGE "pcre"
|
||||
#define PACKAGE_BUGREPORT ""
|
||||
#define PACKAGE_NAME "PCRE"
|
||||
#define PACKAGE_STRING "PCRE 8.32"
|
||||
#define PACKAGE_TARNAME "pcre"
|
||||
#define PACKAGE_URL ""
|
||||
#define PACKAGE_VERSION "8.32"
|
||||
|
||||
Add the following lines:
|
||||
#ifndef SUPPORT_UTF
|
||||
#define SUPPORT_UTF 100 // any value is fine
|
||||
#endif
|
||||
|
||||
#ifndef SUPPORT_UCP
|
||||
#define SUPPORT_UCP 101 // any value is fine
|
||||
#endif
|
||||
|
||||
#ifndef SUPPORT_UCP
|
||||
#define SUPPORT_PCRE16 102 // any value is fine
|
||||
#endif
|
||||
|
||||
#ifndef SUPPORT_UTF8
|
||||
#define SUPPORT_UTF8 103 // any value is fine
|
||||
#endif
|
||||
|
||||
6. Build a C++ Builder project using the IDE. Go to File / New / Other and
|
||||
choose Static Library. You can name it pcre.cbproj or whatever. Now set your
|
||||
paths by going to Project / Options. Set the Include path. Do this from the
|
||||
"Base" option to apply to both Release and Debug builds. Now add the following
|
||||
files to the project:
|
||||
|
||||
pcre.h
|
||||
pcre16_byte_order.c
|
||||
pcre16_chartables.c
|
||||
pcre16_compile.c
|
||||
pcre16_config.c
|
||||
pcre16_dfa_exec.c
|
||||
pcre16_exec.c
|
||||
pcre16_fullinfo.c
|
||||
pcre16_get.c
|
||||
pcre16_globals.c
|
||||
pcre16_maketables.c
|
||||
pcre16_newline.c
|
||||
pcre16_ord2utf16.c
|
||||
pcre16_printint.c
|
||||
pcre16_refcount.c
|
||||
pcre16_string_utils.c
|
||||
pcre16_study.c
|
||||
pcre16_tables.c
|
||||
pcre16_ucd.c
|
||||
pcre16_utf16_utils.c
|
||||
pcre16_valid_utf16.c
|
||||
pcre16_version.c
|
||||
pcre16_xclass.c
|
||||
|
||||
//Optional
|
||||
pcre_version.c
|
||||
|
||||
7. After compiling the .lib file, copy the .lib and header files to a project
|
||||
you want to use PCRE with. Enjoy.
|
||||
|
||||
Optional ... Building PCRE using the makevp.bat file:
|
||||
|
||||
1. Edit makevp_c.txt and makevp_l.txt and change all the names to the 16 bit
|
||||
versions.
|
||||
|
||||
2. Edit makevp.bat and set the path to C++ Builder. Run makevp.bat.
|
||||
|
||||
Another PCRE user added this comment:
|
||||
|
||||
Another approach I successfully used for some years with BCB 5 and 6 was to
|
||||
make sure that include and library paths of PCRE are configured before the
|
||||
default paths of the IDE in the dialogs where one can manage those paths.
|
||||
Afterwards one can open the project files using a text editor and manually add
|
||||
the self created library for pcre itself, pcrecpp doesn't ship with the IDE, in
|
||||
the library nodes where the IDE manages its own libraries to link against in
|
||||
front of the IDE-own libraries. This way one can use the default PCRE function
|
||||
names without getting access violations on runtime.
|
||||
|
||||
<ALLLIB value="libpcre.lib $(LIBFILES) $(LIBRARIES) import32.lib cp32mt.lib"/>
|
||||
|
||||
|
||||
BUILDING PCRE ON OPENVMS
|
||||
|
||||
Stephen Hoffman sent the following, in December 2012:
|
||||
|
||||
"Here <http://labs.hoffmanlabs.com/node/1847> is a very short write-up on the
|
||||
OpenVMS port and here
|
||||
|
||||
<http://labs.hoffmanlabs.com/labsnotes/pcre-vms-8_32.zip>
|
||||
|
||||
is a zip with the OpenVMS files, and with one modified testing-related PCRE
|
||||
file." This is a port of PCRE 8.32.
|
||||
|
||||
Earlier, Dan Mooney sent the following comments about building PCRE on OpenVMS.
|
||||
They relate to an older version of PCRE that used fewer source files, so the
|
||||
exact commands will need changing. See the current list of source files above.
|
||||
|
||||
"It was quite easy to compile and link the library. I don't have a formal
|
||||
make file but the attached file [reproduced below] contains the OpenVMS DCL
|
||||
commands I used to build the library. I had to add #define
|
||||
POSIX_MALLOC_THRESHOLD 10 to pcre.h since it was not defined anywhere.
|
||||
|
||||
The library was built on:
|
||||
O/S: HP OpenVMS v7.3-1
|
||||
Compiler: Compaq C v6.5-001-48BCD
|
||||
Linker: vA13-01
|
||||
|
||||
The test results did not match 100% due to the issues you mention in your
|
||||
documentation regarding isprint(), iscntrl(), isgraph() and ispunct(). I
|
||||
modified some of the character tables temporarily and was able to get the
|
||||
results to match. Tests using the fr locale did not match since I don't have
|
||||
that locale loaded. The study size was always reported to be 3 less than the
|
||||
value in the standard test output files."
|
||||
|
||||
=========================
|
||||
$! This DCL procedure builds PCRE on OpenVMS
|
||||
$!
|
||||
$! I followed the instructions in the non-unix-use file in the distribution.
|
||||
$!
|
||||
$ COMPILE == "CC/LIST/NOMEMBER_ALIGNMENT/PREFIX_LIBRARY_ENTRIES=ALL_ENTRIES
|
||||
$ COMPILE DFTABLES.C
|
||||
$ LINK/EXE=DFTABLES.EXE DFTABLES.OBJ
|
||||
$ RUN DFTABLES.EXE/OUTPUT=CHARTABLES.C
|
||||
$ COMPILE MAKETABLES.C
|
||||
$ COMPILE GET.C
|
||||
$ COMPILE STUDY.C
|
||||
$! I had to set POSIX_MALLOC_THRESHOLD to 10 in PCRE.H since the symbol
|
||||
$! did not seem to be defined anywhere.
|
||||
$! I edited pcre.h and added #DEFINE SUPPORT_UTF8 to enable UTF8 support.
|
||||
$ COMPILE PCRE.C
|
||||
$ LIB/CREATE PCRE MAKETABLES.OBJ, GET.OBJ, STUDY.OBJ, PCRE.OBJ
|
||||
$! I had to set POSIX_MALLOC_THRESHOLD to 10 in PCRE.H since the symbol
|
||||
$! did not seem to be defined anywhere.
|
||||
$ COMPILE PCREPOSIX.C
|
||||
$ LIB/CREATE PCREPOSIX PCREPOSIX.OBJ
|
||||
$ COMPILE PCRETEST.C
|
||||
$ LINK/EXE=PCRETEST.EXE PCRETEST.OBJ, PCRE/LIB, PCREPOSIX/LIB
|
||||
$! C programs that want access to command line arguments must be
|
||||
$! defined as a symbol
|
||||
$ PCRETEST :== "$ SYS$ROADSUSERS:[DMOONEY.REGEXP]PCRETEST.EXE"
|
||||
$! Arguments must be enclosed in quotes.
|
||||
$ PCRETEST "-C"
|
||||
$! Test results:
|
||||
$!
|
||||
$! The test results did not match 100%. The functions isprint(), iscntrl(),
|
||||
$! isgraph() and ispunct() on OpenVMS must not produce the same results
|
||||
$! as the system that built the test output files provided with the
|
||||
$! distribution.
|
||||
$!
|
||||
$! The study size did not match and was always 3 less on OpenVMS.
|
||||
$!
|
||||
$! Locale could not be set to fr
|
||||
$!
|
||||
=========================
|
||||
|
||||
|
||||
BUILDING PCRE ON STRATUS OPENVOS
|
||||
|
||||
These notes on the port of PCRE to VOS (lightly edited) were supplied by
|
||||
Ashutosh Warikoo, whose email address has the local part awarikoo and the
|
||||
domain nse.co.in. The port was for version 7.9 in August 2009.
|
||||
|
||||
1. Building PCRE
|
||||
|
||||
I built pcre on OpenVOS Release 17.0.1at using GNU Tools 3.4a without any
|
||||
problems. I used the following packages to build PCRE:
|
||||
|
||||
ftp://ftp.stratus.com/pub/vos/posix/ga/posix.save.evf.gz
|
||||
|
||||
Please read and follow the instructions that come with these packages. To start
|
||||
the build of pcre, from the root of the package type:
|
||||
|
||||
./build.sh
|
||||
|
||||
2. Installing PCRE
|
||||
|
||||
Once you have successfully built PCRE, login to the SysAdmin group, switch to
|
||||
the root user, and type
|
||||
|
||||
[ !create_dir (master_disk)>usr --if needed ]
|
||||
[ !create_dir (master_disk)>usr>local --if needed ]
|
||||
!gmake install
|
||||
|
||||
This installs PCRE and its man pages into /usr/local. You can add
|
||||
(master_disk)>usr>local>bin to your command search paths, or if you are in
|
||||
BASH, add /usr/local/bin to the PATH environment variable.
|
||||
|
||||
4. Restrictions
|
||||
|
||||
This port requires readline library optionally. However during the build I
|
||||
faced some yet unexplored errors while linking with readline. As it was an
|
||||
optional component I chose to disable it.
|
||||
|
||||
5. Known Problems
|
||||
|
||||
I ran the test suite, but you will have to be your own judge of whether this
|
||||
command, and this port, suits your purposes. If you find any problems that
|
||||
appear to be related to the port itself, please let me know. Please see the
|
||||
build.log file in the root of the package also.
|
||||
|
||||
|
||||
BUILDING PCRE ON NATIVE Z/OS AND Z/VM
|
||||
|
||||
z/OS and z/VM are operating systems for mainframe computers, produced by IBM.
|
||||
The character code used is EBCDIC, not ASCII or Unicode. In z/OS, UNIX APIs and
|
||||
applications can be supported through UNIX System Services, and in such an
|
||||
environment PCRE can be built in the same way as in other systems. However, in
|
||||
native z/OS (without UNIX System Services) and in z/VM, special ports are
|
||||
required. For details, please see this web site:
|
||||
|
||||
http://www.zaconsultants.net
|
||||
|
||||
There is also a mirror here:
|
||||
|
||||
http://www.vsoft-software.com/downloads.html
|
||||
|
||||
==========================
|
||||
Last Updated: 14 May 2013
|
991
tools/pcre/doc/html/README.txt
Normal file
991
tools/pcre/doc/html/README.txt
Normal file
@ -0,0 +1,991 @@
|
||||
README file for PCRE (Perl-compatible regular expression library)
|
||||
-----------------------------------------------------------------
|
||||
|
||||
The latest release of PCRE is always available in three alternative formats
|
||||
from:
|
||||
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.bz2
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.zip
|
||||
|
||||
There is a mailing list for discussion about the development of PCRE at
|
||||
pcre-dev@exim.org. You can access the archives and subscribe or manage your
|
||||
subscription here:
|
||||
|
||||
https://lists.exim.org/mailman/listinfo/pcre-dev
|
||||
|
||||
Please read the NEWS file if you are upgrading from a previous release.
|
||||
The contents of this README file are:
|
||||
|
||||
The PCRE APIs
|
||||
Documentation for PCRE
|
||||
Contributions by users of PCRE
|
||||
Building PCRE on non-Unix-like systems
|
||||
Building PCRE without using autotools
|
||||
Building PCRE using autotools
|
||||
Retrieving configuration information
|
||||
Shared libraries
|
||||
Cross-compiling using autotools
|
||||
Using HP's ANSI C++ compiler (aCC)
|
||||
Compiling in Tru64 using native compilers
|
||||
Using Sun's compilers for Solaris
|
||||
Using PCRE from MySQL
|
||||
Making new tarballs
|
||||
Testing PCRE
|
||||
Character tables
|
||||
File manifest
|
||||
|
||||
|
||||
The PCRE APIs
|
||||
-------------
|
||||
|
||||
PCRE is written in C, and it has its own API. There are three sets of
|
||||
functions, one for the 8-bit library, which processes strings of bytes, one for
|
||||
the 16-bit library, which processes strings of 16-bit values, and one for the
|
||||
32-bit library, which processes strings of 32-bit values. The distribution also
|
||||
includes a set of C++ wrapper functions (see the pcrecpp man page for details),
|
||||
courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
|
||||
C++.
|
||||
|
||||
In addition, there is a set of C wrapper functions (again, just for the 8-bit
|
||||
library) that are based on the POSIX regular expression API (see the pcreposix
|
||||
man page). These end up in the library called libpcreposix. Note that this just
|
||||
provides a POSIX calling interface to PCRE; the regular expressions themselves
|
||||
still follow Perl syntax and semantics. The POSIX API is restricted, and does
|
||||
not give full access to all of PCRE's facilities.
|
||||
|
||||
The header file for the POSIX-style functions is called pcreposix.h. The
|
||||
official POSIX name is regex.h, but I did not want to risk possible problems
|
||||
with existing files of that name by distributing it that way. To use PCRE with
|
||||
an existing program that uses the POSIX API, pcreposix.h will have to be
|
||||
renamed or pointed at by a link.
|
||||
|
||||
If you are using the POSIX interface to PCRE and there is already a POSIX regex
|
||||
library installed on your system, as well as worrying about the regex.h header
|
||||
file (as mentioned above), you must also take care when linking programs to
|
||||
ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
|
||||
up the POSIX functions of the same name from the other library.
|
||||
|
||||
One way of avoiding this confusion is to compile PCRE with the addition of
|
||||
-Dregcomp=PCREregcomp (and similarly for the other POSIX functions) to the
|
||||
compiler flags (CFLAGS if you are using "configure" -- see below). This has the
|
||||
effect of renaming the functions so that the names no longer clash. Of course,
|
||||
you have to do the same thing for your applications, or write them using the
|
||||
new names.
|
||||
|
||||
|
||||
Documentation for PCRE
|
||||
----------------------
|
||||
|
||||
If you install PCRE in the normal way on a Unix-like system, you will end up
|
||||
with a set of man pages whose names all start with "pcre". The one that is just
|
||||
called "pcre" lists all the others. In addition to these man pages, the PCRE
|
||||
documentation is supplied in two other forms:
|
||||
|
||||
1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
|
||||
doc/pcretest.txt in the source distribution. The first of these is a
|
||||
concatenation of the text forms of all the section 3 man pages except
|
||||
the listing of pcredemo.c and those that summarize individual functions.
|
||||
The other two are the text forms of the section 1 man pages for the
|
||||
pcregrep and pcretest commands. These text forms are provided for ease of
|
||||
scanning with text editors or similar tools. They are installed in
|
||||
<prefix>/share/doc/pcre, where <prefix> is the installation prefix
|
||||
(defaulting to /usr/local).
|
||||
|
||||
2. A set of files containing all the documentation in HTML form, hyperlinked
|
||||
in various ways, and rooted in a file called index.html, is distributed in
|
||||
doc/html and installed in <prefix>/share/doc/pcre/html.
|
||||
|
||||
Users of PCRE have contributed files containing the documentation for various
|
||||
releases in CHM format. These can be found in the Contrib directory of the FTP
|
||||
site (see next section).
|
||||
|
||||
|
||||
Contributions by users of PCRE
|
||||
------------------------------
|
||||
|
||||
You can find contributions from PCRE users in the directory
|
||||
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
|
||||
|
||||
There is a README file giving brief descriptions of what they are. Some are
|
||||
complete in themselves; others are pointers to URLs containing relevant files.
|
||||
Some of this material is likely to be well out-of-date. Several of the earlier
|
||||
contributions provided support for compiling PCRE on various flavours of
|
||||
Windows (I myself do not use Windows). Nowadays there is more Windows support
|
||||
in the standard distribution, so these contibutions have been archived.
|
||||
|
||||
A PCRE user maintains downloadable Windows binaries of the pcregrep and
|
||||
pcretest programs here:
|
||||
|
||||
http://www.rexegg.com/pcregrep-pcretest.html
|
||||
|
||||
|
||||
Building PCRE on non-Unix-like systems
|
||||
--------------------------------------
|
||||
|
||||
For a non-Unix-like system, please read the comments in the file
|
||||
NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and
|
||||
"make" you may be able to build PCRE using autotools in the same way as for
|
||||
many Unix-like systems.
|
||||
|
||||
PCRE can also be configured using the GUI facility provided by CMake's
|
||||
cmake-gui command. This creates Makefiles, solution files, etc. The file
|
||||
NON-AUTOTOOLS-BUILD has information about CMake.
|
||||
|
||||
PCRE has been compiled on many different operating systems. It should be
|
||||
straightforward to build PCRE on any system that has a Standard C compiler and
|
||||
library, because it uses only Standard C functions.
|
||||
|
||||
|
||||
Building PCRE without using autotools
|
||||
-------------------------------------
|
||||
|
||||
The use of autotools (in particular, libtool) is problematic in some
|
||||
environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
|
||||
file for ways of building PCRE without using autotools.
|
||||
|
||||
|
||||
Building PCRE using autotools
|
||||
-----------------------------
|
||||
|
||||
If you are using HP's ANSI C++ compiler (aCC), please see the special note
|
||||
in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
|
||||
|
||||
The following instructions assume the use of the widely used "configure; make;
|
||||
make install" (autotools) process.
|
||||
|
||||
To build PCRE on system that supports autotools, first run the "configure"
|
||||
command from the PCRE distribution directory, with your current directory set
|
||||
to the directory where you want the files to be created. This command is a
|
||||
standard GNU "autoconf" configuration script, for which generic instructions
|
||||
are supplied in the file INSTALL.
|
||||
|
||||
Most commonly, people build PCRE within its own distribution directory, and in
|
||||
this case, on many systems, just running "./configure" is sufficient. However,
|
||||
the usual methods of changing standard defaults are available. For example:
|
||||
|
||||
CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
|
||||
|
||||
This command specifies that the C compiler should be run with the flags '-O2
|
||||
-Wall' instead of the default, and that "make install" should install PCRE
|
||||
under /opt/local instead of the default /usr/local.
|
||||
|
||||
If you want to build in a different directory, just run "configure" with that
|
||||
directory as current. For example, suppose you have unpacked the PCRE source
|
||||
into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
|
||||
|
||||
cd /build/pcre/pcre-xxx
|
||||
/source/pcre/pcre-xxx/configure
|
||||
|
||||
PCRE is written in C and is normally compiled as a C library. However, it is
|
||||
possible to build it as a C++ library, though the provided building apparatus
|
||||
does not have any features to support this.
|
||||
|
||||
There are some optional features that can be included or omitted from the PCRE
|
||||
library. They are also documented in the pcrebuild man page.
|
||||
|
||||
. By default, both shared and static libraries are built. You can change this
|
||||
by adding one of these options to the "configure" command:
|
||||
|
||||
--disable-shared
|
||||
--disable-static
|
||||
|
||||
(See also "Shared libraries on Unix-like systems" below.)
|
||||
|
||||
. By default, only the 8-bit library is built. If you add --enable-pcre16 to
|
||||
the "configure" command, the 16-bit library is also built. If you add
|
||||
--enable-pcre32 to the "configure" command, the 32-bit library is also built.
|
||||
If you want only the 16-bit or 32-bit library, use --disable-pcre8 to disable
|
||||
building the 8-bit library.
|
||||
|
||||
. If you are building the 8-bit library and want to suppress the building of
|
||||
the C++ wrapper library, you can add --disable-cpp to the "configure"
|
||||
command. Otherwise, when "configure" is run without --disable-pcre8, it will
|
||||
try to find a C++ compiler and C++ header files, and if it succeeds, it will
|
||||
try to build the C++ wrapper.
|
||||
|
||||
. If you want to include support for just-in-time compiling, which can give
|
||||
large performance improvements on certain platforms, add --enable-jit to the
|
||||
"configure" command. This support is available only for certain hardware
|
||||
architectures. If you try to enable it on an unsupported architecture, there
|
||||
will be a compile time error.
|
||||
|
||||
. When JIT support is enabled, pcregrep automatically makes use of it, unless
|
||||
you add --disable-pcregrep-jit to the "configure" command.
|
||||
|
||||
. If you want to make use of the support for UTF-8 Unicode character strings in
|
||||
the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,
|
||||
or UTF-32 Unicode character strings in the 32-bit library, you must add
|
||||
--enable-utf to the "configure" command. Without it, the code for handling
|
||||
UTF-8, UTF-16 and UTF-8 is not included in the relevant library. Even
|
||||
when --enable-utf is included, the use of a UTF encoding still has to be
|
||||
enabled by an option at run time. When PCRE is compiled with this option, its
|
||||
input can only either be ASCII or UTF-8/16/32, even when running on EBCDIC
|
||||
platforms. It is not possible to use both --enable-utf and --enable-ebcdic at
|
||||
the same time.
|
||||
|
||||
. There are no separate options for enabling UTF-8, UTF-16 and UTF-32
|
||||
independently because that would allow ridiculous settings such as requesting
|
||||
UTF-16 support while building only the 8-bit library. However, the option
|
||||
--enable-utf8 is retained for backwards compatibility with earlier releases
|
||||
that did not support 16-bit or 32-bit character strings. It is synonymous with
|
||||
--enable-utf. It is not possible to configure one library with UTF support
|
||||
and the other without in the same configuration.
|
||||
|
||||
. If, in addition to support for UTF-8/16/32 character strings, you want to
|
||||
include support for the \P, \p, and \X sequences that recognize Unicode
|
||||
character properties, you must add --enable-unicode-properties to the
|
||||
"configure" command. This adds about 30K to the size of the library (in the
|
||||
form of a property table); only the basic two-letter properties such as Lu
|
||||
are supported.
|
||||
|
||||
. You can build PCRE to recognize either CR or LF or the sequence CRLF or any
|
||||
of the preceding, or any of the Unicode newline sequences as indicating the
|
||||
end of a line. Whatever you specify at build time is the default; the caller
|
||||
of PCRE can change the selection at run time. The default newline indicator
|
||||
is a single LF character (the Unix standard). You can specify the default
|
||||
newline indicator by adding --enable-newline-is-cr or --enable-newline-is-lf
|
||||
or --enable-newline-is-crlf or --enable-newline-is-anycrlf or
|
||||
--enable-newline-is-any to the "configure" command, respectively.
|
||||
|
||||
If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
|
||||
the standard tests will fail, because the lines in the test files end with
|
||||
LF. Even if the files are edited to change the line endings, there are likely
|
||||
to be some failures. With --enable-newline-is-anycrlf or
|
||||
--enable-newline-is-any, many tests should succeed, but there may be some
|
||||
failures.
|
||||
|
||||
. By default, the sequence \R in a pattern matches any Unicode line ending
|
||||
sequence. This is independent of the option specifying what PCRE considers to
|
||||
be the end of a line (see above). However, the caller of PCRE can restrict \R
|
||||
to match only CR, LF, or CRLF. You can make this the default by adding
|
||||
--enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
|
||||
|
||||
. When called via the POSIX interface, PCRE uses malloc() to get additional
|
||||
storage for processing capturing parentheses if there are more than 10 of
|
||||
them in a pattern. You can increase this threshold by setting, for example,
|
||||
|
||||
--with-posix-malloc-threshold=20
|
||||
|
||||
on the "configure" command.
|
||||
|
||||
. PCRE has a counter that limits the depth of nesting of parentheses in a
|
||||
pattern. This limits the amount of system stack that a pattern uses when it
|
||||
is compiled. The default is 250, but you can change it by setting, for
|
||||
example,
|
||||
|
||||
--with-parens-nest-limit=500
|
||||
|
||||
. PCRE has a counter that can be set to limit the amount of resources it uses
|
||||
when matching a pattern. If the limit is exceeded during a match, the match
|
||||
fails. The default is ten million. You can change the default by setting, for
|
||||
example,
|
||||
|
||||
--with-match-limit=500000
|
||||
|
||||
on the "configure" command. This is just the default; individual calls to
|
||||
pcre_exec() can supply their own value. There is more discussion on the
|
||||
pcreapi man page.
|
||||
|
||||
. There is a separate counter that limits the depth of recursive function calls
|
||||
during a matching process. This also has a default of ten million, which is
|
||||
essentially "unlimited". You can change the default by setting, for example,
|
||||
|
||||
--with-match-limit-recursion=500000
|
||||
|
||||
Recursive function calls use up the runtime stack; running out of stack can
|
||||
cause programs to crash in strange ways. There is a discussion about stack
|
||||
sizes in the pcrestack man page.
|
||||
|
||||
. The default maximum compiled pattern size is around 64K. You can increase
|
||||
this by adding --with-link-size=3 to the "configure" command. In the 8-bit
|
||||
library, PCRE then uses three bytes instead of two for offsets to different
|
||||
parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
|
||||
the same as --with-link-size=4, which (in both libraries) uses four-byte
|
||||
offsets. Increasing the internal link size reduces performance. In the 32-bit
|
||||
library, the only supported link size is 4.
|
||||
|
||||
. You can build PCRE so that its internal match() function that is called from
|
||||
pcre_exec() does not call itself recursively. Instead, it uses memory blocks
|
||||
obtained from the heap via the special functions pcre_stack_malloc() and
|
||||
pcre_stack_free() to save data that would otherwise be saved on the stack. To
|
||||
build PCRE like this, use
|
||||
|
||||
--disable-stack-for-recursion
|
||||
|
||||
on the "configure" command. PCRE runs more slowly in this mode, but it may be
|
||||
necessary in environments with limited stack sizes. This applies only to the
|
||||
normal execution of the pcre_exec() function; if JIT support is being
|
||||
successfully used, it is not relevant. Equally, it does not apply to
|
||||
pcre_dfa_exec(), which does not use deeply nested recursion. There is a
|
||||
discussion about stack sizes in the pcrestack man page.
|
||||
|
||||
. For speed, PCRE uses four tables for manipulating and identifying characters
|
||||
whose code point values are less than 256. By default, it uses a set of
|
||||
tables for ASCII encoding that is part of the distribution. If you specify
|
||||
|
||||
--enable-rebuild-chartables
|
||||
|
||||
a program called dftables is compiled and run in the default C locale when
|
||||
you obey "make". It builds a source file called pcre_chartables.c. If you do
|
||||
not specify this option, pcre_chartables.c is created as a copy of
|
||||
pcre_chartables.c.dist. See "Character tables" below for further information.
|
||||
|
||||
. It is possible to compile PCRE for use on systems that use EBCDIC as their
|
||||
character code (as opposed to ASCII/Unicode) by specifying
|
||||
|
||||
--enable-ebcdic
|
||||
|
||||
This automatically implies --enable-rebuild-chartables (see above). However,
|
||||
when PCRE is built this way, it always operates in EBCDIC. It cannot support
|
||||
both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
|
||||
which specifies that the code value for the EBCDIC NL character is 0x25
|
||||
instead of the default 0x15.
|
||||
|
||||
. In environments where valgrind is installed, if you specify
|
||||
|
||||
--enable-valgrind
|
||||
|
||||
PCRE will use valgrind annotations to mark certain memory regions as
|
||||
unaddressable. This allows it to detect invalid memory accesses, and is
|
||||
mostly useful for debugging PCRE itself.
|
||||
|
||||
. In environments where the gcc compiler is used and lcov version 1.6 or above
|
||||
is installed, if you specify
|
||||
|
||||
--enable-coverage
|
||||
|
||||
the build process implements a code coverage report for the test suite. The
|
||||
report is generated by running "make coverage". If ccache is installed on
|
||||
your system, it must be disabled when building PCRE for coverage reporting.
|
||||
You can do this by setting the environment variable CCACHE_DISABLE=1 before
|
||||
running "make" to build PCRE. There is more information about coverage
|
||||
reporting in the "pcrebuild" documentation.
|
||||
|
||||
. The pcregrep program currently supports only 8-bit data files, and so
|
||||
requires the 8-bit PCRE library. It is possible to compile pcregrep to use
|
||||
libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
|
||||
specifying one or both of
|
||||
|
||||
--enable-pcregrep-libz
|
||||
--enable-pcregrep-libbz2
|
||||
|
||||
Of course, the relevant libraries must be installed on your system.
|
||||
|
||||
. The default size (in bytes) of the internal buffer used by pcregrep can be
|
||||
set by, for example:
|
||||
|
||||
--with-pcregrep-bufsize=51200
|
||||
|
||||
The value must be a plain integer. The default is 20480.
|
||||
|
||||
. It is possible to compile pcretest so that it links with the libreadline
|
||||
or libedit libraries, by specifying, respectively,
|
||||
|
||||
--enable-pcretest-libreadline or --enable-pcretest-libedit
|
||||
|
||||
If this is done, when pcretest's input is from a terminal, it reads it using
|
||||
the readline() function. This provides line-editing and history facilities.
|
||||
Note that libreadline is GPL-licenced, so if you distribute a binary of
|
||||
pcretest linked in this way, there may be licensing issues. These can be
|
||||
avoided by linking with libedit (which has a BSD licence) instead.
|
||||
|
||||
Enabling libreadline causes the -lreadline option to be added to the pcretest
|
||||
build. In many operating environments with a sytem-installed readline
|
||||
library this is sufficient. However, in some environments (e.g. if an
|
||||
unmodified distribution version of readline is in use), it may be necessary
|
||||
to specify something like LIBS="-lncurses" as well. This is because, to quote
|
||||
the readline INSTALL, "Readline uses the termcap functions, but does not link
|
||||
with the termcap or curses library itself, allowing applications which link
|
||||
with readline the to choose an appropriate library." If you get error
|
||||
messages about missing functions tgetstr, tgetent, tputs, tgetflag, or tgoto,
|
||||
this is the problem, and linking with the ncurses library should fix it.
|
||||
|
||||
The "configure" script builds the following files for the basic C library:
|
||||
|
||||
. Makefile the makefile that builds the library
|
||||
. config.h build-time configuration options for the library
|
||||
. pcre.h the public PCRE header file
|
||||
. pcre-config script that shows the building settings such as CFLAGS
|
||||
that were set for "configure"
|
||||
. libpcre.pc ) data for the pkg-config command
|
||||
. libpcre16.pc )
|
||||
. libpcre32.pc )
|
||||
. libpcreposix.pc )
|
||||
. libtool script that builds shared and/or static libraries
|
||||
|
||||
Versions of config.h and pcre.h are distributed in the PCRE tarballs under the
|
||||
names config.h.generic and pcre.h.generic. These are provided for those who
|
||||
have to built PCRE without using "configure" or CMake. If you use "configure"
|
||||
or CMake, the .generic versions are not used.
|
||||
|
||||
When building the 8-bit library, if a C++ compiler is found, the following
|
||||
files are also built:
|
||||
|
||||
. libpcrecpp.pc data for the pkg-config command
|
||||
. pcrecpparg.h header file for calling PCRE via the C++ wrapper
|
||||
. pcre_stringpiece.h header for the C++ "stringpiece" functions
|
||||
|
||||
The "configure" script also creates config.status, which is an executable
|
||||
script that can be run to recreate the configuration, and config.log, which
|
||||
contains compiler output from tests that "configure" runs.
|
||||
|
||||
Once "configure" has run, you can run "make". This builds the the libraries
|
||||
libpcre, libpcre16 and/or libpcre32, and a test program called pcretest. If you
|
||||
enabled JIT support with --enable-jit, a test program called pcre_jit_test is
|
||||
built as well.
|
||||
|
||||
If the 8-bit library is built, libpcreposix and the pcregrep command are also
|
||||
built, and if a C++ compiler was found on your system, and you did not disable
|
||||
it with --disable-cpp, "make" builds the C++ wrapper library, which is called
|
||||
libpcrecpp, as well as some test programs called pcrecpp_unittest,
|
||||
pcre_scanner_unittest, and pcre_stringpiece_unittest.
|
||||
|
||||
The command "make check" runs all the appropriate tests. Details of the PCRE
|
||||
tests are given below in a separate section of this document.
|
||||
|
||||
You can use "make install" to install PCRE into live directories on your
|
||||
system. The following are installed (file names are all relative to the
|
||||
<prefix> that is set when "configure" is run):
|
||||
|
||||
Commands (bin):
|
||||
pcretest
|
||||
pcregrep (if 8-bit support is enabled)
|
||||
pcre-config
|
||||
|
||||
Libraries (lib):
|
||||
libpcre16 (if 16-bit support is enabled)
|
||||
libpcre32 (if 32-bit support is enabled)
|
||||
libpcre (if 8-bit support is enabled)
|
||||
libpcreposix (if 8-bit support is enabled)
|
||||
libpcrecpp (if 8-bit and C++ support is enabled)
|
||||
|
||||
Configuration information (lib/pkgconfig):
|
||||
libpcre16.pc
|
||||
libpcre32.pc
|
||||
libpcre.pc
|
||||
libpcreposix.pc
|
||||
libpcrecpp.pc (if C++ support is enabled)
|
||||
|
||||
Header files (include):
|
||||
pcre.h
|
||||
pcreposix.h
|
||||
pcre_scanner.h )
|
||||
pcre_stringpiece.h ) if C++ support is enabled
|
||||
pcrecpp.h )
|
||||
pcrecpparg.h )
|
||||
|
||||
Man pages (share/man/man{1,3}):
|
||||
pcregrep.1
|
||||
pcretest.1
|
||||
pcre-config.1
|
||||
pcre.3
|
||||
pcre*.3 (lots more pages, all starting "pcre")
|
||||
|
||||
HTML documentation (share/doc/pcre/html):
|
||||
index.html
|
||||
*.html (lots more pages, hyperlinked from index.html)
|
||||
|
||||
Text file documentation (share/doc/pcre):
|
||||
AUTHORS
|
||||
COPYING
|
||||
ChangeLog
|
||||
LICENCE
|
||||
NEWS
|
||||
README
|
||||
pcre.txt (a concatenation of the man(3) pages)
|
||||
pcretest.txt the pcretest man page
|
||||
pcregrep.txt the pcregrep man page
|
||||
pcre-config.txt the pcre-config man page
|
||||
|
||||
If you want to remove PCRE from your system, you can run "make uninstall".
|
||||
This removes all the files that "make install" installed. However, it does not
|
||||
remove any directories, because these are often shared with other programs.
|
||||
|
||||
|
||||
Retrieving configuration information
|
||||
------------------------------------
|
||||
|
||||
Running "make install" installs the command pcre-config, which can be used to
|
||||
recall information about the PCRE configuration and installation. For example:
|
||||
|
||||
pcre-config --version
|
||||
|
||||
prints the version number, and
|
||||
|
||||
pcre-config --libs
|
||||
|
||||
outputs information about where the library is installed. This command can be
|
||||
included in makefiles for programs that use PCRE, saving the programmer from
|
||||
having to remember too many details.
|
||||
|
||||
The pkg-config command is another system for saving and retrieving information
|
||||
about installed libraries. Instead of separate commands for each library, a
|
||||
single command is used. For example:
|
||||
|
||||
pkg-config --cflags pcre
|
||||
|
||||
The data is held in *.pc files that are installed in a directory called
|
||||
<prefix>/lib/pkgconfig.
|
||||
|
||||
|
||||
Shared libraries
|
||||
----------------
|
||||
|
||||
The default distribution builds PCRE as shared libraries and static libraries,
|
||||
as long as the operating system supports shared libraries. Shared library
|
||||
support relies on the "libtool" script which is built as part of the
|
||||
"configure" process.
|
||||
|
||||
The libtool script is used to compile and link both shared and static
|
||||
libraries. They are placed in a subdirectory called .libs when they are newly
|
||||
built. The programs pcretest and pcregrep are built to use these uninstalled
|
||||
libraries (by means of wrapper scripts in the case of shared libraries). When
|
||||
you use "make install" to install shared libraries, pcregrep and pcretest are
|
||||
automatically re-built to use the newly installed shared libraries before being
|
||||
installed themselves. However, the versions left in the build directory still
|
||||
use the uninstalled libraries.
|
||||
|
||||
To build PCRE using static libraries only you must use --disable-shared when
|
||||
configuring it. For example:
|
||||
|
||||
./configure --prefix=/usr/gnu --disable-shared
|
||||
|
||||
Then run "make" in the usual way. Similarly, you can use --disable-static to
|
||||
build only shared libraries.
|
||||
|
||||
|
||||
Cross-compiling using autotools
|
||||
-------------------------------
|
||||
|
||||
You can specify CC and CFLAGS in the normal way to the "configure" command, in
|
||||
order to cross-compile PCRE for some other host. However, you should NOT
|
||||
specify --enable-rebuild-chartables, because if you do, the dftables.c source
|
||||
file is compiled and run on the local host, in order to generate the inbuilt
|
||||
character tables (the pcre_chartables.c file). This will probably not work,
|
||||
because dftables.c needs to be compiled with the local compiler, not the cross
|
||||
compiler.
|
||||
|
||||
When --enable-rebuild-chartables is not specified, pcre_chartables.c is created
|
||||
by making a copy of pcre_chartables.c.dist, which is a default set of tables
|
||||
that assumes ASCII code. Cross-compiling with the default tables should not be
|
||||
a problem.
|
||||
|
||||
If you need to modify the character tables when cross-compiling, you should
|
||||
move pcre_chartables.c.dist out of the way, then compile dftables.c by hand and
|
||||
run it on the local host to make a new version of pcre_chartables.c.dist.
|
||||
Then when you cross-compile PCRE this new version of the tables will be used.
|
||||
|
||||
|
||||
Using HP's ANSI C++ compiler (aCC)
|
||||
----------------------------------
|
||||
|
||||
Unless C++ support is disabled by specifying the "--disable-cpp" option of the
|
||||
"configure" script, you must include the "-AA" option in the CXXFLAGS
|
||||
environment variable in order for the C++ components to compile correctly.
|
||||
|
||||
Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
|
||||
needed libraries fail to get included when specifying the "-AA" compiler
|
||||
option. If you experience unresolved symbols when linking the C++ programs,
|
||||
use the workaround of specifying the following environment variable prior to
|
||||
running the "configure" script:
|
||||
|
||||
CXXLDFLAGS="-lstd_v2 -lCsup_v2"
|
||||
|
||||
|
||||
Compiling in Tru64 using native compilers
|
||||
-----------------------------------------
|
||||
|
||||
The following error may occur when compiling with native compilers in the Tru64
|
||||
operating system:
|
||||
|
||||
CXX libpcrecpp_la-pcrecpp.lo
|
||||
cxx: Error: /usr/lib/cmplrs/cxx/V7.1-006/include/cxx/iosfwd, line 58: #error
|
||||
directive: "cannot include iosfwd -- define __USE_STD_IOSTREAM to
|
||||
override default - see section 7.1.2 of the C++ Using Guide"
|
||||
#error "cannot include iosfwd -- define __USE_STD_IOSTREAM to override default
|
||||
- see section 7.1.2 of the C++ Using Guide"
|
||||
|
||||
This may be followed by other errors, complaining that 'namespace "std" has no
|
||||
member'. The solution to this is to add the line
|
||||
|
||||
#define __USE_STD_IOSTREAM 1
|
||||
|
||||
to the config.h file.
|
||||
|
||||
|
||||
Using Sun's compilers for Solaris
|
||||
---------------------------------
|
||||
|
||||
A user reports that the following configurations work on Solaris 9 sparcv9 and
|
||||
Solaris 9 x86 (32-bit):
|
||||
|
||||
Solaris 9 sparcv9: ./configure --disable-cpp CC=/bin/cc CFLAGS="-m64 -g"
|
||||
Solaris 9 x86: ./configure --disable-cpp CC=/bin/cc CFLAGS="-g"
|
||||
|
||||
|
||||
Using PCRE from MySQL
|
||||
---------------------
|
||||
|
||||
On systems where both PCRE and MySQL are installed, it is possible to make use
|
||||
of PCRE from within MySQL, as an alternative to the built-in pattern matching.
|
||||
There is a web page that tells you how to do this:
|
||||
|
||||
http://www.mysqludf.org/lib_mysqludf_preg/index.php
|
||||
|
||||
|
||||
Making new tarballs
|
||||
-------------------
|
||||
|
||||
The command "make dist" creates three PCRE tarballs, in tar.gz, tar.bz2, and
|
||||
zip formats. The command "make distcheck" does the same, but then does a trial
|
||||
build of the new distribution to ensure that it works.
|
||||
|
||||
If you have modified any of the man page sources in the doc directory, you
|
||||
should first run the PrepareRelease script before making a distribution. This
|
||||
script creates the .txt and HTML forms of the documentation from the man pages.
|
||||
|
||||
|
||||
Testing PCRE
|
||||
------------
|
||||
|
||||
To test the basic PCRE library on a Unix-like system, run the RunTest script.
|
||||
There is another script called RunGrepTest that tests the options of the
|
||||
pcregrep command. If the C++ wrapper library is built, three test programs
|
||||
called pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest
|
||||
are also built. When JIT support is enabled, another test program called
|
||||
pcre_jit_test is built.
|
||||
|
||||
Both the scripts and all the program tests are run if you obey "make check" or
|
||||
"make test". For other environments, see the instructions in
|
||||
NON-AUTOTOOLS-BUILD.
|
||||
|
||||
The RunTest script runs the pcretest test program (which is documented in its
|
||||
own man page) on each of the relevant testinput files in the testdata
|
||||
directory, and compares the output with the contents of the corresponding
|
||||
testoutput files. RunTest uses a file called testtry to hold the main output
|
||||
from pcretest. Other files whose names begin with "test" are used as working
|
||||
files in some tests.
|
||||
|
||||
Some tests are relevant only when certain build-time options were selected. For
|
||||
example, the tests for UTF-8/16/32 support are run only if --enable-utf was
|
||||
used. RunTest outputs a comment when it skips a test.
|
||||
|
||||
Many of the tests that are not skipped are run up to three times. The second
|
||||
run forces pcre_study() to be called for all patterns except for a few in some
|
||||
tests that are marked "never study" (see the pcretest program for how this is
|
||||
done). If JIT support is available, the non-DFA tests are run a third time,
|
||||
this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option.
|
||||
This testing can be suppressed by putting "nojit" on the RunTest command line.
|
||||
|
||||
The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
|
||||
libraries that are enabled. If you want to run just one set of tests, call
|
||||
RunTest with either the -8, -16 or -32 option.
|
||||
|
||||
If valgrind is installed, you can run the tests under it by putting "valgrind"
|
||||
on the RunTest command line. To run pcretest on just one or more specific test
|
||||
files, give their numbers as arguments to RunTest, for example:
|
||||
|
||||
RunTest 2 7 11
|
||||
|
||||
You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
|
||||
end), or a number preceded by ~ to exclude a test. For example:
|
||||
|
||||
Runtest 3-15 ~10
|
||||
|
||||
This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
|
||||
except test 13. Whatever order the arguments are in, the tests are always run
|
||||
in numerical order.
|
||||
|
||||
You can also call RunTest with the single argument "list" to cause it to output
|
||||
a list of tests.
|
||||
|
||||
The first test file can be fed directly into the perltest.pl script to check
|
||||
that Perl gives the same results. The only difference you should see is in the
|
||||
first few lines, where the Perl version is given instead of the PCRE version.
|
||||
|
||||
The second set of tests check pcre_fullinfo(), pcre_study(),
|
||||
pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
|
||||
detection, and run-time flags that are specific to PCRE, as well as the POSIX
|
||||
wrapper API. It also uses the debugging flags to check some of the internals of
|
||||
pcre_compile().
|
||||
|
||||
If you build PCRE with a locale setting that is not the standard C locale, the
|
||||
character tables may be different (see next paragraph). In some cases, this may
|
||||
cause failures in the second set of tests. For example, in a locale where the
|
||||
isprint() function yields TRUE for characters in the range 128-255, the use of
|
||||
[:isascii:] inside a character class defines a different set of characters, and
|
||||
this shows up in this test as a difference in the compiled code, which is being
|
||||
listed for checking. Where the comparison test output contains [\x00-\x7f] the
|
||||
test will contain [\x00-\xff], and similarly in some other cases. This is not a
|
||||
bug in PCRE.
|
||||
|
||||
The third set of tests checks pcre_maketables(), the facility for building a
|
||||
set of character tables for a specific locale and using them instead of the
|
||||
default tables. The tests make use of the "fr_FR" (French) locale. Before
|
||||
running the test, the script checks for the presence of this locale by running
|
||||
the "locale" command. If that command fails, or if it doesn't include "fr_FR"
|
||||
in the list of available locales, the third test cannot be run, and a comment
|
||||
is output to say why. If running this test produces instances of the error
|
||||
|
||||
** Failed to set locale "fr_FR"
|
||||
|
||||
in the comparison output, it means that locale is not available on your system,
|
||||
despite being listed by "locale". This does not mean that PCRE is broken.
|
||||
|
||||
[If you are trying to run this test on Windows, you may be able to get it to
|
||||
work by changing "fr_FR" to "french" everywhere it occurs. Alternatively, use
|
||||
RunTest.bat. The version of RunTest.bat included with PCRE 7.4 and above uses
|
||||
Windows versions of test 2. More info on using RunTest.bat is included in the
|
||||
document entitled NON-UNIX-USE.]
|
||||
|
||||
The fourth and fifth tests check the UTF-8/16/32 support and error handling and
|
||||
internal UTF features of PCRE that are not relevant to Perl, respectively. The
|
||||
sixth and seventh tests do the same for Unicode character properties support.
|
||||
|
||||
The eighth, ninth, and tenth tests check the pcre_dfa_exec() alternative
|
||||
matching function, in non-UTF-8/16/32 mode, UTF-8/16/32 mode, and UTF-8/16/32
|
||||
mode with Unicode property support, respectively.
|
||||
|
||||
The eleventh test checks some internal offsets and code size features; it is
|
||||
run only when the default "link size" of 2 is set (in other cases the sizes
|
||||
change) and when Unicode property support is enabled.
|
||||
|
||||
The twelfth test is run only when JIT support is available, and the thirteenth
|
||||
test is run only when JIT support is not available. They test some JIT-specific
|
||||
features such as information output from pcretest about JIT compilation.
|
||||
|
||||
The fourteenth, fifteenth, and sixteenth tests are run only in 8-bit mode, and
|
||||
the seventeenth, eighteenth, and nineteenth tests are run only in 16/32-bit
|
||||
mode. These are tests that generate different output in the two modes. They are
|
||||
for general cases, UTF-8/16/32 support, and Unicode property support,
|
||||
respectively.
|
||||
|
||||
The twentieth test is run only in 16/32-bit mode. It tests some specific
|
||||
16/32-bit features of the DFA matching engine.
|
||||
|
||||
The twenty-first and twenty-second tests are run only in 16/32-bit mode, when
|
||||
the link size is set to 2 for the 16-bit library. They test reloading
|
||||
pre-compiled patterns.
|
||||
|
||||
The twenty-third and twenty-fourth tests are run only in 16-bit mode. They are
|
||||
for general cases, and UTF-16 support, respectively.
|
||||
|
||||
The twenty-fifth and twenty-sixth tests are run only in 32-bit mode. They are
|
||||
for general cases, and UTF-32 support, respectively.
|
||||
|
||||
|
||||
Character tables
|
||||
----------------
|
||||
|
||||
For speed, PCRE uses four tables for manipulating and identifying characters
|
||||
whose code point values are less than 256. The final argument of the
|
||||
pcre_compile() function is a pointer to a block of memory containing the
|
||||
concatenated tables. A call to pcre_maketables() can be used to generate a set
|
||||
of tables in the current locale. If the final argument for pcre_compile() is
|
||||
passed as NULL, a set of default tables that is built into the binary is used.
|
||||
|
||||
The source file called pcre_chartables.c contains the default set of tables. By
|
||||
default, this is created as a copy of pcre_chartables.c.dist, which contains
|
||||
tables for ASCII coding. However, if --enable-rebuild-chartables is specified
|
||||
for ./configure, a different version of pcre_chartables.c is built by the
|
||||
program dftables (compiled from dftables.c), which uses the ANSI C character
|
||||
handling functions such as isalnum(), isalpha(), isupper(), islower(), etc. to
|
||||
build the table sources. This means that the default C locale which is set for
|
||||
your system will control the contents of these default tables. You can change
|
||||
the default tables by editing pcre_chartables.c and then re-building PCRE. If
|
||||
you do this, you should take care to ensure that the file does not get
|
||||
automatically re-generated. The best way to do this is to move
|
||||
pcre_chartables.c.dist out of the way and replace it with your customized
|
||||
tables.
|
||||
|
||||
When the dftables program is run as a result of --enable-rebuild-chartables,
|
||||
it uses the default C locale that is set on your system. It does not pay
|
||||
attention to the LC_xxx environment variables. In other words, it uses the
|
||||
system's default locale rather than whatever the compiling user happens to have
|
||||
set. If you really do want to build a source set of character tables in a
|
||||
locale that is specified by the LC_xxx variables, you can run the dftables
|
||||
program by hand with the -L option. For example:
|
||||
|
||||
./dftables -L pcre_chartables.c.special
|
||||
|
||||
The first two 256-byte tables provide lower casing and case flipping functions,
|
||||
respectively. The next table consists of three 32-byte bit maps which identify
|
||||
digits, "word" characters, and white space, respectively. These are used when
|
||||
building 32-byte bit maps that represent character classes for code points less
|
||||
than 256.
|
||||
|
||||
The final 256-byte table has bits indicating various character types, as
|
||||
follows:
|
||||
|
||||
1 white space character
|
||||
2 letter
|
||||
4 decimal digit
|
||||
8 hexadecimal digit
|
||||
16 alphanumeric or '_'
|
||||
128 regular expression metacharacter or binary zero
|
||||
|
||||
You should not alter the set of characters that contain the 128 bit, as that
|
||||
will cause PCRE to malfunction.
|
||||
|
||||
|
||||
File manifest
|
||||
-------------
|
||||
|
||||
The distribution should contain the files listed below. Where a file name is
|
||||
given as pcre[16|32]_xxx it means that there are three files, one with the name
|
||||
pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
|
||||
|
||||
(A) Source files of the PCRE library functions and their headers:
|
||||
|
||||
dftables.c auxiliary program for building pcre_chartables.c
|
||||
when --enable-rebuild-chartables is specified
|
||||
|
||||
pcre_chartables.c.dist a default set of character tables that assume ASCII
|
||||
coding; used, unless --enable-rebuild-chartables is
|
||||
specified, by copying to pcre[16]_chartables.c
|
||||
|
||||
pcreposix.c )
|
||||
pcre[16|32]_byte_order.c )
|
||||
pcre[16|32]_compile.c )
|
||||
pcre[16|32]_config.c )
|
||||
pcre[16|32]_dfa_exec.c )
|
||||
pcre[16|32]_exec.c )
|
||||
pcre[16|32]_fullinfo.c )
|
||||
pcre[16|32]_get.c ) sources for the functions in the library,
|
||||
pcre[16|32]_globals.c ) and some internal functions that they use
|
||||
pcre[16|32]_jit_compile.c )
|
||||
pcre[16|32]_maketables.c )
|
||||
pcre[16|32]_newline.c )
|
||||
pcre[16|32]_refcount.c )
|
||||
pcre[16|32]_string_utils.c )
|
||||
pcre[16|32]_study.c )
|
||||
pcre[16|32]_tables.c )
|
||||
pcre[16|32]_ucd.c )
|
||||
pcre[16|32]_version.c )
|
||||
pcre[16|32]_xclass.c )
|
||||
pcre_ord2utf8.c )
|
||||
pcre_valid_utf8.c )
|
||||
pcre16_ord2utf16.c )
|
||||
pcre16_utf16_utils.c )
|
||||
pcre16_valid_utf16.c )
|
||||
pcre32_utf32_utils.c )
|
||||
pcre32_valid_utf32.c )
|
||||
|
||||
pcre[16|32]_printint.c ) debugging function that is used by pcretest,
|
||||
) and can also be #included in pcre_compile()
|
||||
|
||||
pcre.h.in template for pcre.h when built by "configure"
|
||||
pcreposix.h header for the external POSIX wrapper API
|
||||
pcre_internal.h header for internal use
|
||||
sljit/* 16 files that make up the JIT compiler
|
||||
ucp.h header for Unicode property handling
|
||||
|
||||
config.h.in template for config.h, which is built by "configure"
|
||||
|
||||
pcrecpp.h public header file for the C++ wrapper
|
||||
pcrecpparg.h.in template for another C++ header file
|
||||
pcre_scanner.h public header file for C++ scanner functions
|
||||
pcrecpp.cc )
|
||||
pcre_scanner.cc ) source for the C++ wrapper library
|
||||
|
||||
pcre_stringpiece.h.in template for pcre_stringpiece.h, the header for the
|
||||
C++ stringpiece functions
|
||||
pcre_stringpiece.cc source for the C++ stringpiece functions
|
||||
|
||||
(B) Source files for programs that use PCRE:
|
||||
|
||||
pcredemo.c simple demonstration of coding calls to PCRE
|
||||
pcregrep.c source of a grep utility that uses PCRE
|
||||
pcretest.c comprehensive test program
|
||||
|
||||
(C) Auxiliary files:
|
||||
|
||||
132html script to turn "man" pages into HTML
|
||||
AUTHORS information about the author of PCRE
|
||||
ChangeLog log of changes to the code
|
||||
CleanTxt script to clean nroff output for txt man pages
|
||||
Detrail script to remove trailing spaces
|
||||
HACKING some notes about the internals of PCRE
|
||||
INSTALL generic installation instructions
|
||||
LICENCE conditions for the use of PCRE
|
||||
COPYING the same, using GNU's standard name
|
||||
Makefile.in ) template for Unix Makefile, which is built by
|
||||
) "configure"
|
||||
Makefile.am ) the automake input that was used to create
|
||||
) Makefile.in
|
||||
NEWS important changes in this release
|
||||
NON-UNIX-USE the previous name for NON-AUTOTOOLS-BUILD
|
||||
NON-AUTOTOOLS-BUILD notes on building PCRE without using autotools
|
||||
PrepareRelease script to make preparations for "make dist"
|
||||
README this file
|
||||
RunTest a Unix shell script for running tests
|
||||
RunGrepTest a Unix shell script for pcregrep tests
|
||||
aclocal.m4 m4 macros (generated by "aclocal")
|
||||
config.guess ) files used by libtool,
|
||||
config.sub ) used only when building a shared library
|
||||
configure a configuring shell script (built by autoconf)
|
||||
configure.ac ) the autoconf input that was used to build
|
||||
) "configure" and config.h
|
||||
depcomp ) script to find program dependencies, generated by
|
||||
) automake
|
||||
doc/*.3 man page sources for PCRE
|
||||
doc/*.1 man page sources for pcregrep and pcretest
|
||||
doc/index.html.src the base HTML page
|
||||
doc/html/* HTML documentation
|
||||
doc/pcre.txt plain text version of the man pages
|
||||
doc/pcretest.txt plain text documentation of test program
|
||||
doc/perltest.txt plain text documentation of Perl test program
|
||||
install-sh a shell script for installing files
|
||||
libpcre16.pc.in template for libpcre16.pc for pkg-config
|
||||
libpcre32.pc.in template for libpcre32.pc for pkg-config
|
||||
libpcre.pc.in template for libpcre.pc for pkg-config
|
||||
libpcreposix.pc.in template for libpcreposix.pc for pkg-config
|
||||
libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config
|
||||
ltmain.sh file used to build a libtool script
|
||||
missing ) common stub for a few missing GNU programs while
|
||||
) installing, generated by automake
|
||||
mkinstalldirs script for making install directories
|
||||
perltest.pl Perl test program
|
||||
pcre-config.in source of script which retains PCRE information
|
||||
pcre_jit_test.c test program for the JIT compiler
|
||||
pcrecpp_unittest.cc )
|
||||
pcre_scanner_unittest.cc ) test programs for the C++ wrapper
|
||||
pcre_stringpiece_unittest.cc )
|
||||
testdata/testinput* test data for main library tests
|
||||
testdata/testoutput* expected test results
|
||||
testdata/grep* input and output for pcregrep tests
|
||||
testdata/* other supporting test files
|
||||
|
||||
(D) Auxiliary files for cmake support
|
||||
|
||||
cmake/COPYING-CMAKE-SCRIPTS
|
||||
cmake/FindPackageHandleStandardArgs.cmake
|
||||
cmake/FindEditline.cmake
|
||||
cmake/FindReadline.cmake
|
||||
CMakeLists.txt
|
||||
config-cmake.h.in
|
||||
|
||||
(E) Auxiliary files for VPASCAL
|
||||
|
||||
makevp.bat
|
||||
makevp_c.txt
|
||||
makevp_l.txt
|
||||
pcregexp.pas
|
||||
|
||||
(F) Auxiliary files for building PCRE "by hand"
|
||||
|
||||
pcre.h.generic ) a version of the public PCRE header file
|
||||
) for use in non-"configure" environments
|
||||
config.h.generic ) a version of config.h for use in non-"configure"
|
||||
) environments
|
||||
|
||||
(F) Miscellaneous
|
||||
|
||||
RunTest.bat a script for running tests under Windows
|
||||
|
||||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 17 January 2014
|
@ -11,27 +11,29 @@
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>Perl-compatible Regular Expressions (PCRE)</h1>
|
||||
<p>
|
||||
The HTML documentation for PCRE comprises the following pages:
|
||||
The HTML documentation for PCRE consists of a number of pages that are listed
|
||||
below in alphabetical order. If you are new to PCRE, please read the first one
|
||||
first.
|
||||
</p>
|
||||
|
||||
<table>
|
||||
<tr><td><a href="pcre.html">pcre</a></td>
|
||||
<td> Introductory page</td></tr>
|
||||
|
||||
<tr><td><a href="pcre-config.html">pcre-config</a></td>
|
||||
<td> Information about the installation configuration</td></tr>
|
||||
|
||||
<tr><td><a href="pcre16.html">pcre16</a></td>
|
||||
<td> Discussion of the 16-bit PCRE library</td></tr>
|
||||
|
||||
<tr><td><a href="pcre32.html">pcre32</a></td>
|
||||
<td> Discussion of the 32-bit PCRE library</td></tr>
|
||||
|
||||
<tr><td><a href="pcre-config.html">pcre-config</a></td>
|
||||
<td> Information about the installation configuration</td></tr>
|
||||
|
||||
<tr><td><a href="pcreapi.html">pcreapi</a></td>
|
||||
<td> PCRE's native API</td></tr>
|
||||
|
||||
<tr><td><a href="pcrebuild.html">pcrebuild</a></td>
|
||||
<td> Options for building PCRE</td></tr>
|
||||
<td> Building PCRE</td></tr>
|
||||
|
||||
<tr><td><a href="pcrecallout.html">pcrecallout</a></td>
|
||||
<td> The <i>callout</i> facility</td></tr>
|
||||
@ -67,7 +69,7 @@ The HTML documentation for PCRE comprises the following pages:
|
||||
<td> Some comments on performance</td></tr>
|
||||
|
||||
<tr><td><a href="pcreposix.html">pcreposix</a></td>
|
||||
<td> The POSIX API to the PCRE library</td></tr>
|
||||
<td> The POSIX API to the PCRE 8-bit library</td></tr>
|
||||
|
||||
<tr><td><a href="pcreprecompile.html">pcreprecompile</a></td>
|
||||
<td> How to save and re-use compiled patterns</td></tr>
|
||||
@ -118,13 +120,13 @@ functions.
|
||||
<td> Match a compiled pattern to a subject string
|
||||
(DFA algorithm; <i>not</i> Perl compatible)</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_free_study.html">pcre_free_study</a></td>
|
||||
<td> Free study data</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_exec.html">pcre_exec</a></td>
|
||||
<td> Match a compiled pattern to a subject string
|
||||
(Perl compatible)</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_free_study.html">pcre_free_study</a></td>
|
||||
<td> Free study data</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_free_substring.html">pcre_free_substring</a></td>
|
||||
<td> Free extracted substring</td></tr>
|
||||
|
||||
@ -140,14 +142,17 @@ functions.
|
||||
<tr><td><a href="pcre_get_stringnumber.html">pcre_get_stringnumber</a></td>
|
||||
<td> Convert captured string name to number</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_stringtable_entries.html">pcre_get_stringtable_entries</a></td>
|
||||
<td> Find table entries for given string name</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_substring.html">pcre_get_substring</a></td>
|
||||
<td> Extract numbered substring into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_substring_list.html">pcre_get_substring_list</a></td>
|
||||
<td> Extract all substrings into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_info.html">pcre_info</a></td>
|
||||
<td> Obsolete information extraction function</td></tr>
|
||||
<tr><td><a href="pcre_jit_exec.html">pcre_jit_exec</a></td>
|
||||
<td> Fast path interface to JIT matching</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_jit_stack_alloc.html">pcre_jit_stack_alloc</a></td>
|
||||
<td> Create a stack for JIT matching</td></tr>
|
||||
|
@ -23,8 +23,8 @@ man page, in case the conversion went wrong.
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
||||
<P>
|
||||
<b>pcre-config [--prefix] [--exec-prefix] [--version] [--libs]</b>
|
||||
<b>[--libs16] [--libs32] [--libs-cpp] [--libs-posix]</b>
|
||||
<b>[--cflags] [--cflags-posix]</b>
|
||||
<b> [--libs16] [--libs32] [--libs-cpp] [--libs-posix]</b>
|
||||
<b> [--cflags] [--cflags-posix]</b>
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
|
@ -38,9 +38,9 @@ Herczeg.
|
||||
</P>
|
||||
<P>
|
||||
Starting with release 8.32 it is possible to compile a third separate PCRE
|
||||
library, which supports 32-bit character strings (including
|
||||
UTF-32 strings). The build process allows any set of the 8-, 16- and 32-bit
|
||||
libraries. The work to make this possible was done by Christian Persch.
|
||||
library that supports 32-bit character strings (including UTF-32 strings). The
|
||||
build process allows any combination of the 8-, 16- and 32-bit libraries. The
|
||||
work to make this possible was done by Christian Persch.
|
||||
</P>
|
||||
<P>
|
||||
The three libraries contain identical sets of functions, except that the names
|
||||
@ -62,7 +62,7 @@ The current implementation of PCRE corresponds approximately with Perl 5.12,
|
||||
including support for UTF-8/16/32 encoded strings and Unicode general category
|
||||
properties. However, UTF-8/16/32 and Unicode support has to be explicitly
|
||||
enabled; it is not the default. The Unicode tables correspond to Unicode
|
||||
release 6.2.0.
|
||||
release 6.3.0.
|
||||
</P>
|
||||
<P>
|
||||
In addition to the Perl-compatible matching function, PCRE contains an
|
||||
@ -100,8 +100,11 @@ function makes it possible for a client to discover which features are
|
||||
available. The features themselves are described in the
|
||||
<a href="pcrebuild.html"><b>pcrebuild</b></a>
|
||||
page. Documentation about building PCRE for various operating systems can be
|
||||
found in the <b>README</b> and <b>NON-AUTOTOOLS_BUILD</b> files in the source
|
||||
distribution.
|
||||
found in the
|
||||
<a href="README.txt"><b>README</b></a>
|
||||
and
|
||||
<a href="NON-AUTOTOOLS-BUILD.txt"><b>NON-AUTOTOOLS_BUILD</b></a>
|
||||
files in the source distribution.
|
||||
</P>
|
||||
<P>
|
||||
The libraries contains a number of undocumented internal functions and data
|
||||
@ -126,8 +129,11 @@ use sufficiently many resources as to cause your application to lose
|
||||
performance.
|
||||
</P>
|
||||
<P>
|
||||
The best way of guarding against this possibility is to use the
|
||||
One way of guarding against this possibility is to use the
|
||||
<b>pcre_fullinfo()</b> function to check the compiled pattern's options for UTF.
|
||||
Alternatively, from release 8.33, you can set the PCRE_NEVER_UTF option at
|
||||
compile time. This causes an compile time error if a pattern contains a
|
||||
UTF-setting sequence.
|
||||
</P>
|
||||
<P>
|
||||
If your application is one that supports UTF, be aware that validity checking
|
||||
@ -148,15 +154,18 @@ page.
|
||||
The user documentation for PCRE comprises a number of different sections. In
|
||||
the "man" format, each of these is a separate "man page". In the HTML format,
|
||||
each is a separate page, linked from the index page. In the plain text format,
|
||||
all the sections, except the <b>pcredemo</b> section, are concatenated, for ease
|
||||
of searching. The sections are as follows:
|
||||
the descriptions of the <b>pcregrep</b> and <b>pcretest</b> programs are in files
|
||||
called <b>pcregrep.txt</b> and <b>pcretest.txt</b>, respectively. The remaining
|
||||
sections, except for the <b>pcredemo</b> section (which is a program listing),
|
||||
are concatenated in <b>pcre.txt</b>, for ease of searching. The sections are as
|
||||
follows:
|
||||
<pre>
|
||||
pcre this document
|
||||
pcre-config show PCRE installation configuration information
|
||||
pcre16 details of the 16-bit library
|
||||
pcre32 details of the 32-bit library
|
||||
pcre-config show PCRE installation configuration information
|
||||
pcreapi details of PCRE's native C API
|
||||
pcrebuild options for building PCRE
|
||||
pcrebuild building PCRE
|
||||
pcrecallout details of the callout feature
|
||||
pcrecompat discussion of Perl compatibility
|
||||
pcrecpp details of the C++ wrapper for the 8-bit library
|
||||
@ -176,8 +185,8 @@ of searching. The sections are as follows:
|
||||
pcretest description of the <b>pcretest</b> testing command
|
||||
pcreunicode discussion of Unicode and UTF-8/16/32 support
|
||||
</pre>
|
||||
In addition, in the "man" and HTML formats, there is a short page for each
|
||||
C library function, listing its arguments and results.
|
||||
In the "man" and HTML formats, there is also a short page for each C library
|
||||
function, listing its arguments and results.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
@ -195,9 +204,9 @@ two digits 10, at the domain cam.ac.uk.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 11 November 2012
|
||||
Last updated: 08 January 2014
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2014 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
@ -42,126 +42,126 @@ man page, in case the conversion went wrong.
|
||||
<br><a name="SEC1" href="#TOC1">PCRE 16-BIT API BASIC FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>pcre16 *pcre16_compile(PCRE_SPTR16 <i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b> const unsigned char *<i>tableptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcre16 *pcre16_compile2(PCRE_SPTR16 <i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>int *<i>errorcodeptr</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int *<i>errorcodeptr</i>,</b>
|
||||
<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b> const unsigned char *<i>tableptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcre16_extra *pcre16_study(const pcre16 *<i>code</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char **<i>errptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre16_free_study(pcre16_extra *<i>extra</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_exec(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_dfa_exec(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b>int *<i>workspace</i>, int <i>wscount</i>);</b>
|
||||
<b> PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b> int *<i>workspace</i>, int <i>wscount</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">PCRE 16-BIT API STRING EXTRACTION FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>int pcre16_copy_named_substring(const pcre16 *<i>code</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, PCRE_SPTR16 <i>stringname</i>,</b>
|
||||
<b>PCRE_UCHAR16 *<i>buffer</i>, int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b> int <i>stringcount</i>, PCRE_SPTR16 <i>stringname</i>,</b>
|
||||
<b> PCRE_UCHAR16 *<i>buffer</i>, int <i>buffersize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_copy_substring(PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>, PCRE_UCHAR16 *<i>buffer</i>,</b>
|
||||
<b>int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int <i>stringcount</i>, int <i>stringnumber</i>, PCRE_UCHAR16 *<i>buffer</i>,</b>
|
||||
<b> int <i>buffersize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_get_named_substring(const pcre16 *<i>code</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, PCRE_SPTR16 <i>stringname</i>,</b>
|
||||
<b>PCRE_SPTR16 *<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b> int <i>stringcount</i>, PCRE_SPTR16 <i>stringname</i>,</b>
|
||||
<b> PCRE_SPTR16 *<i>stringptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_get_stringnumber(const pcre16 *<i>code</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>name</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b>" PCRE_SPTR16 <i>name</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_get_stringtable_entries(const pcre16 *<i>code</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>name</i>, PCRE_UCHAR16 **<i>first</i>, PCRE_UCHAR16 **<i>last</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> PCRE_SPTR16 <i>name</i>, PCRE_UCHAR16 **<i>first</i>, PCRE_UCHAR16 **<i>last</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_get_substring(PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>,</b>
|
||||
<b>PCRE_SPTR16 *<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int <i>stringcount</i>, int <i>stringnumber</i>,</b>
|
||||
<b> PCRE_SPTR16 *<i>stringptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_get_substring_list(PCRE_SPTR16 <i>subject</i>,</b>
|
||||
<b>int *<i>ovector</i>, int <i>stringcount</i>, PCRE_SPTR16 **<i>listptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int *<i>ovector</i>, int <i>stringcount</i>, PCRE_SPTR16 **<i>listptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre16_free_substring(PCRE_SPTR16 <i>stringptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre16_free_substring_list(PCRE_SPTR16 *<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">PCRE 16-BIT API AUXILIARY FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>pcre16_jit_stack *pcre16_jit_stack_alloc(int <i>startsize</i>, int <i>maxsize</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre16_jit_stack_free(pcre16_jit_stack *<i>stack</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre16_assign_jit_stack(pcre16_extra *<i>extra</i>,</b>
|
||||
<b>pcre16_jit_callback <i>callback</i>, void *<i>data</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> pcre16_jit_callback <i>callback</i>, void *<i>data</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>const unsigned char *pcre16_maketables(void);</b>
|
||||
</P>
|
||||
<P>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_fullinfo(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
|
||||
<b>int <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int <i>what</i>, void *<i>where</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_refcount(pcre16 *<i>code</i>, int <i>adjust</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_config(int <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<br>
|
||||
<br>
|
||||
<b>const char *pcre16_version(void);</b>
|
||||
</P>
|
||||
<P>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_pattern_to_host_byte_order(pcre16 *<i>code</i>,</b>
|
||||
<b>pcre16_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
|
||||
<b> pcre16_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">PCRE 16-BIT API INDIRECTED FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>void *(*pcre16_malloc)(size_t);</b>
|
||||
</P>
|
||||
<P>
|
||||
<br>
|
||||
<br>
|
||||
<b>void (*pcre16_free)(void *);</b>
|
||||
</P>
|
||||
<P>
|
||||
<br>
|
||||
<br>
|
||||
<b>void *(*pcre16_stack_malloc)(size_t);</b>
|
||||
</P>
|
||||
<P>
|
||||
<br>
|
||||
<br>
|
||||
<b>void (*pcre16_stack_free)(void *);</b>
|
||||
</P>
|
||||
<P>
|
||||
<br>
|
||||
<br>
|
||||
<b>int (*pcre16_callout)(pcre16_callout_block *);</b>
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">PCRE 16-BIT API 16-BIT-ONLY FUNCTION</a><br>
|
||||
<P>
|
||||
<b>int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *<i>output</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>input</i>, int <i>length</i>, int *<i>byte_order</i>,</b>
|
||||
<b>int <i>keep_boms</i>);</b>
|
||||
<b> PCRE_SPTR16 <i>input</i>, int <i>length</i>, int *<i>byte_order</i>,</b>
|
||||
<b> int <i>keep_boms</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">THE PCRE 16-BIT LIBRARY</a><br>
|
||||
<P>
|
||||
@ -259,8 +259,9 @@ buffer, including the zero terminator if the string was zero-terminated.
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">SUBJECT STRING OFFSETS</a><br>
|
||||
<P>
|
||||
The offsets within subject strings that are returned by the matching functions
|
||||
are in 16-bit units rather than bytes.
|
||||
The lengths and starting offsets of subject strings must be specified in 16-bit
|
||||
data units, and the offsets within subject strings that are returned by the
|
||||
matching functions are in also 16-bit units rather than bytes.
|
||||
</P>
|
||||
<br><a name="SEC13" href="#TOC1">NAMED SUBPATTERNS</a><br>
|
||||
<P>
|
||||
@ -374,9 +375,9 @@ Cambridge CB2 3QH, England.
|
||||
</P>
|
||||
<br><a name="SEC22" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 08 November 2012
|
||||
Last updated: 12 May 2013
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2013 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
382
tools/pcre/doc/html/pcre32.html
Normal file
382
tools/pcre/doc/html/pcre32.html
Normal file
@ -0,0 +1,382 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre32 specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre32 man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PCRE 32-BIT API BASIC FUNCTIONS</a>
|
||||
<li><a name="TOC2" href="#SEC2">PCRE 32-BIT API STRING EXTRACTION FUNCTIONS</a>
|
||||
<li><a name="TOC3" href="#SEC3">PCRE 32-BIT API AUXILIARY FUNCTIONS</a>
|
||||
<li><a name="TOC4" href="#SEC4">PCRE 32-BIT API INDIRECTED FUNCTIONS</a>
|
||||
<li><a name="TOC5" href="#SEC5">PCRE 32-BIT API 32-BIT-ONLY FUNCTION</a>
|
||||
<li><a name="TOC6" href="#SEC6">THE PCRE 32-BIT LIBRARY</a>
|
||||
<li><a name="TOC7" href="#SEC7">THE HEADER FILE</a>
|
||||
<li><a name="TOC8" href="#SEC8">THE LIBRARY NAME</a>
|
||||
<li><a name="TOC9" href="#SEC9">STRING TYPES</a>
|
||||
<li><a name="TOC10" href="#SEC10">STRUCTURE TYPES</a>
|
||||
<li><a name="TOC11" href="#SEC11">32-BIT FUNCTIONS</a>
|
||||
<li><a name="TOC12" href="#SEC12">SUBJECT STRING OFFSETS</a>
|
||||
<li><a name="TOC13" href="#SEC13">NAMED SUBPATTERNS</a>
|
||||
<li><a name="TOC14" href="#SEC14">OPTION NAMES</a>
|
||||
<li><a name="TOC15" href="#SEC15">CHARACTER CODES</a>
|
||||
<li><a name="TOC16" href="#SEC16">ERROR NAMES</a>
|
||||
<li><a name="TOC17" href="#SEC17">ERROR TEXTS</a>
|
||||
<li><a name="TOC18" href="#SEC18">CALLOUTS</a>
|
||||
<li><a name="TOC19" href="#SEC19">TESTING</a>
|
||||
<li><a name="TOC20" href="#SEC20">NOT SUPPORTED IN 32-BIT MODE</a>
|
||||
<li><a name="TOC21" href="#SEC21">AUTHOR</a>
|
||||
<li><a name="TOC22" href="#SEC22">REVISION</a>
|
||||
</ul>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE 32-BIT API BASIC FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>pcre32 *pcre32_compile(PCRE_SPTR32 <i>pattern</i>, int <i>options</i>,</b>
|
||||
<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b> const unsigned char *<i>tableptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcre32 *pcre32_compile2(PCRE_SPTR32 <i>pattern</i>, int <i>options</i>,</b>
|
||||
<b> int *<i>errorcodeptr</i>,</b>
|
||||
<b> const unsigned char *<i>tableptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcre32_extra *pcre32_study(const pcre32 *<i>code</i>, int <i>options</i>,</b>
|
||||
<b> const char **<i>errptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre32_free_study(pcre32_extra *<i>extra</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_exec(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
|
||||
<b> PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_dfa_exec(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
|
||||
<b> PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b> int *<i>workspace</i>, int <i>wscount</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">PCRE 32-BIT API STRING EXTRACTION FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>int pcre32_copy_named_substring(const pcre32 *<i>code</i>,</b>
|
||||
<b> PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b> int <i>stringcount</i>, PCRE_SPTR32 <i>stringname</i>,</b>
|
||||
<b> PCRE_UCHAR32 *<i>buffer</i>, int <i>buffersize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_copy_substring(PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b> int <i>stringcount</i>, int <i>stringnumber</i>, PCRE_UCHAR32 *<i>buffer</i>,</b>
|
||||
<b> int <i>buffersize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_get_named_substring(const pcre32 *<i>code</i>,</b>
|
||||
<b> PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b> int <i>stringcount</i>, PCRE_SPTR32 <i>stringname</i>,</b>
|
||||
<b> PCRE_SPTR32 *<i>stringptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_get_stringnumber(const pcre32 *<i>code</i>,</b>
|
||||
<b> PCRE_SPTR32 <i>name</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_get_stringtable_entries(const pcre32 *<i>code</i>,</b>
|
||||
<b> PCRE_SPTR32 <i>name</i>, PCRE_UCHAR32 **<i>first</i>, PCRE_UCHAR32 **<i>last</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_get_substring(PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b> int <i>stringcount</i>, int <i>stringnumber</i>,</b>
|
||||
<b> PCRE_SPTR32 *<i>stringptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_get_substring_list(PCRE_SPTR32 <i>subject</i>,</b>
|
||||
<b> int *<i>ovector</i>, int <i>stringcount</i>, PCRE_SPTR32 **<i>listptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre32_free_substring(PCRE_SPTR32 <i>stringptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre32_free_substring_list(PCRE_SPTR32 *<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">PCRE 32-BIT API AUXILIARY FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>pcre32_jit_stack *pcre32_jit_stack_alloc(int <i>startsize</i>, int <i>maxsize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre32_jit_stack_free(pcre32_jit_stack *<i>stack</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre32_assign_jit_stack(pcre32_extra *<i>extra</i>,</b>
|
||||
<b> pcre32_jit_callback <i>callback</i>, void *<i>data</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>const unsigned char *pcre32_maketables(void);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_fullinfo(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
|
||||
<b> int <i>what</i>, void *<i>where</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_refcount(pcre32 *<i>code</i>, int <i>adjust</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_config(int <i>what</i>, void *<i>where</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>const char *pcre32_version(void);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_pattern_to_host_byte_order(pcre32 *<i>code</i>,</b>
|
||||
<b> pcre32_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">PCRE 32-BIT API INDIRECTED FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>void *(*pcre32_malloc)(size_t);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void (*pcre32_free)(void *);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void *(*pcre32_stack_malloc)(size_t);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void (*pcre32_stack_free)(void *);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int (*pcre32_callout)(pcre32_callout_block *);</b>
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">PCRE 32-BIT API 32-BIT-ONLY FUNCTION</a><br>
|
||||
<P>
|
||||
<b>int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *<i>output</i>,</b>
|
||||
<b> PCRE_SPTR32 <i>input</i>, int <i>length</i>, int *<i>byte_order</i>,</b>
|
||||
<b> int <i>keep_boms</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">THE PCRE 32-BIT LIBRARY</a><br>
|
||||
<P>
|
||||
Starting with release 8.32, it is possible to compile a PCRE library that
|
||||
supports 32-bit character strings, including UTF-32 strings, as well as or
|
||||
instead of the original 8-bit library. This work was done by Christian Persch,
|
||||
based on the work done by Zoltan Herczeg for the 16-bit library. All three
|
||||
libraries contain identical sets of functions, used in exactly the same way.
|
||||
Only the names of the functions and the data types of their arguments and
|
||||
results are different. To avoid over-complication and reduce the documentation
|
||||
maintenance load, most of the PCRE documentation describes the 8-bit library,
|
||||
with only occasional references to the 16-bit and 32-bit libraries. This page
|
||||
describes what is different when you use the 32-bit library.
|
||||
</P>
|
||||
<P>
|
||||
WARNING: A single application can be linked with all or any of the three
|
||||
libraries, but you must take care when processing any particular pattern
|
||||
to use functions from just one library. For example, if you want to study
|
||||
a pattern that was compiled with <b>pcre32_compile()</b>, you must do so
|
||||
with <b>pcre32_study()</b>, not <b>pcre_study()</b>, and you must free the
|
||||
study data with <b>pcre32_free_study()</b>.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">THE HEADER FILE</a><br>
|
||||
<P>
|
||||
There is only one header file, <b>pcre.h</b>. It contains prototypes for all the
|
||||
functions in all libraries, as well as definitions of flags, structures, error
|
||||
codes, etc.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">THE LIBRARY NAME</a><br>
|
||||
<P>
|
||||
In Unix-like systems, the 32-bit library is called <b>libpcre32</b>, and can
|
||||
normally be accesss by adding <b>-lpcre32</b> to the command for linking an
|
||||
application that uses PCRE.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">STRING TYPES</a><br>
|
||||
<P>
|
||||
In the 8-bit library, strings are passed to PCRE library functions as vectors
|
||||
of bytes with the C type "char *". In the 32-bit library, strings are passed as
|
||||
vectors of unsigned 32-bit quantities. The macro PCRE_UCHAR32 specifies an
|
||||
appropriate data type, and PCRE_SPTR32 is defined as "const PCRE_UCHAR32 *". In
|
||||
very many environments, "unsigned int" is a 32-bit data type. When PCRE is
|
||||
built, it defines PCRE_UCHAR32 as "unsigned int", but checks that it really is
|
||||
a 32-bit data type. If it is not, the build fails with an error message telling
|
||||
the maintainer to modify the definition appropriately.
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">STRUCTURE TYPES</a><br>
|
||||
<P>
|
||||
The types of the opaque structures that are used for compiled 32-bit patterns
|
||||
and JIT stacks are <b>pcre32</b> and <b>pcre32_jit_stack</b> respectively. The
|
||||
type of the user-accessible structure that is returned by <b>pcre32_study()</b>
|
||||
is <b>pcre32_extra</b>, and the type of the structure that is used for passing
|
||||
data to a callout function is <b>pcre32_callout_block</b>. These structures
|
||||
contain the same fields, with the same names, as their 8-bit counterparts. The
|
||||
only difference is that pointers to character strings are 32-bit instead of
|
||||
8-bit types.
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">32-BIT FUNCTIONS</a><br>
|
||||
<P>
|
||||
For every function in the 8-bit library there is a corresponding function in
|
||||
the 32-bit library with a name that starts with <b>pcre32_</b> instead of
|
||||
<b>pcre_</b>. The prototypes are listed above. In addition, there is one extra
|
||||
function, <b>pcre32_utf32_to_host_byte_order()</b>. This is a utility function
|
||||
that converts a UTF-32 character string to host byte order if necessary. The
|
||||
other 32-bit functions expect the strings they are passed to be in host byte
|
||||
order.
|
||||
</P>
|
||||
<P>
|
||||
The <i>input</i> and <i>output</i> arguments of
|
||||
<b>pcre32_utf32_to_host_byte_order()</b> may point to the same address, that is,
|
||||
conversion in place is supported. The output buffer must be at least as long as
|
||||
the input.
|
||||
</P>
|
||||
<P>
|
||||
The <i>length</i> argument specifies the number of 32-bit data units in the
|
||||
input string; a negative value specifies a zero-terminated string.
|
||||
</P>
|
||||
<P>
|
||||
If <i>byte_order</i> is NULL, it is assumed that the string starts off in host
|
||||
byte order. This may be changed by byte-order marks (BOMs) anywhere in the
|
||||
string (commonly as the first character).
|
||||
</P>
|
||||
<P>
|
||||
If <i>byte_order</i> is not NULL, a non-zero value of the integer to which it
|
||||
points means that the input starts off in host byte order, otherwise the
|
||||
opposite order is assumed. Again, BOMs in the string can change this. The final
|
||||
byte order is passed back at the end of processing.
|
||||
</P>
|
||||
<P>
|
||||
If <i>keep_boms</i> is not zero, byte-order mark characters (0xfeff) are copied
|
||||
into the output string. Otherwise they are discarded.
|
||||
</P>
|
||||
<P>
|
||||
The result of the function is the number of 32-bit units placed into the output
|
||||
buffer, including the zero terminator if the string was zero-terminated.
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">SUBJECT STRING OFFSETS</a><br>
|
||||
<P>
|
||||
The lengths and starting offsets of subject strings must be specified in 32-bit
|
||||
data units, and the offsets within subject strings that are returned by the
|
||||
matching functions are in also 32-bit units rather than bytes.
|
||||
</P>
|
||||
<br><a name="SEC13" href="#TOC1">NAMED SUBPATTERNS</a><br>
|
||||
<P>
|
||||
The name-to-number translation table that is maintained for named subpatterns
|
||||
uses 32-bit characters. The <b>pcre32_get_stringtable_entries()</b> function
|
||||
returns the length of each entry in the table as the number of 32-bit data
|
||||
units.
|
||||
</P>
|
||||
<br><a name="SEC14" href="#TOC1">OPTION NAMES</a><br>
|
||||
<P>
|
||||
There are two new general option names, PCRE_UTF32 and PCRE_NO_UTF32_CHECK,
|
||||
which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In
|
||||
fact, these new options define the same bits in the options word. There is a
|
||||
discussion about the
|
||||
<a href="pcreunicode.html#utf32strings">validity of UTF-32 strings</a>
|
||||
in the
|
||||
<a href="pcreunicode.html"><b>pcreunicode</b></a>
|
||||
page.
|
||||
</P>
|
||||
<P>
|
||||
For the <b>pcre32_config()</b> function there is an option PCRE_CONFIG_UTF32
|
||||
that returns 1 if UTF-32 support is configured, otherwise 0. If this option is
|
||||
given to <b>pcre_config()</b> or <b>pcre16_config()</b>, or if the
|
||||
PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF16 option is given to <b>pcre32_config()</b>,
|
||||
the result is the PCRE_ERROR_BADOPTION error.
|
||||
</P>
|
||||
<br><a name="SEC15" href="#TOC1">CHARACTER CODES</a><br>
|
||||
<P>
|
||||
In 32-bit mode, when PCRE_UTF32 is not set, character values are treated in the
|
||||
same way as in 8-bit, non UTF-8 mode, except, of course, that they can range
|
||||
from 0 to 0x7fffffff instead of 0 to 0xff. Character types for characters less
|
||||
than 0xff can therefore be influenced by the locale in the same way as before.
|
||||
Characters greater than 0xff have only one case, and no "type" (such as letter
|
||||
or digit).
|
||||
</P>
|
||||
<P>
|
||||
In UTF-32 mode, the character code is Unicode, in the range 0 to 0x10ffff, with
|
||||
the exception of values in the range 0xd800 to 0xdfff because those are
|
||||
"surrogate" values that are ill-formed in UTF-32.
|
||||
</P>
|
||||
<P>
|
||||
A UTF-32 string can indicate its endianness by special code knows as a
|
||||
byte-order mark (BOM). The PCRE functions do not handle this, expecting strings
|
||||
to be in host byte order. A utility function called
|
||||
<b>pcre32_utf32_to_host_byte_order()</b> is provided to help with this (see
|
||||
above).
|
||||
</P>
|
||||
<br><a name="SEC16" href="#TOC1">ERROR NAMES</a><br>
|
||||
<P>
|
||||
The error PCRE_ERROR_BADUTF32 corresponds to its 8-bit counterpart.
|
||||
The error PCRE_ERROR_BADMODE is given when a compiled
|
||||
pattern is passed to a function that processes patterns in the other
|
||||
mode, for example, if a pattern compiled with <b>pcre_compile()</b> is passed to
|
||||
<b>pcre32_exec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
There are new error codes whose names begin with PCRE_UTF32_ERR for invalid
|
||||
UTF-32 strings, corresponding to the PCRE_UTF8_ERR codes for UTF-8 strings that
|
||||
are described in the section entitled
|
||||
<a href="pcreapi.html#badutf8reasons">"Reason codes for invalid UTF-8 strings"</a>
|
||||
in the main
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page. The UTF-32 errors are:
|
||||
<pre>
|
||||
PCRE_UTF32_ERR1 Surrogate character (range from 0xd800 to 0xdfff)
|
||||
PCRE_UTF32_ERR2 Non-character
|
||||
PCRE_UTF32_ERR3 Character > 0x10ffff
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC17" href="#TOC1">ERROR TEXTS</a><br>
|
||||
<P>
|
||||
If there is an error while compiling a pattern, the error text that is passed
|
||||
back by <b>pcre32_compile()</b> or <b>pcre32_compile2()</b> is still an 8-bit
|
||||
character string, zero-terminated.
|
||||
</P>
|
||||
<br><a name="SEC18" href="#TOC1">CALLOUTS</a><br>
|
||||
<P>
|
||||
The <i>subject</i> and <i>mark</i> fields in the callout block that is passed to
|
||||
a callout function point to 32-bit vectors.
|
||||
</P>
|
||||
<br><a name="SEC19" href="#TOC1">TESTING</a><br>
|
||||
<P>
|
||||
The <b>pcretest</b> program continues to operate with 8-bit input and output
|
||||
files, but it can be used for testing the 32-bit library. If it is run with the
|
||||
command line option <b>-32</b>, patterns and subject strings are converted from
|
||||
8-bit to 32-bit before being passed to PCRE, and the 32-bit library functions
|
||||
are used instead of the 8-bit ones. Returned 32-bit strings are converted to
|
||||
8-bit for output. If both the 8-bit and the 16-bit libraries were not compiled,
|
||||
<b>pcretest</b> defaults to 32-bit and the <b>-32</b> option is ignored.
|
||||
</P>
|
||||
<P>
|
||||
When PCRE is being built, the <b>RunTest</b> script that is called by "make
|
||||
check" uses the <b>pcretest</b> <b>-C</b> option to discover which of the 8-bit,
|
||||
16-bit and 32-bit libraries has been built, and runs the tests appropriately.
|
||||
</P>
|
||||
<br><a name="SEC20" href="#TOC1">NOT SUPPORTED IN 32-BIT MODE</a><br>
|
||||
<P>
|
||||
Not all the features of the 8-bit library are available with the 32-bit
|
||||
library. The C++ and POSIX wrapper functions support only the 8-bit library,
|
||||
and the <b>pcregrep</b> program is at present 8-bit only.
|
||||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service
|
||||
<br>
|
||||
Cambridge CB2 3QH, England.
|
||||
<br>
|
||||
</P>
|
||||
<br><a name="SEC22" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 12 May 2013
|
||||
<br>
|
||||
Copyright © 1997-2013 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
@ -20,15 +20,15 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>void pcre_assign_jit_stack(pcre_extra *<i>extra</i>,</b>
|
||||
<b>pcre_jit_callback <i>callback</i>, void *<i>data</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> pcre_jit_callback <i>callback</i>, void *<i>data</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre16_assign_jit_stack(pcre16_extra *<i>extra</i>,</b>
|
||||
<b>pcre16_jit_callback <i>callback</i>, void *<i>data</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> pcre16_jit_callback <i>callback</i>, void *<i>data</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre32_assign_jit_stack(pcre32_extra *<i>extra</i>,</b>
|
||||
<b>pcre32_jit_callback <i>callback</i>, void *<i>data</i>);</b>
|
||||
<b> pcre32_jit_callback <i>callback</i>, void *<i>data</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,18 +20,18 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b> const unsigned char *<i>tableptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcre16 *pcre16_compile(PCRE_SPTR16 <i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b> const unsigned char *<i>tableptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcre32 *pcre32_compile(PCRE_SPTR32 <i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b> const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
@ -65,6 +65,7 @@ The option bits are:
|
||||
PCRE_FIRSTLINE Force matching to be before newline
|
||||
PCRE_JAVASCRIPT_COMPAT JavaScript compatibility
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NEVER_UTF Lock out UTF, e.g. via (*UTF)
|
||||
PCRE_NEWLINE_ANY Recognize any Unicode newline sequence
|
||||
PCRE_NEWLINE_ANYCRLF Recognize CR, LF, and CRLF as newline
|
||||
sequences
|
||||
@ -73,6 +74,8 @@ The option bits are:
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_NO_AUTO_POSSESS Disable auto-possessification
|
||||
PCRE_NO_START_OPTIMIZE Disable match-time start optimizations
|
||||
PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16
|
||||
validity (only relevant if
|
||||
PCRE_UTF16 is set)
|
||||
|
@ -20,21 +20,21 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>pcre *pcre_compile2(const char *<i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>int *<i>errorcodeptr</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int *<i>errorcodeptr</i>,</b>
|
||||
<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b> const unsigned char *<i>tableptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcre16 *pcre16_compile2(PCRE_SPTR16 <i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>int *<i>errorcodeptr</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int *<i>errorcodeptr</i>,</b>
|
||||
<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b> const unsigned char *<i>tableptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcre32 *pcre32_compile2(PCRE_SPTR32 <i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>int *<i>errorcodeptr</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
<b>" int *<i>errorcodeptr</i>,£</b>
|
||||
<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b> const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
@ -69,6 +69,7 @@ The option bits are:
|
||||
PCRE_FIRSTLINE Force matching to be before newline
|
||||
PCRE_JAVASCRIPT_COMPAT JavaScript compatibility
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NEVER_UTF Lock out UTF, e.g. via (*UTF)
|
||||
PCRE_NEWLINE_ANY Recognize any Unicode newline sequence
|
||||
PCRE_NEWLINE_ANYCRLF Recognize CR, LF, and CRLF as newline
|
||||
sequences
|
||||
@ -77,6 +78,8 @@ The option bits are:
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_NO_AUTO_POSSESS Disable auto-possessification
|
||||
PCRE_NO_START_OPTIMIZE Disable match-time start optimizations
|
||||
PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16
|
||||
validity (only relevant if
|
||||
PCRE_UTF16 is set)
|
||||
|
@ -48,6 +48,7 @@ point to an unsigned long integer. The available codes are:
|
||||
target architecture for the JIT compiler,
|
||||
or NULL if there is no JIT support
|
||||
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
|
||||
PCRE_CONFIG_PARENS_LIMIT Parentheses nesting limit
|
||||
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
|
||||
PCRE_CONFIG_MATCH_LIMIT_RECURSION
|
||||
Internal recursion depth limit
|
||||
|
@ -20,21 +20,21 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
|
||||
<b>char *<i>buffer</i>, int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b> int <i>stringcount</i>, const char *<i>stringname</i>,</b>
|
||||
<b> char *<i>buffer</i>, int <i>buffersize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_copy_named_substring(const pcre16 *<i>code</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, PCRE_SPTR16 <i>stringname</i>,</b>
|
||||
<b>PCRE_UCHAR16 *<i>buffer</i>, int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b> int <i>stringcount</i>, PCRE_SPTR16 <i>stringname</i>,</b>
|
||||
<b> PCRE_UCHAR16 *<i>buffer</i>, int <i>buffersize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_copy_named_substring(const pcre32 *<i>code</i>,</b>
|
||||
<b>PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, PCRE_SPTR32 <i>stringname</i>,</b>
|
||||
<b>PCRE_UCHAR32 *<i>buffer</i>, int <i>buffersize</i>);</b>
|
||||
<b> PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b> int <i>stringcount</i>, PCRE_SPTR32 <i>stringname</i>,</b>
|
||||
<b> PCRE_UCHAR32 *<i>buffer</i>, int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,18 +20,18 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
|
||||
<b>int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
|
||||
<b> int <i>buffersize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_copy_substring(PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>, PCRE_UCHAR16 *<i>buffer</i>,</b>
|
||||
<b>int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int <i>stringcount</i>, int <i>stringnumber</i>, PCRE_UCHAR16 *<i>buffer</i>,</b>
|
||||
<b> int <i>buffersize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_copy_substring(PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>, PCRE_UCHAR32 *<i>buffer</i>,</b>
|
||||
<b>int <i>buffersize</i>);</b>
|
||||
<b> int <i>stringcount</i>, int <i>stringnumber</i>, PCRE_UCHAR32 *<i>buffer</i>,</b>
|
||||
<b> int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,21 +20,21 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_dfa_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
|
||||
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b>int *<i>workspace</i>, int <i>wscount</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b> int *<i>workspace</i>, int <i>wscount</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_dfa_exec(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b>int *<i>workspace</i>, int <i>wscount</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b> int *<i>workspace</i>, int <i>wscount</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_dfa_exec(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
|
||||
<b>PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b>int *<i>workspace</i>, int <i>wscount</i>);</b>
|
||||
<b> PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b> int *<i>workspace</i>, int <i>wscount</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
@ -50,16 +50,17 @@ are:
|
||||
<i>extra</i> Points to an associated <b>pcre[16|32]_extra</b> structure,
|
||||
or is NULL
|
||||
<i>subject</i> Points to the subject string
|
||||
<i>length</i> Length of the subject string, in bytes
|
||||
<i>startoffset</i> Offset in bytes in the subject at which to
|
||||
start matching
|
||||
<i>length</i> Length of the subject string
|
||||
<i>startoffset</i> Offset in the subject at which to start matching
|
||||
<i>options</i> Option bits
|
||||
<i>ovector</i> Points to a vector of ints for result offsets
|
||||
<i>ovecsize</i> Number of elements in the vector
|
||||
<i>workspace</i> Points to a vector of ints used as working space
|
||||
<i>wscount</i> Number of elements in the vector
|
||||
</pre>
|
||||
The options are:
|
||||
The units for <i>length</i> and <i>startoffset</i> are bytes for
|
||||
<b>pcre_exec()</b>, 16-bit data items for <b>pcre16_exec()</b>, and 32-bit items
|
||||
for <b>pcre32_exec()</b>. The options are:
|
||||
<pre>
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_BSR_ANYCRLF \R matches only CR, LF, or CRLF
|
||||
|
@ -20,18 +20,18 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
|
||||
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_exec(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_exec(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
|
||||
<b>PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
|
||||
<b> PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
@ -45,14 +45,15 @@ offsets to captured substrings. Its arguments are:
|
||||
<i>extra</i> Points to an associated <b>pcre[16|32]_extra</b> structure,
|
||||
or is NULL
|
||||
<i>subject</i> Points to the subject string
|
||||
<i>length</i> Length of the subject string, in bytes
|
||||
<i>startoffset</i> Offset in bytes in the subject at which to
|
||||
start matching
|
||||
<i>length</i> Length of the subject string
|
||||
<i>startoffset</i> Offset in the subject at which to start matching
|
||||
<i>options</i> Option bits
|
||||
<i>ovector</i> Points to a vector of ints for result offsets
|
||||
<i>ovecsize</i> Number of elements in the vector (a multiple of 3)
|
||||
</pre>
|
||||
The options are:
|
||||
The units for <i>length</i> and <i>startoffset</i> are bytes for
|
||||
<b>pcre_exec()</b>, 16-bit data items for <b>pcre16_exec()</b>, and 32-bit items
|
||||
for <b>pcre32_exec()</b>. The options are:
|
||||
<pre>
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_BSR_ANYCRLF \R matches only CR, LF, or CRLF
|
||||
|
@ -20,15 +20,15 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_fullinfo(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
|
||||
<b>int <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int <i>what</i>, void *<i>where</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_fullinfo(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
|
||||
<b>int <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int <i>what</i>, void *<i>where</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_fullinfo(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
|
||||
<b>int <i>what</i>, void *<i>where</i>);</b>
|
||||
<b> int <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,21 +20,21 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_named_substring(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
|
||||
<b>const char **<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b> int <i>stringcount</i>, const char *<i>stringname</i>,</b>
|
||||
<b> const char **<i>stringptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_get_named_substring(const pcre16 *<i>code</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, PCRE_SPTR16 <i>stringname</i>,</b>
|
||||
<b>PCRE_SPTR16 *<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b> int <i>stringcount</i>, PCRE_SPTR16 <i>stringname</i>,</b>
|
||||
<b> PCRE_SPTR16 *<i>stringptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_get_named_substring(const pcre32 *<i>code</i>,</b>
|
||||
<b>PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, PCRE_SPTR32 <i>stringname</i>,</b>
|
||||
<b>PCRE_SPTR32 *<i>stringptr</i>);</b>
|
||||
<b> PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b> int <i>stringcount</i>, PCRE_SPTR32 <i>stringname</i>,</b>
|
||||
<b> PCRE_SPTR32 *<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,15 +20,15 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>name</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char *<i>name</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_get_stringnumber(const pcre16 *<i>code</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>name</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> PCRE_SPTR16 <i>name</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_get_stringnumber(const pcre32 *<i>code</i>,</b>
|
||||
<b>PCRE_SPTR32 <i>name</i>);</b>
|
||||
<b> PCRE_SPTR32 <i>name</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,15 +20,15 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_stringtable_entries(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>name</i>, char **<i>first</i>, char **<i>last</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char *<i>name</i>, char **<i>first</i>, char **<i>last</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_get_stringtable_entries(const pcre16 *<i>code</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>name</i>, PCRE_UCHAR16 **<i>first</i>, PCRE_UCHAR16 **<i>last</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> PCRE_SPTR16 <i>name</i>, PCRE_UCHAR16 **<i>first</i>, PCRE_UCHAR16 **<i>last</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_get_stringtable_entries(const pcre32 *<i>code</i>,</b>
|
||||
<b>PCRE_SPTR32 <i>name</i>, PCRE_UCHAR32 **<i>first</i>, PCRE_UCHAR32 **<i>last</i>);</b>
|
||||
<b> PCRE_SPTR32 <i>name</i>, PCRE_UCHAR32 **<i>first</i>, PCRE_UCHAR32 **<i>last</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,18 +20,18 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>,</b>
|
||||
<b>const char **<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int <i>stringcount</i>, int <i>stringnumber</i>,</b>
|
||||
<b> const char **<i>stringptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_get_substring(PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>,</b>
|
||||
<b>PCRE_SPTR16 *<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int <i>stringcount</i>, int <i>stringnumber</i>,</b>
|
||||
<b> PCRE_SPTR16 *<i>stringptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_get_substring(PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>,</b>
|
||||
<b>PCRE_SPTR32 *<i>stringptr</i>);</b>
|
||||
<b> int <i>stringcount</i>, int <i>stringnumber</i>,</b>
|
||||
<b> PCRE_SPTR32 *<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,15 +20,15 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_substring_list(const char *<i>subject</i>,</b>
|
||||
<b>int *<i>ovector</i>, int <i>stringcount</i>, const char ***<i>listptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int *<i>ovector</i>, int <i>stringcount</i>, const char ***<i>listptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_get_substring_list(PCRE_SPTR16 <i>subject</i>,</b>
|
||||
<b>int *<i>ovector</i>, int <i>stringcount</i>, PCRE_SPTR16 **<i>listptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int *<i>ovector</i>, int <i>stringcount</i>, PCRE_SPTR16 **<i>listptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_get_substring_list(PCRE_SPTR32 <i>subject</i>,</b>
|
||||
<b>int *<i>ovector</i>, int <i>stringcount</i>, PCRE_SPTR32 **<i>listptr</i>);</b>
|
||||
<b> int *<i>ovector</i>, int <i>stringcount</i>, PCRE_SPTR32 **<i>listptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,21 +20,21 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_jit_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
|
||||
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b>pcre_jit_stack *<i>jstack</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b> pcre_jit_stack *<i>jstack</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_jit_exec(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b>pcre_jit_stack *<i>jstack</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b> pcre_jit_stack *<i>jstack</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_jit_exec(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
|
||||
<b>PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b>pcre_jit_stack *<i>jstack</i>);</b>
|
||||
<b> PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b> pcre_jit_stack *<i>jstack</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,15 +20,15 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>pcre_jit_stack *pcre_jit_stack_alloc(int <i>startsize</i>,</b>
|
||||
<b>int <i>maxsize</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int <i>maxsize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcre16_jit_stack *pcre16_jit_stack_alloc(int <i>startsize</i>,</b>
|
||||
<b>int <i>maxsize</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int <i>maxsize</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcre32_jit_stack *pcre32_jit_stack_alloc(int <i>startsize</i>,</b>
|
||||
<b>int <i>maxsize</i>);</b>
|
||||
<b> int <i>maxsize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,15 +20,15 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_pattern_to_host_byte_order(pcre *<i>code</i>,</b>
|
||||
<b>pcre_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> pcre_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre16_pattern_to_host_byte_order(pcre16 *<i>code</i>,</b>
|
||||
<b>pcre16_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> pcre16_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre32_pattern_to_host_byte_order(pcre32 *<i>code</i>,</b>
|
||||
<b>pcre32_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
|
||||
<b> pcre32_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,15 +20,15 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char **<i>errptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcre16_extra *pcre16_study(const pcre16 *<i>code</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> const char **<i>errptr</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcre32_extra *pcre32_study(const pcre32 *<i>code</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>);</b>
|
||||
<b> const char **<i>errptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
@ -20,8 +20,8 @@ SYNOPSIS
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *<i>output</i>,</b>
|
||||
<b>PCRE_SPTR16 <i>input</i>, int <i>length</i>, int *<i>host_byte_order</i>,</b>
|
||||
<b>int <i>keep_boms</i>);</b>
|
||||
<b> PCRE_SPTR16 <i>input</i>, int <i>length</i>, int *<i>host_byte_order</i>,</b>
|
||||
<b> int <i>keep_boms</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
|
57
tools/pcre/doc/html/pcre_utf32_to_host_byte_order.html
Normal file
57
tools/pcre/doc/html/pcre_utf32_to_host_byte_order.html
Normal file
@ -0,0 +1,57 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_utf32_to_host_byte_order specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_utf32_to_host_byte_order man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *<i>output</i>,</b>
|
||||
<b> PCRE_SPTR32 <i>input</i>, int <i>length</i>, int *<i>host_byte_order</i>,</b>
|
||||
<b> int <i>keep_boms</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function, which exists only in the 32-bit library, converts a UTF-32
|
||||
string to the correct order for the current host, taking account of any byte
|
||||
order marks (BOMs) within the string. Its arguments are:
|
||||
<pre>
|
||||
<i>output</i> pointer to output buffer, may be the same as <i>input</i>
|
||||
<i>input</i> pointer to input buffer
|
||||
<i>length</i> number of 32-bit units in the input, or negative for
|
||||
a zero-terminated string
|
||||
<i>host_byte_order</i> a NULL value or a non-zero value pointed to means
|
||||
start in host byte order
|
||||
<i>keep_boms</i> if non-zero, BOMs are copied to the output string
|
||||
</pre>
|
||||
The result of the function is the number of 32-bit units placed into the output
|
||||
buffer, including the zero terminator if the string was zero-terminated.
|
||||
</P>
|
||||
<P>
|
||||
If <i>host_byte_order</i> is not NULL, it is set to indicate the byte order that
|
||||
is current at the end of the string.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
File diff suppressed because it is too large
Load Diff
@ -13,46 +13,63 @@ from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
|
||||
<li><a name="TOC2" href="#SEC2">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
|
||||
<li><a name="TOC3" href="#SEC3">BUILDING SHARED AND STATIC LIBRARIES</a>
|
||||
<li><a name="TOC4" href="#SEC4">C++ SUPPORT</a>
|
||||
<li><a name="TOC5" href="#SEC5">UTF-8, UTF-16 AND UTF-32 SUPPORT</a>
|
||||
<li><a name="TOC6" href="#SEC6">UNICODE CHARACTER PROPERTY SUPPORT</a>
|
||||
<li><a name="TOC7" href="#SEC7">JUST-IN-TIME COMPILER SUPPORT</a>
|
||||
<li><a name="TOC8" href="#SEC8">CODE VALUE OF NEWLINE</a>
|
||||
<li><a name="TOC9" href="#SEC9">WHAT \R MATCHES</a>
|
||||
<li><a name="TOC10" href="#SEC10">POSIX MALLOC USAGE</a>
|
||||
<li><a name="TOC11" href="#SEC11">HANDLING VERY LARGE PATTERNS</a>
|
||||
<li><a name="TOC12" href="#SEC12">AVOIDING EXCESSIVE STACK USAGE</a>
|
||||
<li><a name="TOC13" href="#SEC13">LIMITING PCRE RESOURCE USAGE</a>
|
||||
<li><a name="TOC14" href="#SEC14">CREATING CHARACTER TABLES AT BUILD TIME</a>
|
||||
<li><a name="TOC15" href="#SEC15">USING EBCDIC CODE</a>
|
||||
<li><a name="TOC16" href="#SEC16">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
|
||||
<li><a name="TOC17" href="#SEC17">PCREGREP BUFFER SIZE</a>
|
||||
<li><a name="TOC18" href="#SEC18">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a>
|
||||
<li><a name="TOC19" href="#SEC19">DEBUGGING WITH VALGRIND SUPPORT</a>
|
||||
<li><a name="TOC20" href="#SEC20">CODE COVERAGE REPORTING</a>
|
||||
<li><a name="TOC21" href="#SEC21">SEE ALSO</a>
|
||||
<li><a name="TOC22" href="#SEC22">AUTHOR</a>
|
||||
<li><a name="TOC23" href="#SEC23">REVISION</a>
|
||||
<li><a name="TOC1" href="#SEC1">BUILDING PCRE</a>
|
||||
<li><a name="TOC2" href="#SEC2">PCRE BUILD-TIME OPTIONS</a>
|
||||
<li><a name="TOC3" href="#SEC3">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
|
||||
<li><a name="TOC4" href="#SEC4">BUILDING SHARED AND STATIC LIBRARIES</a>
|
||||
<li><a name="TOC5" href="#SEC5">C++ SUPPORT</a>
|
||||
<li><a name="TOC6" href="#SEC6">UTF-8, UTF-16 AND UTF-32 SUPPORT</a>
|
||||
<li><a name="TOC7" href="#SEC7">UNICODE CHARACTER PROPERTY SUPPORT</a>
|
||||
<li><a name="TOC8" href="#SEC8">JUST-IN-TIME COMPILER SUPPORT</a>
|
||||
<li><a name="TOC9" href="#SEC9">CODE VALUE OF NEWLINE</a>
|
||||
<li><a name="TOC10" href="#SEC10">WHAT \R MATCHES</a>
|
||||
<li><a name="TOC11" href="#SEC11">POSIX MALLOC USAGE</a>
|
||||
<li><a name="TOC12" href="#SEC12">HANDLING VERY LARGE PATTERNS</a>
|
||||
<li><a name="TOC13" href="#SEC13">AVOIDING EXCESSIVE STACK USAGE</a>
|
||||
<li><a name="TOC14" href="#SEC14">LIMITING PCRE RESOURCE USAGE</a>
|
||||
<li><a name="TOC15" href="#SEC15">CREATING CHARACTER TABLES AT BUILD TIME</a>
|
||||
<li><a name="TOC16" href="#SEC16">USING EBCDIC CODE</a>
|
||||
<li><a name="TOC17" href="#SEC17">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
|
||||
<li><a name="TOC18" href="#SEC18">PCREGREP BUFFER SIZE</a>
|
||||
<li><a name="TOC19" href="#SEC19">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a>
|
||||
<li><a name="TOC20" href="#SEC20">DEBUGGING WITH VALGRIND SUPPORT</a>
|
||||
<li><a name="TOC21" href="#SEC21">CODE COVERAGE REPORTING</a>
|
||||
<li><a name="TOC22" href="#SEC22">SEE ALSO</a>
|
||||
<li><a name="TOC23" href="#SEC23">AUTHOR</a>
|
||||
<li><a name="TOC24" href="#SEC24">REVISION</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
|
||||
<br><a name="SEC1" href="#TOC1">BUILDING PCRE</a><br>
|
||||
<P>
|
||||
This document describes the optional features of PCRE that can be selected when
|
||||
the library is compiled. It assumes use of the <b>configure</b> script, where
|
||||
the optional features are selected or deselected by providing options to
|
||||
<b>configure</b> before running the <b>make</b> command. However, the same
|
||||
options can be selected in both Unix-like and non-Unix-like environments using
|
||||
the GUI facility of <b>cmake-gui</b> if you are using <b>CMake</b> instead of
|
||||
<b>configure</b> to build PCRE.
|
||||
PCRE is distributed with a <b>configure</b> script that can be used to build the
|
||||
library in Unix-like environments using the applications known as Autotools.
|
||||
Also in the distribution are files to support building using <b>CMake</b>
|
||||
instead of <b>configure</b>. The text file
|
||||
<a href="README.txt"><b>README</b></a>
|
||||
contains general information about building with Autotools (some of which is
|
||||
repeated below), and also has some comments about building on various operating
|
||||
systems. There is a lot more information about building PCRE without using
|
||||
Autotools (including information about using <b>CMake</b> and building "by
|
||||
hand") in the text file called
|
||||
<a href="NON-AUTOTOOLS-BUILD.txt"><b>NON-AUTOTOOLS-BUILD</b>.</a>
|
||||
You should consult this file as well as the
|
||||
<a href="README.txt"><b>README</b></a>
|
||||
file if you are building in a non-Unix-like environment.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
|
||||
<P>
|
||||
The rest of this document describes the optional features of PCRE that can be
|
||||
selected when the library is compiled. It assumes use of the <b>configure</b>
|
||||
script, where the optional features are selected or deselected by providing
|
||||
options to <b>configure</b> before running the <b>make</b> command. However, the
|
||||
same options can be selected in both Unix-like and non-Unix-like environments
|
||||
using the GUI facility of <b>cmake-gui</b> if you are using <b>CMake</b> instead
|
||||
of <b>configure</b> to build PCRE.
|
||||
</P>
|
||||
<P>
|
||||
There is a lot more information about building PCRE without using
|
||||
<b>configure</b> (including information about using <b>CMake</b> or building "by
|
||||
hand") in the file called <i>NON-AUTOTOOLS-BUILD</i>, which is part of the PCRE
|
||||
distribution. You should consult this file as well as the <i>README</i> file if
|
||||
you are building in a non-Unix-like environment.
|
||||
If you are not using Autotools or <b>CMake</b>, option selection can be done by
|
||||
editing the <b>config.h</b> file, or by passing parameter settings to the
|
||||
compiler, as described in
|
||||
<a href="NON-AUTOTOOLS-BUILD.txt"><b>NON-AUTOTOOLS-BUILD</b>.</a>
|
||||
</P>
|
||||
<P>
|
||||
The complete list of options for <b>configure</b> (which includes the standard
|
||||
@ -67,7 +84,7 @@ The following sections include descriptions of options whose names begin with
|
||||
--enable and --disable always come in pairs, so the complementary option always
|
||||
exists as well, but as it specifies the default, it is not described.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
|
||||
<br><a name="SEC3" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
|
||||
<P>
|
||||
By default, a library called <b>libpcre</b> is built, containing functions that
|
||||
take string arguments contained in vectors of bytes, either as single-byte
|
||||
@ -78,7 +95,7 @@ strings, by adding
|
||||
<pre>
|
||||
--enable-pcre16
|
||||
</pre>
|
||||
to the <b>configure</b> command. You can also build a separate
|
||||
to the <b>configure</b> command. You can also build yet another separate
|
||||
library, called <b>libpcre32</b>, in which strings are contained in vectors of
|
||||
32-bit data units and interpreted either as single-unit characters or UTF-32
|
||||
strings, by adding
|
||||
@ -94,17 +111,17 @@ and POSIX wrappers are for the 8-bit library only, and that <b>pcregrep</b> is
|
||||
an 8-bit program. None of these are built if you select only the 16-bit or
|
||||
32-bit libraries.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
|
||||
<br><a name="SEC4" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
|
||||
<P>
|
||||
The PCRE building process uses <b>libtool</b> to build both shared and static
|
||||
Unix libraries by default. You can suppress one of these by adding one of
|
||||
The Autotools PCRE building process uses <b>libtool</b> to build both shared and
|
||||
static libraries by default. You can suppress one of these by adding one of
|
||||
<pre>
|
||||
--disable-shared
|
||||
--disable-static
|
||||
</pre>
|
||||
to the <b>configure</b> command, as required.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">C++ SUPPORT</a><br>
|
||||
<br><a name="SEC5" href="#TOC1">C++ SUPPORT</a><br>
|
||||
<P>
|
||||
By default, if the 8-bit library is being built, the <b>configure</b> script
|
||||
will search for a C++ compiler and C++ header files. If it finds them, it
|
||||
@ -115,7 +132,7 @@ strings). You can disable this by adding
|
||||
</pre>
|
||||
to the <b>configure</b> command.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">UTF-8, UTF-16 AND UTF-32 SUPPORT</a><br>
|
||||
<br><a name="SEC6" href="#TOC1">UTF-8, UTF-16 AND UTF-32 SUPPORT</a><br>
|
||||
<P>
|
||||
To build PCRE with support for UTF Unicode character strings, add
|
||||
<pre>
|
||||
@ -143,7 +160,7 @@ not possible to support both EBCDIC and UTF-8 codes in the same version of the
|
||||
library. Consequently, --enable-utf and --enable-ebcdic are mutually
|
||||
exclusive.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
|
||||
<br><a name="SEC7" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
|
||||
<P>
|
||||
UTF support allows the libraries to process character codepoints up to 0x10ffff
|
||||
in the strings that they handle. On its own, however, it does not provide any
|
||||
@ -163,7 +180,7 @@ supported. Details are given in the
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
|
||||
<br><a name="SEC8" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
|
||||
<P>
|
||||
Just-in-time compiler support is included in the build by specifying
|
||||
<pre>
|
||||
@ -180,7 +197,7 @@ pcregrep automatically makes use of it, unless you add
|
||||
</pre>
|
||||
to the "configure" command.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
|
||||
<br><a name="SEC9" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
|
||||
<P>
|
||||
By default, PCRE interprets the linefeed (LF) character as indicating the end
|
||||
of a line. This is the normal newline character on Unix-like systems. You can
|
||||
@ -213,7 +230,7 @@ Whatever line ending convention is selected when PCRE is built can be
|
||||
overridden when the library functions are called. At build time it is
|
||||
conventional to use the standard for your operating system.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br>
|
||||
<br><a name="SEC10" href="#TOC1">WHAT \R MATCHES</a><br>
|
||||
<P>
|
||||
By default, the sequence \R in a pattern matches any Unicode newline sequence,
|
||||
whatever has been selected as the line ending sequence. If you specify
|
||||
@ -224,7 +241,7 @@ the default is changed so that \R matches only CR, LF, or CRLF. Whatever is
|
||||
selected when PCRE is built can be overridden when the library functions are
|
||||
called.
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">POSIX MALLOC USAGE</a><br>
|
||||
<br><a name="SEC11" href="#TOC1">POSIX MALLOC USAGE</a><br>
|
||||
<P>
|
||||
When the 8-bit library is called through the POSIX interface (see the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
@ -240,7 +257,7 @@ such as
|
||||
</pre>
|
||||
to the <b>configure</b> command.
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
|
||||
<br><a name="SEC12" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
|
||||
<P>
|
||||
Within a compiled pattern, offset values are used to point from one part to
|
||||
another (for example, from an opening parenthesis to an alternation
|
||||
@ -259,7 +276,7 @@ longer offsets slows down the operation of PCRE because it has to load
|
||||
additional data when handling them. For the 32-bit library the value is always
|
||||
4 and cannot be overridden; the value of --with-link-size is ignored.
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
|
||||
<br><a name="SEC13" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
|
||||
<P>
|
||||
When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
|
||||
by making recursive calls to an internal function called <b>match()</b>. In
|
||||
@ -290,7 +307,7 @@ perform better than <b>malloc()</b> and <b>free()</b>. PCRE runs noticeably more
|
||||
slowly when built in this way. This option affects only the <b>pcre_exec()</b>
|
||||
function; it is not relevant for <b>pcre_dfa_exec()</b>.
|
||||
</P>
|
||||
<br><a name="SEC13" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
|
||||
<br><a name="SEC14" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
|
||||
<P>
|
||||
Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
|
||||
(sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>
|
||||
@ -319,7 +336,7 @@ constraints. However, you can set a lower limit by adding, for example,
|
||||
</pre>
|
||||
to the <b>configure</b> command. This value can also be overridden at run time.
|
||||
</P>
|
||||
<br><a name="SEC14" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
|
||||
<br><a name="SEC15" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
|
||||
<P>
|
||||
PCRE uses fixed tables for processing characters whose code values are less
|
||||
than 256. By default, PCRE is built with a set of tables that are distributed
|
||||
@ -336,7 +353,7 @@ compiling, because <b>dftables</b> is run on the local host. If you need to
|
||||
create alternative tables when cross compiling, you will have to do so "by
|
||||
hand".)
|
||||
</P>
|
||||
<br><a name="SEC15" href="#TOC1">USING EBCDIC CODE</a><br>
|
||||
<br><a name="SEC16" href="#TOC1">USING EBCDIC CODE</a><br>
|
||||
<P>
|
||||
PCRE assumes by default that it will run in an environment where the character
|
||||
code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
|
||||
@ -367,7 +384,7 @@ The options that select newline behaviour, such as --enable-newline-is-cr,
|
||||
and equivalent run-time options, refer to these character values in an EBCDIC
|
||||
environment.
|
||||
</P>
|
||||
<br><a name="SEC16" href="#TOC1">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
|
||||
<br><a name="SEC17" href="#TOC1">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
|
||||
<P>
|
||||
By default, <b>pcregrep</b> reads all files as plain text. You can build it so
|
||||
that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads
|
||||
@ -380,7 +397,7 @@ to the <b>configure</b> command. These options naturally require that the
|
||||
relevant libraries are installed on your system. Configuration will fail if
|
||||
they are not.
|
||||
</P>
|
||||
<br><a name="SEC17" href="#TOC1">PCREGREP BUFFER SIZE</a><br>
|
||||
<br><a name="SEC18" href="#TOC1">PCREGREP BUFFER SIZE</a><br>
|
||||
<P>
|
||||
<b>pcregrep</b> uses an internal buffer to hold a "window" on the file it is
|
||||
scanning, in order to be able to output "before" and "after" lines when it
|
||||
@ -395,7 +412,7 @@ parameter value by adding, for example,
|
||||
to the <b>configure</b> command. The caller of \fPpcregrep\fP can, however,
|
||||
override this value by specifying a run-time option.
|
||||
</P>
|
||||
<br><a name="SEC18" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>
|
||||
<br><a name="SEC19" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>
|
||||
<P>
|
||||
If you add
|
||||
<pre>
|
||||
@ -426,7 +443,7 @@ automatically included, you may need to add something like
|
||||
</pre>
|
||||
immediately before the <b>configure</b> command.
|
||||
</P>
|
||||
<br><a name="SEC19" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
|
||||
<br><a name="SEC20" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
|
||||
<P>
|
||||
By adding the
|
||||
<pre>
|
||||
@ -436,7 +453,7 @@ option to to the <b>configure</b> command, PCRE will use valgrind annotations
|
||||
to mark certain memory regions as unaddressable. This allows it to detect
|
||||
invalid memory accesses, and is mostly useful for debugging PCRE itself.
|
||||
</P>
|
||||
<br><a name="SEC20" href="#TOC1">CODE COVERAGE REPORTING</a><br>
|
||||
<br><a name="SEC21" href="#TOC1">CODE COVERAGE REPORTING</a><br>
|
||||
<P>
|
||||
If your C compiler is gcc, you can build a version of PCRE that can generate a
|
||||
code coverage report for its test suite. To enable this, you must install
|
||||
@ -493,11 +510,11 @@ This cleans all coverage data including the generated coverage report. For more
|
||||
information about code coverage, see the <b>gcov</b> and <b>lcov</b>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">SEE ALSO</a><br>
|
||||
<br><a name="SEC22" href="#TOC1">SEE ALSO</a><br>
|
||||
<P>
|
||||
<b>pcreapi</b>(3), <b>pcre16</b>, <b>pcre32</b>, <b>pcre_config</b>(3).
|
||||
</P>
|
||||
<br><a name="SEC22" href="#TOC1">AUTHOR</a><br>
|
||||
<br><a name="SEC23" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
@ -506,11 +523,11 @@ University Computing Service
|
||||
Cambridge CB2 3QH, England.
|
||||
<br>
|
||||
</P>
|
||||
<br><a name="SEC23" href="#TOC1">REVISION</a><br>
|
||||
<br><a name="SEC24" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 30 October 2012
|
||||
Last updated: 12 May 2013
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2013 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
@ -64,23 +64,63 @@ it is processed as if it were
|
||||
<br>
|
||||
<br>
|
||||
Notice that there is a callout before and after each parenthesis and
|
||||
alternation bar. Automatic callouts can be used for tracking the progress of
|
||||
pattern matching. The
|
||||
<a href="pcretest.html"><b>pcretest</b></a>
|
||||
command has an option that sets automatic callouts; when it is used, the output
|
||||
indicates how the pattern is matched. This is useful information when you are
|
||||
trying to optimize the performance of a particular pattern.
|
||||
alternation bar. If the pattern contains a conditional group whose condition is
|
||||
an assertion, an automatic callout is inserted immediately before the
|
||||
condition. Such a callout may also be inserted explicitly, for example:
|
||||
<pre>
|
||||
(?(?C9)(?=a)ab|de)
|
||||
</pre>
|
||||
This applies only to assertion conditions (because they are themselves
|
||||
independent groups).
|
||||
</P>
|
||||
<P>
|
||||
The use of callouts in a pattern makes it ineligible for optimization by the
|
||||
just-in-time compiler. Studying such a pattern with the PCRE_STUDY_JIT_COMPILE
|
||||
option always fails.
|
||||
Automatic callouts can be used for tracking the progress of pattern matching.
|
||||
The
|
||||
<a href="pcretest.html"><b>pcretest</b></a>
|
||||
program has a pattern qualifier (/C) that sets automatic callouts; when it is
|
||||
used, the output indicates how the pattern is being matched. This is useful
|
||||
information when you are trying to optimize the performance of a particular
|
||||
pattern.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">MISSING CALLOUTS</a><br>
|
||||
<P>
|
||||
You should be aware that, because of optimizations in the way PCRE matches
|
||||
patterns by default, callouts sometimes do not happen. For example, if the
|
||||
pattern is
|
||||
You should be aware that, because of optimizations in the way PCRE compiles and
|
||||
matches patterns, callouts sometimes do not happen exactly as you might expect.
|
||||
</P>
|
||||
<P>
|
||||
At compile time, PCRE "auto-possessifies" repeated items when it knows that
|
||||
what follows cannot be part of the repeat. For example, a+[bc] is compiled as
|
||||
if it were a++[bc]. The <b>pcretest</b> output when this pattern is anchored and
|
||||
then applied with automatic callouts to the string "aaaa" is:
|
||||
<pre>
|
||||
--->aaaa
|
||||
+0 ^ ^
|
||||
+1 ^ a+
|
||||
+3 ^ ^ [bc]
|
||||
No match
|
||||
</pre>
|
||||
This indicates that when matching [bc] fails, there is no backtracking into a+
|
||||
and therefore the callouts that would be taken for the backtracks do not occur.
|
||||
You can disable the auto-possessify feature by passing PCRE_NO_AUTO_POSSESS
|
||||
to <b>pcre_compile()</b>, or starting the pattern with (*NO_AUTO_POSSESS). If
|
||||
this is done in <b>pcretest</b> (using the /O qualifier), the output changes to
|
||||
this:
|
||||
<pre>
|
||||
--->aaaa
|
||||
+0 ^ ^
|
||||
+1 ^ a+
|
||||
+3 ^ ^ [bc]
|
||||
+3 ^ ^ [bc]
|
||||
+3 ^ ^ [bc]
|
||||
+3 ^^ [bc]
|
||||
No match
|
||||
</pre>
|
||||
This time, when matching [bc] fails, the matcher backtracks into a+ and tries
|
||||
again, repeatedly, until a+ itself fails.
|
||||
</P>
|
||||
<P>
|
||||
Other optimizations that provide fast "no match" results also affect callouts.
|
||||
For example, if the pattern is
|
||||
<pre>
|
||||
ab(?C4)cd
|
||||
</pre>
|
||||
@ -104,11 +144,11 @@ callouts such as the example above are obeyed.
|
||||
<br><a name="SEC4" href="#TOC1">THE CALLOUT INTERFACE</a><br>
|
||||
<P>
|
||||
During matching, when PCRE reaches a callout point, the external function
|
||||
defined by <i>pcre_callout</i> or <i>pcre[16|32]_callout</i> is called
|
||||
(if it is set). This applies to both normal and DFA matching. The only
|
||||
argument to the callout function is a pointer to a <b>pcre_callout</b>
|
||||
or <b>pcre[16|32]_callout</b> block.
|
||||
These structures contains the following fields:
|
||||
defined by <i>pcre_callout</i> or <i>pcre[16|32]_callout</i> is called (if it is
|
||||
set). This applies to both normal and DFA matching. The only argument to the
|
||||
callout function is a pointer to a <b>pcre_callout</b> or
|
||||
<b>pcre[16|32]_callout</b> block. These structures contains the following
|
||||
fields:
|
||||
<pre>
|
||||
int <i>version</i>;
|
||||
int <i>callout_number</i>;
|
||||
@ -141,10 +181,10 @@ automatically generated callouts).
|
||||
<P>
|
||||
The <i>offset_vector</i> field is a pointer to the vector of offsets that was
|
||||
passed by the caller to the matching function. When <b>pcre_exec()</b> or
|
||||
<b>pcre[16|32]_exec()</b> is used, the contents can be inspected, in order to extract
|
||||
substrings that have been matched so far, in the same way as for extracting
|
||||
substrings after a match has completed. For the DFA matching functions, this
|
||||
field is not useful.
|
||||
<b>pcre[16|32]_exec()</b> is used, the contents can be inspected, in order to
|
||||
extract substrings that have been matched so far, in the same way as for
|
||||
extracting substrings after a match has completed. For the DFA matching
|
||||
functions, this field is not useful.
|
||||
</P>
|
||||
<P>
|
||||
The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
|
||||
@ -171,8 +211,10 @@ functions are used, because they do not support captured substrings.
|
||||
</P>
|
||||
<P>
|
||||
The <i>capture_last</i> field contains the number of the most recently captured
|
||||
substring. If no substrings have been captured, its value is -1. This is always
|
||||
the case for the DFA matching functions.
|
||||
substring. However, when a recursion exits, the value reverts to what it was
|
||||
outside the recursion, as do the values of all captured substrings. If no
|
||||
substrings have been captured, the value of <i>capture_last</i> is -1. This is
|
||||
always the case for the DFA matching functions.
|
||||
</P>
|
||||
<P>
|
||||
The <i>callout_data</i> field contains a value that is passed to a matching
|
||||
@ -203,11 +245,12 @@ same callout number. However, they are set for all callouts.
|
||||
</P>
|
||||
<P>
|
||||
The <i>mark</i> field is present from version 2 of the callout structure. In
|
||||
callouts from <b>pcre_exec()</b> or <b>pcre[16|32]_exec()</b> it contains a pointer to
|
||||
the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
|
||||
(*THEN) item in the match, or NULL if no such items have been passed. Instances
|
||||
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
|
||||
callouts from the DFA matching functions this field always contains NULL.
|
||||
callouts from <b>pcre_exec()</b> or <b>pcre[16|32]_exec()</b> it contains a
|
||||
pointer to the zero-terminated name of the most recently passed (*MARK),
|
||||
(*PRUNE), or (*THEN) item in the match, or NULL if no such items have been
|
||||
passed. Instances of (*PRUNE) or (*THEN) without a name do not obliterate a
|
||||
previous (*MARK). In callouts from the DFA matching functions this field always
|
||||
contains NULL.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">RETURN VALUES</a><br>
|
||||
<P>
|
||||
@ -234,9 +277,9 @@ Cambridge CB2 3QH, England.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 24 June 2012
|
||||
Last updated: 12 November 2013
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2013 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
@ -36,10 +36,8 @@ these do not seem to have any use.
|
||||
</P>
|
||||
<P>
|
||||
3. Capturing subpatterns that occur inside negative lookahead assertions are
|
||||
counted, but their entries in the offsets vector are never set. Perl sets its
|
||||
numerical variables from any such patterns that are matched before the
|
||||
assertion fails to match something (thereby succeeding), but only if the
|
||||
negative lookahead assertion contains just one branch.
|
||||
counted, but their entries in the offsets vector are never set. Perl sometimes
|
||||
(but not always) sets its numerical variables from inside negative assertions.
|
||||
</P>
|
||||
<P>
|
||||
4. Though binary zero characters are supported in the subject string, they are
|
||||
@ -102,24 +100,32 @@ in the
|
||||
page.
|
||||
</P>
|
||||
<P>
|
||||
10. If any of the backtracking control verbs are used in an assertion or in a
|
||||
subpattern that is called as a subroutine (whether or not recursively), their
|
||||
effect is confined to that subpattern; it does not extend to the surrounding
|
||||
pattern. This is not always the case in Perl. In particular, if (*THEN) is
|
||||
present in a group that is called as a subroutine, its action is limited to
|
||||
that group, even if the group does not contain any | characters. There is one
|
||||
exception to this: the name from a *(MARK), (*PRUNE), or (*THEN) that is
|
||||
encountered in a successful positive assertion <i>is</i> passed back when a
|
||||
match succeeds (compare capturing parentheses in assertions). Note that such
|
||||
subpatterns are processed as anchored at the point where they are tested.
|
||||
10. If any of the backtracking control verbs are used in a subpattern that is
|
||||
called as a subroutine (whether or not recursively), their effect is confined
|
||||
to that subpattern; it does not extend to the surrounding pattern. This is not
|
||||
always the case in Perl. In particular, if (*THEN) is present in a group that
|
||||
is called as a subroutine, its action is limited to that group, even if the
|
||||
group does not contain any | characters. Note that such subpatterns are
|
||||
processed as anchored at the point where they are tested.
|
||||
</P>
|
||||
<P>
|
||||
11. There are some differences that are concerned with the settings of captured
|
||||
11. If a pattern contains more than one backtracking control verb, the first
|
||||
one that is backtracked onto acts. For example, in the pattern
|
||||
A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C
|
||||
triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
|
||||
same as PCRE, but there are examples where it differs.
|
||||
</P>
|
||||
<P>
|
||||
12. Most backtracking verbs in assertions have their normal actions. They are
|
||||
not confined to the assertion.
|
||||
</P>
|
||||
<P>
|
||||
13. There are some differences that are concerned with the settings of captured
|
||||
strings when part of a pattern is repeated. For example, matching "aba" against
|
||||
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
|
||||
</P>
|
||||
<P>
|
||||
12. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
|
||||
14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
|
||||
names is not as general as Perl's. This is a consequence of the fact the PCRE
|
||||
works internally just with numbers, using an external table to translate
|
||||
between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B),
|
||||
@ -130,13 +136,26 @@ names map to capturing subpattern number 1. To avoid this confusing situation,
|
||||
an error is given at compile time.
|
||||
</P>
|
||||
<P>
|
||||
13. Perl recognizes comments in some places that PCRE does not, for example,
|
||||
15. Perl recognizes comments in some places that PCRE does not, for example,
|
||||
between the ( and ? at the start of a subpattern. If the /x modifier is set,
|
||||
Perl allows white space between ( and ? but PCRE never does, even if the
|
||||
PCRE_EXTENDED option is set.
|
||||
Perl allows white space between ( and ? (though current Perls warn that this is
|
||||
deprecated) but PCRE never does, even if the PCRE_EXTENDED option is set.
|
||||
</P>
|
||||
<P>
|
||||
14. PCRE provides some extensions to the Perl regular expression facilities.
|
||||
16. Perl, when in warning mode, gives warnings for character classes such as
|
||||
[A-\d] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE has no
|
||||
warning features, so it gives an error in these cases because they are almost
|
||||
certainly user mistakes.
|
||||
</P>
|
||||
<P>
|
||||
17. In PCRE, the upper/lower case character properties Lu and Ll are not
|
||||
affected when case-independent matching is specified. For example, \p{Lu}
|
||||
always matches an upper case letter. I think Perl has changed in this respect;
|
||||
in the release at the time of writing (5.16), \p{Lu} and \p{Ll} match all
|
||||
letters, regardless of case, when case independence is specified.
|
||||
</P>
|
||||
<P>
|
||||
18. PCRE provides some extensions to the Perl regular expression facilities.
|
||||
Perl 5.10 includes new features that are not in earlier versions of Perl, some
|
||||
of which (such as named parentheses) have been in PCRE for some time. This list
|
||||
is with respect to Perl 5.10:
|
||||
@ -207,9 +226,9 @@ Cambridge CB2 3QH, England.
|
||||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 25 August 2012
|
||||
Last updated: 10 November 2013
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2013 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
@ -37,8 +37,10 @@ man page, in case the conversion went wrong.
|
||||
<b>pcregrep</b> searches files for character patterns, in the same way as other
|
||||
grep commands do, but it uses the PCRE regular expression library to support
|
||||
patterns that are compatible with the regular expressions of Perl 5. See
|
||||
<a href="pcresyntax.html"><b>pcresyntax</b>(3)</a>
|
||||
for a quick-reference summary of pattern syntax, or
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b>(3)</a>
|
||||
for a full description of syntax and semantics of the regular expressions
|
||||
for a full description of the syntax and semantics of the regular expressions
|
||||
that PCRE supports.
|
||||
</P>
|
||||
<P>
|
||||
@ -748,9 +750,9 @@ Cambridge CB2 3QH, England.
|
||||
</P>
|
||||
<br><a name="SEC14" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 13 September 2012
|
||||
Last updated: 03 April 2014
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2014 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
@ -172,15 +172,9 @@ PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and
|
||||
PCRE_PARTIAL_SOFT.
|
||||
</P>
|
||||
<P>
|
||||
The unsupported pattern items are:
|
||||
<pre>
|
||||
\C match a single byte; not supported in UTF-8 mode
|
||||
(?Cn) callouts
|
||||
(*PRUNE) )
|
||||
(*SKIP) ) backtracking control verbs
|
||||
(*THEN) )
|
||||
</pre>
|
||||
Support for some of these may be added in future.
|
||||
The only unsupported pattern items are \C (match a single data unit) when
|
||||
running in a UTF mode, and a callout immediately before an assertion condition
|
||||
in a conditional group.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">RETURN VALUES FROM JIT EXECUTION</a><br>
|
||||
<P>
|
||||
@ -449,9 +443,9 @@ Cambridge CB2 3QH, England.
|
||||
</P>
|
||||
<br><a name="SEC14" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 31 October 2012
|
||||
Last updated: 17 March 2013
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2013 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
@ -21,9 +21,10 @@ practice be relevant.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of a compiled pattern is approximately 64K data units (bytes
|
||||
for the 8-bit library, 32-bit units for the 32-bit library, and 32-bit units for
|
||||
the 32-bit library) if PCRE is compiled with the default internal linkage size
|
||||
of 2 bytes. If you want to process regular expressions that are truly enormous,
|
||||
for the 8-bit library, 16-bit units for the 16-bit library, and 32-bit units for
|
||||
the 32-bit library) if PCRE is compiled with the default internal linkage size,
|
||||
which is 2 bytes for the 8-bit and 16-bit libraries, and 4 bytes for the 32-bit
|
||||
library. If you want to process regular expressions that are truly enormous,
|
||||
you can compile PCRE with an internal linkage size of 3 or 4 (when building the
|
||||
16-bit or 32-bit library, 3 is rounded up to 4). See the <b>README</b> file in
|
||||
the source distribution and the
|
||||
@ -36,7 +37,10 @@ All values in repeating quantifiers must be less than 65536.
|
||||
</P>
|
||||
<P>
|
||||
There is no limit to the number of parenthesized subpatterns, but there can be
|
||||
no more than 65535 capturing subpatterns.
|
||||
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
||||
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
||||
order to limit the amount of system stack used at compile time. The limit can
|
||||
be specified when PCRE is built; the default is 250.
|
||||
</P>
|
||||
<P>
|
||||
There is a limit to the number of forward references to subsequent subpatterns
|
||||
@ -50,7 +54,7 @@ maximum number of named subpatterns is 10000.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
|
||||
is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit library.
|
||||
is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit libraries.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of a subject string is the largest positive number that an
|
||||
@ -77,9 +81,9 @@ Cambridge CB2 3QH, England.
|
||||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 04 May 2012
|
||||
Last updated: 05 November 2013
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2013 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
@ -126,6 +126,15 @@ character of the subject. The algorithm does not automatically move on to find
|
||||
matches that start at later positions.
|
||||
</P>
|
||||
<P>
|
||||
PCRE's "auto-possessification" optimization usually applies to character
|
||||
repeats at the end of a pattern (as well as internally). For example, the
|
||||
pattern "a\d+" is compiled as if it were "a\d++" because there is no point
|
||||
even considering the possibility of backtracking into the repeated digits. For
|
||||
DFA matching, this means that only one possible match is found. If you really
|
||||
do want multiple matches in such cases, either use an ungreedy repeat
|
||||
("a\d+?") or set the PCRE_NO_AUTO_POSSESS option when compiling.
|
||||
</P>
|
||||
<P>
|
||||
There are a number of features of PCRE regular expressions that are not
|
||||
supported by the alternative matching algorithm. They are as follows:
|
||||
</P>
|
||||
@ -224,7 +233,7 @@ Cambridge CB2 3QH, England.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 08 January 2012
|
||||
Last updated: 12 November 2013
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
<br>
|
||||
|
@ -81,33 +81,36 @@ strings. This optimization is also disabled for partial matching.
|
||||
<br><a name="SEC2" href="#TOC1">PARTIAL MATCHING USING pcre_exec() OR pcre[16|32]_exec()</a><br>
|
||||
<P>
|
||||
A partial match occurs during a call to <b>pcre_exec()</b> or
|
||||
<b>pcre[16|32]_exec()</b> when the end of the subject string is reached successfully,
|
||||
but matching cannot continue because more characters are needed. However, at
|
||||
least one character in the subject must have been inspected. This character
|
||||
need not form part of the final matched string; lookbehind assertions and the
|
||||
\K escape sequence provide ways of inspecting characters before the start of a
|
||||
matched substring. The requirement for inspecting at least one character exists
|
||||
because an empty string can always be matched; without such a restriction there
|
||||
would always be a partial match of an empty string at the end of the subject.
|
||||
<b>pcre[16|32]_exec()</b> when the end of the subject string is reached
|
||||
successfully, but matching cannot continue because more characters are needed.
|
||||
However, at least one character in the subject must have been inspected. This
|
||||
character need not form part of the final matched string; lookbehind assertions
|
||||
and the \K escape sequence provide ways of inspecting characters before the
|
||||
start of a matched substring. The requirement for inspecting at least one
|
||||
character exists because an empty string can always be matched; without such a
|
||||
restriction there would always be a partial match of an empty string at the end
|
||||
of the subject.
|
||||
</P>
|
||||
<P>
|
||||
If there are at least two slots in the offsets vector when a partial match is
|
||||
returned, the first slot is set to the offset of the earliest character that
|
||||
was inspected. For convenience, the second offset points to the end of the
|
||||
subject so that a substring can easily be identified.
|
||||
subject so that a substring can easily be identified. If there are at least
|
||||
three slots in the offsets vector, the third slot is set to the offset of the
|
||||
character where matching started.
|
||||
</P>
|
||||
<P>
|
||||
For the majority of patterns, the first offset identifies the start of the
|
||||
partially matched string. However, for patterns that contain lookbehind
|
||||
assertions, or \K, or begin with \b or \B, earlier characters have been
|
||||
inspected while carrying out the match. For example:
|
||||
For the majority of patterns, the contents of the first and third slots will be
|
||||
the same. However, for patterns that contain lookbehind assertions, or begin
|
||||
with \b or \B, characters before the one where matching started may have been
|
||||
inspected while carrying out the match. For example, consider this pattern:
|
||||
<pre>
|
||||
/(?<=abc)123/
|
||||
</pre>
|
||||
This pattern matches "123", but only if it is preceded by "abc". If the subject
|
||||
string is "xyzabc12", the offsets after a partial match are for the substring
|
||||
"abc12", because all these characters are needed if another match is tried
|
||||
with extra characters added to the subject.
|
||||
string is "xyzabc12", the first two offsets after a partial match are for the
|
||||
substring "abc12", because all these characters were inspected. However, the
|
||||
third offset is set to 6, because that is the offset where matching began.
|
||||
</P>
|
||||
<P>
|
||||
What happens when a partial match is identified depends on which of the two
|
||||
@ -303,6 +306,16 @@ not retain the previously partially-matched string. It is up to the calling
|
||||
program to do that if it needs to.
|
||||
</P>
|
||||
<P>
|
||||
That means that, for an unanchored pattern, if a continued match fails, it is
|
||||
not possible to try again at a new starting point. All this facility is capable
|
||||
of doing is continuing with the previous match attempt. In the previous
|
||||
example, if the second set of data is "ug23" the result is no match, even
|
||||
though there would be a match for "aug23" if the entire string were given at
|
||||
once. Depending on the application, this may or may not be what you want.
|
||||
The only way to allow for starting again at the next character is to retain the
|
||||
matched part of the subject and try a new complete match.
|
||||
</P>
|
||||
<P>
|
||||
You can set the PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with
|
||||
PCRE_DFA_RESTART to continue partial matching over multiple segments. This
|
||||
facility can be used to pass very long subject strings to the DFA matching
|
||||
@ -334,10 +347,9 @@ processing time is needed.
|
||||
<P>
|
||||
<b>Note:</b> If the pattern contains lookbehind assertions, or \K, or starts
|
||||
with \b or \B, the string that is returned for a partial match includes
|
||||
characters that precede the partially matched string itself, because these must
|
||||
be retained when adding on more characters for a subsequent matching attempt.
|
||||
However, in some cases you may need to retain even earlier characters, as
|
||||
discussed in the next section.
|
||||
characters that precede the start of what would be returned for a complete
|
||||
match, because it contains all the characters that were inspected during the
|
||||
partial match.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">ISSUES WITH MULTI-SEGMENT MATCHING</a><br>
|
||||
<P>
|
||||
@ -356,12 +368,35 @@ includes the effect of PCRE_NOTEOL.
|
||||
offsets that are returned for a partial match. However a lookbehind assertion
|
||||
later in the pattern could require even earlier characters to be inspected. You
|
||||
can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the
|
||||
<b>pcre_fullinfo()</b> or <b>pcre[16|32]_fullinfo()</b> functions to obtain the length
|
||||
of the largest lookbehind in the pattern. This length is given in characters,
|
||||
not bytes. If you always retain at least that many characters before the
|
||||
partially matched string, all should be well. (Of course, near the start of the
|
||||
subject, fewer characters may be present; in that case all characters should be
|
||||
retained.)
|
||||
<b>pcre_fullinfo()</b> or <b>pcre[16|32]_fullinfo()</b> functions to obtain the
|
||||
length of the longest lookbehind in the pattern. This length is given in
|
||||
characters, not bytes. If you always retain at least that many characters
|
||||
before the partially matched string, all should be well. (Of course, near the
|
||||
start of the subject, fewer characters may be present; in that case all
|
||||
characters should be retained.)
|
||||
</P>
|
||||
<P>
|
||||
From release 8.33, there is a more accurate way of deciding which characters to
|
||||
retain. Instead of subtracting the length of the longest lookbehind from the
|
||||
earliest inspected character (<i>offsets[0]</i>), the match start position
|
||||
(<i>offsets[2]</i>) should be used, and the next match attempt started at the
|
||||
<i>offsets[2]</i> character by setting the <i>startoffset</i> argument of
|
||||
<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
For example, if the pattern "(?<=123)abc" is partially
|
||||
matched against the string "xx123a", the three offset values returned are 2, 6,
|
||||
and 5. This indicates that the matching process that gave a partial match
|
||||
started at offset 5, but the characters "123a" were all inspected. The maximum
|
||||
lookbehind for that pattern is 3, so taking that away from 5 shows that we need
|
||||
only keep "123a", and the next match attempt can be started at offset 3 (that
|
||||
is, at "a") when further characters have been added. When the match start is
|
||||
not the earliest inspected character, <b>pcretest</b> shows it explicitly:
|
||||
<pre>
|
||||
re> "(?<=123)abc"
|
||||
data> xx123a\P\P
|
||||
Partial match at offset 5: 123a
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
3. Because a partial match must always contain at least one character, what
|
||||
@ -465,9 +500,9 @@ Cambridge CB2 3QH, England.
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 24 June 2012
|
||||
Last updated: 02 July 2013
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2013 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -13,7 +13,7 @@ from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF POSIX API</a>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
|
||||
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
|
||||
<li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
|
||||
<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
|
||||
@ -23,23 +23,21 @@ man page, in case the conversion went wrong.
|
||||
<li><a name="TOC8" href="#SEC8">AUTHOR</a>
|
||||
<li><a name="TOC9" href="#SEC9">REVISION</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
||||
<P>
|
||||
<b>#include <pcreposix.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
|
||||
<b>int <i>cflags</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> int <i>cflags</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int regexec(regex_t *<i>preg</i>, const char *<i>string</i>,</b>
|
||||
<b>size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
|
||||
<b>char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b> size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
|
||||
<b> size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
|
||||
<b> char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void regfree(regex_t *<i>preg</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
|
||||
|
@ -102,8 +102,8 @@ study data.
|
||||
<br><a name="SEC3" href="#TOC1">RE-USING A PRECOMPILED PATTERN</a><br>
|
||||
<P>
|
||||
Re-using a precompiled pattern is straightforward. Having reloaded it into main
|
||||
memory, called <b>pcre[16|32]_pattern_to_host_byte_order()</b> if necessary,
|
||||
you pass its pointer to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> in
|
||||
memory, called <b>pcre[16|32]_pattern_to_host_byte_order()</b> if necessary, you
|
||||
pass its pointer to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> in
|
||||
the usual way.
|
||||
</P>
|
||||
<P>
|
||||
@ -119,6 +119,11 @@ in the
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
<b>Warning:</b> The tables that <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b> use
|
||||
must be the same as those that were used when the pattern was compiled. If this
|
||||
is not the case, the behaviour is undefined.
|
||||
</P>
|
||||
<P>
|
||||
If you did not provide custom character tables when the pattern was compiled,
|
||||
the pointer in the compiled pattern is NULL, which causes the matching
|
||||
functions to use PCRE's internal tables. Thus, you do not need to take any
|
||||
@ -126,9 +131,9 @@ special action at run time in this case.
|
||||
</P>
|
||||
<P>
|
||||
If you saved study data with the compiled pattern, you need to create your own
|
||||
<b>pcre[16|32]_extra</b> data block and set the <i>study_data</i> field to point to the
|
||||
reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in the
|
||||
<i>flags</i> field to indicate that study data is present. Then pass the
|
||||
<b>pcre[16|32]_extra</b> data block and set the <i>study_data</i> field to point
|
||||
to the reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in
|
||||
the <i>flags</i> field to indicate that study data is present. Then pass the
|
||||
<b>pcre[16|32]_extra</b> block to the matching function in the usual way. If the
|
||||
pattern was studied for just-in-time optimization, that data cannot be saved,
|
||||
and so is lost by a save/restore cycle.
|
||||
@ -149,9 +154,9 @@ Cambridge CB2 3QH, England.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 24 June 2012
|
||||
Last updated: 12 November 2013
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2013 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
@ -29,13 +29,13 @@ man page, in case the conversion went wrong.
|
||||
<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
|
||||
<li><a name="TOC15" href="#SEC15">COMMENT</a>
|
||||
<li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
|
||||
<li><a name="TOC17" href="#SEC17">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
|
||||
<li><a name="TOC18" href="#SEC18">BACKREFERENCES</a>
|
||||
<li><a name="TOC19" href="#SEC19">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
|
||||
<li><a name="TOC20" href="#SEC20">CONDITIONAL PATTERNS</a>
|
||||
<li><a name="TOC21" href="#SEC21">BACKTRACKING CONTROL</a>
|
||||
<li><a name="TOC22" href="#SEC22">NEWLINE CONVENTIONS</a>
|
||||
<li><a name="TOC23" href="#SEC23">WHAT \R MATCHES</a>
|
||||
<li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a>
|
||||
<li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a>
|
||||
<li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
|
||||
<li><a name="TOC20" href="#SEC20">BACKREFERENCES</a>
|
||||
<li><a name="TOC21" href="#SEC21">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
|
||||
<li><a name="TOC22" href="#SEC22">CONDITIONAL PATTERNS</a>
|
||||
<li><a name="TOC23" href="#SEC23">BACKTRACKING CONTROL</a>
|
||||
<li><a name="TOC24" href="#SEC24">CALLOUTS</a>
|
||||
<li><a name="TOC25" href="#SEC25">SEE ALSO</a>
|
||||
<li><a name="TOC26" href="#SEC26">AUTHOR</a>
|
||||
@ -65,10 +65,14 @@ documentation. This document contains a quick-reference summary of the syntax.
|
||||
\n newline (hex 0A)
|
||||
\r carriage return (hex 0D)
|
||||
\t tab (hex 09)
|
||||
\0dd character with octal code 0dd
|
||||
\ddd character with octal code ddd, or backreference
|
||||
\o{ddd..} character with octal code ddd..
|
||||
\xhh character with hex code hh
|
||||
\x{hhh..} character with hex code hhh..
|
||||
</PRE>
|
||||
</pre>
|
||||
Note that \0dd is always an octal code, and that \8 and \9 are the literal
|
||||
characters "8" and "9".
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
|
||||
<P>
|
||||
@ -92,9 +96,11 @@ documentation. This document contains a quick-reference summary of the syntax.
|
||||
\W a "non-word" character
|
||||
\X a Unicode extended grapheme cluster
|
||||
</pre>
|
||||
In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII
|
||||
characters, even in a UTF mode. However, this can be changed by setting the
|
||||
PCRE_UCP option.
|
||||
By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode
|
||||
or in the 16- bit and 32-bit libraries. However, if locale-specific matching is
|
||||
happening, \s and \w may also match characters with code points in the range
|
||||
128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences
|
||||
is changed to use Unicode properties and they match many more characters.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
|
||||
<P>
|
||||
@ -150,9 +156,13 @@ PCRE_UCP option.
|
||||
<pre>
|
||||
Xan Alphanumeric: union of properties L and N
|
||||
Xps POSIX space: property Z or tab, NL, VT, FF, CR
|
||||
Xsp Perl space: property Z or tab, NL, FF, CR
|
||||
Xsp Perl space: property Z or tab, NL, VT, FF, CR
|
||||
Xuc Univerally-named character: one that can be
|
||||
represented by a Universal Character Name
|
||||
Xwd Perl word: property Xan or underscore
|
||||
</PRE>
|
||||
</pre>
|
||||
Perl and POSIX space are now the same. Perl added VT to its space character set
|
||||
at release 5.18 and PCRE changed at release 8.34.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
|
||||
<P>
|
||||
@ -329,7 +339,8 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
|
||||
<P>
|
||||
<pre>
|
||||
\K reset start of match
|
||||
</PRE>
|
||||
</pre>
|
||||
\K is honoured in positive assertions, but ignored in negative ones.
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
|
||||
<P>
|
||||
@ -372,18 +383,45 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
|
||||
(?x) extended (ignore white space)
|
||||
(?-...) unset option(s)
|
||||
</pre>
|
||||
The following are recognized only at the start of a pattern or after one of the
|
||||
newline-setting options with similar syntax:
|
||||
The following are recognized only at the very start of a pattern or after one
|
||||
of the newline or \R options with similar syntax. More than one of them may
|
||||
appear.
|
||||
<pre>
|
||||
(*LIMIT_MATCH=d) set the match limit to d (decimal number)
|
||||
(*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
|
||||
(*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS)
|
||||
(*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
|
||||
(*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
|
||||
(*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
|
||||
(*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32)
|
||||
(*UTF) set appropriate UTF mode for the library in use
|
||||
(*UCP) set PCRE_UCP (use Unicode properties for \d etc)
|
||||
</pre>
|
||||
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
|
||||
limits set by the caller of pcre_exec(), not increase them.
|
||||
</P>
|
||||
<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
|
||||
<P>
|
||||
These are recognized only at the very start of the pattern or after option
|
||||
settings with a similar syntax.
|
||||
<pre>
|
||||
(*CR) carriage return only
|
||||
(*LF) linefeed only
|
||||
(*CRLF) carriage return followed by linefeed
|
||||
(*ANYCRLF) all three of the above
|
||||
(*ANY) any Unicode newline sequence
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
|
||||
<br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br>
|
||||
<P>
|
||||
These are recognized only at the very start of the pattern or after option
|
||||
setting with a similar syntax.
|
||||
<pre>
|
||||
(*BSR_ANYCRLF) CR, LF, or CRLF
|
||||
(*BSR_UNICODE) any Unicode newline sequence
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(?=...) positive look ahead
|
||||
@ -393,7 +431,7 @@ newline-setting options with similar syntax:
|
||||
</pre>
|
||||
Each top-level branch of a look behind must be of a fixed length.
|
||||
</P>
|
||||
<br><a name="SEC18" href="#TOC1">BACKREFERENCES</a><br>
|
||||
<br><a name="SEC20" href="#TOC1">BACKREFERENCES</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
\n reference by number (can be ambiguous)
|
||||
@ -407,7 +445,7 @@ Each top-level branch of a look behind must be of a fixed length.
|
||||
(?P=name) reference by name (Python)
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC19" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
|
||||
<br><a name="SEC21" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(?R) recurse whole pattern
|
||||
@ -426,7 +464,7 @@ Each top-level branch of a look behind must be of a fixed length.
|
||||
\g'-n' call subpattern by relative number (PCRE extension)
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC20" href="#TOC1">CONDITIONAL PATTERNS</a><br>
|
||||
<br><a name="SEC22" href="#TOC1">CONDITIONAL PATTERNS</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(?(condition)yes-pattern)
|
||||
@ -445,7 +483,7 @@ Each top-level branch of a look behind must be of a fixed length.
|
||||
(?(assert)... assertion condition
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">BACKTRACKING CONTROL</a><br>
|
||||
<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
|
||||
<P>
|
||||
The following act immediately they are reached:
|
||||
<pre>
|
||||
@ -468,27 +506,6 @@ pattern is not anchored.
|
||||
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br>
|
||||
<P>
|
||||
These are recognized only at the very start of the pattern or after a
|
||||
(*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
|
||||
<pre>
|
||||
(*CR) carriage return only
|
||||
(*LF) linefeed only
|
||||
(*CRLF) carriage return followed by linefeed
|
||||
(*ANYCRLF) all three of the above
|
||||
(*ANY) any Unicode newline sequence
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br>
|
||||
<P>
|
||||
These are recognized only at the very start of the pattern or after a
|
||||
(*...) option that sets the newline convention or a UTF or UCP mode.
|
||||
<pre>
|
||||
(*BSR_ANYCRLF) CR, LF, or CRLF
|
||||
(*BSR_UNICODE) any Unicode newline sequence
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
@ -512,9 +529,9 @@ Cambridge CB2 3QH, England.
|
||||
</P>
|
||||
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 11 November 2012
|
||||
Last updated: 08 January 2014
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2014 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
@ -14,21 +14,22 @@ man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
|
||||
<li><a name="TOC2" href="#SEC2">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
|
||||
<li><a name="TOC3" href="#SEC3">COMMAND LINE OPTIONS</a>
|
||||
<li><a name="TOC4" href="#SEC4">DESCRIPTION</a>
|
||||
<li><a name="TOC5" href="#SEC5">PATTERN MODIFIERS</a>
|
||||
<li><a name="TOC6" href="#SEC6">DATA LINES</a>
|
||||
<li><a name="TOC7" href="#SEC7">THE ALTERNATIVE MATCHING FUNCTION</a>
|
||||
<li><a name="TOC8" href="#SEC8">DEFAULT OUTPUT FROM PCRETEST</a>
|
||||
<li><a name="TOC9" href="#SEC9">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
|
||||
<li><a name="TOC10" href="#SEC10">RESTARTING AFTER A PARTIAL MATCH</a>
|
||||
<li><a name="TOC11" href="#SEC11">CALLOUTS</a>
|
||||
<li><a name="TOC12" href="#SEC12">NON-PRINTING CHARACTERS</a>
|
||||
<li><a name="TOC13" href="#SEC13">SAVING AND RELOADING COMPILED PATTERNS</a>
|
||||
<li><a name="TOC14" href="#SEC14">SEE ALSO</a>
|
||||
<li><a name="TOC15" href="#SEC15">AUTHOR</a>
|
||||
<li><a name="TOC16" href="#SEC16">REVISION</a>
|
||||
<li><a name="TOC2" href="#SEC2">INPUT DATA FORMAT</a>
|
||||
<li><a name="TOC3" href="#SEC3">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
|
||||
<li><a name="TOC4" href="#SEC4">COMMAND LINE OPTIONS</a>
|
||||
<li><a name="TOC5" href="#SEC5">DESCRIPTION</a>
|
||||
<li><a name="TOC6" href="#SEC6">PATTERN MODIFIERS</a>
|
||||
<li><a name="TOC7" href="#SEC7">DATA LINES</a>
|
||||
<li><a name="TOC8" href="#SEC8">THE ALTERNATIVE MATCHING FUNCTION</a>
|
||||
<li><a name="TOC9" href="#SEC9">DEFAULT OUTPUT FROM PCRETEST</a>
|
||||
<li><a name="TOC10" href="#SEC10">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
|
||||
<li><a name="TOC11" href="#SEC11">RESTARTING AFTER A PARTIAL MATCH</a>
|
||||
<li><a name="TOC12" href="#SEC12">CALLOUTS</a>
|
||||
<li><a name="TOC13" href="#SEC13">NON-PRINTING CHARACTERS</a>
|
||||
<li><a name="TOC14" href="#SEC14">SAVING AND RELOADING COMPILED PATTERNS</a>
|
||||
<li><a name="TOC15" href="#SEC15">SEE ALSO</a>
|
||||
<li><a name="TOC16" href="#SEC16">AUTHOR</a>
|
||||
<li><a name="TOC17" href="#SEC17">REVISION</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
||||
<P>
|
||||
@ -63,25 +64,34 @@ conjunction with the test script and data files that are distributed as part of
|
||||
PCRE, and are unlikely to be of use otherwise. They are all documented here,
|
||||
but without much justification.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
|
||||
<br><a name="SEC2" href="#TOC1">INPUT DATA FORMAT</a><br>
|
||||
<P>
|
||||
Input to <b>pcretest</b> is processed line by line, either by calling the C
|
||||
library's <b>fgets()</b> function, or via the <b>libreadline</b> library (see
|
||||
below). In Unix-like environments, <b>fgets()</b> treats any bytes other than
|
||||
newline as data characters. However, in some Windows environments character 26
|
||||
(hex 1A) causes an immediate end of file, and no further data is read. For
|
||||
maximum portability, therefore, it is safest to use only ASCII characters in
|
||||
<b>pcretest</b> input files.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
|
||||
<P>
|
||||
From release 8.30, two separate PCRE libraries can be built. The original one
|
||||
supports 8-bit character strings, whereas the newer 16-bit library supports
|
||||
character strings encoded in 16-bit units. From release 8.32, a third
|
||||
library can be built, supporting character strings encoded in 32-bit units.
|
||||
The <b>pcretest</b> program can be
|
||||
used to test all three libraries. However, it is itself still an 8-bit program,
|
||||
reading 8-bit input and writing 8-bit output. When testing the 16-bit or 32-bit
|
||||
library, the patterns and data strings are converted to 16- or 32-bit format
|
||||
before being passed to the PCRE library functions. Results are converted to
|
||||
8-bit for output.
|
||||
character strings encoded in 16-bit units. From release 8.32, a third library
|
||||
can be built, supporting character strings encoded in 32-bit units. The
|
||||
<b>pcretest</b> program can be used to test all three libraries. However, it is
|
||||
itself still an 8-bit program, reading 8-bit input and writing 8-bit output.
|
||||
When testing the 16-bit or 32-bit library, the patterns and data strings are
|
||||
converted to 16- or 32-bit format before being passed to the PCRE library
|
||||
functions. Results are converted to 8-bit for output.
|
||||
</P>
|
||||
<P>
|
||||
References to functions and structures of the form <b>pcre[16|32]_xx</b> below
|
||||
mean "<b>pcre_xx</b> when using the 8-bit library or <b>pcre16_xx</b> when using
|
||||
the 16-bit library".
|
||||
mean "<b>pcre_xx</b> when using the 8-bit library, <b>pcre16_xx</b> when using
|
||||
the 16-bit library, or <b>pcre32_xx</b> when using the 32-bit library".
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">COMMAND LINE OPTIONS</a><br>
|
||||
<br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br>
|
||||
<P>
|
||||
<b>-8</b>
|
||||
If both the 8-bit library has been built, this option causes the 8-bit library
|
||||
@ -110,23 +120,30 @@ internal form is output after compilation.
|
||||
<P>
|
||||
<b>-C</b>
|
||||
Output the version number of the PCRE library, and all available information
|
||||
about the optional features that are included, and then exit. All other options
|
||||
are ignored.
|
||||
about the optional features that are included, and then exit with zero exit
|
||||
code. All other options are ignored.
|
||||
</P>
|
||||
<P>
|
||||
<b>-C</b> <i>option</i>
|
||||
Output information about a specific build-time option, then exit. This
|
||||
functionality is intended for use in scripts such as <b>RunTest</b>. The
|
||||
following options output the value indicated:
|
||||
following options output the value and set the exit code as indicated:
|
||||
<pre>
|
||||
ebcdic-nl the code for LF (= NL) in an EBCDIC environment:
|
||||
0x15 or 0x25
|
||||
0 if used in an ASCII environment
|
||||
linksize the internal link size (2, 3, or 4)
|
||||
exit code is always 0
|
||||
linksize the configured internal link size (2, 3, or 4)
|
||||
exit code is set to the link size
|
||||
newline the default newline setting:
|
||||
CR, LF, CRLF, ANYCRLF, or ANY
|
||||
exit code is always 0
|
||||
bsr the default setting for what \R matches:
|
||||
ANYCRLF or ANY
|
||||
exit code is always 0
|
||||
</pre>
|
||||
The following options output 1 for true or zero for false:
|
||||
The following options output 1 for true or 0 for false, and set the exit code
|
||||
to the same value:
|
||||
<pre>
|
||||
ebcdic compiled for an EBCDIC environment
|
||||
jit just-in-time support is available
|
||||
@ -134,8 +151,10 @@ The following options output 1 for true or zero for false:
|
||||
pcre32 the 32-bit library was built
|
||||
pcre8 the 8-bit library was built
|
||||
ucp Unicode property support is available
|
||||
utf UTF-8 and/or UTF-16 and/or UTF-32 support is available
|
||||
</PRE>
|
||||
utf UTF-8 and/or UTF-16 and/or UTF-32 support
|
||||
is available
|
||||
</pre>
|
||||
If an unknown option is given, an error message is output; the exit code is 0.
|
||||
</P>
|
||||
<P>
|
||||
<b>-d</b>
|
||||
@ -171,6 +190,11 @@ equivalent to adding <b>/M</b> to each regular expression. The size is given in
|
||||
bytes for both libraries.
|
||||
</P>
|
||||
<P>
|
||||
<b>-O</b>
|
||||
Behave as if each pattern has the <b>/O</b> modifier, that is disable
|
||||
auto-possessification for all patterns.
|
||||
</P>
|
||||
<P>
|
||||
<b>-o</b> <i>osize</i>
|
||||
Set the number of elements in the output vector that is used when calling
|
||||
<b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> to be <i>osize</i>. The
|
||||
@ -240,20 +264,25 @@ should never be studied (see the <b>/S</b> pattern modifier below).
|
||||
</P>
|
||||
<P>
|
||||
<b>-t</b>
|
||||
Run each compile, study, and match many times with a timer, and output
|
||||
resulting time per compile or match (in milliseconds). Do not set <b>-m</b> with
|
||||
<b>-t</b>, because you will then get the size output a zillion times, and the
|
||||
timing will be distorted. You can control the number of iterations that are
|
||||
used for timing by following <b>-t</b> with a number (as a separate item on the
|
||||
command line). For example, "-t 1000" would iterate 1000 times. The default is
|
||||
to iterate 500000 times.
|
||||
Run each compile, study, and match many times with a timer, and output the
|
||||
resulting times per compile, study, or match (in milliseconds). Do not set
|
||||
<b>-m</b> with <b>-t</b>, because you will then get the size output a zillion
|
||||
times, and the timing will be distorted. You can control the number of
|
||||
iterations that are used for timing by following <b>-t</b> with a number (as a
|
||||
separate item on the command line). For example, "-t 1000" iterates 1000 times.
|
||||
The default is to iterate 500000 times.
|
||||
</P>
|
||||
<P>
|
||||
<b>-tm</b>
|
||||
This is like <b>-t</b> except that it times only the matching phase, not the
|
||||
compile or study phases.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
<b>-T</b> <b>-TM</b>
|
||||
These behave like <b>-t</b> and <b>-tm</b>, but in addition, at the end of a run,
|
||||
the total times for all compiles, studies, and matches are output.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
If <b>pcretest</b> is given two filename arguments, it reads from the first and
|
||||
writes to the second. If it is given only one filename argument, it reads from
|
||||
@ -271,7 +300,7 @@ option states whether or not <b>readline()</b> will be used.
|
||||
<P>
|
||||
The program handles any number of sets of input on a single input file. Each
|
||||
set starts with a regular expression, and continues with any number of data
|
||||
lines to be matched against the pattern.
|
||||
lines to be matched against that pattern.
|
||||
</P>
|
||||
<P>
|
||||
Each data line is matched separately and independently. If you want to do
|
||||
@ -310,7 +339,7 @@ backslash, because
|
||||
is interpreted as the first line of a pattern that starts with "abc/", causing
|
||||
pcretest to read the next line as a continuation of the regular expression.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">PATTERN MODIFIERS</a><br>
|
||||
<br><a name="SEC6" href="#TOC1">PATTERN MODIFIERS</a><br>
|
||||
<P>
|
||||
A pattern may be followed by any number of modifiers, which are mostly single
|
||||
characters, though some of these can be qualified by further characters.
|
||||
@ -323,6 +352,7 @@ fall into several groups that are described in detail in the following
|
||||
sections.
|
||||
<pre>
|
||||
<b>/8</b> set UTF mode
|
||||
<b>/9</b> set PCRE_NEVER_UTF (locks out UTF mode)
|
||||
<b>/?</b> disable UTF validity check
|
||||
<b>/+</b> show remainder of subject after match
|
||||
<b>/=</b> show all captures (not just those that are set)
|
||||
@ -344,7 +374,9 @@ sections.
|
||||
<b>/M</b> show compiled memory size
|
||||
<b>/m</b> set PCRE_MULTILINE
|
||||
<b>/N</b> set PCRE_NO_AUTO_CAPTURE
|
||||
<b>/O</b> set PCRE_NO_AUTO_POSSESS
|
||||
<b>/P</b> use the POSIX wrapper
|
||||
<b>/Q</b> test external stack check function
|
||||
<b>/S</b> study the pattern after compilation
|
||||
<b>/s</b> set PCRE_DOTALL
|
||||
<b>/T</b> select character tables
|
||||
@ -395,12 +427,14 @@ options that do not correspond to anything in Perl:
|
||||
<b>/8</b> PCRE_UTF32 ) when using the 32-bit
|
||||
<b>/?</b> PCRE_NO_UTF32_CHECK ) library
|
||||
|
||||
<b>/9</b> PCRE_NEVER_UTF
|
||||
<b>/A</b> PCRE_ANCHORED
|
||||
<b>/C</b> PCRE_AUTO_CALLOUT
|
||||
<b>/E</b> PCRE_DOLLAR_ENDONLY
|
||||
<b>/f</b> PCRE_FIRSTLINE
|
||||
<b>/J</b> PCRE_DUPNAMES
|
||||
<b>/N</b> PCRE_NO_AUTO_CAPTURE
|
||||
<b>/O</b> PCRE_NO_AUTO_POSSESS
|
||||
<b>/U</b> PCRE_UNGREEDY
|
||||
<b>/W</b> PCRE_UCP
|
||||
<b>/X</b> PCRE_EXTRA
|
||||
@ -504,7 +538,10 @@ below.
|
||||
The <b>/I</b> modifier requests that <b>pcretest</b> output information about the
|
||||
compiled pattern (whether it is anchored, has a fixed first character, and
|
||||
so on). It does this by calling <b>pcre[16|32]_fullinfo()</b> after compiling a
|
||||
pattern. If the pattern is studied, the results of that are also output.
|
||||
pattern. If the pattern is studied, the results of that are also output. In
|
||||
this output, the word "char" means a non-UTF character, that is, the value of a
|
||||
single data item (8-bit, 16-bit, or 32-bit, depending on the library that is
|
||||
being tested).
|
||||
</P>
|
||||
<P>
|
||||
The <b>/K</b> modifier requests <b>pcretest</b> to show names from backtracking
|
||||
@ -538,14 +575,22 @@ successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the
|
||||
JIT compiled code is also output.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/Q</b> modifier is used to test the use of <b>pcre_stack_guard</b>. It
|
||||
must be followed by '0' or '1', specifying the return code to be given from an
|
||||
external function that is passed to PCRE and used for stack checking during
|
||||
compilation (see the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation for details).
|
||||
</P>
|
||||
<P>
|
||||
The <b>/S</b> modifier causes <b>pcre[16|32]_study()</b> to be called after the
|
||||
expression has been compiled, and the results used when the expression is
|
||||
matched. There are a number of qualifying characters that may follow <b>/S</b>.
|
||||
They may appear in any order.
|
||||
</P>
|
||||
<P>
|
||||
If <b>S</b> is followed by an exclamation mark, <b>pcre[16|32]_study()</b> is called
|
||||
with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
|
||||
If <b>/S</b> is followed by an exclamation mark, <b>pcre[16|32]_study()</b> is
|
||||
called with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
|
||||
<b>pcre_extra</b> block, even when studying discovers no useful information.
|
||||
</P>
|
||||
<P>
|
||||
@ -624,7 +669,38 @@ function:
|
||||
The <b>/+</b> modifier works as described above. All other modifiers are
|
||||
ignored.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">DATA LINES</a><br>
|
||||
<br><b>
|
||||
Locking out certain modifiers
|
||||
</b><br>
|
||||
<P>
|
||||
PCRE can be compiled with or without support for certain features such as
|
||||
UTF-8/16/32 or Unicode properties. Accordingly, the standard tests are split up
|
||||
into a number of different files that are selected for running depending on
|
||||
which features are available. When updating the tests, it is all too easy to
|
||||
put a new test into the wrong file by mistake; for example, to put a test that
|
||||
requires UTF support into a file that is used when it is not available. To help
|
||||
detect such mistakes as early as possible, there is a facility for locking out
|
||||
specific modifiers. If an input line for <b>pcretest</b> starts with the string
|
||||
"< forbid " the following sequence of characters is taken as a list of
|
||||
forbidden modifiers. For example, in the test files that must not use UTF or
|
||||
Unicode property support, this line appears:
|
||||
<pre>
|
||||
< forbid 8W
|
||||
</pre>
|
||||
This locks out the /8 and /W modifiers. An immediate error is given if they are
|
||||
subsequently encountered. If the character string contains < but not >, all the
|
||||
multi-character modifiers that begin with < are locked out. Otherwise, such
|
||||
modifiers must be explicitly listed, for example:
|
||||
<pre>
|
||||
< forbid <JS><cr>
|
||||
</pre>
|
||||
There must be a single space between < and "forbid" for this feature to be
|
||||
recognised. If there is not, the line is interpreted either as a request to
|
||||
re-load a pre-compiled pattern (see "SAVING AND RELOADING COMPILED PATTERNS"
|
||||
below) or, if there is a another < character, as a pattern that uses < as its
|
||||
delimiter.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">DATA LINES</a><br>
|
||||
<P>
|
||||
Before each data line is passed to <b>pcre[16|32]_exec()</b>, leading and trailing
|
||||
white space is removed, and it is then scanned for \ escapes. Some of these
|
||||
@ -644,6 +720,7 @@ recognized:
|
||||
\v vertical tab (\x0b)
|
||||
\nnn octal character (up to 3 octal digits); always
|
||||
a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
|
||||
\o{dd...} octal character (any number of octal digits}
|
||||
\xhh hexadecimal byte (up to 2 hex digits)
|
||||
\x{hh...} hexadecimal character (any number of hex digits)
|
||||
\A pass the PCRE_ANCHORED option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
|
||||
@ -748,7 +825,7 @@ API to be used, the only option-setting sequences that have any effect are \B,
|
||||
\N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively,
|
||||
to be passed to <b>regexec()</b>.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
|
||||
<br><a name="SEC8" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
|
||||
<P>
|
||||
By default, <b>pcretest</b> uses the standard PCRE matching function,
|
||||
<b>pcre[16|32]_exec()</b> to match each data line. PCRE also supports an
|
||||
@ -765,7 +842,7 @@ This function finds all possible matches at a given point. If, however, the \F
|
||||
escape sequence is present in the data line, it stops after the first match is
|
||||
found. This is always the shortest possible match.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
|
||||
<br><a name="SEC9" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
|
||||
<P>
|
||||
This section describes the output when the normal matching function,
|
||||
<b>pcre[16|32]_exec()</b>, is being used.
|
||||
@ -856,7 +933,7 @@ prompt is used for continuations), data lines may not. However newlines can be
|
||||
included in data by means of the \n escape (or \r, \r\n, etc., depending on
|
||||
the newline sequence setting).
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
|
||||
<br><a name="SEC10" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
|
||||
<P>
|
||||
When the alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, is used (by
|
||||
means of the \D escape sequence or the <b>-dfa</b> command line option), the
|
||||
@ -892,7 +969,7 @@ at the end of the longest match. For example:
|
||||
Since the matching function does not support substring capture, the escape
|
||||
sequences that are concerned with captured substrings are not relevant.
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
|
||||
<br><a name="SEC11" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
|
||||
<P>
|
||||
When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
|
||||
indicating that the subject partially matched the pattern, you can restart the
|
||||
@ -909,7 +986,7 @@ For further information about partial matching, see the
|
||||
<a href="pcrepartial.html"><b>pcrepartial</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">CALLOUTS</a><br>
|
||||
<br><a name="SEC12" href="#TOC1">CALLOUTS</a><br>
|
||||
<P>
|
||||
If the pattern contains any callout requests, <b>pcretest</b>'s callout function
|
||||
is called during matching. This works with both matching functions. By default,
|
||||
@ -970,7 +1047,7 @@ the
|
||||
<a href="pcrecallout.html"><b>pcrecallout</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
|
||||
<br><a name="SEC13" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
|
||||
<P>
|
||||
When <b>pcretest</b> is outputting text in the compiled version of a pattern,
|
||||
bytes other than 32-126 are always treated as non-printing characters are are
|
||||
@ -982,7 +1059,7 @@ string, it behaves in the same way, unless a different locale has been set for
|
||||
the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b>
|
||||
function to distinguish printing and non-printing characters.
|
||||
</P>
|
||||
<br><a name="SEC13" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
|
||||
<br><a name="SEC14" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
|
||||
<P>
|
||||
The facilities described in this section are not available when the POSIX
|
||||
interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is
|
||||
@ -1013,10 +1090,9 @@ writing the file, <b>pcretest</b> expects to read a new pattern.
|
||||
</P>
|
||||
<P>
|
||||
A saved pattern can be reloaded into <b>pcretest</b> by specifying < and a file
|
||||
name instead of a pattern. The name of the file must not contain a < character,
|
||||
as otherwise <b>pcretest</b> will interpret the line as a pattern delimited by <
|
||||
characters.
|
||||
For example:
|
||||
name instead of a pattern. There must be no space between < and the file name,
|
||||
which must not contain a < character, as otherwise <b>pcretest</b> will
|
||||
interpret the line as a pattern delimited by < characters. For example:
|
||||
<pre>
|
||||
re> </some/file
|
||||
Compiled pattern loaded from /some/file
|
||||
@ -1055,14 +1131,14 @@ string using a reloaded pattern is likely to cause <b>pcretest</b> to crash.
|
||||
Finally, if you attempt to load a file that is not in the correct format, the
|
||||
result is undefined.
|
||||
</P>
|
||||
<br><a name="SEC14" href="#TOC1">SEE ALSO</a><br>
|
||||
<br><a name="SEC15" href="#TOC1">SEE ALSO</a><br>
|
||||
<P>
|
||||
<b>pcre</b>(3), <b>pcre16</b>(3), <b>pcre32</b>(3), <b>pcreapi</b>(3),
|
||||
<b>pcrecallout</b>(3),
|
||||
<b>pcrejit</b>, <b>pcrematching</b>(3), <b>pcrepartial</b>(d),
|
||||
<b>pcrepattern</b>(3), <b>pcreprecompile</b>(3).
|
||||
</P>
|
||||
<br><a name="SEC15" href="#TOC1">AUTHOR</a><br>
|
||||
<br><a name="SEC16" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
@ -1071,11 +1147,11 @@ University Computing Service
|
||||
Cambridge CB2 3QH, England.
|
||||
<br>
|
||||
</P>
|
||||
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
|
||||
<br><a name="SEC17" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 10 September 2012
|
||||
Last updated: 09 February 2014
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2014 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
@ -85,7 +85,9 @@ place. From release 7.3 of PCRE, the check is according the rules of RFC 3629,
|
||||
which are themselves derived from the Unicode specification. Earlier releases
|
||||
of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit
|
||||
values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0
|
||||
to U+10FFFF, excluding the surrogate area and the non-characters.
|
||||
to U+10FFFF, excluding the surrogate area. (From release 8.33 the so-called
|
||||
"non-character" code points are no longer excluded because Unicode corrigendum
|
||||
#9 makes it clear that they should not be.)
|
||||
</P>
|
||||
<P>
|
||||
Characters in the "Surrogate Area" of Unicode are reserved for use by UTF-16,
|
||||
@ -96,10 +98,6 @@ surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8 and
|
||||
UTF-32.)
|
||||
</P>
|
||||
<P>
|
||||
Also excluded are the "Non-Character" code points, which are U+FDD0 to U+FDEF
|
||||
and the last two code points in each plane, U+??FFFE and U+??FFFF.
|
||||
</P>
|
||||
<P>
|
||||
If an invalid UTF-8 string is passed to PCRE, an error return is given. At
|
||||
compile time, the only additional information is the offset to the first byte
|
||||
of the failing character. The run-time functions <b>pcre_exec()</b> and
|
||||
@ -135,10 +133,6 @@ U+D800 to U+DFFF are independent code points. Values in the surrogate range
|
||||
must be used in pairs in the correct manner.
|
||||
</P>
|
||||
<P>
|
||||
Excluded are the "Non-Character" code points, which are U+FDD0 to U+FDEF
|
||||
and the last two code points in each plane, U+??FFFE and U+??FFFF.
|
||||
</P>
|
||||
<P>
|
||||
If an invalid UTF-16 string is passed to PCRE, an error return is given. At
|
||||
compile time, the only additional information is the offset to the first data
|
||||
unit of the failing character. The run-time functions <b>pcre16_exec()</b> and
|
||||
@ -160,9 +154,7 @@ Validity of UTF-32 strings
|
||||
When you set the PCRE_UTF32 flag, the strings of 32-bit data units that are
|
||||
passed as patterns and subjects are (by default) checked for validity on entry
|
||||
to the relevant functions. This check allows only values in the range U+0
|
||||
to U+10FFFF, excluding the surrogate area U+D800 to U+DFFF, and the
|
||||
"Non-Character" code points, which are U+FDD0 to U+FDEF and the last two
|
||||
characters in each plane, U+??FFFE and U+??FFFF.
|
||||
to U+10FFFF, excluding the surrogate area U+D800 to U+DFFF.
|
||||
</P>
|
||||
<P>
|
||||
If an invalid UTF-32 string is passed to PCRE, an error return is given. At
|
||||
@ -261,9 +253,9 @@ Cambridge CB2 3QH, England.
|
||||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 11 November 2012
|
||||
Last updated: 27 February 2013
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2013 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
@ -11,27 +11,29 @@
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>Perl-compatible Regular Expressions (PCRE)</h1>
|
||||
<p>
|
||||
The HTML documentation for PCRE comprises the following pages:
|
||||
The HTML documentation for PCRE consists of a number of pages that are listed
|
||||
below in alphabetical order. If you are new to PCRE, please read the first one
|
||||
first.
|
||||
</p>
|
||||
|
||||
<table>
|
||||
<tr><td><a href="pcre.html">pcre</a></td>
|
||||
<td> Introductory page</td></tr>
|
||||
|
||||
<tr><td><a href="pcre-config.html">pcre-config</a></td>
|
||||
<td> Information about the installation configuration</td></tr>
|
||||
|
||||
<tr><td><a href="pcre16.html">pcre16</a></td>
|
||||
<td> Discussion of the 16-bit PCRE library</td></tr>
|
||||
|
||||
<tr><td><a href="pcre32.html">pcre32</a></td>
|
||||
<td> Discussion of the 32-bit PCRE library</td></tr>
|
||||
|
||||
<tr><td><a href="pcre-config.html">pcre-config</a></td>
|
||||
<td> Information about the installation configuration</td></tr>
|
||||
|
||||
<tr><td><a href="pcreapi.html">pcreapi</a></td>
|
||||
<td> PCRE's native API</td></tr>
|
||||
|
||||
<tr><td><a href="pcrebuild.html">pcrebuild</a></td>
|
||||
<td> Options for building PCRE</td></tr>
|
||||
<td> Building PCRE</td></tr>
|
||||
|
||||
<tr><td><a href="pcrecallout.html">pcrecallout</a></td>
|
||||
<td> The <i>callout</i> facility</td></tr>
|
||||
@ -67,7 +69,7 @@ The HTML documentation for PCRE comprises the following pages:
|
||||
<td> Some comments on performance</td></tr>
|
||||
|
||||
<tr><td><a href="pcreposix.html">pcreposix</a></td>
|
||||
<td> The POSIX API to the PCRE library</td></tr>
|
||||
<td> The POSIX API to the PCRE 8-bit library</td></tr>
|
||||
|
||||
<tr><td><a href="pcreprecompile.html">pcreprecompile</a></td>
|
||||
<td> How to save and re-use compiled patterns</td></tr>
|
||||
@ -118,13 +120,13 @@ functions.
|
||||
<td> Match a compiled pattern to a subject string
|
||||
(DFA algorithm; <i>not</i> Perl compatible)</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_free_study.html">pcre_free_study</a></td>
|
||||
<td> Free study data</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_exec.html">pcre_exec</a></td>
|
||||
<td> Match a compiled pattern to a subject string
|
||||
(Perl compatible)</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_free_study.html">pcre_free_study</a></td>
|
||||
<td> Free study data</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_free_substring.html">pcre_free_substring</a></td>
|
||||
<td> Free extracted substring</td></tr>
|
||||
|
||||
@ -140,14 +142,17 @@ functions.
|
||||
<tr><td><a href="pcre_get_stringnumber.html">pcre_get_stringnumber</a></td>
|
||||
<td> Convert captured string name to number</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_stringtable_entries.html">pcre_get_stringtable_entries</a></td>
|
||||
<td> Find table entries for given string name</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_substring.html">pcre_get_substring</a></td>
|
||||
<td> Extract numbered substring into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_substring_list.html">pcre_get_substring_list</a></td>
|
||||
<td> Extract all substrings into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_info.html">pcre_info</a></td>
|
||||
<td> Obsolete information extraction function</td></tr>
|
||||
<tr><td><a href="pcre_jit_exec.html">pcre_jit_exec</a></td>
|
||||
<td> Fast path interface to JIT matching</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_jit_stack_alloc.html">pcre_jit_stack_alloc</a></td>
|
||||
<td> Create a stack for JIT matching</td></tr>
|
||||
|
@ -4,11 +4,11 @@ pcre-config - program to return PCRE configuration
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B pcre-config [--prefix] [--exec-prefix] [--version] [--libs]
|
||||
.ti +5n
|
||||
.B [--libs16] [--libs32] [--libs-cpp] [--libs-posix]
|
||||
.ti +5n
|
||||
.B [--cflags] [--cflags-posix]
|
||||
.B " [--libs16] [--libs32] [--libs-cpp] [--libs-posix]"
|
||||
.B " [--cflags] [--cflags-posix]"
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
|
@ -1,4 +1,4 @@
|
||||
PCRE-CONFIG(1) PCRE-CONFIG(1)
|
||||
PCRE-CONFIG(1) General Commands Manual PCRE-CONFIG(1)
|
||||
|
||||
|
||||
|
||||
@ -8,8 +8,8 @@ NAME
|
||||
SYNOPSIS
|
||||
|
||||
pcre-config [--prefix] [--exec-prefix] [--version] [--libs]
|
||||
[--libs16] [--libs32] [--libs-cpp] [--libs-posix]
|
||||
[--cflags] [--cflags-posix]
|
||||
[--libs16] [--libs32] [--libs-cpp] [--libs-posix]
|
||||
[--cflags] [--cflags-posix]
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRE 3 "11 November 2012" "PCRE 8.32"
|
||||
.TH PCRE 3 "08 January 2014" "PCRE 8.35"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH INTRODUCTION
|
||||
@ -19,9 +19,9 @@ built. The majority of the work to make this possible was done by Zoltan
|
||||
Herczeg.
|
||||
.P
|
||||
Starting with release 8.32 it is possible to compile a third separate PCRE
|
||||
library, which supports 32-bit character strings (including
|
||||
UTF-32 strings). The build process allows any set of the 8-, 16- and 32-bit
|
||||
libraries. The work to make this possible was done by Christian Persch.
|
||||
library that supports 32-bit character strings (including UTF-32 strings). The
|
||||
build process allows any combination of the 8-, 16- and 32-bit libraries. The
|
||||
work to make this possible was done by Christian Persch.
|
||||
.P
|
||||
The three libraries contain identical sets of functions, except that the names
|
||||
in the 16-bit library start with \fBpcre16_\fP instead of \fBpcre_\fP, and the
|
||||
@ -44,7 +44,7 @@ The current implementation of PCRE corresponds approximately with Perl 5.12,
|
||||
including support for UTF-8/16/32 encoded strings and Unicode general category
|
||||
properties. However, UTF-8/16/32 and Unicode support has to be explicitly
|
||||
enabled; it is not the default. The Unicode tables correspond to Unicode
|
||||
release 6.2.0.
|
||||
release 6.3.0.
|
||||
.P
|
||||
In addition to the Perl-compatible matching function, PCRE contains an
|
||||
alternative function that matches the same compiled patterns in a different
|
||||
@ -68,6 +68,7 @@ in the \fIContrib\fP directory at the primary FTP site, which is:
|
||||
.\" HTML <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">
|
||||
.\" </a>
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre
|
||||
.\"
|
||||
.P
|
||||
Details of exactly which Perl regular expression features are and are not
|
||||
supported by PCRE are given in separate documents. See the
|
||||
@ -95,8 +96,17 @@ available. The features themselves are described in the
|
||||
\fBpcrebuild\fP
|
||||
.\"
|
||||
page. Documentation about building PCRE for various operating systems can be
|
||||
found in the \fBREADME\fP and \fBNON-AUTOTOOLS_BUILD\fP files in the source
|
||||
distribution.
|
||||
found in the
|
||||
.\" HTML <a href="README.txt">
|
||||
.\" </a>
|
||||
\fBREADME\fP
|
||||
.\"
|
||||
and
|
||||
.\" HTML <a href="NON-AUTOTOOLS-BUILD.txt">
|
||||
.\" </a>
|
||||
\fBNON-AUTOTOOLS_BUILD\fP
|
||||
.\"
|
||||
files in the source distribution.
|
||||
.P
|
||||
The libraries contains a number of undocumented internal functions and data
|
||||
tables that are used by more than one of the exported external functions, but
|
||||
@ -121,8 +131,11 @@ checked for UTF-8 validity. If the data string is very long, such a check might
|
||||
use sufficiently many resources as to cause your application to lose
|
||||
performance.
|
||||
.P
|
||||
The best way of guarding against this possibility is to use the
|
||||
One way of guarding against this possibility is to use the
|
||||
\fBpcre_fullinfo()\fP function to check the compiled pattern's options for UTF.
|
||||
Alternatively, from release 8.33, you can set the PCRE_NEVER_UTF option at
|
||||
compile time. This causes an compile time error if a pattern contains a
|
||||
UTF-setting sequence.
|
||||
.P
|
||||
If your application is one that supports UTF, be aware that validity checking
|
||||
can take time. If the same data string is to be matched many times, you can use
|
||||
@ -145,15 +158,18 @@ page.
|
||||
The user documentation for PCRE comprises a number of different sections. In
|
||||
the "man" format, each of these is a separate "man page". In the HTML format,
|
||||
each is a separate page, linked from the index page. In the plain text format,
|
||||
all the sections, except the \fBpcredemo\fP section, are concatenated, for ease
|
||||
of searching. The sections are as follows:
|
||||
the descriptions of the \fBpcregrep\fP and \fBpcretest\fP programs are in files
|
||||
called \fBpcregrep.txt\fP and \fBpcretest.txt\fP, respectively. The remaining
|
||||
sections, except for the \fBpcredemo\fP section (which is a program listing),
|
||||
are concatenated in \fBpcre.txt\fP, for ease of searching. The sections are as
|
||||
follows:
|
||||
.sp
|
||||
pcre this document
|
||||
pcre-config show PCRE installation configuration information
|
||||
pcre16 details of the 16-bit library
|
||||
pcre32 details of the 32-bit library
|
||||
pcre-config show PCRE installation configuration information
|
||||
pcreapi details of PCRE's native C API
|
||||
pcrebuild options for building PCRE
|
||||
pcrebuild building PCRE
|
||||
pcrecallout details of the callout feature
|
||||
pcrecompat discussion of Perl compatibility
|
||||
pcrecpp details of the C++ wrapper for the 8-bit library
|
||||
@ -175,8 +191,8 @@ of searching. The sections are as follows:
|
||||
pcretest description of the \fBpcretest\fP testing command
|
||||
pcreunicode discussion of Unicode and UTF-8/16/32 support
|
||||
.sp
|
||||
In addition, in the "man" and HTML formats, there is a short page for each
|
||||
C library function, listing its arguments and results.
|
||||
In the "man" and HTML formats, there is also a short page for each C library
|
||||
function, listing its arguments and results.
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
@ -197,6 +213,6 @@ two digits 10, at the domain cam.ac.uk.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 11 November 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 08 January 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
.fi
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -1,4 +1,4 @@
|
||||
.TH PCRE 3 "08 November 2012" "PCRE 8.32"
|
||||
.TH PCRE 3 "12 May 2013" "PCRE 8.33"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
@ -8,140 +8,120 @@ PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE 16-BIT API BASIC FUNCTIONS"
|
||||
.rs
|
||||
.sp
|
||||
.SM
|
||||
.nf
|
||||
.B pcre16 *pcre16_compile(PCRE_SPTR16 \fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.PP
|
||||
.B " const char **\fIerrptr\fP, int *\fIerroffset\fP,"
|
||||
.B " const unsigned char *\fItableptr\fP);"
|
||||
.sp
|
||||
.B pcre16 *pcre16_compile2(PCRE_SPTR16 \fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B int *\fIerrorcodeptr\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.PP
|
||||
.B " int *\fIerrorcodeptr\fP,"
|
||||
.B " const char **\fIerrptr\fP, int *\fIerroffset\fP,"
|
||||
.B " const unsigned char *\fItableptr\fP);"
|
||||
.sp
|
||||
.B pcre16_extra *pcre16_study(const pcre16 *\fIcode\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP);
|
||||
.PP
|
||||
.B " const char **\fIerrptr\fP);"
|
||||
.sp
|
||||
.B void pcre16_free_study(pcre16_extra *\fIextra\fP);
|
||||
.PP
|
||||
.sp
|
||||
.B int pcre16_exec(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "PCRE_SPTR16 \fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR16 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);"
|
||||
.sp
|
||||
.B int pcre16_dfa_exec(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "PCRE_SPTR16 \fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
|
||||
.ti +5n
|
||||
.B int *\fIworkspace\fP, int \fIwscount\fP);
|
||||
.B " PCRE_SPTR16 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,"
|
||||
.B " int *\fIworkspace\fP, int \fIwscount\fP);"
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH "PCRE 16-BIT API STRING EXTRACTION FUNCTIONS"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre16_copy_named_substring(const pcre16 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP,
|
||||
.ti +5n
|
||||
.B PCRE_UCHAR16 *\fIbuffer\fP, int \fIbuffersize\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,"
|
||||
.B " int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP,"
|
||||
.B " PCRE_UCHAR16 *\fIbuffer\fP, int \fIbuffersize\fP);"
|
||||
.sp
|
||||
.B int pcre16_copy_substring(PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR16 *\fIbuffer\fP,
|
||||
.ti +5n
|
||||
.B int \fIbuffersize\fP);
|
||||
.PP
|
||||
.B " int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR16 *\fIbuffer\fP,"
|
||||
.B " int \fIbuffersize\fP);"
|
||||
.sp
|
||||
.B int pcre16_get_named_substring(const pcre16 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 *\fIstringptr\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,"
|
||||
.B " int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP,"
|
||||
.B " PCRE_SPTR16 *\fIstringptr\fP);"
|
||||
.sp
|
||||
.B int pcre16_get_stringnumber(const pcre16 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 \fIname\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR16 \fIname\fP);
|
||||
.sp
|
||||
.B int pcre16_get_stringtable_entries(const pcre16 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 \fIname\fP, PCRE_UCHAR16 **\fIfirst\fP, PCRE_UCHAR16 **\fIlast\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR16 \fIname\fP, PCRE_UCHAR16 **\fIfirst\fP, PCRE_UCHAR16 **\fIlast\fP);"
|
||||
.sp
|
||||
.B int pcre16_get_substring(PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 *\fIstringptr\fP);
|
||||
.PP
|
||||
.B " int \fIstringcount\fP, int \fIstringnumber\fP,"
|
||||
.B " PCRE_SPTR16 *\fIstringptr\fP);"
|
||||
.sp
|
||||
.B int pcre16_get_substring_list(PCRE_SPTR16 \fIsubject\fP,
|
||||
.ti +5n
|
||||
.B int *\fIovector\fP, int \fIstringcount\fP, "PCRE_SPTR16 **\fIlistptr\fP);"
|
||||
.PP
|
||||
.B " int *\fIovector\fP, int \fIstringcount\fP, PCRE_SPTR16 **\fIlistptr\fP);"
|
||||
.sp
|
||||
.B void pcre16_free_substring(PCRE_SPTR16 \fIstringptr\fP);
|
||||
.PP
|
||||
.sp
|
||||
.B void pcre16_free_substring_list(PCRE_SPTR16 *\fIstringptr\fP);
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH "PCRE 16-BIT API AUXILIARY FUNCTIONS"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B pcre16_jit_stack *pcre16_jit_stack_alloc(int \fIstartsize\fP, int \fImaxsize\fP);
|
||||
.PP
|
||||
.sp
|
||||
.B void pcre16_jit_stack_free(pcre16_jit_stack *\fIstack\fP);
|
||||
.PP
|
||||
.sp
|
||||
.B void pcre16_assign_jit_stack(pcre16_extra *\fIextra\fP,
|
||||
.ti +5n
|
||||
.B pcre16_jit_callback \fIcallback\fP, void *\fIdata\fP);
|
||||
.PP
|
||||
.B " pcre16_jit_callback \fIcallback\fP, void *\fIdata\fP);"
|
||||
.sp
|
||||
.B const unsigned char *pcre16_maketables(void);
|
||||
.PP
|
||||
.sp
|
||||
.B int pcre16_fullinfo(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B int \fIwhat\fP, void *\fIwhere\fP);
|
||||
.PP
|
||||
.B " int \fIwhat\fP, void *\fIwhere\fP);"
|
||||
.sp
|
||||
.B int pcre16_refcount(pcre16 *\fIcode\fP, int \fIadjust\fP);
|
||||
.PP
|
||||
.sp
|
||||
.B int pcre16_config(int \fIwhat\fP, void *\fIwhere\fP);
|
||||
.PP
|
||||
.sp
|
||||
.B const char *pcre16_version(void);
|
||||
.PP
|
||||
.sp
|
||||
.B int pcre16_pattern_to_host_byte_order(pcre16 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B pcre16_extra *\fIextra\fP, const unsigned char *\fItables\fP);
|
||||
.B " pcre16_extra *\fIextra\fP, const unsigned char *\fItables\fP);"
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH "PCRE 16-BIT API INDIRECTED FUNCTIONS"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B void *(*pcre16_malloc)(size_t);
|
||||
.PP
|
||||
.sp
|
||||
.B void (*pcre16_free)(void *);
|
||||
.PP
|
||||
.sp
|
||||
.B void *(*pcre16_stack_malloc)(size_t);
|
||||
.PP
|
||||
.sp
|
||||
.B void (*pcre16_stack_free)(void *);
|
||||
.PP
|
||||
.sp
|
||||
.B int (*pcre16_callout)(pcre16_callout_block *);
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH "PCRE 16-BIT API 16-BIT-ONLY FUNCTION"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *\fIoutput\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 \fIinput\fP, int \fIlength\fP, int *\fIbyte_order\fP,
|
||||
.ti +5n
|
||||
.B int \fIkeep_boms\fP);
|
||||
.B " PCRE_SPTR16 \fIinput\fP, int \fIlength\fP, int *\fIbyte_order\fP,"
|
||||
.B " int \fIkeep_boms\fP);"
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH "THE PCRE 16-BIT LIBRARY"
|
||||
@ -246,8 +226,9 @@ buffer, including the zero terminator if the string was zero-terminated.
|
||||
.SH "SUBJECT STRING OFFSETS"
|
||||
.rs
|
||||
.sp
|
||||
The offsets within subject strings that are returned by the matching functions
|
||||
are in 16-bit units rather than bytes.
|
||||
The lengths and starting offsets of subject strings must be specified in 16-bit
|
||||
data units, and the offsets within subject strings that are returned by the
|
||||
matching functions are in also 16-bit units rather than bytes.
|
||||
.
|
||||
.
|
||||
.SH "NAMED SUBPATTERNS"
|
||||
@ -385,6 +366,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 08 November 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 12 May 2013
|
||||
Copyright (c) 1997-2013 University of Cambridge.
|
||||
.fi
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRE 3 "08 November 2012" "PCRE 8.32"
|
||||
.TH PCRE 3 "12 May 2013" "PCRE 8.33"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
@ -8,140 +8,119 @@ PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE 32-BIT API BASIC FUNCTIONS"
|
||||
.rs
|
||||
.sp
|
||||
.SM
|
||||
.nf
|
||||
.B pcre32 *pcre32_compile(PCRE_SPTR32 \fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.PP
|
||||
.B " const char **\fIerrptr\fP, int *\fIerroffset\fP,"
|
||||
.B " const unsigned char *\fItableptr\fP);"
|
||||
.sp
|
||||
.B pcre32 *pcre32_compile2(PCRE_SPTR32 \fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B int *\fIerrorcodeptr\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.PP
|
||||
.B " int *\fIerrorcodeptr\fP,"
|
||||
.B " const unsigned char *\fItableptr\fP);"
|
||||
.sp
|
||||
.B pcre32_extra *pcre32_study(const pcre32 *\fIcode\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP);
|
||||
.PP
|
||||
.B " const char **\fIerrptr\fP);"
|
||||
.sp
|
||||
.B void pcre32_free_study(pcre32_extra *\fIextra\fP);
|
||||
.PP
|
||||
.sp
|
||||
.B int pcre32_exec(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "PCRE_SPTR32 \fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR32 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);"
|
||||
.sp
|
||||
.B int pcre32_dfa_exec(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "PCRE_SPTR32 \fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
|
||||
.ti +5n
|
||||
.B int *\fIworkspace\fP, int \fIwscount\fP);
|
||||
.B " PCRE_SPTR32 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,"
|
||||
.B " int *\fIworkspace\fP, int \fIwscount\fP);"
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH "PCRE 32-BIT API STRING EXTRACTION FUNCTIONS"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre32_copy_named_substring(const pcre32 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, PCRE_SPTR32 \fIstringname\fP,
|
||||
.ti +5n
|
||||
.B PCRE_UCHAR32 *\fIbuffer\fP, int \fIbuffersize\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP,"
|
||||
.B " int \fIstringcount\fP, PCRE_SPTR32 \fIstringname\fP,"
|
||||
.B " PCRE_UCHAR32 *\fIbuffer\fP, int \fIbuffersize\fP);"
|
||||
.sp
|
||||
.B int pcre32_copy_substring(PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR32 *\fIbuffer\fP,
|
||||
.ti +5n
|
||||
.B int \fIbuffersize\fP);
|
||||
.PP
|
||||
.B " int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR32 *\fIbuffer\fP,"
|
||||
.B " int \fIbuffersize\fP);"
|
||||
.sp
|
||||
.B int pcre32_get_named_substring(const pcre32 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, PCRE_SPTR32 \fIstringname\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 *\fIstringptr\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP,"
|
||||
.B " int \fIstringcount\fP, PCRE_SPTR32 \fIstringname\fP,"
|
||||
.B " PCRE_SPTR32 *\fIstringptr\fP);"
|
||||
.sp
|
||||
.B int pcre32_get_stringnumber(const pcre32 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 \fIname\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR32 \fIname\fP);"
|
||||
.sp
|
||||
.B int pcre32_get_stringtable_entries(const pcre32 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 \fIname\fP, PCRE_UCHAR32 **\fIfirst\fP, PCRE_UCHAR32 **\fIlast\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR32 \fIname\fP, PCRE_UCHAR32 **\fIfirst\fP, PCRE_UCHAR32 **\fIlast\fP);"
|
||||
.sp
|
||||
.B int pcre32_get_substring(PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 *\fIstringptr\fP);
|
||||
.PP
|
||||
.B " int \fIstringcount\fP, int \fIstringnumber\fP,"
|
||||
.B " PCRE_SPTR32 *\fIstringptr\fP);"
|
||||
.sp
|
||||
.B int pcre32_get_substring_list(PCRE_SPTR32 \fIsubject\fP,
|
||||
.ti +5n
|
||||
.B int *\fIovector\fP, int \fIstringcount\fP, "PCRE_SPTR32 **\fIlistptr\fP);"
|
||||
.PP
|
||||
.B " int *\fIovector\fP, int \fIstringcount\fP, PCRE_SPTR32 **\fIlistptr\fP);"
|
||||
.sp
|
||||
.B void pcre32_free_substring(PCRE_SPTR32 \fIstringptr\fP);
|
||||
.PP
|
||||
.sp
|
||||
.B void pcre32_free_substring_list(PCRE_SPTR32 *\fIstringptr\fP);
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH "PCRE 32-BIT API AUXILIARY FUNCTIONS"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B pcre32_jit_stack *pcre32_jit_stack_alloc(int \fIstartsize\fP, int \fImaxsize\fP);
|
||||
.PP
|
||||
.sp
|
||||
.B void pcre32_jit_stack_free(pcre32_jit_stack *\fIstack\fP);
|
||||
.PP
|
||||
.sp
|
||||
.B void pcre32_assign_jit_stack(pcre32_extra *\fIextra\fP,
|
||||
.ti +5n
|
||||
.B pcre32_jit_callback \fIcallback\fP, void *\fIdata\fP);
|
||||
.PP
|
||||
.B " pcre32_jit_callback \fIcallback\fP, void *\fIdata\fP);"
|
||||
.sp
|
||||
.B const unsigned char *pcre32_maketables(void);
|
||||
.PP
|
||||
.sp
|
||||
.B int pcre32_fullinfo(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B int \fIwhat\fP, void *\fIwhere\fP);
|
||||
.PP
|
||||
.B " int \fIwhat\fP, void *\fIwhere\fP);"
|
||||
.sp
|
||||
.B int pcre32_refcount(pcre32 *\fIcode\fP, int \fIadjust\fP);
|
||||
.PP
|
||||
.sp
|
||||
.B int pcre32_config(int \fIwhat\fP, void *\fIwhere\fP);
|
||||
.PP
|
||||
.sp
|
||||
.B const char *pcre32_version(void);
|
||||
.PP
|
||||
.sp
|
||||
.B int pcre32_pattern_to_host_byte_order(pcre32 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B pcre32_extra *\fIextra\fP, const unsigned char *\fItables\fP);
|
||||
.B " pcre32_extra *\fIextra\fP, const unsigned char *\fItables\fP);"
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH "PCRE 32-BIT API INDIRECTED FUNCTIONS"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B void *(*pcre32_malloc)(size_t);
|
||||
.PP
|
||||
.sp
|
||||
.B void (*pcre32_free)(void *);
|
||||
.PP
|
||||
.sp
|
||||
.B void *(*pcre32_stack_malloc)(size_t);
|
||||
.PP
|
||||
.sp
|
||||
.B void (*pcre32_stack_free)(void *);
|
||||
.PP
|
||||
.sp
|
||||
.B int (*pcre32_callout)(pcre32_callout_block *);
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH "PCRE 32-BIT API 32-BIT-ONLY FUNCTION"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *\fIoutput\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 \fIinput\fP, int \fIlength\fP, int *\fIbyte_order\fP,
|
||||
.ti +5n
|
||||
.B int \fIkeep_boms\fP);
|
||||
.B " PCRE_SPTR32 \fIinput\fP, int \fIlength\fP, int *\fIbyte_order\fP,"
|
||||
.B " int \fIkeep_boms\fP);"
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH "THE PCRE 32-BIT LIBRARY"
|
||||
@ -246,8 +225,9 @@ buffer, including the zero terminator if the string was zero-terminated.
|
||||
.SH "SUBJECT STRING OFFSETS"
|
||||
.rs
|
||||
.sp
|
||||
The offsets within subject strings that are returned by the matching functions
|
||||
are in 32-bit units rather than bytes.
|
||||
The lengths and starting offsets of subject strings must be specified in 32-bit
|
||||
data units, and the offsets within subject strings that are returned by the
|
||||
matching functions are in also 32-bit units rather than bytes.
|
||||
.
|
||||
.
|
||||
.SH "NAMED SUBPATTERNS"
|
||||
@ -384,6 +364,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 08 November 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 12 May 2013
|
||||
Copyright (c) 1997-2013 University of Cambridge.
|
||||
.fi
|
||||
|
@ -6,18 +6,16 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B void pcre_assign_jit_stack(pcre_extra *\fIextra\fP,
|
||||
.ti +5n
|
||||
.B pcre_jit_callback \fIcallback\fP, void *\fIdata\fP);
|
||||
.PP
|
||||
.B " pcre_jit_callback \fIcallback\fP, void *\fIdata\fP);"
|
||||
.sp
|
||||
.B void pcre16_assign_jit_stack(pcre16_extra *\fIextra\fP,
|
||||
.ti +5n
|
||||
.B pcre16_jit_callback \fIcallback\fP, void *\fIdata\fP);
|
||||
.PP
|
||||
.B " pcre16_jit_callback \fIcallback\fP, void *\fIdata\fP);"
|
||||
.sp
|
||||
.B void pcre32_assign_jit_stack(pcre32_extra *\fIextra\fP,
|
||||
.ti +5n
|
||||
.B pcre32_jit_callback \fIcallback\fP, void *\fIdata\fP);
|
||||
.B " pcre32_jit_callback \fIcallback\fP, void *\fIdata\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRE_COMPILE 3 "24 June 2012" "PCRE 8.30"
|
||||
.TH PCRE_COMPILE 3 "01 October 2013" "PCRE 8.34"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
@ -6,24 +6,19 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B pcre *pcre_compile(const char *\fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.PP
|
||||
.B " const char **\fIerrptr\fP, int *\fIerroffset\fP,"
|
||||
.B " const unsigned char *\fItableptr\fP);"
|
||||
.sp
|
||||
.B pcre16 *pcre16_compile(PCRE_SPTR16 \fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.PP
|
||||
.B " const char **\fIerrptr\fP, int *\fIerroffset\fP,"
|
||||
.B " const unsigned char *\fItableptr\fP);"
|
||||
.sp
|
||||
.B pcre32 *pcre32_compile(PCRE_SPTR32 \fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.B " const char **\fIerrptr\fP, int *\fIerroffset\fP,"
|
||||
.B " const unsigned char *\fItableptr\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
@ -56,6 +51,7 @@ The option bits are:
|
||||
PCRE_FIRSTLINE Force matching to be before newline
|
||||
PCRE_JAVASCRIPT_COMPAT JavaScript compatibility
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NEVER_UTF Lock out UTF, e.g. via (*UTF)
|
||||
PCRE_NEWLINE_ANY Recognize any Unicode newline sequence
|
||||
PCRE_NEWLINE_ANYCRLF Recognize CR, LF, and CRLF as newline
|
||||
sequences
|
||||
@ -64,6 +60,8 @@ The option bits are:
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_NO_AUTO_POSSESS Disable auto-possessification
|
||||
PCRE_NO_START_OPTIMIZE Disable match-time start optimizations
|
||||
PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16
|
||||
validity (only relevant if
|
||||
PCRE_UTF16 is set)
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRE_COMPILE2 3 "24 June 2012" "PCRE 8.30"
|
||||
.TH PCRE_COMPILE2 3 "01 October 2013" "PCRE 8.34"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
@ -6,30 +6,22 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B pcre *pcre_compile2(const char *\fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B int *\fIerrorcodeptr\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.PP
|
||||
.B " int *\fIerrorcodeptr\fP,"
|
||||
.B " const char **\fIerrptr\fP, int *\fIerroffset\fP,"
|
||||
.B " const unsigned char *\fItableptr\fP);"
|
||||
.sp
|
||||
.B pcre16 *pcre16_compile2(PCRE_SPTR16 \fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B int *\fIerrorcodeptr\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.PP
|
||||
.B " int *\fIerrorcodeptr\fP,"
|
||||
.B " const char **\fIerrptr\fP, int *\fIerroffset\fP,"
|
||||
.B " const unsigned char *\fItableptr\fP);"
|
||||
.sp
|
||||
.B pcre32 *pcre32_compile2(PCRE_SPTR32 \fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B int *\fIerrorcodeptr\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.B " int *\fIerrorcodeptr\fP,£
|
||||
.B " const char **\fIerrptr\fP, int *\fIerroffset\fP,"
|
||||
.B " const unsigned char *\fItableptr\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
@ -64,6 +56,7 @@ The option bits are:
|
||||
PCRE_FIRSTLINE Force matching to be before newline
|
||||
PCRE_JAVASCRIPT_COMPAT JavaScript compatibility
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NEVER_UTF Lock out UTF, e.g. via (*UTF)
|
||||
PCRE_NEWLINE_ANY Recognize any Unicode newline sequence
|
||||
PCRE_NEWLINE_ANYCRLF Recognize CR, LF, and CRLF as newline
|
||||
sequences
|
||||
@ -72,6 +65,8 @@ The option bits are:
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_NO_AUTO_POSSESS Disable auto-possessification
|
||||
PCRE_NO_START_OPTIMIZE Disable match-time start optimizations
|
||||
PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16
|
||||
validity (only relevant if
|
||||
PCRE_UTF16 is set)
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRE_CONFIG 3 "24 June 2012" "PCRE 8.30"
|
||||
.TH PCRE_CONFIG 3 "05 November 2013" "PCRE 8.34"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
@ -33,6 +33,7 @@ point to an unsigned long integer. The available codes are:
|
||||
target architecture for the JIT compiler,
|
||||
or NULL if there is no JIT support
|
||||
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
|
||||
PCRE_CONFIG_PARENS_LIMIT Parentheses nesting limit
|
||||
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
|
||||
PCRE_CONFIG_MATCH_LIMIT_RECURSION
|
||||
Internal recursion depth limit
|
||||
|
@ -6,30 +6,22 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre_copy_named_substring(const pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B const char *\fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, const char *\fIstringname\fP,
|
||||
.ti +5n
|
||||
.B char *\fIbuffer\fP, int \fIbuffersize\fP);
|
||||
.PP
|
||||
.B " const char *\fIsubject\fP, int *\fIovector\fP,"
|
||||
.B " int \fIstringcount\fP, const char *\fIstringname\fP,"
|
||||
.B " char *\fIbuffer\fP, int \fIbuffersize\fP);"
|
||||
.sp
|
||||
.B int pcre16_copy_named_substring(const pcre16 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP,
|
||||
.ti +5n
|
||||
.B PCRE_UCHAR16 *\fIbuffer\fP, int \fIbuffersize\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,"
|
||||
.B " int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP,"
|
||||
.B " PCRE_UCHAR16 *\fIbuffer\fP, int \fIbuffersize\fP);"
|
||||
.sp
|
||||
.B int pcre32_copy_named_substring(const pcre32 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, PCRE_SPTR32 \fIstringname\fP,
|
||||
.ti +5n
|
||||
.B PCRE_UCHAR32 *\fIbuffer\fP, int \fIbuffersize\fP);
|
||||
.B " PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP,"
|
||||
.B " int \fIstringcount\fP, PCRE_SPTR32 \fIstringname\fP,"
|
||||
.B " PCRE_UCHAR32 *\fIbuffer\fP, int \fIbuffersize\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -6,24 +6,19 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre_copy_substring(const char *\fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP, char *\fIbuffer\fP,
|
||||
.ti +5n
|
||||
.B int \fIbuffersize\fP);
|
||||
.PP
|
||||
.B " int \fIstringcount\fP, int \fIstringnumber\fP, char *\fIbuffer\fP,"
|
||||
.B " int \fIbuffersize\fP);"
|
||||
.sp
|
||||
.B int pcre16_copy_substring(PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR16 *\fIbuffer\fP,
|
||||
.ti +5n
|
||||
.B int \fIbuffersize\fP);
|
||||
.PP
|
||||
.B " int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR16 *\fIbuffer\fP,"
|
||||
.B " int \fIbuffersize\fP);"
|
||||
.sp
|
||||
.B int pcre32_copy_substring(PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR32 *\fIbuffer\fP,
|
||||
.ti +5n
|
||||
.B int \fIbuffersize\fP);
|
||||
.B " int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR32 *\fIbuffer\fP,"
|
||||
.B " int \fIbuffersize\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRE_DFA_EXEC 3 "24 June 2012" "PCRE 8.30"
|
||||
.TH PCRE_DFA_EXEC 3 "12 May 2013" "PCRE 8.33"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
@ -6,30 +6,22 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
|
||||
.ti +5n
|
||||
.B int *\fIworkspace\fP, int \fIwscount\fP);
|
||||
.PP
|
||||
.B " const char *\fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,"
|
||||
.B " int *\fIworkspace\fP, int \fIwscount\fP);"
|
||||
.sp
|
||||
.B int pcre16_dfa_exec(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "PCRE_SPTR16 \fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
|
||||
.ti +5n
|
||||
.B int *\fIworkspace\fP, int \fIwscount\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR16 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,"
|
||||
.B " int *\fIworkspace\fP, int \fIwscount\fP);"
|
||||
.sp
|
||||
.B int pcre32_dfa_exec(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "PCRE_SPTR32 \fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
|
||||
.ti +5n
|
||||
.B int *\fIworkspace\fP, int \fIwscount\fP);
|
||||
.B " PCRE_SPTR32 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,"
|
||||
.B " int *\fIworkspace\fP, int \fIwscount\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
@ -44,16 +36,17 @@ are:
|
||||
\fIextra\fP Points to an associated \fBpcre[16|32]_extra\fP structure,
|
||||
or is NULL
|
||||
\fIsubject\fP Points to the subject string
|
||||
\fIlength\fP Length of the subject string, in bytes
|
||||
\fIstartoffset\fP Offset in bytes in the subject at which to
|
||||
start matching
|
||||
\fIlength\fP Length of the subject string
|
||||
\fIstartoffset\fP Offset in the subject at which to start matching
|
||||
\fIoptions\fP Option bits
|
||||
\fIovector\fP Points to a vector of ints for result offsets
|
||||
\fIovecsize\fP Number of elements in the vector
|
||||
\fIworkspace\fP Points to a vector of ints used as working space
|
||||
\fIwscount\fP Number of elements in the vector
|
||||
.sp
|
||||
The options are:
|
||||
The units for \fIlength\fP and \fIstartoffset\fP are bytes for
|
||||
\fBpcre_exec()\fP, 16-bit data items for \fBpcre16_exec()\fP, and 32-bit items
|
||||
for \fBpcre32_exec()\fP. The options are:
|
||||
.sp
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_BSR_ANYCRLF \eR matches only CR, LF, or CRLF
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRE_EXEC 3 "24 June 2012" "PCRE 8.30"
|
||||
.TH PCRE_EXEC 3 "12 May 2013" "PCRE 8.33"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
@ -6,24 +6,19 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);
|
||||
.PP
|
||||
.B " const char *\fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);"
|
||||
.sp
|
||||
.B int pcre16_exec(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "PCRE_SPTR16 \fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR16 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);"
|
||||
.sp
|
||||
.B int pcre32_exec(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "PCRE_SPTR32 \fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);
|
||||
.B " PCRE_SPTR32 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
@ -36,14 +31,15 @@ offsets to captured substrings. Its arguments are:
|
||||
\fIextra\fP Points to an associated \fBpcre[16|32]_extra\fP structure,
|
||||
or is NULL
|
||||
\fIsubject\fP Points to the subject string
|
||||
\fIlength\fP Length of the subject string, in bytes
|
||||
\fIstartoffset\fP Offset in bytes in the subject at which to
|
||||
start matching
|
||||
\fIlength\fP Length of the subject string
|
||||
\fIstartoffset\fP Offset in the subject at which to start matching
|
||||
\fIoptions\fP Option bits
|
||||
\fIovector\fP Points to a vector of ints for result offsets
|
||||
\fIovecsize\fP Number of elements in the vector (a multiple of 3)
|
||||
.sp
|
||||
The options are:
|
||||
The units for \fIlength\fP and \fIstartoffset\fP are bytes for
|
||||
\fBpcre_exec()\fP, 16-bit data items for \fBpcre16_exec()\fP, and 32-bit items
|
||||
for \fBpcre32_exec()\fP. The options are:
|
||||
.sp
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_BSR_ANYCRLF \eR matches only CR, LF, or CRLF
|
||||
|
@ -6,18 +6,16 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre_fullinfo(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B int \fIwhat\fP, void *\fIwhere\fP);
|
||||
.PP
|
||||
.B " int \fIwhat\fP, void *\fIwhere\fP);"
|
||||
.sp
|
||||
.B int pcre16_fullinfo(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B int \fIwhat\fP, void *\fIwhere\fP);
|
||||
.PP
|
||||
.B " int \fIwhat\fP, void *\fIwhere\fP);"
|
||||
.sp
|
||||
.B int pcre32_fullinfo(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B int \fIwhat\fP, void *\fIwhere\fP);
|
||||
.B " int \fIwhat\fP, void *\fIwhere\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -6,30 +6,22 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre_get_named_substring(const pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B const char *\fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, const char *\fIstringname\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIstringptr\fP);
|
||||
.PP
|
||||
.B " const char *\fIsubject\fP, int *\fIovector\fP,"
|
||||
.B " int \fIstringcount\fP, const char *\fIstringname\fP,"
|
||||
.B " const char **\fIstringptr\fP);"
|
||||
.sp
|
||||
.B int pcre16_get_named_substring(const pcre16 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 *\fIstringptr\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,"
|
||||
.B " int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP,"
|
||||
.B " PCRE_SPTR16 *\fIstringptr\fP);"
|
||||
.sp
|
||||
.B int pcre32_get_named_substring(const pcre32 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, PCRE_SPTR32 \fIstringname\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 *\fIstringptr\fP);
|
||||
.B " PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP,"
|
||||
.B " int \fIstringcount\fP, PCRE_SPTR32 \fIstringname\fP,"
|
||||
.B " PCRE_SPTR32 *\fIstringptr\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -6,18 +6,16 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre_get_stringnumber(const pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B const char *\fIname\fP);
|
||||
.PP
|
||||
.B " const char *\fIname\fP);"
|
||||
.sp
|
||||
.B int pcre16_get_stringnumber(const pcre16 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 \fIname\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR16 \fIname\fP);"
|
||||
.sp
|
||||
.B int pcre32_get_stringnumber(const pcre32 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 \fIname\fP);
|
||||
.B " PCRE_SPTR32 \fIname\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -6,18 +6,16 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre_get_stringtable_entries(const pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B const char *\fIname\fP, char **\fIfirst\fP, char **\fIlast\fP);
|
||||
.PP
|
||||
.B " const char *\fIname\fP, char **\fIfirst\fP, char **\fIlast\fP);"
|
||||
.sp
|
||||
.B int pcre16_get_stringtable_entries(const pcre16 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 \fIname\fP, PCRE_UCHAR16 **\fIfirst\fP, PCRE_UCHAR16 **\fIlast\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR16 \fIname\fP, PCRE_UCHAR16 **\fIfirst\fP, PCRE_UCHAR16 **\fIlast\fP);"
|
||||
.sp
|
||||
.B int pcre32_get_stringtable_entries(const pcre32 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 \fIname\fP, PCRE_UCHAR32 **\fIfirst\fP, PCRE_UCHAR32 **\fIlast\fP);
|
||||
.B " PCRE_SPTR32 \fIname\fP, PCRE_UCHAR32 **\fIfirst\fP, PCRE_UCHAR32 **\fIlast\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -6,24 +6,19 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre_get_substring(const char *\fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIstringptr\fP);
|
||||
.PP
|
||||
.B " int \fIstringcount\fP, int \fIstringnumber\fP,"
|
||||
.B " const char **\fIstringptr\fP);"
|
||||
.sp
|
||||
.B int pcre16_get_substring(PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 *\fIstringptr\fP);
|
||||
.PP
|
||||
.B " int \fIstringcount\fP, int \fIstringnumber\fP,"
|
||||
.B " PCRE_SPTR16 *\fIstringptr\fP);"
|
||||
.sp
|
||||
.B int pcre32_get_substring(PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 *\fIstringptr\fP);
|
||||
.B " int \fIstringcount\fP, int \fIstringnumber\fP,"
|
||||
.B " PCRE_SPTR32 *\fIstringptr\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -6,18 +6,16 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre_get_substring_list(const char *\fIsubject\fP,
|
||||
.ti +5n
|
||||
.B int *\fIovector\fP, int \fIstringcount\fP, "const char ***\fIlistptr\fP);"
|
||||
.PP
|
||||
.B " int *\fIovector\fP, int \fIstringcount\fP, const char ***\fIlistptr\fP);"
|
||||
.sp
|
||||
.B int pcre16_get_substring_list(PCRE_SPTR16 \fIsubject\fP,
|
||||
.ti +5n
|
||||
.B int *\fIovector\fP, int \fIstringcount\fP, "PCRE_SPTR16 **\fIlistptr\fP);"
|
||||
.PP
|
||||
.B " int *\fIovector\fP, int \fIstringcount\fP, PCRE_SPTR16 **\fIlistptr\fP);"
|
||||
.sp
|
||||
.B int pcre32_get_substring_list(PCRE_SPTR32 \fIsubject\fP,
|
||||
.ti +5n
|
||||
.B int *\fIovector\fP, int \fIstringcount\fP, "PCRE_SPTR32 **\fIlistptr\fP);"
|
||||
.B " int *\fIovector\fP, int \fIstringcount\fP, PCRE_SPTR32 **\fIlistptr\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -6,30 +6,22 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre_jit_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
|
||||
.ti +5n
|
||||
.B pcre_jit_stack *\fIjstack\fP);
|
||||
.PP
|
||||
.B " const char *\fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,"
|
||||
.B " pcre_jit_stack *\fIjstack\fP);"
|
||||
.sp
|
||||
.B int pcre16_jit_exec(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "PCRE_SPTR16 \fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
|
||||
.ti +5n
|
||||
.B pcre_jit_stack *\fIjstack\fP);
|
||||
.PP
|
||||
.B " PCRE_SPTR16 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,"
|
||||
.B " pcre_jit_stack *\fIjstack\fP);"
|
||||
.sp
|
||||
.B int pcre32_jit_exec(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "PCRE_SPTR32 \fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
|
||||
.ti +5n
|
||||
.B pcre_jit_stack *\fIjstack\fP);
|
||||
.B " PCRE_SPTR32 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP,"
|
||||
.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,"
|
||||
.B " pcre_jit_stack *\fIjstack\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -6,18 +6,16 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B pcre_jit_stack *pcre_jit_stack_alloc(int \fIstartsize\fP,
|
||||
.ti +5n
|
||||
.B int \fImaxsize\fP);
|
||||
.PP
|
||||
.B " int \fImaxsize\fP);"
|
||||
.sp
|
||||
.B pcre16_jit_stack *pcre16_jit_stack_alloc(int \fIstartsize\fP,
|
||||
.ti +5n
|
||||
.B int \fImaxsize\fP);
|
||||
.PP
|
||||
.B " int \fImaxsize\fP);"
|
||||
.sp
|
||||
.B pcre32_jit_stack *pcre32_jit_stack_alloc(int \fIstartsize\fP,
|
||||
.ti +5n
|
||||
.B int \fImaxsize\fP);
|
||||
.B " int \fImaxsize\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -6,18 +6,16 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre_pattern_to_host_byte_order(pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B pcre_extra *\fIextra\fP, const unsigned char *\fItables\fP);
|
||||
.PP
|
||||
.B " pcre_extra *\fIextra\fP, const unsigned char *\fItables\fP);"
|
||||
.sp
|
||||
.B int pcre16_pattern_to_host_byte_order(pcre16 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B pcre16_extra *\fIextra\fP, const unsigned char *\fItables\fP);
|
||||
.PP
|
||||
.B " pcre16_extra *\fIextra\fP, const unsigned char *\fItables\fP);"
|
||||
.sp
|
||||
.B int pcre32_pattern_to_host_byte_order(pcre32 *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B pcre32_extra *\fIextra\fP, const unsigned char *\fItables\fP);
|
||||
.B " pcre32_extra *\fIextra\fP, const unsigned char *\fItables\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -6,18 +6,16 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP);
|
||||
.PP
|
||||
.B " const char **\fIerrptr\fP);"
|
||||
.sp
|
||||
.B pcre16_extra *pcre16_study(const pcre16 *\fIcode\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP);
|
||||
.PP
|
||||
.B " const char **\fIerrptr\fP);"
|
||||
.sp
|
||||
.B pcre32_extra *pcre32_study(const pcre32 *\fIcode\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP);
|
||||
.B " const char **\fIerrptr\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -6,12 +6,11 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *\fIoutput\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR16 \fIinput\fP, int \fIlength\fP, int *\fIhost_byte_order\fP,
|
||||
.ti +5n
|
||||
.B int \fIkeep_boms\fP);
|
||||
.B " PCRE_SPTR16 \fIinput\fP, int \fIlength\fP, int *\fIhost_byte_order\fP,"
|
||||
.B " int \fIkeep_boms\fP);"
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
|
@ -6,12 +6,11 @@ PCRE - Perl-compatible regular expressions
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *\fIoutput\fP,
|
||||
.ti +5n
|
||||
.B PCRE_SPTR32 \fIinput\fP, int \fIlength\fP, int *\fIhost_byte_order\fP,
|
||||
.ti +5n
|
||||
.B int \fIkeep_boms\fP);
|
||||
.B " PCRE_SPTR32 \fIinput\fP, int \fIlength\fP, int *\fIhost_byte_order\fP,"
|
||||
.B " int \fIkeep_boms\fP);"
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -1,24 +1,54 @@
|
||||
.TH PCREBUILD 3 "30 October 2012" "PCRE 8.32"
|
||||
.TH PCREBUILD 3 "12 May 2013" "PCRE 8.33"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.
|
||||
.
|
||||
.SH "BUILDING PCRE"
|
||||
.rs
|
||||
.sp
|
||||
PCRE is distributed with a \fBconfigure\fP script that can be used to build the
|
||||
library in Unix-like environments using the applications known as Autotools.
|
||||
Also in the distribution are files to support building using \fBCMake\fP
|
||||
instead of \fBconfigure\fP. The text file
|
||||
.\" HTML <a href="README.txt">
|
||||
.\" </a>
|
||||
\fBREADME\fP
|
||||
.\"
|
||||
contains general information about building with Autotools (some of which is
|
||||
repeated below), and also has some comments about building on various operating
|
||||
systems. There is a lot more information about building PCRE without using
|
||||
Autotools (including information about using \fBCMake\fP and building "by
|
||||
hand") in the text file called
|
||||
.\" HTML <a href="NON-AUTOTOOLS-BUILD.txt">
|
||||
.\" </a>
|
||||
\fBNON-AUTOTOOLS-BUILD\fP.
|
||||
.\"
|
||||
You should consult this file as well as the
|
||||
.\" HTML <a href="README.txt">
|
||||
.\" </a>
|
||||
\fBREADME\fP
|
||||
.\"
|
||||
file if you are building in a non-Unix-like environment.
|
||||
.
|
||||
.
|
||||
.SH "PCRE BUILD-TIME OPTIONS"
|
||||
.rs
|
||||
.sp
|
||||
This document describes the optional features of PCRE that can be selected when
|
||||
the library is compiled. It assumes use of the \fBconfigure\fP script, where
|
||||
the optional features are selected or deselected by providing options to
|
||||
\fBconfigure\fP before running the \fBmake\fP command. However, the same
|
||||
options can be selected in both Unix-like and non-Unix-like environments using
|
||||
the GUI facility of \fBcmake-gui\fP if you are using \fBCMake\fP instead of
|
||||
\fBconfigure\fP to build PCRE.
|
||||
The rest of this document describes the optional features of PCRE that can be
|
||||
selected when the library is compiled. It assumes use of the \fBconfigure\fP
|
||||
script, where the optional features are selected or deselected by providing
|
||||
options to \fBconfigure\fP before running the \fBmake\fP command. However, the
|
||||
same options can be selected in both Unix-like and non-Unix-like environments
|
||||
using the GUI facility of \fBcmake-gui\fP if you are using \fBCMake\fP instead
|
||||
of \fBconfigure\fP to build PCRE.
|
||||
.P
|
||||
There is a lot more information about building PCRE without using
|
||||
\fBconfigure\fP (including information about using \fBCMake\fP or building "by
|
||||
hand") in the file called \fINON-AUTOTOOLS-BUILD\fP, which is part of the PCRE
|
||||
distribution. You should consult this file as well as the \fIREADME\fP file if
|
||||
you are building in a non-Unix-like environment.
|
||||
If you are not using Autotools or \fBCMake\fP, option selection can be done by
|
||||
editing the \fBconfig.h\fP file, or by passing parameter settings to the
|
||||
compiler, as described in
|
||||
.\" HTML <a href="NON-AUTOTOOLS-BUILD.txt">
|
||||
.\" </a>
|
||||
\fBNON-AUTOTOOLS-BUILD\fP.
|
||||
.\"
|
||||
.P
|
||||
The complete list of options for \fBconfigure\fP (which includes the standard
|
||||
ones such as the selection of the installation directory) can be obtained by
|
||||
@ -45,7 +75,7 @@ strings, by adding
|
||||
.sp
|
||||
--enable-pcre16
|
||||
.sp
|
||||
to the \fBconfigure\fP command. You can also build a separate
|
||||
to the \fBconfigure\fP command. You can also build yet another separate
|
||||
library, called \fBlibpcre32\fP, in which strings are contained in vectors of
|
||||
32-bit data units and interpreted either as single-unit characters or UTF-32
|
||||
strings, by adding
|
||||
@ -65,8 +95,8 @@ an 8-bit program. None of these are built if you select only the 16-bit or
|
||||
.SH "BUILDING SHARED AND STATIC LIBRARIES"
|
||||
.rs
|
||||
.sp
|
||||
The PCRE building process uses \fBlibtool\fP to build both shared and static
|
||||
Unix libraries by default. You can suppress one of these by adding one of
|
||||
The Autotools PCRE building process uses \fBlibtool\fP to build both shared and
|
||||
static libraries by default. You can suppress one of these by adding one of
|
||||
.sp
|
||||
--disable-shared
|
||||
--disable-static
|
||||
@ -515,6 +545,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 30 October 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 12 May 2013
|
||||
Copyright (c) 1997-2013 University of Cambridge.
|
||||
.fi
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRECALLOUT 3 "24 June 2012" "PCRE 8.30"
|
||||
.TH PCRECALLOUT 3 "12 November 2013" "PCRE 8.34"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
@ -41,26 +41,64 @@ it is processed as if it were
|
||||
(?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
|
||||
.sp
|
||||
Notice that there is a callout before and after each parenthesis and
|
||||
alternation bar. Automatic callouts can be used for tracking the progress of
|
||||
pattern matching. The
|
||||
alternation bar. If the pattern contains a conditional group whose condition is
|
||||
an assertion, an automatic callout is inserted immediately before the
|
||||
condition. Such a callout may also be inserted explicitly, for example:
|
||||
.sp
|
||||
(?(?C9)(?=a)ab|de)
|
||||
.sp
|
||||
This applies only to assertion conditions (because they are themselves
|
||||
independent groups).
|
||||
.P
|
||||
Automatic callouts can be used for tracking the progress of pattern matching.
|
||||
The
|
||||
.\" HREF
|
||||
\fBpcretest\fP
|
||||
.\"
|
||||
command has an option that sets automatic callouts; when it is used, the output
|
||||
indicates how the pattern is matched. This is useful information when you are
|
||||
trying to optimize the performance of a particular pattern.
|
||||
.P
|
||||
The use of callouts in a pattern makes it ineligible for optimization by the
|
||||
just-in-time compiler. Studying such a pattern with the PCRE_STUDY_JIT_COMPILE
|
||||
option always fails.
|
||||
program has a pattern qualifier (/C) that sets automatic callouts; when it is
|
||||
used, the output indicates how the pattern is being matched. This is useful
|
||||
information when you are trying to optimize the performance of a particular
|
||||
pattern.
|
||||
.
|
||||
.
|
||||
.SH "MISSING CALLOUTS"
|
||||
.rs
|
||||
.sp
|
||||
You should be aware that, because of optimizations in the way PCRE matches
|
||||
patterns by default, callouts sometimes do not happen. For example, if the
|
||||
pattern is
|
||||
You should be aware that, because of optimizations in the way PCRE compiles and
|
||||
matches patterns, callouts sometimes do not happen exactly as you might expect.
|
||||
.P
|
||||
At compile time, PCRE "auto-possessifies" repeated items when it knows that
|
||||
what follows cannot be part of the repeat. For example, a+[bc] is compiled as
|
||||
if it were a++[bc]. The \fBpcretest\fP output when this pattern is anchored and
|
||||
then applied with automatic callouts to the string "aaaa" is:
|
||||
.sp
|
||||
--->aaaa
|
||||
+0 ^ ^
|
||||
+1 ^ a+
|
||||
+3 ^ ^ [bc]
|
||||
No match
|
||||
.sp
|
||||
This indicates that when matching [bc] fails, there is no backtracking into a+
|
||||
and therefore the callouts that would be taken for the backtracks do not occur.
|
||||
You can disable the auto-possessify feature by passing PCRE_NO_AUTO_POSSESS
|
||||
to \fBpcre_compile()\fP, or starting the pattern with (*NO_AUTO_POSSESS). If
|
||||
this is done in \fBpcretest\fP (using the /O qualifier), the output changes to
|
||||
this:
|
||||
.sp
|
||||
--->aaaa
|
||||
+0 ^ ^
|
||||
+1 ^ a+
|
||||
+3 ^ ^ [bc]
|
||||
+3 ^ ^ [bc]
|
||||
+3 ^ ^ [bc]
|
||||
+3 ^^ [bc]
|
||||
No match
|
||||
.sp
|
||||
This time, when matching [bc] fails, the matcher backtracks into a+ and tries
|
||||
again, repeatedly, until a+ itself fails.
|
||||
.P
|
||||
Other optimizations that provide fast "no match" results also affect callouts.
|
||||
For example, if the pattern is
|
||||
.sp
|
||||
ab(?C4)cd
|
||||
.sp
|
||||
@ -84,11 +122,11 @@ callouts such as the example above are obeyed.
|
||||
.rs
|
||||
.sp
|
||||
During matching, when PCRE reaches a callout point, the external function
|
||||
defined by \fIpcre_callout\fP or \fIpcre[16|32]_callout\fP is called
|
||||
(if it is set). This applies to both normal and DFA matching. The only
|
||||
argument to the callout function is a pointer to a \fBpcre_callout\fP
|
||||
or \fBpcre[16|32]_callout\fP block.
|
||||
These structures contains the following fields:
|
||||
defined by \fIpcre_callout\fP or \fIpcre[16|32]_callout\fP is called (if it is
|
||||
set). This applies to both normal and DFA matching. The only argument to the
|
||||
callout function is a pointer to a \fBpcre_callout\fP or
|
||||
\fBpcre[16|32]_callout\fP block. These structures contains the following
|
||||
fields:
|
||||
.sp
|
||||
int \fIversion\fP;
|
||||
int \fIcallout_number\fP;
|
||||
@ -119,10 +157,10 @@ automatically generated callouts).
|
||||
.P
|
||||
The \fIoffset_vector\fP field is a pointer to the vector of offsets that was
|
||||
passed by the caller to the matching function. When \fBpcre_exec()\fP or
|
||||
\fBpcre[16|32]_exec()\fP is used, the contents can be inspected, in order to extract
|
||||
substrings that have been matched so far, in the same way as for extracting
|
||||
substrings after a match has completed. For the DFA matching functions, this
|
||||
field is not useful.
|
||||
\fBpcre[16|32]_exec()\fP is used, the contents can be inspected, in order to
|
||||
extract substrings that have been matched so far, in the same way as for
|
||||
extracting substrings after a match has completed. For the DFA matching
|
||||
functions, this field is not useful.
|
||||
.P
|
||||
The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values
|
||||
that were passed to the matching function.
|
||||
@ -144,8 +182,10 @@ value of \fIcapture_top\fP is one. This is always the case when the DFA
|
||||
functions are used, because they do not support captured substrings.
|
||||
.P
|
||||
The \fIcapture_last\fP field contains the number of the most recently captured
|
||||
substring. If no substrings have been captured, its value is -1. This is always
|
||||
the case for the DFA matching functions.
|
||||
substring. However, when a recursion exits, the value reverts to what it was
|
||||
outside the recursion, as do the values of all captured substrings. If no
|
||||
substrings have been captured, the value of \fIcapture_last\fP is -1. This is
|
||||
always the case for the DFA matching functions.
|
||||
.P
|
||||
The \fIcallout_data\fP field contains a value that is passed to a matching
|
||||
function specifically so that it can be passed back in callouts. It is passed
|
||||
@ -173,11 +213,12 @@ help in distinguishing between different automatic callouts, which all have the
|
||||
same callout number. However, they are set for all callouts.
|
||||
.P
|
||||
The \fImark\fP field is present from version 2 of the callout structure. In
|
||||
callouts from \fBpcre_exec()\fP or \fBpcre[16|32]_exec()\fP it contains a pointer to
|
||||
the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
|
||||
(*THEN) item in the match, or NULL if no such items have been passed. Instances
|
||||
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
|
||||
callouts from the DFA matching functions this field always contains NULL.
|
||||
callouts from \fBpcre_exec()\fP or \fBpcre[16|32]_exec()\fP it contains a
|
||||
pointer to the zero-terminated name of the most recently passed (*MARK),
|
||||
(*PRUNE), or (*THEN) item in the match, or NULL if no such items have been
|
||||
passed. Instances of (*PRUNE) or (*THEN) without a name do not obliterate a
|
||||
previous (*MARK). In callouts from the DFA matching functions this field always
|
||||
contains NULL.
|
||||
.
|
||||
.
|
||||
.SH "RETURN VALUES"
|
||||
@ -209,6 +250,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 24 June 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 12 November 2013
|
||||
Copyright (c) 1997-2013 University of Cambridge.
|
||||
.fi
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRECOMPAT 3 "24 June 2012" "PCRE 8.30"
|
||||
.TH PCRECOMPAT 3 "10 November 2013" "PCRE 8.34"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "DIFFERENCES BETWEEN PCRE AND PERL"
|
||||
@ -23,10 +23,8 @@ just once). Perl allows repeat quantifiers on other assertions such as \eb, but
|
||||
these do not seem to have any use.
|
||||
.P
|
||||
3. Capturing subpatterns that occur inside negative lookahead assertions are
|
||||
counted, but their entries in the offsets vector are never set. Perl sets its
|
||||
numerical variables from any such patterns that are matched before the
|
||||
assertion fails to match something (thereby succeeding), but only if the
|
||||
negative lookahead assertion contains just one branch.
|
||||
counted, but their entries in the offsets vector are never set. Perl sometimes
|
||||
(but not always) sets its numerical variables from inside negative assertions.
|
||||
.P
|
||||
4. Though binary zero characters are supported in the subject string, they are
|
||||
not allowed in a pattern string because it is passed as a normal C string,
|
||||
@ -91,22 +89,28 @@ in the
|
||||
.\"
|
||||
page.
|
||||
.P
|
||||
10. If any of the backtracking control verbs are used in an assertion or in a
|
||||
subpattern that is called as a subroutine (whether or not recursively), their
|
||||
effect is confined to that subpattern; it does not extend to the surrounding
|
||||
pattern. This is not always the case in Perl. In particular, if (*THEN) is
|
||||
present in a group that is called as a subroutine, its action is limited to
|
||||
that group, even if the group does not contain any | characters. There is one
|
||||
exception to this: the name from a *(MARK), (*PRUNE), or (*THEN) that is
|
||||
encountered in a successful positive assertion \fIis\fP passed back when a
|
||||
match succeeds (compare capturing parentheses in assertions). Note that such
|
||||
subpatterns are processed as anchored at the point where they are tested.
|
||||
10. If any of the backtracking control verbs are used in a subpattern that is
|
||||
called as a subroutine (whether or not recursively), their effect is confined
|
||||
to that subpattern; it does not extend to the surrounding pattern. This is not
|
||||
always the case in Perl. In particular, if (*THEN) is present in a group that
|
||||
is called as a subroutine, its action is limited to that group, even if the
|
||||
group does not contain any | characters. Note that such subpatterns are
|
||||
processed as anchored at the point where they are tested.
|
||||
.P
|
||||
11. There are some differences that are concerned with the settings of captured
|
||||
11. If a pattern contains more than one backtracking control verb, the first
|
||||
one that is backtracked onto acts. For example, in the pattern
|
||||
A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C
|
||||
triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
|
||||
same as PCRE, but there are examples where it differs.
|
||||
.P
|
||||
12. Most backtracking verbs in assertions have their normal actions. They are
|
||||
not confined to the assertion.
|
||||
.P
|
||||
13. There are some differences that are concerned with the settings of captured
|
||||
strings when part of a pattern is repeated. For example, matching "aba" against
|
||||
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
|
||||
.P
|
||||
12. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
|
||||
14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
|
||||
names is not as general as Perl's. This is a consequence of the fact the PCRE
|
||||
works internally just with numbers, using an external table to translate
|
||||
between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B),
|
||||
@ -116,12 +120,23 @@ would not be possible to distinguish which parentheses matched, because both
|
||||
names map to capturing subpattern number 1. To avoid this confusing situation,
|
||||
an error is given at compile time.
|
||||
.P
|
||||
13. Perl recognizes comments in some places that PCRE does not, for example,
|
||||
15. Perl recognizes comments in some places that PCRE does not, for example,
|
||||
between the ( and ? at the start of a subpattern. If the /x modifier is set,
|
||||
Perl allows white space between ( and ? but PCRE never does, even if the
|
||||
PCRE_EXTENDED option is set.
|
||||
Perl allows white space between ( and ? (though current Perls warn that this is
|
||||
deprecated) but PCRE never does, even if the PCRE_EXTENDED option is set.
|
||||
.P
|
||||
14. PCRE provides some extensions to the Perl regular expression facilities.
|
||||
16. Perl, when in warning mode, gives warnings for character classes such as
|
||||
[A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE has no
|
||||
warning features, so it gives an error in these cases because they are almost
|
||||
certainly user mistakes.
|
||||
.P
|
||||
17. In PCRE, the upper/lower case character properties Lu and Ll are not
|
||||
affected when case-independent matching is specified. For example, \ep{Lu}
|
||||
always matches an upper case letter. I think Perl has changed in this respect;
|
||||
in the release at the time of writing (5.16), \ep{Lu} and \ep{Ll} match all
|
||||
letters, regardless of case, when case independence is specified.
|
||||
.P
|
||||
18. PCRE provides some extensions to the Perl regular expression facilities.
|
||||
Perl 5.10 includes new features that are not in earlier versions of Perl, some
|
||||
of which (such as named parentheses) have been in PCRE for some time. This list
|
||||
is with respect to Perl 5.10:
|
||||
@ -180,6 +195,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 25 August 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 10 November 2013
|
||||
Copyright (c) 1997-2013 University of Cambridge.
|
||||
.fi
|
||||
|
424
tools/pcre/doc/pcredemo.3
Normal file
424
tools/pcre/doc/pcredemo.3
Normal file
@ -0,0 +1,424 @@
|
||||
.\" Start example.
|
||||
.de EX
|
||||
. nr mE \\n(.f
|
||||
. nf
|
||||
. nh
|
||||
. ft CW
|
||||
..
|
||||
.
|
||||
.
|
||||
.\" End example.
|
||||
.de EE
|
||||
. ft \\n(mE
|
||||
. fi
|
||||
. hy \\n(HY
|
||||
..
|
||||
.
|
||||
.EX
|
||||
/*************************************************
|
||||
* PCRE DEMONSTRATION PROGRAM *
|
||||
*************************************************/
|
||||
|
||||
/* This is a demonstration program to illustrate the most straightforward ways
|
||||
of calling the PCRE regular expression library from a C program. See the
|
||||
pcresample documentation for a short discussion ("man pcresample" if you have
|
||||
the PCRE man pages installed).
|
||||
|
||||
In Unix-like environments, if PCRE is installed in your standard system
|
||||
libraries, you should be able to compile this program using this command:
|
||||
|
||||
gcc -Wall pcredemo.c -lpcre -o pcredemo
|
||||
|
||||
If PCRE is not installed in a standard place, it is likely to be installed with
|
||||
support for the pkg-config mechanism. If you have pkg-config, you can compile
|
||||
this program using this command:
|
||||
|
||||
gcc -Wall pcredemo.c `pkg-config --cflags --libs libpcre` -o pcredemo
|
||||
|
||||
If you do not have pkg-config, you may have to use this:
|
||||
|
||||
gcc -Wall pcredemo.c -I/usr/local/include -L/usr/local/lib \e
|
||||
-R/usr/local/lib -lpcre -o pcredemo
|
||||
|
||||
Replace "/usr/local/include" and "/usr/local/lib" with wherever the include and
|
||||
library files for PCRE are installed on your system. Only some operating
|
||||
systems (e.g. Solaris) use the -R option.
|
||||
|
||||
Building under Windows:
|
||||
|
||||
If you want to statically link this program against a non-dll .a file, you must
|
||||
define PCRE_STATIC before including pcre.h, otherwise the pcre_malloc() and
|
||||
pcre_free() exported functions will be declared __declspec(dllimport), with
|
||||
unwanted results. So in this environment, uncomment the following line. */
|
||||
|
||||
/* #define PCRE_STATIC */
|
||||
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
#include <pcre.h>
|
||||
|
||||
#define OVECCOUNT 30 /* should be a multiple of 3 */
|
||||
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
pcre *re;
|
||||
const char *error;
|
||||
char *pattern;
|
||||
char *subject;
|
||||
unsigned char *name_table;
|
||||
unsigned int option_bits;
|
||||
int erroffset;
|
||||
int find_all;
|
||||
int crlf_is_newline;
|
||||
int namecount;
|
||||
int name_entry_size;
|
||||
int ovector[OVECCOUNT];
|
||||
int subject_length;
|
||||
int rc, i;
|
||||
int utf8;
|
||||
|
||||
|
||||
/**************************************************************************
|
||||
* First, sort out the command line. There is only one possible option at *
|
||||
* the moment, "-g" to request repeated matching to find all occurrences, *
|
||||
* like Perl's /g option. We set the variable find_all to a non-zero value *
|
||||
* if the -g option is present. Apart from that, there must be exactly two *
|
||||
* arguments. *
|
||||
**************************************************************************/
|
||||
|
||||
find_all = 0;
|
||||
for (i = 1; i < argc; i++)
|
||||
{
|
||||
if (strcmp(argv[i], "-g") == 0) find_all = 1;
|
||||
else break;
|
||||
}
|
||||
|
||||
/* After the options, we require exactly two arguments, which are the pattern,
|
||||
and the subject string. */
|
||||
|
||||
if (argc - i != 2)
|
||||
{
|
||||
printf("Two arguments required: a regex and a subject string\en");
|
||||
return 1;
|
||||
}
|
||||
|
||||
pattern = argv[i];
|
||||
subject = argv[i+1];
|
||||
subject_length = (int)strlen(subject);
|
||||
|
||||
|
||||
/*************************************************************************
|
||||
* Now we are going to compile the regular expression pattern, and handle *
|
||||
* and errors that are detected. *
|
||||
*************************************************************************/
|
||||
|
||||
re = pcre_compile(
|
||||
pattern, /* the pattern */
|
||||
0, /* default options */
|
||||
&error, /* for error message */
|
||||
&erroffset, /* for error offset */
|
||||
NULL); /* use default character tables */
|
||||
|
||||
/* Compilation failed: print the error message and exit */
|
||||
|
||||
if (re == NULL)
|
||||
{
|
||||
printf("PCRE compilation failed at offset %d: %s\en", erroffset, error);
|
||||
return 1;
|
||||
}
|
||||
|
||||
|
||||
/*************************************************************************
|
||||
* If the compilation succeeded, we call PCRE again, in order to do a *
|
||||
* pattern match against the subject string. This does just ONE match. If *
|
||||
* further matching is needed, it will be done below. *
|
||||
*************************************************************************/
|
||||
|
||||
rc = pcre_exec(
|
||||
re, /* the compiled pattern */
|
||||
NULL, /* no extra data - we didn't study the pattern */
|
||||
subject, /* the subject string */
|
||||
subject_length, /* the length of the subject */
|
||||
0, /* start at offset 0 in the subject */
|
||||
0, /* default options */
|
||||
ovector, /* output vector for substring information */
|
||||
OVECCOUNT); /* number of elements in the output vector */
|
||||
|
||||
/* Matching failed: handle error cases */
|
||||
|
||||
if (rc < 0)
|
||||
{
|
||||
switch(rc)
|
||||
{
|
||||
case PCRE_ERROR_NOMATCH: printf("No match\en"); break;
|
||||
/*
|
||||
Handle other special cases if you like
|
||||
*/
|
||||
default: printf("Matching error %d\en", rc); break;
|
||||
}
|
||||
pcre_free(re); /* Release memory used for the compiled pattern */
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Match succeded */
|
||||
|
||||
printf("\enMatch succeeded at offset %d\en", ovector[0]);
|
||||
|
||||
|
||||
/*************************************************************************
|
||||
* We have found the first match within the subject string. If the output *
|
||||
* vector wasn't big enough, say so. Then output any substrings that were *
|
||||
* captured. *
|
||||
*************************************************************************/
|
||||
|
||||
/* The output vector wasn't big enough */
|
||||
|
||||
if (rc == 0)
|
||||
{
|
||||
rc = OVECCOUNT/3;
|
||||
printf("ovector only has room for %d captured substrings\en", rc - 1);
|
||||
}
|
||||
|
||||
/* Show substrings stored in the output vector by number. Obviously, in a real
|
||||
application you might want to do things other than print them. */
|
||||
|
||||
for (i = 0; i < rc; i++)
|
||||
{
|
||||
char *substring_start = subject + ovector[2*i];
|
||||
int substring_length = ovector[2*i+1] - ovector[2*i];
|
||||
printf("%2d: %.*s\en", i, substring_length, substring_start);
|
||||
}
|
||||
|
||||
|
||||
/**************************************************************************
|
||||
* That concludes the basic part of this demonstration program. We have *
|
||||
* compiled a pattern, and performed a single match. The code that follows *
|
||||
* shows first how to access named substrings, and then how to code for *
|
||||
* repeated matches on the same subject. *
|
||||
**************************************************************************/
|
||||
|
||||
/* See if there are any named substrings, and if so, show them by name. First
|
||||
we have to extract the count of named parentheses from the pattern. */
|
||||
|
||||
(void)pcre_fullinfo(
|
||||
re, /* the compiled pattern */
|
||||
NULL, /* no extra data - we didn't study the pattern */
|
||||
PCRE_INFO_NAMECOUNT, /* number of named substrings */
|
||||
&namecount); /* where to put the answer */
|
||||
|
||||
if (namecount <= 0) printf("No named substrings\en"); else
|
||||
{
|
||||
unsigned char *tabptr;
|
||||
printf("Named substrings\en");
|
||||
|
||||
/* Before we can access the substrings, we must extract the table for
|
||||
translating names to numbers, and the size of each entry in the table. */
|
||||
|
||||
(void)pcre_fullinfo(
|
||||
re, /* the compiled pattern */
|
||||
NULL, /* no extra data - we didn't study the pattern */
|
||||
PCRE_INFO_NAMETABLE, /* address of the table */
|
||||
&name_table); /* where to put the answer */
|
||||
|
||||
(void)pcre_fullinfo(
|
||||
re, /* the compiled pattern */
|
||||
NULL, /* no extra data - we didn't study the pattern */
|
||||
PCRE_INFO_NAMEENTRYSIZE, /* size of each entry in the table */
|
||||
&name_entry_size); /* where to put the answer */
|
||||
|
||||
/* Now we can scan the table and, for each entry, print the number, the name,
|
||||
and the substring itself. */
|
||||
|
||||
tabptr = name_table;
|
||||
for (i = 0; i < namecount; i++)
|
||||
{
|
||||
int n = (tabptr[0] << 8) | tabptr[1];
|
||||
printf("(%d) %*s: %.*s\en", n, name_entry_size - 3, tabptr + 2,
|
||||
ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]);
|
||||
tabptr += name_entry_size;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*************************************************************************
|
||||
* If the "-g" option was given on the command line, we want to continue *
|
||||
* to search for additional matches in the subject string, in a similar *
|
||||
* way to the /g option in Perl. This turns out to be trickier than you *
|
||||
* might think because of the possibility of matching an empty string. *
|
||||
* What happens is as follows: *
|
||||
* *
|
||||
* If the previous match was NOT for an empty string, we can just start *
|
||||
* the next match at the end of the previous one. *
|
||||
* *
|
||||
* If the previous match WAS for an empty string, we can't do that, as it *
|
||||
* would lead to an infinite loop. Instead, a special call of pcre_exec() *
|
||||
* is made with the PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED flags set. *
|
||||
* The first of these tells PCRE that an empty string at the start of the *
|
||||
* subject is not a valid match; other possibilities must be tried. The *
|
||||
* second flag restricts PCRE to one match attempt at the initial string *
|
||||
* position. If this match succeeds, an alternative to the empty string *
|
||||
* match has been found, and we can print it and proceed round the loop, *
|
||||
* advancing by the length of whatever was found. If this match does not *
|
||||
* succeed, we still stay in the loop, advancing by just one character. *
|
||||
* In UTF-8 mode, which can be set by (*UTF8) in the pattern, this may be *
|
||||
* more than one byte. *
|
||||
* *
|
||||
* However, there is a complication concerned with newlines. When the *
|
||||
* newline convention is such that CRLF is a valid newline, we must *
|
||||
* advance by two characters rather than one. The newline convention can *
|
||||
* be set in the regex by (*CR), etc.; if not, we must find the default. *
|
||||
*************************************************************************/
|
||||
|
||||
if (!find_all) /* Check for -g */
|
||||
{
|
||||
pcre_free(re); /* Release the memory used for the compiled pattern */
|
||||
return 0; /* Finish unless -g was given */
|
||||
}
|
||||
|
||||
/* Before running the loop, check for UTF-8 and whether CRLF is a valid newline
|
||||
sequence. First, find the options with which the regex was compiled; extract
|
||||
the UTF-8 state, and mask off all but the newline options. */
|
||||
|
||||
(void)pcre_fullinfo(re, NULL, PCRE_INFO_OPTIONS, &option_bits);
|
||||
utf8 = option_bits & PCRE_UTF8;
|
||||
option_bits &= PCRE_NEWLINE_CR|PCRE_NEWLINE_LF|PCRE_NEWLINE_CRLF|
|
||||
PCRE_NEWLINE_ANY|PCRE_NEWLINE_ANYCRLF;
|
||||
|
||||
/* If no newline options were set, find the default newline convention from the
|
||||
build configuration. */
|
||||
|
||||
if (option_bits == 0)
|
||||
{
|
||||
int d;
|
||||
(void)pcre_config(PCRE_CONFIG_NEWLINE, &d);
|
||||
/* Note that these values are always the ASCII ones, even in
|
||||
EBCDIC environments. CR = 13, NL = 10. */
|
||||
option_bits = (d == 13)? PCRE_NEWLINE_CR :
|
||||
(d == 10)? PCRE_NEWLINE_LF :
|
||||
(d == (13<<8 | 10))? PCRE_NEWLINE_CRLF :
|
||||
(d == -2)? PCRE_NEWLINE_ANYCRLF :
|
||||
(d == -1)? PCRE_NEWLINE_ANY : 0;
|
||||
}
|
||||
|
||||
/* See if CRLF is a valid newline sequence. */
|
||||
|
||||
crlf_is_newline =
|
||||
option_bits == PCRE_NEWLINE_ANY ||
|
||||
option_bits == PCRE_NEWLINE_CRLF ||
|
||||
option_bits == PCRE_NEWLINE_ANYCRLF;
|
||||
|
||||
/* Loop for second and subsequent matches */
|
||||
|
||||
for (;;)
|
||||
{
|
||||
int options = 0; /* Normally no options */
|
||||
int start_offset = ovector[1]; /* Start at end of previous match */
|
||||
|
||||
/* If the previous match was for an empty string, we are finished if we are
|
||||
at the end of the subject. Otherwise, arrange to run another match at the
|
||||
same point to see if a non-empty match can be found. */
|
||||
|
||||
if (ovector[0] == ovector[1])
|
||||
{
|
||||
if (ovector[0] == subject_length) break;
|
||||
options = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED;
|
||||
}
|
||||
|
||||
/* Run the next matching operation */
|
||||
|
||||
rc = pcre_exec(
|
||||
re, /* the compiled pattern */
|
||||
NULL, /* no extra data - we didn't study the pattern */
|
||||
subject, /* the subject string */
|
||||
subject_length, /* the length of the subject */
|
||||
start_offset, /* starting offset in the subject */
|
||||
options, /* options */
|
||||
ovector, /* output vector for substring information */
|
||||
OVECCOUNT); /* number of elements in the output vector */
|
||||
|
||||
/* This time, a result of NOMATCH isn't an error. If the value in "options"
|
||||
is zero, it just means we have found all possible matches, so the loop ends.
|
||||
Otherwise, it means we have failed to find a non-empty-string match at a
|
||||
point where there was a previous empty-string match. In this case, we do what
|
||||
Perl does: advance the matching position by one character, and continue. We
|
||||
do this by setting the "end of previous match" offset, because that is picked
|
||||
up at the top of the loop as the point at which to start again.
|
||||
|
||||
There are two complications: (a) When CRLF is a valid newline sequence, and
|
||||
the current position is just before it, advance by an extra byte. (b)
|
||||
Otherwise we must ensure that we skip an entire UTF-8 character if we are in
|
||||
UTF-8 mode. */
|
||||
|
||||
if (rc == PCRE_ERROR_NOMATCH)
|
||||
{
|
||||
if (options == 0) break; /* All matches found */
|
||||
ovector[1] = start_offset + 1; /* Advance one byte */
|
||||
if (crlf_is_newline && /* If CRLF is newline & */
|
||||
start_offset < subject_length - 1 && /* we are at CRLF, */
|
||||
subject[start_offset] == '\er' &&
|
||||
subject[start_offset + 1] == '\en')
|
||||
ovector[1] += 1; /* Advance by one more. */
|
||||
else if (utf8) /* Otherwise, ensure we */
|
||||
{ /* advance a whole UTF-8 */
|
||||
while (ovector[1] < subject_length) /* character. */
|
||||
{
|
||||
if ((subject[ovector[1]] & 0xc0) != 0x80) break;
|
||||
ovector[1] += 1;
|
||||
}
|
||||
}
|
||||
continue; /* Go round the loop again */
|
||||
}
|
||||
|
||||
/* Other matching errors are not recoverable. */
|
||||
|
||||
if (rc < 0)
|
||||
{
|
||||
printf("Matching error %d\en", rc);
|
||||
pcre_free(re); /* Release memory used for the compiled pattern */
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Match succeded */
|
||||
|
||||
printf("\enMatch succeeded again at offset %d\en", ovector[0]);
|
||||
|
||||
/* The match succeeded, but the output vector wasn't big enough. */
|
||||
|
||||
if (rc == 0)
|
||||
{
|
||||
rc = OVECCOUNT/3;
|
||||
printf("ovector only has room for %d captured substrings\en", rc - 1);
|
||||
}
|
||||
|
||||
/* As before, show substrings stored in the output vector by number, and then
|
||||
also any named substrings. */
|
||||
|
||||
for (i = 0; i < rc; i++)
|
||||
{
|
||||
char *substring_start = subject + ovector[2*i];
|
||||
int substring_length = ovector[2*i+1] - ovector[2*i];
|
||||
printf("%2d: %.*s\en", i, substring_length, substring_start);
|
||||
}
|
||||
|
||||
if (namecount <= 0) printf("No named substrings\en"); else
|
||||
{
|
||||
unsigned char *tabptr = name_table;
|
||||
printf("Named substrings\en");
|
||||
for (i = 0; i < namecount; i++)
|
||||
{
|
||||
int n = (tabptr[0] << 8) | tabptr[1];
|
||||
printf("(%d) %*s: %.*s\en", n, name_entry_size - 3, tabptr + 2,
|
||||
ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]);
|
||||
tabptr += name_entry_size;
|
||||
}
|
||||
}
|
||||
} /* End of loop to find second and subsequent matches */
|
||||
|
||||
printf("\en");
|
||||
pcre_free(re); /* Release memory used for the compiled pattern */
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* End of pcredemo.c */
|
||||
.EE
|
@ -1,4 +1,4 @@
|
||||
.TH PCREGREP 1 "13 September 2012" "PCRE 8.32"
|
||||
.TH PCREGREP 1 "03 April 2014" "PCRE 8.35"
|
||||
.SH NAME
|
||||
pcregrep - a grep with Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
@ -11,9 +11,13 @@ pcregrep - a grep with Perl-compatible regular expressions.
|
||||
grep commands do, but it uses the PCRE regular expression library to support
|
||||
patterns that are compatible with the regular expressions of Perl 5. See
|
||||
.\" HREF
|
||||
\fBpcresyntax\fP(3)
|
||||
.\"
|
||||
for a quick-reference summary of pattern syntax, or
|
||||
.\" HREF
|
||||
\fBpcrepattern\fP(3)
|
||||
.\"
|
||||
for a full description of syntax and semantics of the regular expressions
|
||||
for a full description of the syntax and semantics of the regular expressions
|
||||
that PCRE supports.
|
||||
.P
|
||||
Patterns, whether supplied on the command line or in a separate file, are given
|
||||
@ -674,6 +678,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 13 September 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 03 April 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
.fi
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -1,4 +1,4 @@
|
||||
.TH PCREJIT 3 "31 October 2012" "PCRE 8.32"
|
||||
.TH PCREJIT 3 "17 March 2013" "PCRE 8.33"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE JUST-IN-TIME COMPILER SUPPORT"
|
||||
@ -151,15 +151,9 @@ PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK, PCRE_NO_UTF32_CHECK, PCRE_NOTBOL,
|
||||
PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and
|
||||
PCRE_PARTIAL_SOFT.
|
||||
.P
|
||||
The unsupported pattern items are:
|
||||
.sp
|
||||
\eC match a single byte; not supported in UTF-8 mode
|
||||
(?Cn) callouts
|
||||
(*PRUNE) )
|
||||
(*SKIP) ) backtracking control verbs
|
||||
(*THEN) )
|
||||
.sp
|
||||
Support for some of these may be added in future.
|
||||
The only unsupported pattern items are \eC (match a single data unit) when
|
||||
running in a UTF mode, and a callout immediately before an assertion condition
|
||||
in a conditional group.
|
||||
.
|
||||
.
|
||||
.SH "RETURN VALUES FROM JIT EXECUTION"
|
||||
@ -432,6 +426,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 31 October 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 17 March 2013
|
||||
Copyright (c) 1997-2013 University of Cambridge.
|
||||
.fi
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRELIMITS 3 "24 June 2012" "PCRE 8.30"
|
||||
.TH PCRELIMITS 3 "05 November 2013" "PCRE 8.34"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "SIZE AND OTHER LIMITATIONS"
|
||||
@ -8,9 +8,10 @@ There are some size limitations in PCRE but it is hoped that they will never in
|
||||
practice be relevant.
|
||||
.P
|
||||
The maximum length of a compiled pattern is approximately 64K data units (bytes
|
||||
for the 8-bit library, 32-bit units for the 32-bit library, and 32-bit units for
|
||||
the 32-bit library) if PCRE is compiled with the default internal linkage size
|
||||
of 2 bytes. If you want to process regular expressions that are truly enormous,
|
||||
for the 8-bit library, 16-bit units for the 16-bit library, and 32-bit units for
|
||||
the 32-bit library) if PCRE is compiled with the default internal linkage size,
|
||||
which is 2 bytes for the 8-bit and 16-bit libraries, and 4 bytes for the 32-bit
|
||||
library. If you want to process regular expressions that are truly enormous,
|
||||
you can compile PCRE with an internal linkage size of 3 or 4 (when building the
|
||||
16-bit or 32-bit library, 3 is rounded up to 4). See the \fBREADME\fP file in
|
||||
the source distribution and the
|
||||
@ -23,7 +24,10 @@ However, the speed of execution is slower.
|
||||
All values in repeating quantifiers must be less than 65536.
|
||||
.P
|
||||
There is no limit to the number of parenthesized subpatterns, but there can be
|
||||
no more than 65535 capturing subpatterns.
|
||||
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
||||
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
||||
order to limit the amount of system stack used at compile time. The limit can
|
||||
be specified when PCRE is built; the default is 250.
|
||||
.P
|
||||
There is a limit to the number of forward references to subsequent subpatterns
|
||||
of around 200,000. Repeated forward references with fixed upper limits, for
|
||||
@ -34,7 +38,7 @@ The maximum length of name for a named subpattern is 32 characters, and the
|
||||
maximum number of named subpatterns is 10000.
|
||||
.P
|
||||
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
|
||||
is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit library.
|
||||
is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit libraries.
|
||||
.P
|
||||
The maximum length of a subject string is the largest positive number that an
|
||||
integer variable can hold. However, when using the traditional matching
|
||||
@ -62,6 +66,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 04 May 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 05 November 2013
|
||||
Copyright (c) 1997-2013 University of Cambridge.
|
||||
.fi
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCREMATCHING 3 "08 January 2012" "PCRE 8.30"
|
||||
.TH PCREMATCHING 3 "12 November 2013" "PCRE 8.34"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE MATCHING ALGORITHMS"
|
||||
@ -106,6 +106,14 @@ the three strings "caterpillar", "cater", and "cat" that start at the fifth
|
||||
character of the subject. The algorithm does not automatically move on to find
|
||||
matches that start at later positions.
|
||||
.P
|
||||
PCRE's "auto-possessification" optimization usually applies to character
|
||||
repeats at the end of a pattern (as well as internally). For example, the
|
||||
pattern "a\ed+" is compiled as if it were "a\ed++" because there is no point
|
||||
even considering the possibility of backtracking into the repeated digits. For
|
||||
DFA matching, this means that only one possible match is found. If you really
|
||||
do want multiple matches in such cases, either use an ungreedy repeat
|
||||
("a\ed+?") or set the PCRE_NO_AUTO_POSSESS option when compiling.
|
||||
.P
|
||||
There are a number of features of PCRE regular expressions that are not
|
||||
supported by the alternative matching algorithm. They are as follows:
|
||||
.P
|
||||
@ -201,6 +209,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 08 January 2012
|
||||
Last updated: 12 November 2013
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
.fi
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCREPARTIAL 3 "24 June 2012" "PCRE 8.31"
|
||||
.TH PCREPARTIAL 3 "02 July 2013" "PCRE 8.34"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PARTIAL MATCHING IN PCRE"
|
||||
@ -56,31 +56,34 @@ strings. This optimization is also disabled for partial matching.
|
||||
.rs
|
||||
.sp
|
||||
A partial match occurs during a call to \fBpcre_exec()\fP or
|
||||
\fBpcre[16|32]_exec()\fP when the end of the subject string is reached successfully,
|
||||
but matching cannot continue because more characters are needed. However, at
|
||||
least one character in the subject must have been inspected. This character
|
||||
need not form part of the final matched string; lookbehind assertions and the
|
||||
\eK escape sequence provide ways of inspecting characters before the start of a
|
||||
matched substring. The requirement for inspecting at least one character exists
|
||||
because an empty string can always be matched; without such a restriction there
|
||||
would always be a partial match of an empty string at the end of the subject.
|
||||
\fBpcre[16|32]_exec()\fP when the end of the subject string is reached
|
||||
successfully, but matching cannot continue because more characters are needed.
|
||||
However, at least one character in the subject must have been inspected. This
|
||||
character need not form part of the final matched string; lookbehind assertions
|
||||
and the \eK escape sequence provide ways of inspecting characters before the
|
||||
start of a matched substring. The requirement for inspecting at least one
|
||||
character exists because an empty string can always be matched; without such a
|
||||
restriction there would always be a partial match of an empty string at the end
|
||||
of the subject.
|
||||
.P
|
||||
If there are at least two slots in the offsets vector when a partial match is
|
||||
returned, the first slot is set to the offset of the earliest character that
|
||||
was inspected. For convenience, the second offset points to the end of the
|
||||
subject so that a substring can easily be identified.
|
||||
subject so that a substring can easily be identified. If there are at least
|
||||
three slots in the offsets vector, the third slot is set to the offset of the
|
||||
character where matching started.
|
||||
.P
|
||||
For the majority of patterns, the first offset identifies the start of the
|
||||
partially matched string. However, for patterns that contain lookbehind
|
||||
assertions, or \eK, or begin with \eb or \eB, earlier characters have been
|
||||
inspected while carrying out the match. For example:
|
||||
For the majority of patterns, the contents of the first and third slots will be
|
||||
the same. However, for patterns that contain lookbehind assertions, or begin
|
||||
with \eb or \eB, characters before the one where matching started may have been
|
||||
inspected while carrying out the match. For example, consider this pattern:
|
||||
.sp
|
||||
/(?<=abc)123/
|
||||
.sp
|
||||
This pattern matches "123", but only if it is preceded by "abc". If the subject
|
||||
string is "xyzabc12", the offsets after a partial match are for the substring
|
||||
"abc12", because all these characters are needed if another match is tried
|
||||
with extra characters added to the subject.
|
||||
string is "xyzabc12", the first two offsets after a partial match are for the
|
||||
substring "abc12", because all these characters were inspected. However, the
|
||||
third offset is set to 6, because that is the offset where matching began.
|
||||
.P
|
||||
What happens when a partial match is identified depends on which of the two
|
||||
partial matching options are set.
|
||||
@ -277,6 +280,15 @@ Notice that when the match is complete, only the last part is shown; PCRE does
|
||||
not retain the previously partially-matched string. It is up to the calling
|
||||
program to do that if it needs to.
|
||||
.P
|
||||
That means that, for an unanchored pattern, if a continued match fails, it is
|
||||
not possible to try again at a new starting point. All this facility is capable
|
||||
of doing is continuing with the previous match attempt. In the previous
|
||||
example, if the second set of data is "ug23" the result is no match, even
|
||||
though there would be a match for "aug23" if the entire string were given at
|
||||
once. Depending on the application, this may or may not be what you want.
|
||||
The only way to allow for starting again at the next character is to retain the
|
||||
matched part of the subject and try a new complete match.
|
||||
.P
|
||||
You can set the PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with
|
||||
PCRE_DFA_RESTART to continue partial matching over multiple segments. This
|
||||
facility can be used to pass very long subject strings to the DFA matching
|
||||
@ -308,10 +320,9 @@ processing time is needed.
|
||||
.P
|
||||
\fBNote:\fP If the pattern contains lookbehind assertions, or \eK, or starts
|
||||
with \eb or \eB, the string that is returned for a partial match includes
|
||||
characters that precede the partially matched string itself, because these must
|
||||
be retained when adding on more characters for a subsequent matching attempt.
|
||||
However, in some cases you may need to retain even earlier characters, as
|
||||
discussed in the next section.
|
||||
characters that precede the start of what would be returned for a complete
|
||||
match, because it contains all the characters that were inspected during the
|
||||
partial match.
|
||||
.
|
||||
.
|
||||
.SH "ISSUES WITH MULTI-SEGMENT MATCHING"
|
||||
@ -330,12 +341,32 @@ includes the effect of PCRE_NOTEOL.
|
||||
offsets that are returned for a partial match. However a lookbehind assertion
|
||||
later in the pattern could require even earlier characters to be inspected. You
|
||||
can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the
|
||||
\fBpcre_fullinfo()\fP or \fBpcre[16|32]_fullinfo()\fP functions to obtain the length
|
||||
of the largest lookbehind in the pattern. This length is given in characters,
|
||||
not bytes. If you always retain at least that many characters before the
|
||||
partially matched string, all should be well. (Of course, near the start of the
|
||||
subject, fewer characters may be present; in that case all characters should be
|
||||
retained.)
|
||||
\fBpcre_fullinfo()\fP or \fBpcre[16|32]_fullinfo()\fP functions to obtain the
|
||||
length of the longest lookbehind in the pattern. This length is given in
|
||||
characters, not bytes. If you always retain at least that many characters
|
||||
before the partially matched string, all should be well. (Of course, near the
|
||||
start of the subject, fewer characters may be present; in that case all
|
||||
characters should be retained.)
|
||||
.P
|
||||
From release 8.33, there is a more accurate way of deciding which characters to
|
||||
retain. Instead of subtracting the length of the longest lookbehind from the
|
||||
earliest inspected character (\fIoffsets[0]\fP), the match start position
|
||||
(\fIoffsets[2]\fP) should be used, and the next match attempt started at the
|
||||
\fIoffsets[2]\fP character by setting the \fIstartoffset\fP argument of
|
||||
\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP.
|
||||
.P
|
||||
For example, if the pattern "(?<=123)abc" is partially
|
||||
matched against the string "xx123a", the three offset values returned are 2, 6,
|
||||
and 5. This indicates that the matching process that gave a partial match
|
||||
started at offset 5, but the characters "123a" were all inspected. The maximum
|
||||
lookbehind for that pattern is 3, so taking that away from 5 shows that we need
|
||||
only keep "123a", and the next match attempt can be started at offset 3 (that
|
||||
is, at "a") when further characters have been added. When the match start is
|
||||
not the earliest inspected character, \fBpcretest\fP shows it explicitly:
|
||||
.sp
|
||||
re> "(?<=123)abc"
|
||||
data> xx123a\eP\eP
|
||||
Partial match at offset 5: 123a
|
||||
.P
|
||||
3. Because a partial match must always contain at least one character, what
|
||||
might be considered a partial match of an empty string actually gives a "no
|
||||
@ -440,6 +471,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 24 June 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 02 July 2013
|
||||
Copyright (c) 1997-2013 University of Cambridge.
|
||||
.fi
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -1,25 +1,22 @@
|
||||
.TH PCREPOSIX 3 "09 January 2012" "PCRE 8.30"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions.
|
||||
.SH "SYNOPSIS OF POSIX API"
|
||||
.SH "SYNOPSIS"
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcreposix.h>
|
||||
.PP
|
||||
.SM
|
||||
.nf
|
||||
.B int regcomp(regex_t *\fIpreg\fP, const char *\fIpattern\fP,
|
||||
.ti +5n
|
||||
.B int \fIcflags\fP);
|
||||
.PP
|
||||
.B " int \fIcflags\fP);"
|
||||
.sp
|
||||
.B int regexec(regex_t *\fIpreg\fP, const char *\fIstring\fP,
|
||||
.ti +5n
|
||||
.B size_t \fInmatch\fP, regmatch_t \fIpmatch\fP[], int \fIeflags\fP);
|
||||
.PP
|
||||
.B size_t regerror(int \fIerrcode\fP, const regex_t *\fIpreg\fP,
|
||||
.ti +5n
|
||||
.B char *\fIerrbuf\fP, size_t \fIerrbuf_size\fP);
|
||||
.PP
|
||||
.B " size_t \fInmatch\fP, regmatch_t \fIpmatch\fP[], int \fIeflags\fP);"
|
||||
.B " size_t regerror(int \fIerrcode\fP, const regex_t *\fIpreg\fP,"
|
||||
.B " char *\fIerrbuf\fP, size_t \fIerrbuf_size\fP);"
|
||||
.sp
|
||||
.B void regfree(regex_t *\fIpreg\fP);
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCREPRECOMPILE 3 "24 June 2012" "PCRE 8.30"
|
||||
.TH PCREPRECOMPILE 3 "12 November 2013" "PCRE 8.34"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "SAVING AND RE-USING PRECOMPILED PCRE PATTERNS"
|
||||
@ -90,8 +90,8 @@ study data.
|
||||
.rs
|
||||
.sp
|
||||
Re-using a precompiled pattern is straightforward. Having reloaded it into main
|
||||
memory, called \fBpcre[16|32]_pattern_to_host_byte_order()\fP if necessary,
|
||||
you pass its pointer to \fBpcre[16|32]_exec()\fP or \fBpcre[16|32]_dfa_exec()\fP in
|
||||
memory, called \fBpcre[16|32]_pattern_to_host_byte_order()\fP if necessary, you
|
||||
pass its pointer to \fBpcre[16|32]_exec()\fP or \fBpcre[16|32]_dfa_exec()\fP in
|
||||
the usual way.
|
||||
.P
|
||||
However, if you passed a pointer to custom character tables when the pattern
|
||||
@ -110,15 +110,19 @@ in the
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
\fBWarning:\fP The tables that \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP use
|
||||
must be the same as those that were used when the pattern was compiled. If this
|
||||
is not the case, the behaviour is undefined.
|
||||
.P
|
||||
If you did not provide custom character tables when the pattern was compiled,
|
||||
the pointer in the compiled pattern is NULL, which causes the matching
|
||||
functions to use PCRE's internal tables. Thus, you do not need to take any
|
||||
special action at run time in this case.
|
||||
.P
|
||||
If you saved study data with the compiled pattern, you need to create your own
|
||||
\fBpcre[16|32]_extra\fP data block and set the \fIstudy_data\fP field to point to the
|
||||
reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in the
|
||||
\fIflags\fP field to indicate that study data is present. Then pass the
|
||||
\fBpcre[16|32]_extra\fP data block and set the \fIstudy_data\fP field to point
|
||||
to the reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in
|
||||
the \fIflags\fP field to indicate that study data is present. Then pass the
|
||||
\fBpcre[16|32]_extra\fP block to the matching function in the usual way. If the
|
||||
pattern was studied for just-in-time optimization, that data cannot be saved,
|
||||
and so is lost by a save/restore cycle.
|
||||
@ -146,6 +150,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 24 June 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 12 November 2013
|
||||
Copyright (c) 1997-2013 University of Cambridge.
|
||||
.fi
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRESYNTAX 3 "11 November 2012" "PCRE 8.32"
|
||||
.TH PCRESYNTAX 3 "08 January 2014" "PCRE 8.35"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||
@ -29,9 +29,14 @@ documentation. This document contains a quick-reference summary of the syntax.
|
||||
\en newline (hex 0A)
|
||||
\er carriage return (hex 0D)
|
||||
\et tab (hex 09)
|
||||
\e0dd character with octal code 0dd
|
||||
\eddd character with octal code ddd, or backreference
|
||||
\eo{ddd..} character with octal code ddd..
|
||||
\exhh character with hex code hh
|
||||
\ex{hhh..} character with hex code hhh..
|
||||
.sp
|
||||
Note that \e0dd is always an octal code, and that \e8 and \e9 are the literal
|
||||
characters "8" and "9".
|
||||
.
|
||||
.
|
||||
.SH "CHARACTER TYPES"
|
||||
@ -56,9 +61,11 @@ documentation. This document contains a quick-reference summary of the syntax.
|
||||
\eW a "non-word" character
|
||||
\eX a Unicode extended grapheme cluster
|
||||
.sp
|
||||
In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
|
||||
characters, even in a UTF mode. However, this can be changed by setting the
|
||||
PCRE_UCP option.
|
||||
By default, \ed, \es, and \ew match only ASCII characters, even in UTF-8 mode
|
||||
or in the 16- bit and 32-bit libraries. However, if locale-specific matching is
|
||||
happening, \es and \ew may also match characters with code points in the range
|
||||
128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences
|
||||
is changed to use Unicode properties and they match many more characters.
|
||||
.
|
||||
.
|
||||
.SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
|
||||
@ -115,8 +122,13 @@ PCRE_UCP option.
|
||||
.sp
|
||||
Xan Alphanumeric: union of properties L and N
|
||||
Xps POSIX space: property Z or tab, NL, VT, FF, CR
|
||||
Xsp Perl space: property Z or tab, NL, FF, CR
|
||||
Xsp Perl space: property Z or tab, NL, VT, FF, CR
|
||||
Xuc Univerally-named character: one that can be
|
||||
represented by a Universal Character Name
|
||||
Xwd Perl word: property Xan or underscore
|
||||
.sp
|
||||
Perl and POSIX space are now the same. Perl added VT to its space character set
|
||||
at release 5.18 and PCRE changed at release 8.34.
|
||||
.
|
||||
.
|
||||
.SH "SCRIPT NAMES FOR \ep AND \eP"
|
||||
@ -297,6 +309,8 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
|
||||
.rs
|
||||
.sp
|
||||
\eK reset start of match
|
||||
.sp
|
||||
\eK is honoured in positive assertions, but ignored in negative ones.
|
||||
.
|
||||
.
|
||||
.SH "ALTERNATION"
|
||||
@ -342,15 +356,45 @@ but some of them use Unicode properties if PCRE_UCP is set. You can use
|
||||
(?x) extended (ignore white space)
|
||||
(?-...) unset option(s)
|
||||
.sp
|
||||
The following are recognized only at the start of a pattern or after one of the
|
||||
newline-setting options with similar syntax:
|
||||
The following are recognized only at the very start of a pattern or after one
|
||||
of the newline or \eR options with similar syntax. More than one of them may
|
||||
appear.
|
||||
.sp
|
||||
(*LIMIT_MATCH=d) set the match limit to d (decimal number)
|
||||
(*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
|
||||
(*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS)
|
||||
(*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
|
||||
(*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
|
||||
(*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
|
||||
(*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32)
|
||||
(*UTF) set appropriate UTF mode for the library in use
|
||||
(*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
|
||||
.sp
|
||||
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
|
||||
limits set by the caller of pcre_exec(), not increase them.
|
||||
.
|
||||
.
|
||||
.SH "NEWLINE CONVENTION"
|
||||
.rs
|
||||
.sp
|
||||
These are recognized only at the very start of the pattern or after option
|
||||
settings with a similar syntax.
|
||||
.sp
|
||||
(*CR) carriage return only
|
||||
(*LF) linefeed only
|
||||
(*CRLF) carriage return followed by linefeed
|
||||
(*ANYCRLF) all three of the above
|
||||
(*ANY) any Unicode newline sequence
|
||||
.
|
||||
.
|
||||
.SH "WHAT \eR MATCHES"
|
||||
.rs
|
||||
.sp
|
||||
These are recognized only at the very start of the pattern or after option
|
||||
setting with a similar syntax.
|
||||
.sp
|
||||
(*BSR_ANYCRLF) CR, LF, or CRLF
|
||||
(*BSR_UNICODE) any Unicode newline sequence
|
||||
.
|
||||
.
|
||||
.SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
|
||||
@ -440,29 +484,6 @@ pattern is not anchored.
|
||||
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
|
||||
.
|
||||
.
|
||||
.SH "NEWLINE CONVENTIONS"
|
||||
.rs
|
||||
.sp
|
||||
These are recognized only at the very start of the pattern or after a
|
||||
(*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
|
||||
.sp
|
||||
(*CR) carriage return only
|
||||
(*LF) linefeed only
|
||||
(*CRLF) carriage return followed by linefeed
|
||||
(*ANYCRLF) all three of the above
|
||||
(*ANY) any Unicode newline sequence
|
||||
.
|
||||
.
|
||||
.SH "WHAT \eR MATCHES"
|
||||
.rs
|
||||
.sp
|
||||
These are recognized only at the very start of the pattern or after a
|
||||
(*...) option that sets the newline convention or a UTF or UCP mode.
|
||||
.sp
|
||||
(*BSR_ANYCRLF) CR, LF, or CRLF
|
||||
(*BSR_UNICODE) any Unicode newline sequence
|
||||
.
|
||||
.
|
||||
.SH "CALLOUTS"
|
||||
.rs
|
||||
.sp
|
||||
@ -491,6 +512,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 11 November 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 08 January 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
.fi
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRETEST 1 "10 September 2012" "PCRE 8.32"
|
||||
.TH PCRETEST 1 "09 February 2014" "PCRE 8.35"
|
||||
.SH NAME
|
||||
pcretest - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
@ -40,23 +40,34 @@ PCRE, and are unlikely to be of use otherwise. They are all documented here,
|
||||
but without much justification.
|
||||
.
|
||||
.
|
||||
.SH "INPUT DATA FORMAT"
|
||||
.rs
|
||||
.sp
|
||||
Input to \fBpcretest\fP is processed line by line, either by calling the C
|
||||
library's \fBfgets()\fP function, or via the \fBlibreadline\fP library (see
|
||||
below). In Unix-like environments, \fBfgets()\fP treats any bytes other than
|
||||
newline as data characters. However, in some Windows environments character 26
|
||||
(hex 1A) causes an immediate end of file, and no further data is read. For
|
||||
maximum portability, therefore, it is safest to use only ASCII characters in
|
||||
\fBpcretest\fP input files.
|
||||
.
|
||||
.
|
||||
.SH "PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES"
|
||||
.rs
|
||||
.sp
|
||||
From release 8.30, two separate PCRE libraries can be built. The original one
|
||||
supports 8-bit character strings, whereas the newer 16-bit library supports
|
||||
character strings encoded in 16-bit units. From release 8.32, a third
|
||||
library can be built, supporting character strings encoded in 32-bit units.
|
||||
The \fBpcretest\fP program can be
|
||||
used to test all three libraries. However, it is itself still an 8-bit program,
|
||||
reading 8-bit input and writing 8-bit output. When testing the 16-bit or 32-bit
|
||||
library, the patterns and data strings are converted to 16- or 32-bit format
|
||||
before being passed to the PCRE library functions. Results are converted to
|
||||
8-bit for output.
|
||||
character strings encoded in 16-bit units. From release 8.32, a third library
|
||||
can be built, supporting character strings encoded in 32-bit units. The
|
||||
\fBpcretest\fP program can be used to test all three libraries. However, it is
|
||||
itself still an 8-bit program, reading 8-bit input and writing 8-bit output.
|
||||
When testing the 16-bit or 32-bit library, the patterns and data strings are
|
||||
converted to 16- or 32-bit format before being passed to the PCRE library
|
||||
functions. Results are converted to 8-bit for output.
|
||||
.P
|
||||
References to functions and structures of the form \fBpcre[16|32]_xx\fP below
|
||||
mean "\fBpcre_xx\fP when using the 8-bit library or \fBpcre16_xx\fP when using
|
||||
the 16-bit library".
|
||||
mean "\fBpcre_xx\fP when using the 8-bit library, \fBpcre16_xx\fP when using
|
||||
the 16-bit library, or \fBpcre32_xx\fP when using the 32-bit library".
|
||||
.
|
||||
.
|
||||
.SH "COMMAND LINE OPTIONS"
|
||||
@ -85,22 +96,29 @@ internal form is output after compilation.
|
||||
.TP 10
|
||||
\fB-C\fP
|
||||
Output the version number of the PCRE library, and all available information
|
||||
about the optional features that are included, and then exit. All other options
|
||||
are ignored.
|
||||
about the optional features that are included, and then exit with zero exit
|
||||
code. All other options are ignored.
|
||||
.TP 10
|
||||
\fB-C\fP \fIoption\fP
|
||||
Output information about a specific build-time option, then exit. This
|
||||
functionality is intended for use in scripts such as \fBRunTest\fP. The
|
||||
following options output the value indicated:
|
||||
following options output the value and set the exit code as indicated:
|
||||
.sp
|
||||
ebcdic-nl the code for LF (= NL) in an EBCDIC environment:
|
||||
0x15 or 0x25
|
||||
0 if used in an ASCII environment
|
||||
linksize the internal link size (2, 3, or 4)
|
||||
exit code is always 0
|
||||
linksize the configured internal link size (2, 3, or 4)
|
||||
exit code is set to the link size
|
||||
newline the default newline setting:
|
||||
CR, LF, CRLF, ANYCRLF, or ANY
|
||||
exit code is always 0
|
||||
bsr the default setting for what \eR matches:
|
||||
ANYCRLF or ANY
|
||||
exit code is always 0
|
||||
.sp
|
||||
The following options output 1 for true or zero for false:
|
||||
The following options output 1 for true or 0 for false, and set the exit code
|
||||
to the same value:
|
||||
.sp
|
||||
ebcdic compiled for an EBCDIC environment
|
||||
jit just-in-time support is available
|
||||
@ -108,7 +126,10 @@ The following options output 1 for true or zero for false:
|
||||
pcre32 the 32-bit library was built
|
||||
pcre8 the 8-bit library was built
|
||||
ucp Unicode property support is available
|
||||
utf UTF-8 and/or UTF-16 and/or UTF-32 support is available
|
||||
utf UTF-8 and/or UTF-16 and/or UTF-32 support
|
||||
is available
|
||||
.sp
|
||||
If an unknown option is given, an error message is output; the exit code is 0.
|
||||
.TP 10
|
||||
\fB-d\fP
|
||||
Behave as if each pattern has the \fB/D\fP (debug) modifier; the internal
|
||||
@ -137,6 +158,10 @@ Output the size of each compiled pattern after it has been compiled. This is
|
||||
equivalent to adding \fB/M\fP to each regular expression. The size is given in
|
||||
bytes for both libraries.
|
||||
.TP 10
|
||||
\fB-O\fP
|
||||
Behave as if each pattern has the \fB/O\fP modifier, that is disable
|
||||
auto-possessification for all patterns.
|
||||
.TP 10
|
||||
\fB-o\fP \fIosize\fP
|
||||
Set the number of elements in the output vector that is used when calling
|
||||
\fBpcre[16|32]_exec()\fP or \fBpcre[16|32]_dfa_exec()\fP to be \fIosize\fP. The
|
||||
@ -198,17 +223,21 @@ contains (*MARK) items there may also be differences, for the same reason. The
|
||||
should never be studied (see the \fB/S\fP pattern modifier below).
|
||||
.TP 10
|
||||
\fB-t\fP
|
||||
Run each compile, study, and match many times with a timer, and output
|
||||
resulting time per compile or match (in milliseconds). Do not set \fB-m\fP with
|
||||
\fB-t\fP, because you will then get the size output a zillion times, and the
|
||||
timing will be distorted. You can control the number of iterations that are
|
||||
used for timing by following \fB-t\fP with a number (as a separate item on the
|
||||
command line). For example, "-t 1000" would iterate 1000 times. The default is
|
||||
to iterate 500000 times.
|
||||
Run each compile, study, and match many times with a timer, and output the
|
||||
resulting times per compile, study, or match (in milliseconds). Do not set
|
||||
\fB-m\fP with \fB-t\fP, because you will then get the size output a zillion
|
||||
times, and the timing will be distorted. You can control the number of
|
||||
iterations that are used for timing by following \fB-t\fP with a number (as a
|
||||
separate item on the command line). For example, "-t 1000" iterates 1000 times.
|
||||
The default is to iterate 500000 times.
|
||||
.TP 10
|
||||
\fB-tm\fP
|
||||
This is like \fB-t\fP except that it times only the matching phase, not the
|
||||
compile or study phases.
|
||||
.TP 10
|
||||
\fB-T\fP \fB-TM\fP
|
||||
These behave like \fB-t\fP and \fB-tm\fP, but in addition, at the end of a run,
|
||||
the total times for all compiles, studies, and matches are output.
|
||||
.
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
@ -228,7 +257,7 @@ option states whether or not \fBreadline()\fP will be used.
|
||||
.P
|
||||
The program handles any number of sets of input on a single input file. Each
|
||||
set starts with a regular expression, and continues with any number of data
|
||||
lines to be matched against the pattern.
|
||||
lines to be matched against that pattern.
|
||||
.P
|
||||
Each data line is matched separately and independently. If you want to do
|
||||
multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
|
||||
@ -280,6 +309,7 @@ fall into several groups that are described in detail in the following
|
||||
sections.
|
||||
.sp
|
||||
\fB/8\fP set UTF mode
|
||||
\fB/9\fP set PCRE_NEVER_UTF (locks out UTF mode)
|
||||
\fB/?\fP disable UTF validity check
|
||||
\fB/+\fP show remainder of subject after match
|
||||
\fB/=\fP show all captures (not just those that are set)
|
||||
@ -301,7 +331,9 @@ sections.
|
||||
\fB/M\fP show compiled memory size
|
||||
\fB/m\fP set PCRE_MULTILINE
|
||||
\fB/N\fP set PCRE_NO_AUTO_CAPTURE
|
||||
\fB/O\fP set PCRE_NO_AUTO_POSSESS
|
||||
\fB/P\fP use the POSIX wrapper
|
||||
\fB/Q\fP test external stack check function
|
||||
\fB/S\fP study the pattern after compilation
|
||||
\fB/s\fP set PCRE_DOTALL
|
||||
\fB/T\fP select character tables
|
||||
@ -350,12 +382,14 @@ options that do not correspond to anything in Perl:
|
||||
\fB/8\fP PCRE_UTF32 ) when using the 32-bit
|
||||
\fB/?\fP PCRE_NO_UTF32_CHECK ) library
|
||||
.sp
|
||||
\fB/9\fP PCRE_NEVER_UTF
|
||||
\fB/A\fP PCRE_ANCHORED
|
||||
\fB/C\fP PCRE_AUTO_CALLOUT
|
||||
\fB/E\fP PCRE_DOLLAR_ENDONLY
|
||||
\fB/f\fP PCRE_FIRSTLINE
|
||||
\fB/J\fP PCRE_DUPNAMES
|
||||
\fB/N\fP PCRE_NO_AUTO_CAPTURE
|
||||
\fB/O\fP PCRE_NO_AUTO_POSSESS
|
||||
\fB/U\fP PCRE_UNGREEDY
|
||||
\fB/W\fP PCRE_UCP
|
||||
\fB/X\fP PCRE_EXTRA
|
||||
@ -453,7 +487,10 @@ below.
|
||||
The \fB/I\fP modifier requests that \fBpcretest\fP output information about the
|
||||
compiled pattern (whether it is anchored, has a fixed first character, and
|
||||
so on). It does this by calling \fBpcre[16|32]_fullinfo()\fP after compiling a
|
||||
pattern. If the pattern is studied, the results of that are also output.
|
||||
pattern. If the pattern is studied, the results of that are also output. In
|
||||
this output, the word "char" means a non-UTF character, that is, the value of a
|
||||
single data item (8-bit, 16-bit, or 32-bit, depending on the library that is
|
||||
being tested).
|
||||
.P
|
||||
The \fB/K\fP modifier requests \fBpcretest\fP to show names from backtracking
|
||||
control verbs that are returned from calls to \fBpcre[16|32]_exec()\fP. It causes
|
||||
@ -483,13 +520,22 @@ the compiled pattern to be output. This does not include the size of the
|
||||
successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the
|
||||
JIT compiled code is also output.
|
||||
.P
|
||||
The \fB/Q\fP modifier is used to test the use of \fBpcre_stack_guard\fP. It
|
||||
must be followed by '0' or '1', specifying the return code to be given from an
|
||||
external function that is passed to PCRE and used for stack checking during
|
||||
compilation (see the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation for details).
|
||||
.P
|
||||
The \fB/S\fP modifier causes \fBpcre[16|32]_study()\fP to be called after the
|
||||
expression has been compiled, and the results used when the expression is
|
||||
matched. There are a number of qualifying characters that may follow \fB/S\fP.
|
||||
They may appear in any order.
|
||||
.P
|
||||
If \fBS\fP is followed by an exclamation mark, \fBpcre[16|32]_study()\fP is called
|
||||
with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
|
||||
If \fB/S\fP is followed by an exclamation mark, \fBpcre[16|32]_study()\fP is
|
||||
called with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
|
||||
\fBpcre_extra\fP block, even when studying discovers no useful information.
|
||||
.P
|
||||
If \fB/S\fP is followed by a second S character, it suppresses studying, even
|
||||
@ -565,6 +611,37 @@ The \fB/+\fP modifier works as described above. All other modifiers are
|
||||
ignored.
|
||||
.
|
||||
.
|
||||
.SS "Locking out certain modifiers"
|
||||
.rs
|
||||
.sp
|
||||
PCRE can be compiled with or without support for certain features such as
|
||||
UTF-8/16/32 or Unicode properties. Accordingly, the standard tests are split up
|
||||
into a number of different files that are selected for running depending on
|
||||
which features are available. When updating the tests, it is all too easy to
|
||||
put a new test into the wrong file by mistake; for example, to put a test that
|
||||
requires UTF support into a file that is used when it is not available. To help
|
||||
detect such mistakes as early as possible, there is a facility for locking out
|
||||
specific modifiers. If an input line for \fBpcretest\fP starts with the string
|
||||
"< forbid " the following sequence of characters is taken as a list of
|
||||
forbidden modifiers. For example, in the test files that must not use UTF or
|
||||
Unicode property support, this line appears:
|
||||
.sp
|
||||
< forbid 8W
|
||||
.sp
|
||||
This locks out the /8 and /W modifiers. An immediate error is given if they are
|
||||
subsequently encountered. If the character string contains < but not >, all the
|
||||
multi-character modifiers that begin with < are locked out. Otherwise, such
|
||||
modifiers must be explicitly listed, for example:
|
||||
.sp
|
||||
< forbid <JS><cr>
|
||||
.sp
|
||||
There must be a single space between < and "forbid" for this feature to be
|
||||
recognised. If there is not, the line is interpreted either as a request to
|
||||
re-load a pre-compiled pattern (see "SAVING AND RELOADING COMPILED PATTERNS"
|
||||
below) or, if there is a another < character, as a pattern that uses < as its
|
||||
delimiter.
|
||||
.
|
||||
.
|
||||
.SH "DATA LINES"
|
||||
.rs
|
||||
.sp
|
||||
@ -588,6 +665,7 @@ recognized:
|
||||
\ev vertical tab (\ex0b)
|
||||
\ennn octal character (up to 3 octal digits); always
|
||||
a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
|
||||
\eo{dd...} octal character (any number of octal digits}
|
||||
\exhh hexadecimal byte (up to 2 hex digits)
|
||||
\ex{hh...} hexadecimal character (any number of hex digits)
|
||||
.\" JOIN
|
||||
@ -1011,10 +1089,9 @@ exact copy of the compiled pattern. If there is additional study data, this
|
||||
writing the file, \fBpcretest\fP expects to read a new pattern.
|
||||
.P
|
||||
A saved pattern can be reloaded into \fBpcretest\fP by specifying < and a file
|
||||
name instead of a pattern. The name of the file must not contain a < character,
|
||||
as otherwise \fBpcretest\fP will interpret the line as a pattern delimited by <
|
||||
characters.
|
||||
For example:
|
||||
name instead of a pattern. There must be no space between < and the file name,
|
||||
which must not contain a < character, as otherwise \fBpcretest\fP will
|
||||
interpret the line as a pattern delimited by < characters. For example:
|
||||
.sp
|
||||
re> </some/file
|
||||
Compiled pattern loaded from /some/file
|
||||
@ -1074,6 +1151,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 10 September 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 09 February 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
.fi
|
||||
|
@ -1,10 +1,10 @@
|
||||
PCRETEST(1) PCRETEST(1)
|
||||
PCRETEST(1) General Commands Manual PCRETEST(1)
|
||||
|
||||
|
||||
|
||||
NAME
|
||||
pcretest - a program for testing Perl-compatible regular expressions.
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
|
||||
pcretest [options] [input file [output file]]
|
||||
@ -29,22 +29,33 @@ SYNOPSIS
|
||||
They are all documented here, but without much justification.
|
||||
|
||||
|
||||
INPUT DATA FORMAT
|
||||
|
||||
Input to pcretest is processed line by line, either by calling the C
|
||||
library's fgets() function, or via the libreadline library (see below).
|
||||
In Unix-like environments, fgets() treats any bytes other than newline
|
||||
as data characters. However, in some Windows environments character 26
|
||||
(hex 1A) causes an immediate end of file, and no further data is read.
|
||||
For maximum portability, therefore, it is safest to use only ASCII
|
||||
characters in pcretest input files.
|
||||
|
||||
|
||||
PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
||||
|
||||
From release 8.30, two separate PCRE libraries can be built. The origi-
|
||||
nal one supports 8-bit character strings, whereas the newer 16-bit
|
||||
library supports character strings encoded in 16-bit units. From
|
||||
release 8.32, a third library can be built, supporting character
|
||||
strings encoded in 32-bit units. The pcretest program can be used to
|
||||
strings encoded in 32-bit units. The pcretest program can be used to
|
||||
test all three libraries. However, it is itself still an 8-bit program,
|
||||
reading 8-bit input and writing 8-bit output. When testing the 16-bit
|
||||
reading 8-bit input and writing 8-bit output. When testing the 16-bit
|
||||
or 32-bit library, the patterns and data strings are converted to 16-
|
||||
or 32-bit format before being passed to the PCRE library functions.
|
||||
Results are converted to 8-bit for output.
|
||||
|
||||
References to functions and structures of the form pcre[16|32]_xx below
|
||||
mean "pcre_xx when using the 8-bit library or pcre16_xx when using the
|
||||
16-bit library".
|
||||
mean "pcre_xx when using the 8-bit library, pcre16_xx when using the
|
||||
16-bit library, or pcre32_xx when using the 32-bit library".
|
||||
|
||||
|
||||
COMMAND LINE OPTIONS
|
||||
@ -71,20 +82,29 @@ COMMAND LINE OPTIONS
|
||||
|
||||
-C Output the version number of the PCRE library, and all avail-
|
||||
able information about the optional features that are
|
||||
included, and then exit. All other options are ignored.
|
||||
included, and then exit with zero exit code. All other
|
||||
options are ignored.
|
||||
|
||||
-C option Output information about a specific build-time option, then
|
||||
exit. This functionality is intended for use in scripts such
|
||||
as RunTest. The following options output the value indicated:
|
||||
-C option Output information about a specific build-time option, then
|
||||
exit. This functionality is intended for use in scripts such
|
||||
as RunTest. The following options output the value and set
|
||||
the exit code as indicated:
|
||||
|
||||
ebcdic-nl the code for LF (= NL) in an EBCDIC environment:
|
||||
0x15 or 0x25
|
||||
0 if used in an ASCII environment
|
||||
linksize the internal link size (2, 3, or 4)
|
||||
exit code is always 0
|
||||
linksize the configured internal link size (2, 3, or 4)
|
||||
exit code is set to the link size
|
||||
newline the default newline setting:
|
||||
CR, LF, CRLF, ANYCRLF, or ANY
|
||||
exit code is always 0
|
||||
bsr the default setting for what \R matches:
|
||||
ANYCRLF or ANY
|
||||
exit code is always 0
|
||||
|
||||
The following options output 1 for true or zero for false:
|
||||
The following options output 1 for true or 0 for false, and
|
||||
set the exit code to the same value:
|
||||
|
||||
ebcdic compiled for an EBCDIC environment
|
||||
jit just-in-time support is available
|
||||
@ -92,32 +112,38 @@ COMMAND LINE OPTIONS
|
||||
pcre32 the 32-bit library was built
|
||||
pcre8 the 8-bit library was built
|
||||
ucp Unicode property support is available
|
||||
utf UTF-8 and/or UTF-16 and/or UTF-32 support is
|
||||
available
|
||||
utf UTF-8 and/or UTF-16 and/or UTF-32 support
|
||||
is available
|
||||
|
||||
-d Behave as if each pattern has the /D (debug) modifier; the
|
||||
internal form and information about the compiled pattern is
|
||||
If an unknown option is given, an error message is output;
|
||||
the exit code is 0.
|
||||
|
||||
-d Behave as if each pattern has the /D (debug) modifier; the
|
||||
internal form and information about the compiled pattern is
|
||||
output after compilation; -d is equivalent to -b -i.
|
||||
|
||||
-dfa Behave as if each data line contains the \D escape sequence;
|
||||
-dfa Behave as if each data line contains the \D escape sequence;
|
||||
this causes the alternative matching function,
|
||||
pcre[16|32]_dfa_exec(), to be used instead of the standard
|
||||
pcre[16|32]_dfa_exec(), to be used instead of the standard
|
||||
pcre[16|32]_exec() function (more detail is given below).
|
||||
|
||||
-help Output a brief summary these options and then exit.
|
||||
|
||||
-i Behave as if each pattern has the /I modifier; information
|
||||
-i Behave as if each pattern has the /I modifier; information
|
||||
about the compiled pattern is given after compilation.
|
||||
|
||||
-M Behave as if each data line contains the \M escape sequence;
|
||||
this causes PCRE to discover the minimum MATCH_LIMIT and
|
||||
MATCH_LIMIT_RECURSION settings by calling pcre[16|32]_exec()
|
||||
-M Behave as if each data line contains the \M escape sequence;
|
||||
this causes PCRE to discover the minimum MATCH_LIMIT and
|
||||
MATCH_LIMIT_RECURSION settings by calling pcre[16|32]_exec()
|
||||
repeatedly with different limits.
|
||||
|
||||
-m Output the size of each compiled pattern after it has been
|
||||
compiled. This is equivalent to adding /M to each regular
|
||||
-m Output the size of each compiled pattern after it has been
|
||||
compiled. This is equivalent to adding /M to each regular
|
||||
expression. The size is given in bytes for both libraries.
|
||||
|
||||
-O Behave as if each pattern has the /O modifier, that is dis-
|
||||
able auto-possessification for all patterns.
|
||||
|
||||
-o osize Set the number of elements in the output vector that is used
|
||||
when calling pcre[16|32]_exec() or pcre[16|32]_dfa_exec() to
|
||||
be osize. The default value is 45, which is enough for 14
|
||||
@ -183,17 +209,21 @@ COMMAND LINE OPTIONS
|
||||
tern modifier below).
|
||||
|
||||
-t Run each compile, study, and match many times with a timer,
|
||||
and output resulting time per compile or match (in millisec-
|
||||
onds). Do not set -m with -t, because you will then get the
|
||||
size output a zillion times, and the timing will be dis-
|
||||
torted. You can control the number of iterations that are
|
||||
used for timing by following -t with a number (as a separate
|
||||
item on the command line). For example, "-t 1000" would iter-
|
||||
ate 1000 times. The default is to iterate 500000 times.
|
||||
and output the resulting times per compile, study, or match
|
||||
(in milliseconds). Do not set -m with -t, because you will
|
||||
then get the size output a zillion times, and the timing will
|
||||
be distorted. You can control the number of iterations that
|
||||
are used for timing by following -t with a number (as a sepa-
|
||||
rate item on the command line). For example, "-t 1000" iter-
|
||||
ates 1000 times. The default is to iterate 500000 times.
|
||||
|
||||
-tm This is like -t except that it times only the matching phase,
|
||||
not the compile or study phases.
|
||||
|
||||
-T -TM These behave like -t and -tm, but in addition, at the end of
|
||||
a run, the total times for all compiles, studies, and matches
|
||||
are output.
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
|
||||
@ -212,7 +242,7 @@ DESCRIPTION
|
||||
|
||||
The program handles any number of sets of input on a single input file.
|
||||
Each set starts with a regular expression, and continues with any num-
|
||||
ber of data lines to be matched against the pattern.
|
||||
ber of data lines to be matched against that pattern.
|
||||
|
||||
Each data line is matched separately and independently. If you want to
|
||||
do multi-line matches, you have to use the \n escape sequence (or \r or
|
||||
@ -265,6 +295,7 @@ PATTERN MODIFIERS
|
||||
groups that are described in detail in the following sections.
|
||||
|
||||
/8 set UTF mode
|
||||
/9 set PCRE_NEVER_UTF (locks out UTF mode)
|
||||
/? disable UTF validity check
|
||||
/+ show remainder of subject after match
|
||||
/= show all captures (not just those that are set)
|
||||
@ -286,7 +317,9 @@ PATTERN MODIFIERS
|
||||
/M show compiled memory size
|
||||
/m set PCRE_MULTILINE
|
||||
/N set PCRE_NO_AUTO_CAPTURE
|
||||
/O set PCRE_NO_AUTO_POSSESS
|
||||
/P use the POSIX wrapper
|
||||
/Q test external stack check function
|
||||
/S study the pattern after compilation
|
||||
/s set PCRE_DOTALL
|
||||
/T select character tables
|
||||
@ -331,12 +364,14 @@ PATTERN MODIFIERS
|
||||
/8 PCRE_UTF32 ) when using the 32-bit
|
||||
/? PCRE_NO_UTF32_CHECK ) library
|
||||
|
||||
/9 PCRE_NEVER_UTF
|
||||
/A PCRE_ANCHORED
|
||||
/C PCRE_AUTO_CALLOUT
|
||||
/E PCRE_DOLLAR_ENDONLY
|
||||
/f PCRE_FIRSTLINE
|
||||
/J PCRE_DUPNAMES
|
||||
/N PCRE_NO_AUTO_CAPTURE
|
||||
/O PCRE_NO_AUTO_POSSESS
|
||||
/U PCRE_UNGREEDY
|
||||
/W PCRE_UCP
|
||||
/X PCRE_EXTRA
|
||||
@ -431,7 +466,9 @@ PATTERN MODIFIERS
|
||||
compiled pattern (whether it is anchored, has a fixed first character,
|
||||
and so on). It does this by calling pcre[16|32]_fullinfo() after com-
|
||||
piling a pattern. If the pattern is studied, the results of that are
|
||||
also output.
|
||||
also output. In this output, the word "char" means a non-UTF character,
|
||||
that is, the value of a single data item (8-bit, 16-bit, or 32-bit,
|
||||
depending on the library that is being tested).
|
||||
|
||||
The /K modifier requests pcretest to show names from backtracking con-
|
||||
trol verbs that are returned from calls to pcre[16|32]_exec(). It
|
||||
@ -462,26 +499,31 @@ PATTERN MODIFIERS
|
||||
pattern is successfully studied with the PCRE_STUDY_JIT_COMPILE option,
|
||||
the size of the JIT compiled code is also output.
|
||||
|
||||
The /S modifier causes pcre[16|32]_study() to be called after the
|
||||
expression has been compiled, and the results used when the expression
|
||||
The /Q modifier is used to test the use of pcre_stack_guard. It must be
|
||||
followed by '0' or '1', specifying the return code to be given from an
|
||||
external function that is passed to PCRE and used for stack checking
|
||||
during compilation (see the pcreapi documentation for details).
|
||||
|
||||
The /S modifier causes pcre[16|32]_study() to be called after the
|
||||
expression has been compiled, and the results used when the expression
|
||||
is matched. There are a number of qualifying characters that may follow
|
||||
/S. They may appear in any order.
|
||||
|
||||
If S is followed by an exclamation mark, pcre[16|32]_study() is called
|
||||
with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
|
||||
If /S is followed by an exclamation mark, pcre[16|32]_study() is called
|
||||
with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
|
||||
pcre_extra block, even when studying discovers no useful information.
|
||||
|
||||
If /S is followed by a second S character, it suppresses studying, even
|
||||
if it was requested externally by the -s command line option. This
|
||||
makes it possible to specify that certain patterns are always studied,
|
||||
if it was requested externally by the -s command line option. This
|
||||
makes it possible to specify that certain patterns are always studied,
|
||||
and others are never studied, independently of -s. This feature is used
|
||||
in the test files in a few cases where the output is different when the
|
||||
pattern is studied.
|
||||
|
||||
If the /S modifier is followed by a + character, the call to
|
||||
pcre[16|32]_study() is made with all the JIT study options, requesting
|
||||
just-in-time optimization support if it is available, for both normal
|
||||
and partial matching. If you want to restrict the JIT compiling modes,
|
||||
If the /S modifier is followed by a + character, the call to
|
||||
pcre[16|32]_study() is made with all the JIT study options, requesting
|
||||
just-in-time optimization support if it is available, for both normal
|
||||
and partial matching. If you want to restrict the JIT compiling modes,
|
||||
you can follow /S+ with a digit in the range 1 to 7:
|
||||
|
||||
1 normal match only
|
||||
@ -492,40 +534,40 @@ PATTERN MODIFIERS
|
||||
7 all three modes (default)
|
||||
|
||||
If /S++ is used instead of /S+ (with or without a following digit), the
|
||||
text "(JIT)" is added to the first output line after a match or no
|
||||
text "(JIT)" is added to the first output line after a match or no
|
||||
match when JIT-compiled code was actually used.
|
||||
|
||||
Note that there is also an independent /+ modifier; it must not be
|
||||
Note that there is also an independent /+ modifier; it must not be
|
||||
given immediately after /S or /S+ because this will be misinterpreted.
|
||||
|
||||
If JIT studying is successful, the compiled JIT code will automatically
|
||||
be used when pcre[16|32]_exec() is run, except when incompatible run-
|
||||
time options are specified. For more details, see the pcrejit documen-
|
||||
tation. See also the \J escape sequence below for a way of setting the
|
||||
be used when pcre[16|32]_exec() is run, except when incompatible run-
|
||||
time options are specified. For more details, see the pcrejit documen-
|
||||
tation. See also the \J escape sequence below for a way of setting the
|
||||
size of the JIT stack.
|
||||
|
||||
Finally, if /S is followed by a minus character, JIT compilation is
|
||||
suppressed, even if it was requested externally by the -s command line
|
||||
option. This makes it possible to specify that JIT is never to be used
|
||||
Finally, if /S is followed by a minus character, JIT compilation is
|
||||
suppressed, even if it was requested externally by the -s command line
|
||||
option. This makes it possible to specify that JIT is never to be used
|
||||
for certain patterns.
|
||||
|
||||
The /T modifier must be followed by a single digit. It causes a spe-
|
||||
The /T modifier must be followed by a single digit. It causes a spe-
|
||||
cific set of built-in character tables to be passed to pcre[16|32]_com-
|
||||
pile(). It is used in the standard PCRE tests to check behaviour with
|
||||
pile(). It is used in the standard PCRE tests to check behaviour with
|
||||
different character tables. The digit specifies the tables as follows:
|
||||
|
||||
0 the default ASCII tables, as distributed in
|
||||
pcre_chartables.c.dist
|
||||
1 a set of tables defining ISO 8859 characters
|
||||
|
||||
In table 1, some characters whose codes are greater than 128 are iden-
|
||||
In table 1, some characters whose codes are greater than 128 are iden-
|
||||
tified as letters, digits, spaces, etc.
|
||||
|
||||
Using the POSIX wrapper API
|
||||
|
||||
The /P modifier causes pcretest to call PCRE via the POSIX wrapper API
|
||||
rather than its native API. This supports only the 8-bit library. When
|
||||
/P is set, the following modifiers set options for the regcomp() func-
|
||||
The /P modifier causes pcretest to call PCRE via the POSIX wrapper API
|
||||
rather than its native API. This supports only the 8-bit library. When
|
||||
/P is set, the following modifiers set options for the regcomp() func-
|
||||
tion:
|
||||
|
||||
/i REG_ICASE
|
||||
@ -536,9 +578,40 @@ PATTERN MODIFIERS
|
||||
/W REG_UCP ) the POSIX standard
|
||||
/8 REG_UTF8 )
|
||||
|
||||
The /+ modifier works as described above. All other modifiers are
|
||||
The /+ modifier works as described above. All other modifiers are
|
||||
ignored.
|
||||
|
||||
Locking out certain modifiers
|
||||
|
||||
PCRE can be compiled with or without support for certain features such
|
||||
as UTF-8/16/32 or Unicode properties. Accordingly, the standard tests
|
||||
are split up into a number of different files that are selected for
|
||||
running depending on which features are available. When updating the
|
||||
tests, it is all too easy to put a new test into the wrong file by mis-
|
||||
take; for example, to put a test that requires UTF support into a file
|
||||
that is used when it is not available. To help detect such mistakes as
|
||||
early as possible, there is a facility for locking out specific modi-
|
||||
fiers. If an input line for pcretest starts with the string "< forbid "
|
||||
the following sequence of characters is taken as a list of forbidden
|
||||
modifiers. For example, in the test files that must not use UTF or Uni-
|
||||
code property support, this line appears:
|
||||
|
||||
< forbid 8W
|
||||
|
||||
This locks out the /8 and /W modifiers. An immediate error is given if
|
||||
they are subsequently encountered. If the character string contains <
|
||||
but not >, all the multi-character modifiers that begin with < are
|
||||
locked out. Otherwise, such modifiers must be explicitly listed, for
|
||||
example:
|
||||
|
||||
< forbid <JS><cr>
|
||||
|
||||
There must be a single space between < and "forbid" for this feature to
|
||||
be recognised. If there is not, the line is interpreted either as a
|
||||
request to re-load a pre-compiled pattern (see "SAVING AND RELOADING
|
||||
COMPILED PATTERNS" below) or, if there is a another < character, as a
|
||||
pattern that uses < as its delimiter.
|
||||
|
||||
|
||||
DATA LINES
|
||||
|
||||
@ -561,6 +634,7 @@ DATA LINES
|
||||
\v vertical tab (\x0b)
|
||||
\nnn octal character (up to 3 octal digits); always
|
||||
a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
|
||||
\o{dd...} octal character (any number of octal digits}
|
||||
\xhh hexadecimal byte (up to 2 hex digits)
|
||||
\x{hh...} hexadecimal character (any number of hex digits)
|
||||
\A pass the PCRE_ANCHORED option to pcre[16|32]_exec()
|
||||
@ -952,50 +1026,51 @@ SAVING AND RELOADING COMPILED PATTERNS
|
||||
writing the file, pcretest expects to read a new pattern.
|
||||
|
||||
A saved pattern can be reloaded into pcretest by specifying < and a
|
||||
file name instead of a pattern. The name of the file must not contain a
|
||||
< character, as otherwise pcretest will interpret the line as a pattern
|
||||
delimited by < characters. For example:
|
||||
file name instead of a pattern. There must be no space between < and
|
||||
the file name, which must not contain a < character, as otherwise
|
||||
pcretest will interpret the line as a pattern delimited by < charac-
|
||||
ters. For example:
|
||||
|
||||
re> </some/file
|
||||
Compiled pattern loaded from /some/file
|
||||
No study data
|
||||
|
||||
If the pattern was previously studied with the JIT optimization, the
|
||||
JIT information cannot be saved and restored, and so is lost. When the
|
||||
pattern has been loaded, pcretest proceeds to read data lines in the
|
||||
If the pattern was previously studied with the JIT optimization, the
|
||||
JIT information cannot be saved and restored, and so is lost. When the
|
||||
pattern has been loaded, pcretest proceeds to read data lines in the
|
||||
usual way.
|
||||
|
||||
You can copy a file written by pcretest to a different host and reload
|
||||
it there, even if the new host has opposite endianness to the one on
|
||||
which the pattern was compiled. For example, you can compile on an i86
|
||||
machine and run on a SPARC machine. When a pattern is reloaded on a
|
||||
You can copy a file written by pcretest to a different host and reload
|
||||
it there, even if the new host has opposite endianness to the one on
|
||||
which the pattern was compiled. For example, you can compile on an i86
|
||||
machine and run on a SPARC machine. When a pattern is reloaded on a
|
||||
host with different endianness, the confirmation message is changed to:
|
||||
|
||||
Compiled pattern (byte-inverted) loaded from /some/file
|
||||
|
||||
The test suite contains some saved pre-compiled patterns with different
|
||||
endianness. These are reloaded using "<!" instead of just "<". This
|
||||
endianness. These are reloaded using "<!" instead of just "<". This
|
||||
suppresses the "(byte-inverted)" text so that the output is the same on
|
||||
all hosts. It also forces debugging output once the pattern has been
|
||||
all hosts. It also forces debugging output once the pattern has been
|
||||
reloaded.
|
||||
|
||||
File names for saving and reloading can be absolute or relative, but
|
||||
note that the shell facility of expanding a file name that starts with
|
||||
File names for saving and reloading can be absolute or relative, but
|
||||
note that the shell facility of expanding a file name that starts with
|
||||
a tilde (~) is not available.
|
||||
|
||||
The ability to save and reload files in pcretest is intended for test-
|
||||
ing and experimentation. It is not intended for production use because
|
||||
only a single pattern can be written to a file. Furthermore, there is
|
||||
no facility for supplying custom character tables for use with a
|
||||
reloaded pattern. If the original pattern was compiled with custom
|
||||
tables, an attempt to match a subject string using a reloaded pattern
|
||||
is likely to cause pcretest to crash. Finally, if you attempt to load
|
||||
The ability to save and reload files in pcretest is intended for test-
|
||||
ing and experimentation. It is not intended for production use because
|
||||
only a single pattern can be written to a file. Furthermore, there is
|
||||
no facility for supplying custom character tables for use with a
|
||||
reloaded pattern. If the original pattern was compiled with custom
|
||||
tables, an attempt to match a subject string using a reloaded pattern
|
||||
is likely to cause pcretest to crash. Finally, if you attempt to load
|
||||
a file that is not in the correct format, the result is undefined.
|
||||
|
||||
|
||||
SEE ALSO
|
||||
|
||||
pcre(3), pcre16(3), pcre32(3), pcreapi(3), pcrecallout(3), pcrejit,
|
||||
pcre(3), pcre16(3), pcre32(3), pcreapi(3), pcrecallout(3), pcrejit,
|
||||
pcrematching(3), pcrepartial(d), pcrepattern(3), pcreprecompile(3).
|
||||
|
||||
|
||||
@ -1008,5 +1083,5 @@ AUTHOR
|
||||
|
||||
REVISION
|
||||
|
||||
Last updated: 10 September 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 09 February 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCREUNICODE 3 "11 November 2012" "PCRE 8.32"
|
||||
.TH PCREUNICODE 3 "27 February 2013" "PCRE 8.33"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "UTF-8, UTF-16, UTF-32, AND UNICODE PROPERTY SUPPORT"
|
||||
@ -84,7 +84,9 @@ place. From release 7.3 of PCRE, the check is according the rules of RFC 3629,
|
||||
which are themselves derived from the Unicode specification. Earlier releases
|
||||
of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit
|
||||
values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0
|
||||
to U+10FFFF, excluding the surrogate area and the non-characters.
|
||||
to U+10FFFF, excluding the surrogate area. (From release 8.33 the so-called
|
||||
"non-character" code points are no longer excluded because Unicode corrigendum
|
||||
#9 makes it clear that they should not be.)
|
||||
.P
|
||||
Characters in the "Surrogate Area" of Unicode are reserved for use by UTF-16,
|
||||
where they are used in pairs to encode codepoints with values greater than
|
||||
@ -93,9 +95,6 @@ independently in the UTF-8 and UTF-32 encodings. (In other words, the whole
|
||||
surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8 and
|
||||
UTF-32.)
|
||||
.P
|
||||
Also excluded are the "Non-Character" code points, which are U+FDD0 to U+FDEF
|
||||
and the last two code points in each plane, U+??FFFE and U+??FFFF.
|
||||
.P
|
||||
If an invalid UTF-8 string is passed to PCRE, an error return is given. At
|
||||
compile time, the only additional information is the offset to the first byte
|
||||
of the failing character. The run-time functions \fBpcre_exec()\fP and
|
||||
@ -128,9 +127,6 @@ to the relevant functions. Values other than those in the surrogate range
|
||||
U+D800 to U+DFFF are independent code points. Values in the surrogate range
|
||||
must be used in pairs in the correct manner.
|
||||
.P
|
||||
Excluded are the "Non-Character" code points, which are U+FDD0 to U+FDEF
|
||||
and the last two code points in each plane, U+??FFFE and U+??FFFF.
|
||||
.P
|
||||
If an invalid UTF-16 string is passed to PCRE, an error return is given. At
|
||||
compile time, the only additional information is the offset to the first data
|
||||
unit of the failing character. The run-time functions \fBpcre16_exec()\fP and
|
||||
@ -152,9 +148,7 @@ However, if an invalid string is passed, the result is undefined.
|
||||
When you set the PCRE_UTF32 flag, the strings of 32-bit data units that are
|
||||
passed as patterns and subjects are (by default) checked for validity on entry
|
||||
to the relevant functions. This check allows only values in the range U+0
|
||||
to U+10FFFF, excluding the surrogate area U+D800 to U+DFFF, and the
|
||||
"Non-Character" code points, which are U+FDD0 to U+FDEF and the last two
|
||||
characters in each plane, U+??FFFE and U+??FFFF.
|
||||
to U+10FFFF, excluding the surrogate area U+D800 to U+DFFF.
|
||||
.P
|
||||
If an invalid UTF-32 string is passed to PCRE, an error return is given. At
|
||||
compile time, the only additional information is the offset to the first data
|
||||
@ -250,6 +244,6 @@ Cambridge CB2 3QH, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 11 November 2012
|
||||
Copyright (c) 1997-2012 University of Cambridge.
|
||||
Last updated: 27 February 2013
|
||||
Copyright (c) 1997-2013 University of Cambridge.
|
||||
.fi
|
||||
|
Reference in New Issue
Block a user