Regex: Update PCRE to v8.35.
I was über lazy at first, so took libs from SM. But actually it's quite easy to compile, so let's update to latest version \o/.
This commit is contained in:
@ -85,7 +85,9 @@ place. From release 7.3 of PCRE, the check is according the rules of RFC 3629,
|
||||
which are themselves derived from the Unicode specification. Earlier releases
|
||||
of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit
|
||||
values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0
|
||||
to U+10FFFF, excluding the surrogate area and the non-characters.
|
||||
to U+10FFFF, excluding the surrogate area. (From release 8.33 the so-called
|
||||
"non-character" code points are no longer excluded because Unicode corrigendum
|
||||
#9 makes it clear that they should not be.)
|
||||
</P>
|
||||
<P>
|
||||
Characters in the "Surrogate Area" of Unicode are reserved for use by UTF-16,
|
||||
@ -96,10 +98,6 @@ surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8 and
|
||||
UTF-32.)
|
||||
</P>
|
||||
<P>
|
||||
Also excluded are the "Non-Character" code points, which are U+FDD0 to U+FDEF
|
||||
and the last two code points in each plane, U+??FFFE and U+??FFFF.
|
||||
</P>
|
||||
<P>
|
||||
If an invalid UTF-8 string is passed to PCRE, an error return is given. At
|
||||
compile time, the only additional information is the offset to the first byte
|
||||
of the failing character. The run-time functions <b>pcre_exec()</b> and
|
||||
@ -135,10 +133,6 @@ U+D800 to U+DFFF are independent code points. Values in the surrogate range
|
||||
must be used in pairs in the correct manner.
|
||||
</P>
|
||||
<P>
|
||||
Excluded are the "Non-Character" code points, which are U+FDD0 to U+FDEF
|
||||
and the last two code points in each plane, U+??FFFE and U+??FFFF.
|
||||
</P>
|
||||
<P>
|
||||
If an invalid UTF-16 string is passed to PCRE, an error return is given. At
|
||||
compile time, the only additional information is the offset to the first data
|
||||
unit of the failing character. The run-time functions <b>pcre16_exec()</b> and
|
||||
@ -160,9 +154,7 @@ Validity of UTF-32 strings
|
||||
When you set the PCRE_UTF32 flag, the strings of 32-bit data units that are
|
||||
passed as patterns and subjects are (by default) checked for validity on entry
|
||||
to the relevant functions. This check allows only values in the range U+0
|
||||
to U+10FFFF, excluding the surrogate area U+D800 to U+DFFF, and the
|
||||
"Non-Character" code points, which are U+FDD0 to U+FDEF and the last two
|
||||
characters in each plane, U+??FFFE and U+??FFFF.
|
||||
to U+10FFFF, excluding the surrogate area U+D800 to U+DFFF.
|
||||
</P>
|
||||
<P>
|
||||
If an invalid UTF-32 string is passed to PCRE, an error return is given. At
|
||||
@ -261,9 +253,9 @@ Cambridge CB2 3QH, England.
|
||||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 11 November 2012
|
||||
Last updated: 27 February 2013
|
||||
<br>
|
||||
Copyright © 1997-2012 University of Cambridge.
|
||||
Copyright © 1997-2013 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
|
Reference in New Issue
Block a user