Windows PowerShell command on Get-command pcreapi

Manual Pages for UNIX Operating System command usage for man pcreapi



Introduction to Library Functions                      PCREAPI(3)



NAME
     PCRE - Perl-compatible regular expressions

PCRE NATIVE API

     #include 

     pcre *pcre_compile(const char *pattern, int options,
          const char **errptr, int *erroffset,
          const unsigned char *tableptr);

     pcre *pcre_compile2(const char *pattern, int options,
          int *errorcodeptr,
          const char **errptr, int *erroffset,
          const unsigned char *tableptr);

     pcre_extra *pcre_study(const pcre *code, int options,
          const char **errptr);

     int pcre_exec(const pcre *code, const pcre_extra *extra,
          const char *subject, int length, int startoffset,
          int options, int *ovector, int ovecsize);

     int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
          const char *subject, int length, int startoffset,
          int options, int *ovector, int ovecsize,
          int *workspace, int wscount);

     int pcre_copy_named_substring(const pcre *code,
          const char *subject, int *ovector,
          int stringcount, const char *stringname,
          char *buffer, int buffersize);

     int pcre_copy_substring(const char *subject, int *ovector,
          int stringcount, int stringnumber, char *buffer,
          int buffersize);

     int pcre_get_named_substring(const pcre *code,
          const char *subject, int *ovector,
          int stringcount, const char *stringname,
          const char **stringptr);

     int pcre_get_stringnumber(const pcre *code,
          const char *name);

     int pcre_get_stringtable_entries(const pcre *code,
          const char *name, char **first, char

     int pcre_get_substring(const char *subject, int *ovector,
          int stringcount, int stringnumber,
          const char **stringptr);




SunOS 5.10                Last change:                          1






Introduction to Library Functions                      PCREAPI(3)



     int pcre_get_substring_list(const char *subject,
          int *ovector, int stringcount, const char ***listptr);

     void pcre_free_substring(const char *stringptr);

     void pcre_free_substring_list(const char **stringptr);

     const unsigned char *pcre_maketables(void);

     int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
          int what, void *where);

     int pcre_info(const pcre *code, int *optptr, *firstcharptr);

     int pcre_refcount(pcre *code, int adjust);

     int pcre_config(int what, void *where);

     char *pcre_version(void);

     void *(*pcre_malloc)(size_t);

     void (*pcre_free)(void *);

     void *(*pcre_stack_malloc)(size_t);

     void (*pcre_stack_free)(void *);

     int (*pcre_callout)(pcre_callout_block *);

PCRE API OVERVIEW

     PCRE has its own native API,  which  is  described  in  this
     document.   There  are  also  some  wrapper  functions  that
     correspond to the POSIX regular expression  API.  These  are
     described in the pcreposix documentation. Both of these APIs
     define a set of C function calls. A C++ wrapper  is  distri-
     buted with PCRE. It is documented in the pcrecpp page.

     The native API C function  prototypes  are  defined  in  the
     header  file  pcre.h, and on Unix systems the library itself
     is called libpcre.  It can normally be  accessed  by  adding
     -lpcre  to  the command for linking an application that uses
     PCRE. The header file  defines  the  macros  PCRE_MAJOR  and
     PCRE_MINOR  to  contain  the major and minor release numbers
     for the library.  Applications can use these to include sup-
     port for different releases of PCRE.

     The functions pcre_compile(), pcre_compile2(), pcre_study(),
     and  pcre_exec() are used for compiling and matching regular
     expressions in a Perl-compatible manner.  A  sample  program
     that demonstrates the simplest way of using them is provided



SunOS 5.10                Last change:                          2






Introduction to Library Functions                      PCREAPI(3)



     in the file called pcredemo.c in  the  source  distribution.
     The  pcresample  documentation  describes how to compile and
     run it.

     A second matching function, pcre_dfa_exec(),  which  is  not
     Perl-compatible,  is  also  provided.  This uses a different
     algorithm for the matching. The alternative algorithm  finds
     all  possible matches (at a given point in the subject), and
     scans the subject just once. However,  this  algorithm  does
     not  return  captured  substrings.  A description of the two
     matching algorithms and their advantages  and  disadvantages
     is given in the pcrematching documentation.

     In addition to the main compiling  and  matching  functions,
     there are convenience functions for extracting captured sub-
     strings  from  a  subject  string   that   is   matched   by
     pcre_exec(). They are:

       pcre_copy_substring()
       pcre_copy_named_substring()
       pcre_get_substring()
       pcre_get_named_substring()
       pcre_get_substring_list()
       pcre_get_stringnumber()
       pcre_get_stringtable_entries()

     pcre_free_substring()  and  pcre_free_substring_list()   are
     also  provided,  to  free  the  memory  used  for  extracted
     strings.

     The function pcre_maketables() is used to  build  a  set  of
     character  tables  in  the  current  locale  for  passing to
     pcre_compile(), pcre_exec(), or pcre_dfa_exec(). This is  an
     optional  facility that is provided for specialist use. Most
     commonly, no special tables are passed, in which case inter-
     nal tables that are generated when PCRE is built are used.

     The function pcre_fullinfo() is used to find out information
     about a compiled pattern; pcre_info() is an obsolete version
     that returns only some of the available information, but  is
     retained   for   backwards   compatibility.    The  function
     pcre_version() returns a pointer to a string containing  the
     version of PCRE and its date of release.

     The function pcre_refcount() maintains a reference count  in
     a data block containing a compiled pattern. This is provided
     for the benefit of object-oriented applications.

     The global variables  pcre_malloc  and  pcre_free  initially
     contain the entry points of the standard malloc() and free()
     functions, respectively. PCRE calls  the  memory  management
     functions  via  these  variables,  so  a calling program can



SunOS 5.10                Last change:                          3






Introduction to Library Functions                      PCREAPI(3)



     replace them if it  wishes  to  intercept  the  calls.  This
     should be done before calling any PCRE functions.

     The global variables pcre_stack_malloc  and  pcre_stack_free
     are  also indirections to memory management functions. These
     special functions are used only when PCRE is compiled to use
     the heap for remembering data, instead of recursive function
     calls, when running the pcre_exec() function. See the  pcre-
     build  documentation  for details of how to do this. It is a
     non-standard way of building PCRE, for use  in  environments
     that  have  limited  stacks.  Because  of the greater use of
     memory management, it runs more slowly.  Separate  functions
     are  provided  so  that special-purpose external code can be
     used for this case. When used, these  functions  are  always
     called  in a stack-like manner (last obtained, first freed),
     and always for memory blocks of the same size.  There  is  a
     discussion  about  PCRE's stack usage in the pcrestack docu-
     mentation.

     The global variable pcre_callout initially contains NULL. It
     can be set by the caller to a "callout" function, which PCRE
     will then call at specified points during a matching  opera-
     tion. Details are given in the pcrecallout documentation.

NEWLINES

     PCRE supports five different conventions for indicating line
     breaks  in strings: a single CR (carriage return) character,
     a single LF (linefeed) character, the two-character sequence
     CRLF,  any  of  the  three preceding, or any Unicode newline
     sequence. The Unicode newline sequences are the  three  just
     mentioned,  plus  the  single  characters  VT (vertical tab,
     U+000B), FF (formfeed, U+000C), NEL (next line, U+0085),  LS
     (line  separator,  U+2028),  and  PS  (paragraph  separator,
     U+2029).

     Each of the first three conventions is used by at least  one
     operating system as its standard newline sequence. When PCRE
     is built, a default can be specified.  The  default  default
     is  LF,  which  is  the Unix standard. When PCRE is run, the
     default can be overridden, either when  a  pattern  is  com-
     piled, or when it is matched.

     At compile time, the newline convention can be specified  by
     the  options argument of pcre_compile(), or it can be speci-
     fied by special text at the start  of  the  pattern  itself;
     this  overrides any other settings. See the pcrepattern page
     for details of the special character sequences.

     In the PCRE documentation the word "newline" is used to mean
     "the  character  or  pair of characters that indicate a line
     break".  The  choice  of  newline  convention  affects   the



SunOS 5.10                Last change:                          4






Introduction to Library Functions                      PCREAPI(3)



     handling  of the dot, circumflex, and dollar metacharacters,
     the handling of #-comments in /x mode, and, when CRLF  is  a
     recognized line ending sequence, the match position advance-
     ment for a non-anchored pattern. There is more detail  about
     this in the section on pcre_exec() options below.

     The  choice  of  newline  convention  does  not  affect  the
     interpretation of the \n or \r escape sequences, nor does it
     affect what \R matches, which is  controlled  in  a  similar
     way, but by separate options.

MULTITHREADING

     The PCRE functions can be used in  multi-threading  applica-
     tions, with the proviso that the memory management functions
     pointed to by pcre_malloc, pcre_free, pcre_stack_malloc, and
     pcre_stack_free,  and  the  callout  function  pointed to by
     pcre_callout, are shared by all threads.

     The compiled form of a regular  expression  is  not  altered
     during  matching, so the same compiled pattern can safely be
     used by several threads at once.

SAVING PRECOMPILED PATTERNS FOR LATER USE

     The compiled form of a regular expression can be  saved  and
     re-used  at  a  later time, possibly by a different program,
     and even on a host other than the one on which it  was  com-
     piled.  Details  are  given in the pcreprecompile documenta-
     tion. However, compiling a regular expression with one  ver-
     sion  of  PCRE  for  use  with  a  different  version is not
     guaranteed to work and may cause crashes.

CHECKING BUILD-TIME OPTIONS

     int pcre_config(int what, void *where);

     The function pcre_config() makes  it  possible  for  a  PCRE
     client  to  discover  which optional features have been com-
     piled into the PCRE library. The pcrebuild documentation has
     more details about these optional features.

     The first argument for pcre_config() is an integer, specify-
     ing  which information is required; the second argument is a
     pointer to a variable into which the information is  placed.
     The following information is available:

       PCRE_CONFIG_UTF8

     The output is an integer that is set to one if UTF-8 support
     is available; otherwise it is set to zero.




SunOS 5.10                Last change:                          5






Introduction to Library Functions                      PCREAPI(3)



       PCRE_CONFIG_UNICODE_PROPERTIES

     The output is an integer that is set to one if  support  for
     Unicode  character  properties is available; otherwise it is
     set to zero.

       PCRE_CONFIG_NEWLINE

     The output is an integer whose value specifies  the  default
     character  sequence that is recognized as meaning "newline".
     The four values that are supported are: 10 for  LF,  13  for
     CR,  3338  for  CRLF,  -2  for  ANYCRLF, and -1 for ANY. The
     default should normally be the standard  sequence  for  your
     operating system.

       PCRE_CONFIG_BSR

     The output is an integer whose value indicates what  charac-
     ter  sequences  the \R escape sequence matches by default. A
     value of 0 means that \R matches  any  Unicode  line  ending
     sequence; a value of 1 means that \R matches only CR, LF, or
     CRLF. The default can be overridden when a pattern  is  com-
     piled or matched.

       PCRE_CONFIG_LINK_SIZE

     The output is an integer that contains the number  of  bytes
     used  for  internal linkage in compiled regular expressions.
     The value is 2, 3, or 4. Larger values allow larger  regular
     expressions  to be compiled, at the expense of slower match-
     ing. The default value of 2 is sufficient for  all  but  the
     most  massive patterns, since it allows the compiled pattern
     to be up to 64K in size.

       PCRE_CONFIG_POSIX_MALLOC_THRESHOLD

     The output is an integer that contains the  threshold  above
     which  the POSIX interface uses malloc() for output vectors.
     Further details are given in the pcreposix documentation.

       PCRE_CONFIG_MATCH_LIMIT

     The output is an integer that gives the  default  limit  for
     the   number  of  internal  matching  function  calls  in  a
     pcre_exec()  execution.  Further  details  are  given   with
     pcre_exec() below.

       PCRE_CONFIG_MATCH_LIMIT_RECURSION

     The output is an integer that gives the  default  limit  for
     the  depth  of  recursion when calling the internal matching
     function in a pcre_exec()  execution.  Further  details  are



SunOS 5.10                Last change:                          6






Introduction to Library Functions                      PCREAPI(3)



     given with pcre_exec() below.

       PCRE_CONFIG_STACKRECURSE

     The output is an integer that is  set  to  one  if  internal
     recursion  when running pcre_exec() is implemented by recur-
     sive function calls that use the  stack  to  remember  their
     state. This is the usual way that PCRE is compiled. The out-
     put is zero if PCRE was compiled to use blocks  of  data  on
     the  heap instead of recursive function calls. In this case,
     pcre_stack_malloc and pcre_stack_free are called  to  manage
     memory  blocks  on  the  heap,  thus avoiding the use of the
     stack.

COMPILING A PATTERN

     pcre *pcre_compile(const char *pattern, int options,
          const char **errptr, int *erroffset,
          const unsigned char *tableptr);

     pcre *pcre_compile2(const char *pattern, int options,
          int *errorcodeptr,
          const char **errptr, int *erroffset,
          const unsigned char *tableptr);

     Either of the functions  pcre_compile()  or  pcre_compile2()
     can  be  called  to compile a pattern into an internal form.
     The only difference  between  the  two  interfaces  is  that
     pcre_compile2()  has  an  additional argument, errorcodeptr,
     via which a numerical error code can be returned.

     The pattern is a C string terminated by a binary  zero,  and
     is  passed  in  the  pattern argument. A pointer to a single
     block  of  memory  that  is  obtained  via  pcre_malloc   is
     returned.  This contains the compiled code and related data.
     The pcre type is defined for the returned block; this  is  a
     typedef  for  a  structure whose contents are not externally
     defined. It is up to the caller  to  free  the  memory  (via
     pcre_free) when it is no longer required.

     Although the compiled code of a PCRE regex  is  relocatable,
     that is, it does not depend on memory location, the complete
     pcre data block is not fully  relocatable,  because  it  may
     contain a copy of the tableptr argument, which is an address
     (see below).

     The options argument  contains  various  bit  settings  that
     affect  the compilation. It should be zero if no options are
     required. The available options are described below. Some of
     them,  in  particular,  those that are compatible with Perl,
     can also be set and unset from within the pattern  (see  the
     detailed  description in the pcrepattern documentation). For



SunOS 5.10                Last change:                          7






Introduction to Library Functions                      PCREAPI(3)



     these options, the contents of the options  argument  speci-
     fies  their initial settings at the start of compilation and
     execution. The PCRE_ANCHORED  and  PCRE_NEWLINE_xxx  options
     can  be  set  at  the time of matching as well as at compile
     time.

     If errptr is NULL, pcre_compile() returns NULL  immediately.
     Otherwise, if compilation of a pattern fails, pcre_compile()
     returns NULL, and sets the variable pointed to by errptr  to
     point  to  a  textual error message. This is a static string
     that is part of the library. You must not try  to  free  it.
     The  offset  from  the start of the pattern to the character
     where the error was discovered is  placed  in  the  variable
     pointed  to  by erroffset, which must not be NULL. If it is,
     an immediate error is given.

     If pcre_compile2() is used instead  of  pcre_compile(),  and
     the errorcodeptr argument is not NULL, a non-zero error code
     number is returned via this argument  in  the  event  of  an
     error.  This  is  in  addition to the textual error message.
     Error codes and messages are listed below.

     If the final  argument,  tableptr,  is  NULL,  PCRE  uses  a
     default  set of character tables that are built when PCRE is
     compiled, using the default C  locale.  Otherwise,  tableptr
     must  be  an  address  that  is  the  result  of  a  call to
     pcre_maketables(). This value is stored  with  the  compiled
     pattern, and used again by pcre_exec(), unless another table
     pointer is passed to it. For more discussion, see  the  sec-
     tion on locale support below.

     This code fragment shows a typical straightforward  call  to
     pcre_compile():

       pcre *re;
       const char *error;
       int erroffset;
       re = pcre_compile(
         "^A.*Z",          /* the pattern */
         0,                /* default options */
         &error,           /* for error message */
         &erroffset,       /* for error offset */
         NULL);            /* use default character tables */

     The following names for  option  bits  are  defined  in  the
     pcre.h header file:

       PCRE_ANCHORED

     If this bit is set, the pattern is forced to be  "anchored",
     that is, it is constrained to match only at the first match-
     ing point in the string that is being searched (the "subject



SunOS 5.10                Last change:                          8






Introduction to Library Functions                      PCREAPI(3)



     string").  This  effect  can also be achieved by appropriate
     constructs in the pattern itself, which is the only  way  to
     do it in Perl.

       PCRE_AUTO_CALLOUT

     If this bit is  set,  pcre_compile()  automatically  inserts
     callout  items,  all  with  number  255, before each pattern
     item. For discussion of the callout facility, see the  pcre-
     callout documentation.

       PCRE_BSR_ANYCRLF
       PCRE_BSR_UNICODE

     These options (which are mutually  exclusive)  control  what
     the  \R  escape  sequence  matches.  The choice is either to
     match only CR, LF, or CRLF, or to match any Unicode  newline
     sequence.  The  default  is specified when PCRE is built. It
     can be overridden from within the pattern, or by setting  an
     option when a compiled pattern is matched.

       PCRE_CASELESS

     If this bit is set, letters in the pattern match both  upper
     and  lower  case  letters.  It  is  equivalent  to Perl's /i
     option, and it can be changed within a  pattern  by  a  (?i)
     option  setting.  In UTF-8 mode, PCRE always understands the
     concept of case for characters whose values  are  less  than
     128, so caseless matching is always possible. For characters
     with higher values, the concept of case is supported if PCRE
     is  compiled  with  Unicode property support, but not other-
     wise. If you want to use caseless  matching  for  characters
     128  and  above,  you must ensure that PCRE is compiled with
     Unicode property support as well as with UTF-8 support.

       PCRE_DOLLAR_ENDONLY

     If this bit is set, a dollar metacharacter  in  the  pattern
     matches  only at the end of the subject string. Without this
     option, a dollar also matches immediately before  a  newline
     at  the  end  of  the  string (but not before any other new-
     lines).  The  PCRE_DOLLAR_ENDONLY  option  is   ignored   if
     PCRE_MULTILINE  is  set.   There  is  no  equivalent to this
     option in Perl, and no way to set it within a pattern.

       PCRE_DOTALL

     If this bit is  set,  a  dot  metacharater  in  the  pattern
     matches  all  characters, including those that indicate new-
     line. Without it, a dot does  not  match  when  the  current
     position  is  at  a  newline.  This  option is equivalent to
     Perl's /s option, and it can be changed within a pattern  by



SunOS 5.10                Last change:                          9






Introduction to Library Functions                      PCREAPI(3)



     a  (?s) option setting. A negative class such as [^a] always
     matches newline characters, independent of  the  setting  of
     this option.

       PCRE_DUPNAMES

     If this bit is set, names used to identify capturing subpat-
     terns  need  not  be unique. This can be helpful for certain
     types of pattern when it is known that only one instance  of
     the  named  subpattern  can  ever be matched. There are more
     details of named subpatterns below; see also the pcrepattern
     documentation.

       PCRE_EXTENDED

     If this bit is set, whitespace data characters in  the  pat-
     tern  are  totally  ignored  except when escaped or inside a
     character class. Whitespace does not include the VT  charac-
     ter  (code 11). In addition, characters between an unescaped
     # outside a character class and the next newline, inclusive,
     are  also  ignored.  This is equivalent to Perl's /x option,
     and it can be changed within a pattern by a (?x) option set-
     ting.

     This option makes it possible  to  include  comments  inside
     complicated patterns.  Note, however, that this applies only
     to data characters. Whitespace characters may  never  appear
     within special character sequences in a pattern, for example
     within the sequence (?( which introduces a conditional  sub-
     pattern.

       PCRE_EXTRA

     This option was invented in  order  to  turn  on  additional
     functionality of PCRE that is incompatible with Perl, but it
     is currently of very little use. When set, any backslash  in
     a  pattern  that is followed by a letter that has no special
     meaning causes an error, thus reserving  these  combinations
     for  future  expansion.  By default, as in Perl, a backslash
     followed by a letter with no special meaning is treated as a
     literal.  (Perl can, however, be persuaded to give a warning
     for this.) There are at present no other features controlled
     by  this option. It can also be set by a (?X) option setting
     within a pattern.

       PCRE_FIRSTLINE

     If this option is set, an unanchored pattern is required  to
     match  before or at the first newline in the subject string,
     though the matched text may continue over the newline.

       PCRE_JAVASCRIPT_COMPAT



SunOS 5.10                Last change:                         10






Introduction to Library Functions                      PCREAPI(3)



     If this option is set, PCRE's behaviour is changed  in  some
     ways  so  that  it is compatible with JavaScript rather than
     Perl. The changes are as follows:

     (1) A lone closing square bracket  in  a  pattern  causes  a
     compile-time  error,  because  this is illegal in JavaScript
     (by default it is treated as a data  character).  Thus,  the
     pattern AB]CD becomes illegal when this option is set.

     (2) At run time, a back reference  to  an  unset  subpattern
     group  matches  an  empty string (by default this causes the
     current matching alternative to fail).  A  pattern  such  as
     (\1)(a)  succeeds  when  this option is set (assuming it can
     find an "a" in the subject), whereas it  fails  by  default,
     for Perl compatibility.

       PCRE_MULTILINE

     By default, PCRE treats the subject string as consisting  of
     a  single  line  of characters (even if it actually contains
     newlines). The "start of  line"  metacharacter  (^)  matches
     only  at  the  start  of the string, while the "end of line"
     metacharacter ($) matches only at the end of the string,  or
     before  a terminating newline (unless PCRE_DOLLAR_ENDONLY is
     set). This is the same as Perl.

     When PCRE_MULTILINE it is set, the "start of line" and  "end
     of  line"  constructs match immediately following or immedi-
     ately  before  internal  newlines  in  the  subject  string,
     respectively,  as well as at the very start and end. This is
     equivalent to Perl's /m option, and it can be changed within
     a pattern by a (?m) option setting. If there are no newlines
     in a subject string, or no occurrences of ^ or $ in  a  pat-
     tern, setting PCRE_MULTILINE has no effect.

       PCRE_NEWLINE_CR
       PCRE_NEWLINE_LF
       PCRE_NEWLINE_CRLF
       PCRE_NEWLINE_ANYCRLF
       PCRE_NEWLINE_ANY

     These options override the default newline  definition  that
     was  chosen  when  PCRE  was built. Setting the first or the
     second specifies that a newline is  indicated  by  a  single
     character     (CR     or    LF,    respectively).    Setting
     PCRE_NEWLINE_CRLF specifies that a newline is  indicated  by
     the      two-character      CRLF      sequence.      Setting
     PCRE_NEWLINE_ANYCRLF specifies that any of the three preced-
     ing sequences should be recognized. Setting PCRE_NEWLINE_ANY
     specifies that any Unicode newline sequence should be recog-
     nized. The Unicode newline sequences are the three just men-
     tioned,  plus  the  single  characters  VT  (vertical   tab,



SunOS 5.10                Last change:                         11






Introduction to Library Functions                      PCREAPI(3)



     U+000B),  FF (formfeed, U+000C), NEL (next line, U+0085), LS
     (line  separator,  U+2028),  and  PS  (paragraph  separator,
     U+2029). The last two are recognized only in UTF-8 mode.

     The newline setting in the options word uses three bits that
     are   treated  as  a  number,  giving  eight  possibilities.
     Currently only six are used (default plus  the  five  values
     above).  This  means  that  if you set more than one newline
     option, the combination may or  may  not  be  sensible.  For
     example,  PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent
     to  PCRE_NEWLINE_CRLF,  but  other  combinations  may  yield
     unused numbers and cause an error.

     The only time that a line break is specially recognized when
     compiling a pattern is if PCRE_EXTENDED is set, and an unes-
     caped # outside a character class is encountered. This indi-
     cates  a  comment that lasts until after the next line break
     sequence. In other circumstances, line break  sequences  are
     treated  as literal data, except that in PCRE_EXTENDED mode,
     both CR and LF are treated as whitespace characters and  are
     therefore ignored.

     The newline option that is set at compile time  becomes  the
     default  that  is  used for pcre_exec() and pcre_dfa_exec(),
     but it can be overridden.

       PCRE_NO_AUTO_CAPTURE

     If this option is set, it disables the use of numbered  cap-
     turing  parentheses  in the pattern. Any opening parenthesis
     that is not followed by ? behaves as if it were followed  by
     ?:  but  named  parentheses  can still be used for capturing
     (and they acquire numbers in the usual  way).  There  is  no
     equivalent of this option in Perl.

       PCRE_UNGREEDY

     This option inverts the "greediness" of the  quantifiers  so
     that  they  are  not greedy by default, but become greedy if
     followed by "?". It is not compatible with Perl. It can also
     be set by a (?U) option setting within the pattern.

       PCRE_UTF8

     This option causes PCRE to regard both the pattern  and  the
     subject  as  strings  of UTF-8 characters instead of single-
     byte character strings. However, it is available  only  when
     PCRE  is  built to include UTF-8 support. If not, the use of
     this option provokes an error. Details of  how  this  option
     changes  the  behaviour  of PCRE are given in the section on
     UTF-8 support in the main pcre page.




SunOS 5.10                Last change:                         12






Introduction to Library Functions                      PCREAPI(3)



       PCRE_NO_UTF8_CHECK

     When PCRE_UTF8 is set, the validity  of  the  pattern  as  a
     UTF-8 string is automatically checked. There is a discussion
     about the validity of UTF-8 strings in the main  pcre  page.
     If   an   invalid   UTF-8   sequence   of  bytes  is  found,
     pcre_compile() returns an error. If you  already  know  that
     your  pattern  is valid, and you want to skip this check for
     performance reasons,  you  can  set  the  PCRE_NO_UTF8_CHECK
     option.  When  it  is  set, the effect of passing an invalid
     UTF-8 string as a pattern is undefined. It  may  cause  your
     program  to  crash. Note that this option can also be passed
     to pcre_exec() and pcre_dfa_exec(), to  suppress  the  UTF-8
     validity checking of subject strings.

COMPILATION ERROR CODES

     The following table  lists  the  error  codes  than  may  be
     returned  by  pcre_compile2(), along with the error messages
     that may be returned by both compiling  functions.  As  PCRE
     has  developed,  some error codes have fallen out of use. To
     avoid confusion, they have not been re-used.

        0  no error
        1  \ at end of pattern
        2  \c at end of pattern
        3  unrecognized character follows \
        4  numbers out of order in {} quantifier
        5  number too big in {} quantifier
        6  missing terminating ] for character class
        7  invalid escape sequence in character class
        8  range out of order in character class
        9  nothing to repeat
       10  [this code is not in use]
       11  internal error: unexpected repeat
       12  unrecognized character after (? or (?-
       13  POSIX named classes are supported only within a class
       14  missing )
       15  reference to non-existent subpattern
       16  erroffset passed as NULL
       17  unknown option bit(s) set
       18  missing ) after comment
       19  [this code is not in use]
       20  regular expression is too large
       21  failed to get memory
       22  unmatched parentheses
       23  internal error: code overflow
       24  unrecognized character after (?<
       25  lookbehind assertion is not fixed length
       26  malformed number or name after (?(
       27  conditional group contains more than two branches
       28  assertion expected after (?(



SunOS 5.10                Last change:                         13






Introduction to Library Functions                      PCREAPI(3)



       29  (?R or (?[+-]digits must be followed by )
       30  unknown POSIX class name
       31  POSIX collating elements are not supported
       32  this version of PCRE is not  compiled  with  PCRE_UTF8
     support
       33  [this code is not in use]
       34  character value in \x{...} sequence is too large
       35  invalid condition (?(0)
       36  \C not allowed in lookbehind assertion
       37  PCRE does not support \L, \l, \N, \U, or \u
       38  number after (?C is > 255
       39  closing ) for (?C expected
       40  recursive call could loop indefinitely
       41  unrecognized character after (?P
       42  syntax error in subpattern name (missing terminator)
       43  two named subpatterns have the same name
       44  invalid UTF-8 string
       45  support for \P, \p, and \X has not been compiled
       46  malformed \P or \p sequence
       47  unknown property name after \P or \p
       48  subpattern name is too long (maximum 32 characters)
       49  too many named subpatterns (maximum 10000)
       50  [this code is not in use]
       51  octal value is greater than \377 (not in UTF-8 mode)
       52  internal error: overran compiling workspace
       53  internal error: previously-checked referenced  subpat-
     tern not found
       54  DEFINE group contains more than one branch
       55  repeating a DEFINE group is not allowed
       56  inconsistent NEWLINE options
       57  \g is not followed by a  braced,  angle-bracketed,  or
     quoted
             name/number or by a plain number
       58  a numbered reference must not be zero
       59  (*VERB) with an argument is not supported
       60  (*VERB) not recognized
       61  number is too big
       62  subpattern name expected
       63  digit expected after (?+
       64  ] is an invalid data character in JavaScript  compati-
     bility mode

     The numbers 32 and 10000 in errors 48 and 49  are  defaults;
     different values may be used if the limits were changed when
     PCRE was built.

STUDYING A PATTERN

     pcre_extra *pcre_study(const pcre *code, int options
          const char **errptr);





SunOS 5.10                Last change:                         14






Introduction to Library Functions                      PCREAPI(3)



     If a compiled pattern is going to be used several times,  it
     is  worth  spending more time analyzing it in order to speed
     up the time taken for matching.  The  function  pcre_study()
     takes a pointer to a compiled pattern as its first argument.
     If studying the pattern produces additional information that
     will  help speed up matching, pcre_study() returns a pointer
     to a pcre_extra block, in which the study_data field  points
     to the results of the study.

     The returned value from pcre_study() can be passed  directly
     to  pcre_exec().  However,  a pcre_extra block also contains
     other fields that can be set by the caller before the  block
     is  passed;  these  are  described  below  in the section on
     matching a pattern.

     If studying the pattern  does  not  produce  any  additional
     information pcre_study() returns NULL. In that circumstance,
     if the calling program wants to pass any of the other fields
     to pcre_exec(), it must set up its own pcre_extra block.

     The second argument of pcre_study() contains option bits. At
     present,  no  options  are defined, and this argument should
     always be zero.

     The third argument for pcre_study()  is  a  pointer  for  an
     error  message.  If  studying  succeeds  (even if no data is
     returned), the variable it points to is set to NULL.  Other-
     wise  it is set to point to a textual error message. This is
     a static string that is part of the library.  You  must  not
     try  to  free it. You should test the error pointer for NULL
     after calling pcre_study(), to be sure that it has run  suc-
     cessfully.

     This is a typical call to pcre_study():

       pcre_extra *pe;
       pe = pcre_study(
         re,             /* result of pcre_compile() */
         0,              /* no options exist */
         &error);        /* set to NULL or points to a message */

     At present, studying a  pattern  is  useful  only  for  non-
     anchored  patterns  that do not have a single fixed starting
     character. A bitmap of possible starting bytes is created.

LOCALE SUPPORT

     PCRE handles caseless matching, and determines whether char-
     acters  are  letters, digits, or whatever, by reference to a
     set of tables, indexed by character value. When  running  in
     UTF-8  mode, this applies only to characters with codes less
     than 128. Higher-valued codes never match escapes such as \w



SunOS 5.10                Last change:                         15






Introduction to Library Functions                      PCREAPI(3)



     or  \d,  but  can  be  tested  with \p if PCRE is built with
     Unicode character property support. The use of locales  with
     Unicode  is discouraged. If you are handling characters with
     codes greater than 128, you  should  either  use  UTF-8  and
     Unicode, or use locales, but not try to mix the two.

     PCRE contains an internal set of tables that are  used  when
     the final argument of pcre_compile() is NULL. These are suf-
     ficient  for  many  applications.   Normally,  the  internal
     tables  recognize  only ASCII characters. However, when PCRE
     is built, it is possible to cause the internal tables to  be
     rebuilt in the default "C" locale of the local system, which
     may cause them to be different.

     The internal tables can always be overridden by tables  sup-
     plied  by  the  application  that  calls  PCRE. These may be
     created in a different locale from the default. As more  and
     more applications change to using Unicode, the need for this
     locale support is expected to die away.

     External tables are built by calling  the  pcre_maketables()
     function,  which  has  no arguments, in the relevant locale.
     The  result  can  then  be  passed  to   pcre_compile()   or
     pcre_exec() as often as necessary. For example, to build and
     use tables that are appropriate for the French locale (where
     accented characters with values greater than 128 are treated
     as letters), the following code could be used:

       setlocale(LC_CTYPE, "fr_FR");
       tables = pcre_maketables();
       re = pcre_compile(..., tables);

     The locale name "fr_FR" is used on Linux and other Unix-like
     systems;  if  you are using Windows, the name for the French
     locale is "french".

     When pcre_maketables() runs, the tables are built in  memory
     that is obtained via pcre_malloc. It is the caller's respon-
     sibility to ensure that the  memory  containing  the  tables
     remains available for as long as it is needed.

     The pointer that is passed to pcre_compile() is  saved  with
     the  compiled pattern, and the same tables are used via this
     pointer by pcre_study() and normally  also  by  pcre_exec().
     Thus,  by  default,  for  any  single  pattern, compilation,
     studying and matching all happen in  the  same  locale,  but
     different patterns can be compiled in different locales.

     It is possible to pass a table pointer or  NULL  (indicating
     the use of the internal tables) to pcre_exec(). Although not
     intended for this purpose, this facility could  be  used  to
     match  a pattern in a different locale from the one in which



SunOS 5.10                Last change:                         16






Introduction to Library Functions                      PCREAPI(3)



     it was compiled. Passing table pointers at run time is  dis-
     cussed below in the section on matching a pattern.

INFORMATION ABOUT A PATTERN

     int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
          int what, void *where);

     The pcre_fullinfo() function  returns  information  about  a
     compiled pattern. It replaces the obsolete pcre_info() func-
     tion, which is nevertheless retained for backwards compabil-
     ity (and is documented below).

     The first argument for pcre_fullinfo() is a pointer  to  the
     compiled  pattern.  The  second  argument  is  the result of
     pcre_study(), or NULL if the pattern was  not  studied.  The
     third  argument  specifies  which  piece  of  information is
     required, and the fourth argument is a pointer to a variable
     to  receive  the data. The yield of the function is zero for
     success, or one of the following negative numbers:

       PCRE_ERROR_NULL       the argument code was NULL
                             the argument where was NULL
       PCRE_ERROR_BADMAGIC   the "magic number" was not found
       PCRE_ERROR_BADOPTION  the value of what was invalid

     The "magic number" is placed at the start of  each  compiled
     pattern  as  an  simple  check  against passing an arbitrary
     memory pointer. Here is a typical call  of  pcre_fullinfo(),
     to obtain the length of the compiled pattern:

       int rc;
       size_t length;
       rc = pcre_fullinfo(
         re,               /* result of pcre_compile() */
         pe,               /* result of pcre_study(), or NULL */
         PCRE_INFO_SIZE,   /* what is required */
         &length);         /* where to put the data */

     The possible values for the third argument  are  defined  in
     pcre.h, and are as follows:

       PCRE_INFO_BACKREFMAX

     Return the number of the highest back reference in the  pat-
     tern.  The  fourth argument should point to an int variable.
     Zero is returned if there are no back references.

       PCRE_INFO_CAPTURECOUNT

     Return the number of capturing subpatterns in  the  pattern.
     The fourth argument should point to an int variable.



SunOS 5.10                Last change:                         17






Introduction to Library Functions                      PCREAPI(3)



       PCRE_INFO_DEFAULT_TABLES

     Return a pointer to the internal  default  character  tables
     within PCRE. The fourth argument should point to an unsigned
     char * variable.  This  information  call  is  provided  for
     internal  use by the pcre_study() function. External callers
     can cause PCRE to use its internal tables by passing a  NULL
     table pointer.

       PCRE_INFO_FIRSTBYTE

     Return information about  the  first  byte  of  any  matched
     string,  for  a  non-anchored  pattern.  The fourth argument
     should point to an int variable. (This  option  used  to  be
     called PCRE_INFO_FIRSTCHAR; the old name is still recognized
     for backwards compatibility.)

     If there is a fixed first byte, for example, from a  pattern
     such  as (cat|cow|coyote), its value is returned. Otherwise,
     if either

     (a) the pattern was compiled with the PCRE_MULTILINE option,
     and every branch starts with "^", or

     (b) every  branch  of  the  pattern  starts  with  ".*"  and
     PCRE_DOTALL is not set (if it were set, the pattern would be
     anchored),

     -1 is returned, indicating that the pattern matches only  at
     the  start  of  a subject string or after any newline within
     the string. Otherwise -2 is returned. For anchored patterns,
     -2 is returned.

       PCRE_INFO_FIRSTTABLE

     If the pattern was studied, and this resulted  in  the  con-
     struction of a 256-bit table indicating a fixed set of bytes
     for the first byte in any matching string, a pointer to  the
     table  is  returned.  Otherwise NULL is returned. The fourth
     argument should point to an unsigned char * variable.

       PCRE_INFO_HASCRORLF

     Return 1 if the pattern contains any explicit matches for CR
     or  LF  characters,  otherwise 0. The fourth argument should
     point to an int variable. An  explicit  match  is  either  a
     literal CR or LF character, or \r or \n.

       PCRE_INFO_JCHANGED

     Return 1 if the (?J) or (?-J) option setting is used in  the
     pattern, otherwise 0. The fourth argument should point to an



SunOS 5.10                Last change:                         18






Introduction to Library Functions                      PCREAPI(3)



     int variable.  (?J)  and  (?-J)  set  and  unset  the  local
     PCRE_DUPNAMES option, respectively.

       PCRE_INFO_LASTLITERAL

     Return the value of the rightmost  literal  byte  that  must
     exist  in  any  matched  string, other than at its start, if
     such a byte has been recorded. The  fourth  argument  should
     point  to  an  int variable. If there is no such byte, -1 is
     returned. For anchored patterns,  a  last  literal  byte  is
     recorded  only  if  it follows something of variable length.
     For example, for the pattern /^a\d+z\d+/ the returned  value
     is "z", but for /^a\dz\d/ the returned value is -1.

       PCRE_INFO_NAMECOUNT
       PCRE_INFO_NAMEENTRYSIZE
       PCRE_INFO_NAMETABLE

     PCRE supports the use of named as well as numbered capturing
     parentheses. The names are just an additional way of identi-
     fying the parentheses, which still acquire numbers.  Several
     convenience functions such as pcre_get_named_substring() are
     provided for extracting captured substrings by name.  It  is
     also  possible  to  extract the data directly, by first con-
     verting the name to a number in order to access the  correct
     pointers  in  the  output vector (described with pcre_exec()
     below). To do the conversion, you need to use  the  name-to-
     number map, which is described by these three values.

     The  map  consists  of  a  number  of  fixed-size   entries.
     PCRE_INFO_NAMECOUNT   gives   the  number  of  entries,  and
     PCRE_INFO_NAMEENTRYSIZE gives the size of each  entry;  both
     of  these return an int value. The entry size depends on the
     length of the longest name.  PCRE_INFO_NAMETABLE  returns  a
     pointer to the first entry of the table (a pointer to char).
     The first two bytes of each entry are the number of the cap-
     turing parenthesis, most significant byte first. The rest of
     the entry is the corresponding name,  zero  terminated.  The
     names  are in alphabetical order. When PCRE_DUPNAMES is set,
     duplicate names are in order of their  parentheses  numbers.
     For   example,   consider   the  following  pattern  (assume
     PCRE_EXTENDED is set, so white space - including newlines  -
     is ignored):

       (? (?(\d\d)?\d\d) -
       (?\d\d) - (?\d\d) )

     There are four named subpatterns,  so  the  table  has  four
     entries,  and  each  entry in the table is eight bytes long.
     The table is as follows, with non-printing  bytes  shows  in
     hexadecimal, and undefined bytes shown as ??:




SunOS 5.10                Last change:                         19






Introduction to Library Functions                      PCREAPI(3)



       00 01 d  a  t  e  00 ??
       00 05 d  a  y  00 ?? ??
       00 04 m  o  n  t  h  00
       00 02 y  e  a  r  00 ??

     When writing code to extract  data  from  named  subpatterns
     using  the  name-to-number  map, remember that the length of
     the entries is likely to be different for each compiled pat-
     tern.

       PCRE_INFO_OKPARTIAL

     Return 1 if the pattern can be used  for  partial  matching,
     otherwise  0.  The  fourth  argument  should point to an int
     variable. The pcrepartial documentation lists  the  restric-
     tions that apply to patterns when partial matching is used.

       PCRE_INFO_OPTIONS

     Return a copy of the options with which the pattern was com-
     piled.  The fourth argument should point to an unsigned long
     int variable. These option bits are those specified  in  the
     call  to  pcre_compile(),  modified  by any top-level option
     settings at the start of the pattern itself. In other words,
     they  are  the  options  that will be in force when matching
     starts. For example, if the pattern /(?im)abc(?-i)d/ is com-
     piled   with   the   PCRE_EXTENDED  option,  the  result  is
     PCRE_CASELESS, PCRE_MULTILINE, and PCRE_EXTENDED.

     A pattern is automatically anchored by PCRE if  all  of  its
     top-level alternatives begin with one of the following:

       ^     unless PCRE_MULTILINE is set
       \A    always
       \G    always
       .*    if PCRE_DOTALL is set and there are no back
               references to the subpattern in which .* appears

     For such patterns, the  PCRE_ANCHORED  bit  is  set  in  the
     options returned by pcre_fullinfo().

       PCRE_INFO_SIZE

     Return the size of the compiled pattern, that is, the  value
     that  was  passed as the argument to pcre_malloc() when PCRE
     was getting memory in which to place the compiled data.  The
     fourth argument should point to a size_t variable.

       PCRE_INFO_STUDYSIZE

     Return the  size  of  the  data  block  pointed  to  by  the
     study_data  field  in a pcre_extra block. That is, it is the



SunOS 5.10                Last change:                         20






Introduction to Library Functions                      PCREAPI(3)



     value that was passed to pcre_malloc() when PCRE was getting
     memory into which to place the data created by pcre_study().
     The fourth argument should point to a size_t variable.

OBSOLETE INFO FUNCTION

     int pcre_info(const pcre *code, int *optptr, *firstcharptr);

     The pcre_info() function is now obsolete because its  inter-
     face  is  too  restrictive  to return all the available data
     about  a  compiled  pattern.   New   programs   should   use
     pcre_fullinfo()  instead.  The  yield  of pcre_info() is the
     number of capturing subpatterns, or  one  of  the  following
     negative numbers:

       PCRE_ERROR_NULL       the argument code was NULL
       PCRE_ERROR_BADMAGIC   the "magic number" was not found

     If the optptr argument is not NULL, a copy  of  the  options
     with which the pattern was compiled is placed in the integer
     it points to (see PCRE_INFO_OPTIONS above).

     If the pattern is not anchored and the firstcharptr argument
     is  not  NULL, it is used to pass back information about the
     first    character    of    any    matched    string    (see
     PCRE_INFO_FIRSTBYTE above).

REFERENCE COUNTS

     int pcre_refcount(pcre *code, int adjust);

     The pcre_refcount() function is used to maintain a reference
     count in the data block that contains a compiled pattern. It
     is provided for the benefit of applications that operate  in
     an  object-oriented  manner,  where  different  parts of the
     application may be using the same compiled pattern, but  you
     want to free the block when they are all done.

     When a pattern is compiled, the  reference  count  field  is
     initialized  to  zero.   It  is changed only by calling this
     function, whose action is to add the adjust value (which may
     be positive or negative) to it. The yield of the function is
     the new value. However, the  value  of  the  count  is  con-
     strained  to  lie between 0 and 65535, inclusive. If the new
     value is outside these limits, it is forced to the appropri-
     ate limit value.

     Except when it is zero, the reference count is not correctly
     preserved  if  a  pattern  is  compiled on one host and then
     transferred to a host whose byte-order is  different.  (This
     seems a highly unlikely scenario.)




SunOS 5.10                Last change:                         21






Introduction to Library Functions                      PCREAPI(3)



MATCHING A PATTERN: THE TRADITIONAL FUNCTION

     int pcre_exec(const pcre *code, const pcre_extra *extra,
          const char *subject, int length, int startoffset,
          int options, int *ovector, int ovecsize);

     The function pcre_exec() is called to match a subject string
     against  a  compiled  pattern,  which  is passed in the code
     argument. If the pattern has been studied, the result of the
     study  should be passed in the extra argument. This function
     is the  main  matching  facility  of  the  library,  and  it
     operates  in a Perl-like manner. For specialist use there is
     also an alternative matching function,  which  is  described
     below in the section about the pcre_dfa_exec() function.

     In most applications, the pattern will  have  been  compiled
     (and  optionally  studied)  in  the  same process that calls
     pcre_exec(). However, it is possible to save  compiled  pat-
     terns  and  study data, and then use them later in different
     processes, possibly even on different hosts. For  a  discus-
     sion about this, see the pcreprecompile documentation.

     Here is an example of a simple call to pcre_exec():

       int rc;
       int ovector[30];
       rc = pcre_exec(
         re,             /* result of pcre_compile() */
         NULL,           /* we didn't study the pattern */
         "some string",  /* the subject string */
         11,             /* the length of the subject string */
         0,              /* start at offset 0 in the subject */
         0,              /* default options */
         ovector,        /*  vector  of  integers  for  substring
     information */
         30);            /*  number  of  elements  (NOT  size  in
     bytes) */

  Extra data for pcre_exec()

     If the extra argument is  not  NULL,  it  must  point  to  a
     pcre_extra  data  block.  The  pcre_study() function returns
     such a block (when it doesn't return NULL), but you can also
     create  one for yourself, and pass additional information in
     it. The pcre_extra block contains the following fields  (not
     necessarily in this order):

       unsigned long int flags;
       void *study_data;
       unsigned long int match_limit;
       unsigned long int match_limit_recursion;
       void *callout_data;



SunOS 5.10                Last change:                         22






Introduction to Library Functions                      PCREAPI(3)



       const unsigned char *tables;

     The flags field is a bitmap  that  specifies  which  of  the
     other fields are set. The flag bits are:

       PCRE_EXTRA_STUDY_DATA
       PCRE_EXTRA_MATCH_LIMIT
       PCRE_EXTRA_MATCH_LIMIT_RECURSION
       PCRE_EXTRA_CALLOUT_DATA
       PCRE_EXTRA_TABLES

     Other flag bits should be set to zero. The study_data  field
     is   set  in  the  pcre_extra  block  that  is  returned  by
     pcre_study(), together with the appropriate  flag  bit.  You
     should  not  set this yourself, but you may add to the block
     by setting the other fields  and  their  corresponding  flag
     bits.

     The match_limit field provides a means  of  preventing  PCRE
     from  using  up a vast amount of resources when running pat-
     terns that are not going to match, but  which  have  a  very
     large  number  of  possibilities  in their search trees. The
     classic example is the use of nested unlimited repeats.

     Internally, PCRE uses a function  called  match()  which  it
     calls  repeatedly  (sometimes recursively). The limit set by
     match_limit is imposed on the number of times this  function
     is  called  during a match, which has the effect of limiting
     the amount of backtracking that can take place. For patterns
     that are not anchored, the count restarts from zero for each
     position in the subject string.

     The default value for the limit can  be  set  when  PCRE  is
     built;  the default default is 10 million, which handles all
     but the most extreme cases. You can override the default  by
     suppling  pcre_exec()  with  a  pcre_extra  block  in  which
     match_limit is set, and PCRE_EXTRA_MATCH_LIMIT is set in the
     flags  field.  If the limit is exceeded, pcre_exec() returns
     PCRE_ERROR_MATCHLIMIT.

     The match_limit_recursion field is similar  to  match_limit,
     but  instead  of  limiting  the  total  number of times that
     match() is called, it limits the  depth  of  recursion.  The
     recursion depth is a smaller number than the total number of
     calls, because not all calls to match() are recursive.  This
     limit is of use only if it is set smaller than match_limit.

     Limiting the recursion depth limits the amount of stack that
     can  be  used, or, when PCRE has been compiled to use memory
     on the heap instead of the stack, the amount of heap  memory
     that can be used.




SunOS 5.10                Last change:                         23






Introduction to Library Functions                      PCREAPI(3)



     The default value for match_limit_recursion can be set  when
     PCRE  is built; the default default is the same value as the
     default for match_limit. You can  override  the  default  by
     suppling  pcre_exec()  with  a  pcre_extra  block  in  which
     match_limit_recursion          is          set,          and
     PCRE_EXTRA_MATCH_LIMIT_RECURSION  is set in the flags field.
     If   the   limit   is    exceeded,    pcre_exec()    returns
     PCRE_ERROR_RECURSIONLIMIT.

     The pcre_callout field is used in conjunction with the "cal-
     lout"  feature,  which is described in the pcrecallout docu-
     mentation.

     The tables field is used to pass a character tables  pointer
     to pcre_exec(); this overrides the value that is stored with
     the compiled pattern. A non-NULL value is  stored  with  the
     compiled  pattern  only  if  custom  tables were supplied to
     pcre_compile() via its tableptr argument.  If NULL is passed
     to pcre_exec() using this mechanism, it forces PCRE's inter-
     nal tables to be used. This facility  is  helpful  when  re-
     using  patterns that have been saved after compiling with an
     external set of tables, because the external tables might be
     at  a  different address when pcre_exec() is called. See the
     pcreprecompile documentation for a discussion of saving com-
     piled patterns for later use.

  Option bits for pcre_exec()

     The unused bits of the options argument for pcre_exec() must
     be  zero.  The  only bits that may be set are PCRE_ANCHORED,
     PCRE_NEWLINE_xxx, PCRE_NOTBOL,  PCRE_NOTEOL,  PCRE_NOTEMPTY,
     PCRE_NO_UTF8_CHECK and PCRE_PARTIAL.

       PCRE_ANCHORED

     The PCRE_ANCHORED option limits pcre_exec() to  matching  at
     the  first matching position. If a pattern was compiled with
     PCRE_ANCHORED, or turned out to be anchored by virtue of its
     contents, it cannot be made unachored at matching time.

       PCRE_BSR_ANYCRLF
       PCRE_BSR_UNICODE

     These options (which are mutually  exclusive)  control  what
     the  \R  escape  sequence  matches.  The choice is either to
     match only CR, LF, or CRLF, or to match any Unicode  newline
     sequence. These options override the choice that was made or
     defaulted when the pattern was compiled.

       PCRE_NEWLINE_CR
       PCRE_NEWLINE_LF
       PCRE_NEWLINE_CRLF



SunOS 5.10                Last change:                         24






Introduction to Library Functions                      PCREAPI(3)



       PCRE_NEWLINE_ANYCRLF
       PCRE_NEWLINE_ANY

     These options  override  the  newline  definition  that  was
     chosen  or  defaulted  when  the  pattern  was compiled. For
     details, see the description of pcre_compile() above. During
     matching,  the  newline  choice affects the behaviour of the
     dot, circumflex, and  dollar  metacharacters.  It  may  also
     alter  the  way the match position is advanced after a match
     failure for an unanchored pattern.

     When     PCRE_NEWLINE_CRLF,     PCRE_NEWLINE_ANYCRLF,     or
     PCRE_NEWLINE_ANY  is  set,  and a match attempt for an unan-
     chored pattern fails when the current position is at a  CRLF
     sequence,  and  the pattern contains no explicit matches for
     CR or LF characters, the match position is advanced  by  two
     characters  instead  of  one,  in  other words, to after the
     CRLF.

     The above rule is a compromise that makes  the  most  common
     cases  work  as expected. For example, if the pattern is .+A
     (and the PCRE_DOTALL option is not set), it does  not  match
     the  string  "\r\nA" because, after failing at the start, it
     skips both the CR and the LF before retrying.  However,  the
     pattern  [\r\n]A does match that string, because it contains
     an explicit CR or LF reference, and so advances only by  one
     character after the first failure.

     An explicit match for CR of LF is either a  literal  appear-
     ance  of  one  of  those  characters, or one of the \r or \n
     escape sequences. Implicit  matches  such  as  [^X]  do  not
     count,  nor does \s (which includes CR and LF in the charac-
     ters that it matches).

     Notwithstanding the above, anomalous effects may still occur
     when  CRLF is a valid newline sequence and explicit \r or \n
     escapes appear in the pattern.

       PCRE_NOTBOL

     This option specifies that first character  of  the  subject
     string  is  not  the  beginning of a line, so the circumflex
     metacharacter should  not  match  before  it.  Setting  this
     without  PCRE_MULTILINE  (at compile time) causes circumflex
     never to match. This option affects only  the  behaviour  of
     the circumflex metacharacter. It does not affect \A.

       PCRE_NOTEOL

     This option specifies that the end of the subject string  is
     not  the  end  of a line, so the dollar metacharacter should
     not match it  nor  (except  in  multiline  mode)  a  newline



SunOS 5.10                Last change:                         25






Introduction to Library Functions                      PCREAPI(3)



     immediately  before  it. Setting this without PCRE_MULTILINE
     (at compile time) causes dollar never to match. This  option
     affects  only  the behaviour of the dollar metacharacter. It
     does not affect \Z or \z.

       PCRE_NOTEMPTY

     An empty string is not considered to be  a  valid  match  if
     this  option  is  set. If there are alternatives in the pat-
     tern, they are tried. If  all  the  alternatives  match  the
     empty  string,  the  entire match fails. For example, if the
     pattern

       a?b?

     is applied to a string not beginning with  "a"  or  "b",  it
     matches  the  empty string at the start of the subject. With
     PCRE_NOTEMPTY set, this match is not valid, so PCRE searches
     further into the string for occurrences of "a" or "b".

     Perl has no direct equivalent of PCRE_NOTEMPTY, but it  does
     make  a  special case of a pattern match of the empty string
     within its split() function, and when using the /g modifier.
     It  is possible to emulate Perl's behaviour after matching a
     null string by first trying the  match  again  at  the  same
     offset  with  PCRE_NOTEMPTY  and  PCRE_ANCHORED, and then if
     that fails by advancing the starting offset (see below)  and
     trying  an  ordinary  match  again.  There is some code that
     demonstrates how to do this in the  pcredemo.c  sample  pro-
     gram.

       PCRE_NO_UTF8_CHECK

     When PCRE_UTF8 is set at compile time, the validity  of  the
     subject  as  a  UTF-8  string  is automatically checked when
     pcre_exec() is subsequently  called.   The  value  of  star-
     toffset  is  also  checked  to  ensure that it points to the
     start of a UTF-8 character. There is a discussion about  the
     validity of UTF-8 strings in the section on UTF-8 support in
     the main pcre page. If an invalid UTF-8 sequence of bytes is
     found,  pcre_exec() returns the error PCRE_ERROR_BADUTF8. If
     startoffset      contains      an       invalid       value,
     PCRE_ERROR_BADUTF8_OFFSET is returned.

     If you already know that your subject is valid, and you want
     to  skip  these  checks for performance reasons, you can set
     the PCRE_NO_UTF8_CHECK option when calling pcre_exec().  You
     might want to do this for the second and subsequent calls to
     pcre_exec() if you are making repeated calls to find all the
     matches  in  a single subject string. However, you should be
     sure that the value of startoffset points to the start of  a
     UTF-8  character. When PCRE_NO_UTF8_CHECK is set, the effect



SunOS 5.10                Last change:                         26






Introduction to Library Functions                      PCREAPI(3)



     of passing an invalid UTF-8 string as a subject, or a  value
     of  startoffset  that does not point to the start of a UTF-8
     character, is undefined. Your program may crash.

       PCRE_PARTIAL

     This option turns on the partial matching  feature.  If  the
     subject string fails to match the pattern, but at some point
     during the matching process  the  end  of  the  subject  was
     reached  (that is, the subject partially matches the pattern
     and the failure to match occurred only  because  there  were
     not   enough   subject   characters),   pcre_exec()  returns
     PCRE_ERROR_PARTIAL  instead  of   PCRE_ERROR_NOMATCH.   When
     PCRE_PARTIAL  is  used,  there  are restrictions on what may
     appear in the pattern. These are discussed in  the  pcrepar-
     tial documentation.

  The string to be matched by pcre_exec()

     The subject string is passed to pcre_exec() as a pointer  in
     subject,  a length (in bytes) in length, and a starting byte
     offset in startoffset. In UTF-8 mode, the byte  offset  must
     point  to the start of a UTF-8 character. Unlike the pattern
     string, the subject may contain binary zero bytes. When  the
     starting  offset  is  zero, the search for a match starts at
     the beginning of the subject, and this is by  far  the  most
     common case.

     A non-zero starting offset  is  useful  when  searching  for
     another  match  in  the  same subject by calling pcre_exec()
     again after a previous success.  Setting startoffset differs
     from  just  passing  over  a  shortened  string  and setting
     PCRE_NOTBOL in the case of a pattern that  begins  with  any
     kind of lookbehind. For example, consider the pattern

       \Biss\B

     which finds occurrences of "iss" in the middle of words. (\B
     matches only if the current position in the subject is not a
     word boundary.) When applied to the string "Mississipi"  the
     first  call  to  pcre_exec()  finds the first occurrence. If
     pcre_exec() is called again with just the remainder  of  the
     subject,  namely  "issipi", it does not match, because \B is
     always false at the start of the subject, which is deemed to
     be  a  word  boundary. However, if pcre_exec() is passed the
     entire string again, but with startoffset set to 4, it finds
     the  second  occurrence  of "iss" because it is able to look
     behind the starting point to discover that it is preceded by
     a letter.

     If a non-zero starting offset is passed when the pattern  is
     anchored,  one attempt to match at the given offset is made.



SunOS 5.10                Last change:                         27






Introduction to Library Functions                      PCREAPI(3)



     This can only succeed if the pattern does  not  require  the
     match to be at the start of the subject.

  How pcre_exec() returns captured substrings

     In general, a pattern matches a certain portion of the  sub-
     ject,  and  in addition, further substrings from the subject
     may be picked out by parts of  the  pattern.  Following  the
     usage  in  Jeffrey Friedl's book, this is called "capturing"
     in what follows, and the phrase  "capturing  subpattern"  is
     used for a fragment of a pattern that picks out a substring.
     PCRE supports several other kinds of  parenthesized  subpat-
     tern that do not cause substrings to be captured.

     Captured substrings are returned to the caller via a  vector
     of  integers  whose address is passed in ovector. The number
     of elements in the vector is passed in ovecsize, which  must
     be  a  non-negative  number.  Note: this argument is NOT the
     size of ovector in bytes.

     The first two-thirds of the vector is used to pass back cap-
     tured  substrings,  each substring using a pair of integers.
     The remaining third of the vector is used  as  workspace  by
     pcre_exec() while matching capturing subpatterns, and is not
     available for passing back information. The number passed in
     ovecsize should always be a multiple of three. If it is not,
     it is rounded down.

     When a match is successful, information about captured  sub-
     strings  is  returned  in pairs of integers, starting at the
     beginning of ovector, and continuing up to two-thirds of its
     length at the most. The first element of each pair is set to
     the byte offset of the first character in a  substring,  and
     the  second is set to the byte offset of the first character
     after the end of a substring. Note: these values are  always
     byte  offsets,  even  in  UTF-8 mode. They are not character
     counts.

     The first pair of integers, ovector[0] and ovector[1], iden-
     tify the portion of the subject string matched by the entire
     pattern. The next pair is used for the first capturing  sub-
     pattern, and so on. The value returned by pcre_exec() is one
     more than the highest numbered pair that has been set.   For
     example,  if two substrings have been captured, the returned
     value is 3. If  there  are  no  capturing  subpatterns,  the
     return  value  from a successful match is 1, indicating that
     just the first pair of offsets has been set.

     If a capturing subpattern is matched repeatedly, it  is  the
     last portion of the string that it matched that is returned.





SunOS 5.10                Last change:                         28






Introduction to Library Functions                      PCREAPI(3)



     If the vector is too small to hold  all  the  captured  sub-
     string  offsets,  it  is used as far as possible (up to two-
     thirds of its length), and the function returns a  value  of
     zero.   If  the  substring  offsets  are  not  of  interest,
     pcre_exec() may be called with ovector passed  as  NULL  and
     ovecsize  as  zero.  However,  if  the pattern contains back
     references and the ovector is not big enough to remember the
     related  substrings,  PCRE  has to get additional memory for
     use during matching. Thus it is usually advisable to  supply
     an ovector.

     The pcre_info() function can be used to find  out  how  many
     capturing  subpatterns  there are in a compiled pattern. The
     smallest size for ovector that will  allow  for  n  captured
     substrings,  in  addition  to  the  offsets of the substring
     matched by the whole pattern, is (n+1)*3.

     It is possible for capturing subpattern number n+1 to  match
     some part of the subject when subpattern n has not been used
     at all. For example, if the string "abc" is matched  against
     the  pattern  (a|(z))(bc) the return from the function is 4,
     and subpatterns 1 and 3 are matched, but 2 is not. When this
     happens,  both  values  in the offset pairs corresponding to
     unused subpatterns are set to -1.

     Offset values that correspond to unused subpatterns  at  the
     end  of  the  expression are also set to -1. For example, if
     the  string   "abc"   is   matched   against   the   pattern
     (abc)(x(yz)?)?  subpatterns  2  and  3  are not matched. The
     return from the function is 2, because the highest used cap-
     turing subpattern number is 1. However, you can refer to the
     offsets for the second and third  capturing  subpatterns  if
     you wish (assuming the vector is large enough, of course).

     Some convenience functions are provided for  extracting  the
     captured substrings as separate strings. These are described
     below.

  Error return values from pcre_exec()

     If pcre_exec() fails, it returns a negative number. The fol-
     lowing are defined in the header file:

       PCRE_ERROR_NOMATCH        (-1)

     The subject string did not match the pattern.

       PCRE_ERROR_NULL           (-2)

     Either code or subject was passed as NULL,  or  ovector  was
     NULL and ovecsize was not zero.




SunOS 5.10                Last change:                         29






Introduction to Library Functions                      PCREAPI(3)



       PCRE_ERROR_BADOPTION      (-3)

     An unrecognized bit was set in the options argument.

       PCRE_ERROR_BADMAGIC       (-4)

     PCRE stores a 4-byte "magic number" at the start of the com-
     piled  code,  to  catch  the  case  when it is passed a junk
     pointer and to detect when a pattern that was compiled in an
     environment  of one endianness is run in an environment with
     the other endianness. This is the error that PCRE gives when
     the magic number is not present.

       PCRE_ERROR_UNKNOWN_OPCODE (-5)

     While running the pattern match, an unknown item was encoun-
     tered in the compiled pattern. This error could be caused by
     a bug in PCRE or by overwriting of the compiled pattern.

       PCRE_ERROR_NOMEMORY       (-6)

     If a pattern contains back references, but the ovector  that
     is  passed  to pcre_exec() is not big enough to remember the
     referenced substrings, PCRE gets a block of  memory  at  the
     start  of  matching to use for this purpose. If the call via
     pcre_malloc() fails, this error  is  given.  The  memory  is
     automatically freed at the end of matching.

       PCRE_ERROR_NOSUBSTRING    (-7)

     This   error   is   used   by   the   pcre_copy_substring(),
     pcre_get_substring(),  and  pcre_get_substring_list()  func-
     tions (see below). It is never returned by pcre_exec().

       PCRE_ERROR_MATCHLIMIT     (-8)

     The backtracking limit,  as  specified  by  the  match_limit
     field  in a pcre_extra structure (or defaulted) was reached.
     See the description above.

       PCRE_ERROR_CALLOUT        (-9)

     This error is never generated by pcre_exec() itself.  It  is
     provided  for  use by callout functions that want to yield a
     distinctive error code. See  the  pcrecallout  documentation
     for details.

       PCRE_ERROR_BADUTF8        (-10)

     A string that contains an invalid UTF-8  byte  sequence  was
     passed as a subject.




SunOS 5.10                Last change:                         30






Introduction to Library Functions                      PCREAPI(3)



       PCRE_ERROR_BADUTF8_OFFSET (-11)

     The UTF-8 byte sequence that was passed  as  a  subject  was
     valid,  but  the  value  of startoffset did not point to the
     beginning of a UTF-8 character.

       PCRE_ERROR_PARTIAL        (-12)

     The subject string did not match,  but  it  did  match  par-
     tially.  See  the  pcrepartial  documentation for details of
     partial matching.

       PCRE_ERROR_BADPARTIAL     (-13)

     The PCRE_PARTIAL option was used  with  a  compiled  pattern
     containing  items  that are not supported for partial match-
     ing. See the pcrepartial documentation for details  of  par-
     tial matching.

       PCRE_ERROR_INTERNAL       (-14)

     An unexpected internal error has occurred. This error  could
     be caused by a bug in PCRE or by overwriting of the compiled
     pattern.

       PCRE_ERROR_BADCOUNT       (-15)

     This error is given if the value of the ovecsize argument is
     negative.

       PCRE_ERROR_RECURSIONLIMIT (-21)

     The  internal  recursion  limit,   as   specified   by   the
     match_limit_recursion  field  in  a pcre_extra structure (or
     defaulted) was reached. See the description above.

       PCRE_ERROR_BADNEWLINE     (-23)

     An  invalid  combination  of  PCRE_NEWLINE_xxx  options  was
     given.

     Error  numbers  -16  to  -20  and  -22  are  not   used   by
     pcre_exec().

EXTRACTING CAPTURED SUBSTRINGS BY NUMBER

     int pcre_copy_substring(const char *subject, int *ovector,
          int stringcount, int stringnumber, char *buffer,
          int buffersize);

     int pcre_get_substring(const char *subject, int *ovector,
          int stringcount, int stringnumber,



SunOS 5.10                Last change:                         31






Introduction to Library Functions                      PCREAPI(3)



          const char **stringptr);

     int pcre_get_substring_list(const char *subject,
          int *ovector, int stringcount, const char ***listptr);

     Captured substrings can be accessed directly  by  using  the
     offsets returned by pcre_exec() in ovector. For convenience,
     the functions  pcre_copy_substring(),  pcre_get_substring(),
     and  pcre_get_substring_list()  are  provided for extracting
     captured  substrings  as  new,   separate,   zero-terminated
     strings.  These functions identify substrings by number. The
     next section describes functions for extracting  named  sub-
     strings.

     A  substring  that  contains  a  binary  zero  is  correctly
     extracted  and  has a further zero added on the end, but the
     result is not, of course, a C string.  However, you can pro-
     cess  such  a  string  by  referring  to  the length that is
     returned by pcre_copy_substring() and  pcre_get_substring().
     Unfortunately, the interface to pcre_get_substring_list() is
     not adequate for handling strings containing  binary  zeros,
     because  the  end  of  the final string is not independently
     indicated.

     The first three arguments are the  same  for  all  three  of
     these  functions:   subject  is  the subject string that has
     just been successfully matched, ovector is a pointer to  the
     vector  of  integer  offsets that was passed to pcre_exec(),
     and stringcount is the number of substrings that  were  cap-
     tured by the match, including the substring that matched the
     entire regular expression. This is  the  value  returned  by
     pcre_exec()  if  it  is  greater  than  zero. If pcre_exec()
     returned zero, indicating that it ran out of space in  ovec-
     tor, the value passed as stringcount should be the number of
     elements in the vector divided by three.

     The functions pcre_copy_substring() and pcre_get_substring()
     extract a single substring, whose number is given as string-
     number. A value of zero extracts the substring that  matched
     the  entire  pattern, whereas higher values extract the cap-
     tured substrings. For pcre_copy_substring(), the  string  is
     placed in buffer, whose length is given by buffersize, while
     for pcre_get_substring() a new block of memory  is  obtained
     via  pcre_malloc, and its address is returned via stringptr.
     The yield of the function is the length of the  string,  not
     including the terminating zero, or one of these error codes:

       PCRE_ERROR_NOMEMORY       (-6)

     The buffer was too small for pcre_copy_substring(),  or  the
     attempt to get memory failed for pcre_get_substring().




SunOS 5.10                Last change:                         32






Introduction to Library Functions                      PCREAPI(3)



       PCRE_ERROR_NOSUBSTRING    (-7)

     There is no substring whose number is stringnumber.

     The pcre_get_substring_list() function extracts  all  avail-
     able  substrings  and builds a list of pointers to them. All
     this is done in a single block of memory  that  is  obtained
     via pcre_malloc. The address of the memory block is returned
     via listptr, which is also the start of the list  of  string
     pointers.  The  end of the list is marked by a NULL pointer.
     The yield of the function is zero if all went well,  or  the
     error code

       PCRE_ERROR_NOMEMORY       (-6)

     if the attempt to get the memory block failed.

     When any of these functions encounter a  substring  that  is
     unset, which can happen when capturing subpattern number n+1
     matches some part of the subject, but subpattern n  has  not
     been  used  at all, they return an empty string. This can be
     distinguished  from  a  genuine  zero-length  substring   by
     inspecting the appropriate offset in ovector, which is nega-
     tive for unset substrings.

     The  two  convenience  functions  pcre_free_substring()  and
     pcre_free_substring_list()  can  be  used to free the memory
     returned by  a  previous  call  of  pcre_get_substring()  or
     pcre_get_substring_list(),  respectively.  They  do  nothing
     more than call the function pointed to by  pcre_free,  which
     of  course  could  be called directly from a C program. How-
     ever, PCRE is used in some situations where it is linked via
     a  special  interface  to  another programming language that
     cannot use pcre_free directly; it is for  these  cases  that
     the functions are provided.

EXTRACTING CAPTURED SUBSTRINGS BY NAME

     int pcre_get_stringnumber(const pcre *code,
          const char *name);

     int pcre_copy_named_substring(const pcre *code,
          const char *subject, int *ovector,
          int stringcount, const char *stringname,
          char *buffer, int buffersize);

     int pcre_get_named_substring(const pcre *code,
          const char *subject, int *ovector,
          int stringcount, const char *stringname,
          const char **stringptr);





SunOS 5.10                Last change:                         33






Introduction to Library Functions                      PCREAPI(3)



     To extract a substring by name, you first have to find asso-
     ciated number.  For example, for this pattern

       (a+)b(?\d+)...

     the number of the subpattern called "xxx" is 2. If the  name
     is  known  to be unique (PCRE_DUPNAMES was not set), you can
     find   the    number    from    the    name    by    calling
     pcre_get_stringnumber().  The first argument is the compiled
     pattern, and the second is the name. The yield of the  func-
     tion  is the subpattern number, or PCRE_ERROR_NOSUBSTRING (-
     7) if there is no subpattern of that name.

     Given the number, you can extract the substring directly, or
     use  one of the functions described in the previous section.
     For convenience, there are also two functions  that  do  the
     whole job.

     Most of the  arguments  of  pcre_copy_named_substring()  and
     pcre_get_named_substring()  are  the  same  as those for the
     similarly named functions that extract by number.  As  these
     are  described  in  the  previous  section, they are not re-
     described here. There are just two differences:

     First, instead of a substring number, a  substring  name  is
     given.  Second,  there  is  an  extra argument, given at the
     start, which is a pointer to the compiled pattern.  This  is
     needed  in order to gain access to the name-to-number trans-
     lation table.

     These functions  call  pcre_get_stringnumber(),  and  if  it
     succeeds,    they   then   call   pcre_copy_substring()   or
     pcre_get_substring(), as appropriate. NOTE: If PCRE_DUPNAMES
     is  set and there are duplicate names, the behaviour may not
     be what you want (see the next section).

DUPLICATE SUBPATTERN NAMES

     int pcre_get_stringtable_entries(const pcre *code,
          const char *name, char **first, char

     When a pattern is compiled with  the  PCRE_DUPNAMES  option,
     names  for  subpatterns  are not required to be unique. Nor-
     mally, patterns with duplicate names are such  that  in  any
     one  match,  only one of the named subpatterns participates.
     An example is shown in the pcrepattern documentation.

     When duplicates are present, pcre_copy_named_substring() and
     pcre_get_named_substring()   return   the   first  substring
     corresponding to the given name that is  set.  If  none  are
     set,  PCRE_ERROR_NOSUBSTRING  (-7)  is  returned; no data is
     returned. The pcre_get_stringnumber() function  returns  one



SunOS 5.10                Last change:                         34






Introduction to Library Functions                      PCREAPI(3)



     of  the numbers that are associated with the name, but it is
     not defined which it is.

     If you want to get full details of all  captured  substrings
     for     a     given     name,     you     must    use    the
     pcre_get_stringtable_entries() function. The first  argument
     is  the  compiled  pattern,  and the second is the name. The
     third and fourth are pointers to variables which are updated
     by  the  function. After it has run, they point to the first
     and last entries in the name-to-number table for  the  given
     name.  The function itself returns the length of each entry,
     or PCRE_ERROR_NOSUBSTRING (-7) if there are none. The format
     of  the  table  is  described  above in the section entitled
     Information about a pattern.  Given all the relevant entries
     for  the  name,  you  can extract each of their numbers, and
     hence the captured data, if any.

FINDING ALL POSSIBLE MATCHES

     The traditional matching function uses a  similar  algorithm
     to Perl, which stops when it finds the first match, starting
     at a given point in the subject. If you  want  to  find  all
     possible  matches,  or  the longest possible match, consider
     using the alternative matching function (see below) instead.
     If  you  cannot use the alternative function, but still need
     to find all possible matches, you can kludge it up by making
     use of the callout facility, which is described in the pcre-
     callout documentation.

     What you have to do is to insert a callout right at the  end
     of  the  pattern.   When  your  callout  function is called,
     extract and save the current matched substring. Then  return
     1,  which  forces  pcre_exec()  to  backtrack  and try other
     alternatives. Ultimately,  when  it  runs  out  of  matches,
     pcre_exec() will yield PCRE_ERROR_NOMATCH.

MATCHING A PATTERN: THE ALTERNATIVE FUNCTION

     int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
          const char *subject, int length, int startoffset,
          int options, int *ovector, int ovecsize,
          int *workspace, int wscount);

     The function pcre_dfa_exec() is called to  match  a  subject
     string  against  a  compiled pattern, using a matching algo-
     rithm that scans the subject string just once, and does  not
     backtrack.  This has different characteristics to the normal
     algorithm, and is not compatible  with  Perl.  Some  of  the
     features  of  PCRE patterns are not supported. Nevertheless,
     there are times when this kind of matching  can  be  useful.
     For  a  discussion  of  the two matching algorithms, see the
     pcrematching documentation.



SunOS 5.10                Last change:                         35






Introduction to Library Functions                      PCREAPI(3)



     The arguments for the pcre_dfa_exec() function are the  same
     as for pcre_exec(), plus two extras. The ovector argument is
     used in a different way, and this is  described  below.  The
     other  common  arguments  are  used  in  the same way as for
     pcre_exec(), so their description is not repeated here.

     The two additional arguments provide workspace for the func-
     tion.  The  workspace vector should contain at least 20 ele-
     ments. It is  used  for  keeping  track  of  multiple  paths
     through  the pattern tree. More workspace will be needed for
     patterns and subjects where there are  a  lot  of  potential
     matches.

     Here is an example of a simple call to pcre_dfa_exec():

       int rc;
       int ovector[10];
       int wspace[20];
       rc = pcre_dfa_exec(
         re,             /* result of pcre_compile() */
         NULL,           /* we didn't study the pattern */
         "some string",  /* the subject string */
         11,             /* the length of the subject string */
         0,              /* start at offset 0 in the subject */
         0,              /* default options */
         ovector,        /*  vector  of  integers  for  substring
     information */
         10,             /*  number  of  elements  (NOT  size  in
     bytes) */
         wspace,         /* working space vector */
         20);            /*  number  of  elements  (NOT  size  in
     bytes) */

  Option bits for pcre_dfa_exec()

     The unused bits of the options argument for  pcre_dfa_exec()
     must   be   zero.   The  only  bits  that  may  be  set  are
     PCRE_ANCHORED, PCRE_NEWLINE_xxx,  PCRE_NOTBOL,  PCRE_NOTEOL,
     PCRE_NOTEMPTY,       PCRE_NO_UTF8_CHECK,       PCRE_PARTIAL,
     PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All  but  the  last
     three  of  these  are  the same as for pcre_exec(), so their
     description is not repeated here.

       PCRE_PARTIAL

     This has the same general effect as it does for pcre_exec(),
     but the details are slightly different. When PCRE_PARTIAL is
     set for pcre_dfa_exec(), the return code  PCRE_ERROR_NOMATCH
     is  converted into PCRE_ERROR_PARTIAL if the end of the sub-
     ject is reached, there have been no  complete  matches,  but
     there  is  still at least one matching possibility. The por-
     tion of the string that provided the partial match is set as



SunOS 5.10                Last change:                         36






Introduction to Library Functions                      PCREAPI(3)



     the first matching string.

       PCRE_DFA_SHORTEST

     Setting the PCRE_DFA_SHORTEST  option  causes  the  matching
     algorithm to stop as soon as it has found one match. Because
     of the way the alternative algorithm works, this  is  neces-
     sarily  the  shortest  possible  match at the first possible
     matching point in the subject string.

       PCRE_DFA_RESTART

     When pcre_dfa_exec() is called with the PCRE_PARTIAL option,
     and  returns  a  partial  match,  it  is possible to call it
     again, with additional subject characters, and have it  con-
     tinue  with  the  same  match.  The  PCRE_DFA_RESTART option
     requests this action; when it  is  set,  the  workspace  and
     wscount  options  must  reference  the same vector as before
     because data about the match so far is left in them after  a
     partial  match. There is more discussion of this facility in
     the pcrepartial documentation.

  Successful returns from pcre_dfa_exec()

     When pcre_dfa_exec() succeeds, it may have matched more than
     one  substring  in  the subject. Note, however, that all the
     matches from one run of the function start at the same point
     in  the  subject.  The  shorter matches are all initial sub-
     strings of the longer matches. For example, if the pattern

       <.*>

     is matched against the string

       This is    
     no more

     the three matched strings are

       
        
         

     On success, the yield of the function is  a  number  greater
     than  zero,  which  is the number of matched substrings. The
     substrings themselves are returned in ovector.  Each  string
     uses two elements; the first is the offset to the start, and
     the second is the offset  to  the  end.  In  fact,  all  the
     strings  have  the same start offset. (Space could have been
     saved by giving this only once, but it was decided to retain
     some  compatibility  with  the way pcre_exec() returns data,
     even though the meaning of the strings is different.)



SunOS 5.10                Last change:                         37






Introduction to Library Functions                      PCREAPI(3)



     The strings are returned in reverse order  of  length;  that
     is,  the  longest  matching  string is given first. If there
     were too many matches to fit into ovector, the yield of  the
     function  is zero, and the vector is filled with the longest
     matches.

  Error returns from pcre_dfa_exec()

     The pcre_dfa_exec() function returns a negative number  when
     it   fails.   Many  of  the  errors  are  the  same  as  for
     pcre_exec(), and these are described above.   There  are  in
     addition   the   following   errors  that  are  specific  to
     pcre_dfa_exec():

       PCRE_ERROR_DFA_UITEM      (-16)

     This return is given if pcre_dfa_exec() encounters  an  item
     in  the  pattern that it does not support, for instance, the
     use of \C or a back reference.

       PCRE_ERROR_DFA_UCOND      (-17)

     This return is given if pcre_dfa_exec() encounters a  condi-
     tion item that uses a back reference for the condition, or a
     test for recursion in a specific group. These are  not  sup-
     ported.

       PCRE_ERROR_DFA_UMLIMIT    (-18)

     This return is given if pcre_dfa_exec() is  called  with  an
     extra  block  that  contains  a  setting  of the match_limit
     field. This is not supported (it is meaningless).

       PCRE_ERROR_DFA_WSSIZE     (-19)

     This return is given if pcre_dfa_exec() runs out of space in
     the workspace vector.

       PCRE_ERROR_DFA_RECURSE    (-20)

     When a recursive subpattern is processed, the matching func-
     tion  calls  itself  recursively,  using private vectors for
     ovector and workspace. This error is  given  if  the  output
     vector  is  not large enough. This should be extremely rare,
     as a vector of size 1000 is used.

SEE ALSO

     pcrebuild(3),  pcrecallout(3),   pcrecpp(3)(3),   pcrematch-
     ing(3),   pcrepartial(3),  pcreposix(3),  pcreprecompile(3),
     pcresample(3), pcrestack(3).




SunOS 5.10                Last change:                         38






Introduction to Library Functions                      PCREAPI(3)



AUTHOR

     Philip Hazel
     University Computing Service
     Cambridge CB2 3QH, England.

REVISION

     Last updated: 24 August 2008
     Copyright (c) 1997-2008 University of Cambridge.

ATTRIBUTES
     See attributes(5) for descriptions of the  following  attri-
     butes:

     _______________________________________
    |   ATTRIBUTE TYPE   |  ATTRIBUTE VALUE|
    |____________________|__________________|_
    | Availability       |  library/pcre   |
    |____________________|__________________|_
    | Interface Stability|  Uncommitted    |
    |____________________|_________________|

NOTES
     Source for PCRE is available on http://opensolaris.org.






























SunOS 5.10                Last change:                         39