removed more unneeded files, see patch 890642
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@26134 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
This commit is contained in:
@@ -1,28 +0,0 @@
|
|||||||
#-------------------------------------------------------------------------
|
|
||||||
#
|
|
||||||
# Makefile--
|
|
||||||
# Makefile for backend/regex
|
|
||||||
#
|
|
||||||
# IDENTIFICATION
|
|
||||||
# $Header: /projects/cvsroot/pgsql-server/src/backend/regex/Makefile,v 1.20 2003/02/05 17:41:32 tgl Exp $
|
|
||||||
#
|
|
||||||
#-------------------------------------------------------------------------
|
|
||||||
|
|
||||||
subdir = src/backend/regex
|
|
||||||
top_builddir = ../../..
|
|
||||||
include $(top_builddir)/src/Makefile.global
|
|
||||||
|
|
||||||
OBJS = regcomp.o regerror.o regexec.o regfree.o
|
|
||||||
|
|
||||||
all: SUBSYS.o
|
|
||||||
|
|
||||||
SUBSYS.o: $(OBJS)
|
|
||||||
$(LD) $(LDREL) $(LDOUT) SUBSYS.o $(OBJS)
|
|
||||||
|
|
||||||
# mark inclusion dependencies between .c files explicitly
|
|
||||||
regcomp.o: regcomp.c regc_lex.c regc_color.c regc_nfa.c regc_cvec.c regc_locale.c
|
|
||||||
|
|
||||||
regexec.o: regexec.c rege_dfa.c
|
|
||||||
|
|
||||||
clean:
|
|
||||||
rm -f SUBSYS.o $(OBJS)
|
|
@@ -1,108 +0,0 @@
|
|||||||
New in alpha3.8: Bug fix for signed/unsigned mixup, found and fixed
|
|
||||||
by the FreeBSD folks.
|
|
||||||
|
|
||||||
New in alpha3.7: A bit of cleanup aimed at maximizing portability,
|
|
||||||
possibly at slight cost in efficiency. "ul" suffixes and "unsigned long"
|
|
||||||
no longer appear, in particular.
|
|
||||||
|
|
||||||
New in alpha3.6: A couple more portability glitches fixed.
|
|
||||||
|
|
||||||
New in alpha3.5: Active development of this code has been stopped --
|
|
||||||
I'm working on a complete reimplementation -- but folks have found some
|
|
||||||
minor portability glitches and the like, hence this release to fix them.
|
|
||||||
One penalty: slightly reduced compatibility with old compilers, because
|
|
||||||
the ANSI C `unsigned long' type and `ul' constant suffix are used in a
|
|
||||||
few places (I could avoid this but it would be considerably more work).
|
|
||||||
|
|
||||||
New in alpha3.4: The complex bug alluded to below has been fixed (in a
|
|
||||||
slightly kludgey temporary way that may hurt efficiency a bit; this is
|
|
||||||
another "get it out the door for 4.4" release). The tests at the end of
|
|
||||||
the tests file have accordingly been uncommented. The primary sign of
|
|
||||||
the bug was that something like a?b matching ab matched b rather than ab.
|
|
||||||
(The bug was essentially specific to this exact situation, else it would
|
|
||||||
have shown up earlier.)
|
|
||||||
|
|
||||||
New in alpha3.3: The definition of word boundaries has been altered
|
|
||||||
slightly, to more closely match the usual programming notion that "_"
|
|
||||||
is an alphabetic. Stuff used for pre-ANSI systems is now in a subdir,
|
|
||||||
and the makefile no longer alludes to it in mysterious ways. The
|
|
||||||
makefile has generally been cleaned up some. Fixes have been made
|
|
||||||
(again!) so that the regression test will run without -DREDEBUG, at
|
|
||||||
the cost of weaker checking. A workaround for a bug in some folks'
|
|
||||||
<assert.h> has been added. And some more things have been added to
|
|
||||||
tests, including a couple right at the end which are commented out
|
|
||||||
because the code currently flunks them (complex bug; fix coming).
|
|
||||||
Plus the usual minor cleanup.
|
|
||||||
|
|
||||||
New in alpha3.2: Assorted bits of cleanup and portability improvement
|
|
||||||
(the development base is now a BSDI system using GCC instead of an ancient
|
|
||||||
Sun system, and the newer compiler exposed some glitches). Fix for a
|
|
||||||
serious bug that affected REs using many [] (including REG_ICASE REs
|
|
||||||
because of the way they are implemented), *sometimes*, depending on
|
|
||||||
memory-allocation patterns. The header-file prototypes no longer name
|
|
||||||
the parameters, avoiding possible name conflicts. The possibility that
|
|
||||||
some clot has defined CHAR_MIN as (say) `-128' instead of `(-128)' is
|
|
||||||
now handled gracefully. "uchar" is no longer used as an internal type
|
|
||||||
name (too many people have the same idea). Still the same old lousy
|
|
||||||
performance, alas.
|
|
||||||
|
|
||||||
New in alpha3.1: Basically nothing, this release is just a bookkeeping
|
|
||||||
convenience. Stay tuned.
|
|
||||||
|
|
||||||
New in alpha3.0: Performance is no better, alas, but some fixes have been
|
|
||||||
made and some functionality has been added. (This is basically the "get
|
|
||||||
it out the door in time for 4.4" release.) One bug fix: regfree() didn't
|
|
||||||
free the main internal structure (how embarrassing). It is now possible
|
|
||||||
to put NULs in either the RE or the target string, using (resp.) a new
|
|
||||||
REG_PEND flag and the old REG_STARTEND flag. The REG_NOSPEC flag to
|
|
||||||
regcomp() makes all characters ordinary, so you can match a literal
|
|
||||||
string easily (this will become more useful when performance improves!).
|
|
||||||
There are now primitives to match beginnings and ends of words, although
|
|
||||||
the syntax is disgusting and so is the implementation. The REG_ATOI
|
|
||||||
debugging interface has changed a bit. And there has been considerable
|
|
||||||
internal cleanup of various kinds.
|
|
||||||
|
|
||||||
New in alpha2.3: Split change list out of README, and moved flags notes
|
|
||||||
into Makefile. Macro-ized the name of regex(7) in regex(3), since it has
|
|
||||||
to change for 4.4BSD. Cleanup work in engine.c, and some new regression
|
|
||||||
tests to catch tricky cases thereof.
|
|
||||||
|
|
||||||
New in alpha2.2: Out-of-date manpages updated. Regerror() acquires two
|
|
||||||
small extensions -- REG_ITOA and REG_ATOI -- which avoid debugging kludges
|
|
||||||
in my own test program and might be useful to others for similar purposes.
|
|
||||||
The regression test will now compile (and run) without REDEBUG. The
|
|
||||||
BRE \$ bug is fixed. Most uses of "uchar" are gone; it's all chars now.
|
|
||||||
Char/uchar parameters are now written int/unsigned, to avoid possible
|
|
||||||
portability problems with unpromoted parameters. Some unsigned casts have
|
|
||||||
been introduced to minimize portability problems with shifting into sign
|
|
||||||
bits.
|
|
||||||
|
|
||||||
New in alpha2.1: Lots of little stuff, cleanup and fixes. The one big
|
|
||||||
thing is that regex.h is now generated, using mkh, rather than being
|
|
||||||
supplied in the distribution; due to circularities in dependencies,
|
|
||||||
you have to build regex.h explicitly by "make h". The two known bugs
|
|
||||||
have been fixed (and the regression test now checks for them), as has a
|
|
||||||
problem with assertions not being suppressed in the absence of REDEBUG.
|
|
||||||
No performance work yet.
|
|
||||||
|
|
||||||
New in alpha2: Backslash-anything is an ordinary character, not an
|
|
||||||
error (except, of course, for the handful of backslashed metacharacters
|
|
||||||
in BREs), which should reduce script breakage. The regression test
|
|
||||||
checks *where* null strings are supposed to match, and has generally
|
|
||||||
been tightened up somewhat. Small bug fixes in parameter passing (not
|
|
||||||
harmful, but technically errors) and some other areas. Debugging
|
|
||||||
invoked by defining REDEBUG rather than not defining NDEBUG.
|
|
||||||
|
|
||||||
New in alpha+3: full prototyping for internal routines, using a little
|
|
||||||
helper program, mkh, which extracts prototypes given in stylized comments.
|
|
||||||
More minor cleanup. Buglet fix: it's CHAR_BIT, not CHAR_BITS. Simple
|
|
||||||
pre-screening of input when a literal string is known to be part of the
|
|
||||||
RE; this does wonders for performance.
|
|
||||||
|
|
||||||
New in alpha+2: minor bits of cleanup. Notably, the number "32" for the
|
|
||||||
word width isn't hardwired into regexec.c any more, the public header
|
|
||||||
file prototypes the functions if __STDC__ is defined, and some small typos
|
|
||||||
in the manpages have been fixed.
|
|
||||||
|
|
||||||
New in alpha+1: improvements to the manual pages, and an important
|
|
||||||
extension, the REG_STARTEND option to regexec().
|
|
@@ -1,35 +0,0 @@
|
|||||||
/* ========= begin header generated by ./mkh ========= */
|
|
||||||
#ifdef __cplusplus
|
|
||||||
extern "C" {
|
|
||||||
#endif
|
|
||||||
|
|
||||||
/* === engine.c === */
|
|
||||||
static int matcher(register struct re_guts *g, char *string, size_t nmatch, regmatch_t pmatch[], int eflags);
|
|
||||||
static char *dissect(register struct match *m, char *start, char *stop, sopno startst, sopno stopst);
|
|
||||||
static char *backref(register struct match *m, char *start, char *stop, sopno startst, sopno stopst, sopno lev);
|
|
||||||
static char *fast(register struct match *m, char *start, char *stop, sopno startst, sopno stopst);
|
|
||||||
static char *slow(register struct match *m, char *start, char *stop, sopno startst, sopno stopst);
|
|
||||||
static states step(register struct re_guts *g, sopno start, sopno stop, register states bef, int ch, register states aft);
|
|
||||||
#define BOL (OUT+1)
|
|
||||||
#define EOL (BOL+1)
|
|
||||||
#define BOLEOL (BOL+2)
|
|
||||||
#define NOTHING (BOL+3)
|
|
||||||
#define BOW (BOL+4)
|
|
||||||
#define EOW (BOL+5)
|
|
||||||
#define CODEMAX (BOL+5) /* highest code used */
|
|
||||||
#define NONCHAR(c) ((c) > CHAR_MAX)
|
|
||||||
#define NNONCHAR (CODEMAX-CHAR_MAX)
|
|
||||||
#ifdef REDEBUG
|
|
||||||
static void print(struct match *m, char *caption, states st, int ch, FILE *d);
|
|
||||||
#endif
|
|
||||||
#ifdef REDEBUG
|
|
||||||
static void at(struct match *m, char *title, char *start, char *stop, sopno startst, sopno stopst);
|
|
||||||
#endif
|
|
||||||
#ifdef REDEBUG
|
|
||||||
static char *pchar(int ch);
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#ifdef __cplusplus
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
/* ========= end header generated by ./mkh ========= */
|
|
@@ -1,76 +0,0 @@
|
|||||||
#! /bin/sh
|
|
||||||
# mkh - pull headers out of C source
|
|
||||||
PATH=/bin:/usr/bin ; export PATH
|
|
||||||
|
|
||||||
# egrep pattern to pick out marked lines
|
|
||||||
egrep='^ =([ ]|$)'
|
|
||||||
|
|
||||||
# Sed program to process marked lines into lines for the header file.
|
|
||||||
# The markers have already been removed. Two things are done here: removal
|
|
||||||
# of backslashed newlines, and some fudging of comments. The first is done
|
|
||||||
# because -o needs to have prototypes on one line to strip them down.
|
|
||||||
# Getting comments into the output is tricky; we turn C++-style // comments
|
|
||||||
# into /* */ comments, after altering any existing */'s to avoid trouble.
|
|
||||||
peel=' /\\$/N
|
|
||||||
/\\\n[ ]*/s///g
|
|
||||||
/\/\//s;\*/;* /;g
|
|
||||||
/\/\//s;//\(.*\);/*\1 */;'
|
|
||||||
|
|
||||||
for a
|
|
||||||
do
|
|
||||||
case "$a" in
|
|
||||||
-o) # old (pre-function-prototype) compiler
|
|
||||||
# add code to comment out argument lists
|
|
||||||
peel="$peel
|
|
||||||
"'/^\([^#\/][^\/]*[a-zA-Z0-9_)]\)(\(.*\))/s;;\1(/*\2*/);'
|
|
||||||
shift
|
|
||||||
;;
|
|
||||||
-b) # funny Berkeley __P macro
|
|
||||||
peel="$peel
|
|
||||||
"'/^\([^#\/][^\/]*[a-zA-Z0-9_)]\)(\(.*\))/s;;\1 __P((\2));'
|
|
||||||
shift
|
|
||||||
;;
|
|
||||||
-s) # compiler doesn't like `static foo();'
|
|
||||||
# add code to get rid of the `static'
|
|
||||||
peel="$peel
|
|
||||||
"'/^static[ ][^\/]*[a-zA-Z0-9_)](.*)/s;static.;;'
|
|
||||||
shift
|
|
||||||
;;
|
|
||||||
-p) # private declarations
|
|
||||||
egrep='^ ==([ ]|$)'
|
|
||||||
shift
|
|
||||||
;;
|
|
||||||
-i) # wrap in #ifndef, argument is name
|
|
||||||
ifndef="$2"
|
|
||||||
shift ; shift
|
|
||||||
;;
|
|
||||||
*) break
|
|
||||||
;;
|
|
||||||
esac
|
|
||||||
done
|
|
||||||
|
|
||||||
if test " $ifndef" != " "
|
|
||||||
then
|
|
||||||
echo "#ifndef $ifndef"
|
|
||||||
echo "#define $ifndef /* never again */"
|
|
||||||
fi
|
|
||||||
echo "/* ========= begin header generated by $0 ========= */"
|
|
||||||
echo '#ifdef __cplusplus'
|
|
||||||
echo 'extern "C" {'
|
|
||||||
echo '#endif'
|
|
||||||
for f
|
|
||||||
do
|
|
||||||
echo
|
|
||||||
echo "/* === $f === */"
|
|
||||||
egrep "$egrep" $f | sed 's/^ ==*[ ]//;s/^ ==*$//' | sed "$peel"
|
|
||||||
echo
|
|
||||||
done
|
|
||||||
echo '#ifdef __cplusplus'
|
|
||||||
echo '}'
|
|
||||||
echo '#endif'
|
|
||||||
echo "/* ========= end header generated by $0 ========= */"
|
|
||||||
if test " $ifndef" != " "
|
|
||||||
then
|
|
||||||
echo "#endif"
|
|
||||||
fi
|
|
||||||
exit 0
|
|
@@ -1,53 +0,0 @@
|
|||||||
/* ========= begin header generated by ./mkh ========= */
|
|
||||||
#ifdef __cplusplus
|
|
||||||
extern "C" {
|
|
||||||
#endif
|
|
||||||
|
|
||||||
/* === regcomp.c === */
|
|
||||||
static void p_ere(register struct parse *p, int stop);
|
|
||||||
static void p_ere_exp(register struct parse *p);
|
|
||||||
static void p_str(register struct parse *p);
|
|
||||||
static void p_bre(register struct parse *p, register int end1, register int end2);
|
|
||||||
static int p_simp_re(register struct parse *p, int starordinary);
|
|
||||||
static int p_count(register struct parse *p);
|
|
||||||
static void p_bracket(register struct parse *p);
|
|
||||||
static void p_b_term(register struct parse *p, register cset *cs);
|
|
||||||
static void p_b_cclass(register struct parse *p, register cset *cs);
|
|
||||||
static void p_b_eclass(register struct parse *p, register cset *cs);
|
|
||||||
static char p_b_symbol(register struct parse *p);
|
|
||||||
static char p_b_coll_elem(register struct parse *p, int endc);
|
|
||||||
static char othercase(int ch);
|
|
||||||
static void bothcases(register struct parse *p, int ch);
|
|
||||||
static void ordinary(register struct parse *p, register int ch);
|
|
||||||
static void nonnewline(register struct parse *p);
|
|
||||||
static void repeat(register struct parse *p, sopno start, int from, int to);
|
|
||||||
static int seterr(register struct parse *p, int e);
|
|
||||||
static cset *allocset(register struct parse *p);
|
|
||||||
static void freeset(register struct parse *p, register cset *cs);
|
|
||||||
static int freezeset(register struct parse *p, register cset *cs);
|
|
||||||
static int firstch(register struct parse *p, register cset *cs);
|
|
||||||
static int nch(register struct parse *p, register cset *cs);
|
|
||||||
static void mcadd(register struct parse *p, register cset *cs, register char *cp);
|
|
||||||
#if 0
|
|
||||||
static void mcsub(register cset *cs, register char *cp);
|
|
||||||
static int mcin(register cset *cs, register char *cp);
|
|
||||||
static char *mcfind(register cset *cs, register char *cp);
|
|
||||||
#endif
|
|
||||||
static void mcinvert(register struct parse *p, register cset *cs);
|
|
||||||
static void mccase(register struct parse *p, register cset *cs);
|
|
||||||
static int isinsets(register struct re_guts *g, int c);
|
|
||||||
static int samesets(register struct re_guts *g, int c1, int c2);
|
|
||||||
static void categorize(struct parse *p, register struct re_guts *g);
|
|
||||||
static sopno dupl(register struct parse *p, sopno start, sopno finish);
|
|
||||||
static void doemit(register struct parse *p, sop op, size_t opnd);
|
|
||||||
static void doinsert(register struct parse *p, sop op, size_t opnd, sopno pos);
|
|
||||||
static void dofwd(register struct parse *p, sopno pos, sop value);
|
|
||||||
static void enlarge(register struct parse *p, sopno size);
|
|
||||||
static void stripsnug(register struct parse *p, register struct re_guts *g);
|
|
||||||
static void findmust(register struct parse *p, register struct re_guts *g);
|
|
||||||
static sopno pluscount(register struct parse *p, register struct re_guts *g);
|
|
||||||
|
|
||||||
#ifdef __cplusplus
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
/* ========= end header generated by ./mkh ========= */
|
|
@@ -1,12 +0,0 @@
|
|||||||
/* ========= begin header generated by ./mkh ========= */
|
|
||||||
#ifdef __cplusplus
|
|
||||||
extern "C" {
|
|
||||||
#endif
|
|
||||||
|
|
||||||
/* === regerror.c === */
|
|
||||||
static char *regatoi(const regex_t *preg, char *localbuf);
|
|
||||||
|
|
||||||
#ifdef __cplusplus
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
/* ========= end header generated by ./mkh ========= */
|
|
@@ -1,509 +0,0 @@
|
|||||||
.TH REGEX 3 "25 Sept 1997"
|
|
||||||
.BY "Henry Spencer"
|
|
||||||
.de ZR
|
|
||||||
.\" one other place knows this name: the SEE ALSO section
|
|
||||||
.IR regex (7) \\$1
|
|
||||||
..
|
|
||||||
.SH NAME
|
|
||||||
regcomp, regexec, regerror, regfree \- regular-expression library
|
|
||||||
.SH SYNOPSIS
|
|
||||||
.ft B
|
|
||||||
.\".na
|
|
||||||
#include <sys/types.h>
|
|
||||||
.br
|
|
||||||
#include <regex.h>
|
|
||||||
.HP 10
|
|
||||||
int regcomp(regex_t\ *preg, const\ char\ *pattern, int\ cflags);
|
|
||||||
.HP
|
|
||||||
int\ regexec(const\ regex_t\ *preg, const\ char\ *string,
|
|
||||||
size_t\ nmatch, regmatch_t\ pmatch[], int\ eflags);
|
|
||||||
.HP
|
|
||||||
size_t\ regerror(int\ errcode, const\ regex_t\ *preg,
|
|
||||||
char\ *errbuf, size_t\ errbuf_size);
|
|
||||||
.HP
|
|
||||||
void\ regfree(regex_t\ *preg);
|
|
||||||
.\".ad
|
|
||||||
.ft
|
|
||||||
.SH DESCRIPTION
|
|
||||||
These routines implement POSIX 1003.2 regular expressions (``RE''s);
|
|
||||||
see
|
|
||||||
.ZR .
|
|
||||||
.I Regcomp
|
|
||||||
compiles an RE written as a string into an internal form,
|
|
||||||
.I regexec
|
|
||||||
matches that internal form against a string and reports results,
|
|
||||||
.I regerror
|
|
||||||
transforms error codes from either into human-readable messages,
|
|
||||||
and
|
|
||||||
.I regfree
|
|
||||||
frees any dynamically-allocated storage used by the internal form
|
|
||||||
of an RE.
|
|
||||||
.PP
|
|
||||||
The header
|
|
||||||
.I <regex.h>
|
|
||||||
declares two structure types,
|
|
||||||
.I regex_t
|
|
||||||
and
|
|
||||||
.IR regmatch_t ,
|
|
||||||
the former for compiled internal forms and the latter for match reporting.
|
|
||||||
It also declares the four functions,
|
|
||||||
a type
|
|
||||||
.IR regoff_t ,
|
|
||||||
and a number of constants with names starting with ``REG_''.
|
|
||||||
.PP
|
|
||||||
.I Regcomp
|
|
||||||
compiles the regular expression contained in the
|
|
||||||
.I pattern
|
|
||||||
string,
|
|
||||||
subject to the flags in
|
|
||||||
.IR cflags ,
|
|
||||||
and places the results in the
|
|
||||||
.I regex_t
|
|
||||||
structure pointed to by
|
|
||||||
.IR preg .
|
|
||||||
.I Cflags
|
|
||||||
is the bitwise OR of zero or more of the following flags:
|
|
||||||
.IP REG_EXTENDED \w'REG_EXTENDED'u+2n
|
|
||||||
Compile modern (``extended'') REs,
|
|
||||||
rather than the obsolete (``basic'') REs that
|
|
||||||
are the default.
|
|
||||||
.IP REG_BASIC
|
|
||||||
This is a synonym for 0,
|
|
||||||
provided as a counterpart to REG_EXTENDED to improve readability.
|
|
||||||
This is an extension,
|
|
||||||
compatible with but not specified by POSIX 1003.2,
|
|
||||||
and should be used with
|
|
||||||
caution in software intended to be portable to other systems.
|
|
||||||
.IP REG_NOSPEC
|
|
||||||
Compile with recognition of all special characters turned off.
|
|
||||||
All characters are thus considered ordinary,
|
|
||||||
so the ``RE'' is a literal string.
|
|
||||||
This is an extension,
|
|
||||||
compatible with but not specified by POSIX 1003.2,
|
|
||||||
and should be used with
|
|
||||||
caution in software intended to be portable to other systems.
|
|
||||||
REG_EXTENDED and REG_NOSPEC may not be used
|
|
||||||
in the same call to
|
|
||||||
.IR regcomp .
|
|
||||||
.IP REG_ICASE
|
|
||||||
Compile for matching that ignores upper/lower case distinctions.
|
|
||||||
See
|
|
||||||
.ZR .
|
|
||||||
.IP REG_NOSUB
|
|
||||||
Compile for matching that need only report success or failure,
|
|
||||||
not what was matched.
|
|
||||||
.IP REG_NEWLINE
|
|
||||||
Compile for newline-sensitive matching.
|
|
||||||
By default, newline is a completely ordinary character with no special
|
|
||||||
meaning in either REs or strings.
|
|
||||||
With this flag,
|
|
||||||
`[^' bracket expressions and `.' never match newline,
|
|
||||||
a `^' anchor matches the null string after any newline in the string
|
|
||||||
in addition to its normal function,
|
|
||||||
and the `$' anchor matches the null string before any newline in the
|
|
||||||
string in addition to its normal function.
|
|
||||||
.IP REG_PEND
|
|
||||||
The regular expression ends,
|
|
||||||
not at the first NUL,
|
|
||||||
but just before the character pointed to by the
|
|
||||||
.I re_endp
|
|
||||||
member of the structure pointed to by
|
|
||||||
.IR preg .
|
|
||||||
The
|
|
||||||
.I re_endp
|
|
||||||
member is of type
|
|
||||||
.IR const\ char\ * .
|
|
||||||
This flag permits inclusion of NULs in the RE;
|
|
||||||
they are considered ordinary characters.
|
|
||||||
This is an extension,
|
|
||||||
compatible with but not specified by POSIX 1003.2,
|
|
||||||
and should be used with
|
|
||||||
caution in software intended to be portable to other systems.
|
|
||||||
.PP
|
|
||||||
When successful,
|
|
||||||
.I regcomp
|
|
||||||
returns 0 and fills in the structure pointed to by
|
|
||||||
.IR preg .
|
|
||||||
One member of that structure
|
|
||||||
(other than
|
|
||||||
.IR re_endp )
|
|
||||||
is publicized:
|
|
||||||
.IR re_nsub ,
|
|
||||||
of type
|
|
||||||
.IR size_t ,
|
|
||||||
contains the number of parenthesized subexpressions within the RE
|
|
||||||
(except that the value of this member is undefined if the
|
|
||||||
REG_NOSUB flag was used).
|
|
||||||
If
|
|
||||||
.I regcomp
|
|
||||||
fails, it returns a non-zero error code;
|
|
||||||
see DIAGNOSTICS.
|
|
||||||
.PP
|
|
||||||
.I Regexec
|
|
||||||
matches the compiled RE pointed to by
|
|
||||||
.I preg
|
|
||||||
against the
|
|
||||||
.IR string ,
|
|
||||||
subject to the flags in
|
|
||||||
.IR eflags ,
|
|
||||||
and reports results using
|
|
||||||
.IR nmatch ,
|
|
||||||
.IR pmatch ,
|
|
||||||
and the returned value.
|
|
||||||
The RE must have been compiled by a previous invocation of
|
|
||||||
.IR regcomp .
|
|
||||||
The compiled form is not altered during execution of
|
|
||||||
.IR regexec ,
|
|
||||||
so a single compiled RE can be used simultaneously by multiple threads.
|
|
||||||
.PP
|
|
||||||
By default,
|
|
||||||
the NUL-terminated string pointed to by
|
|
||||||
.I string
|
|
||||||
is considered to be the text of an entire line,
|
|
||||||
with the NUL indicating the end of the line.
|
|
||||||
(That is,
|
|
||||||
any other end-of-line marker is considered to have been removed
|
|
||||||
and replaced by the NUL.)
|
|
||||||
The
|
|
||||||
.I eflags
|
|
||||||
argument is the bitwise OR of zero or more of the following flags:
|
|
||||||
.IP REG_NOTBOL \w'REG_STARTEND'u+2n
|
|
||||||
The first character of
|
|
||||||
the string
|
|
||||||
is not the beginning of a line, so the `^' anchor should not match before it.
|
|
||||||
This does not affect the behavior of newlines under REG_NEWLINE.
|
|
||||||
.IP REG_NOTEOL
|
|
||||||
The NUL terminating
|
|
||||||
the string
|
|
||||||
does not end a line, so the `$' anchor should not match before it.
|
|
||||||
This does not affect the behavior of newlines under REG_NEWLINE.
|
|
||||||
.IP REG_STARTEND
|
|
||||||
The string is considered to start at
|
|
||||||
\fIstring\fR\ + \fIpmatch\fR[0].\fIrm_so\fR
|
|
||||||
and to have a terminating NUL located at
|
|
||||||
\fIstring\fR\ + \fIpmatch\fR[0].\fIrm_eo\fR
|
|
||||||
(there need not actually be a NUL at that location),
|
|
||||||
regardless of the value of
|
|
||||||
.IR nmatch .
|
|
||||||
See below for the definition of
|
|
||||||
.IR pmatch
|
|
||||||
and
|
|
||||||
.IR nmatch .
|
|
||||||
This is an extension,
|
|
||||||
compatible with but not specified by POSIX 1003.2,
|
|
||||||
and should be used with
|
|
||||||
caution in software intended to be portable to other systems.
|
|
||||||
Note that a non-zero \fIrm_so\fR does not imply REG_NOTBOL;
|
|
||||||
REG_STARTEND affects only the location of the string,
|
|
||||||
not how it is matched.
|
|
||||||
.PP
|
|
||||||
See
|
|
||||||
.ZR
|
|
||||||
for a discussion of what is matched in situations where an RE or a
|
|
||||||
portion thereof could match any of several substrings of
|
|
||||||
.IR string .
|
|
||||||
.PP
|
|
||||||
Normally,
|
|
||||||
.I regexec
|
|
||||||
returns 0 for success and the non-zero code REG_NOMATCH for failure.
|
|
||||||
Other non-zero error codes may be returned in exceptional situations;
|
|
||||||
see DIAGNOSTICS.
|
|
||||||
.PP
|
|
||||||
If REG_NOSUB was specified in the compilation of the RE,
|
|
||||||
or if
|
|
||||||
.I nmatch
|
|
||||||
is 0,
|
|
||||||
.I regexec
|
|
||||||
ignores the
|
|
||||||
.I pmatch
|
|
||||||
argument (but see below for the case where REG_STARTEND is specified).
|
|
||||||
Otherwise,
|
|
||||||
.I pmatch
|
|
||||||
points to an array of
|
|
||||||
.I nmatch
|
|
||||||
structures of type
|
|
||||||
.IR regmatch_t .
|
|
||||||
Such a structure has at least the members
|
|
||||||
.I rm_so
|
|
||||||
and
|
|
||||||
.IR rm_eo ,
|
|
||||||
both of type
|
|
||||||
.I regoff_t
|
|
||||||
(a signed arithmetic type at least as large as an
|
|
||||||
.I off_t
|
|
||||||
and a
|
|
||||||
.IR ssize_t ),
|
|
||||||
containing respectively the offset of the first character of a substring
|
|
||||||
and the offset of the first character after the end of the substring.
|
|
||||||
Offsets are measured from the beginning of the
|
|
||||||
.I string
|
|
||||||
argument given to
|
|
||||||
.IR regexec .
|
|
||||||
An empty substring is denoted by equal offsets,
|
|
||||||
both indicating the character following the empty substring.
|
|
||||||
.PP
|
|
||||||
The 0th member of the
|
|
||||||
.I pmatch
|
|
||||||
array is filled in to indicate what substring of
|
|
||||||
.I string
|
|
||||||
was matched by the entire RE.
|
|
||||||
Remaining members report what substring was matched by parenthesized
|
|
||||||
subexpressions within the RE;
|
|
||||||
member
|
|
||||||
.I i
|
|
||||||
reports subexpression
|
|
||||||
.IR i ,
|
|
||||||
with subexpressions counted (starting at 1) by the order of their opening
|
|
||||||
parentheses in the RE, left to right.
|
|
||||||
Unused entries in the array\(emcorresponding either to subexpressions that
|
|
||||||
did not participate in the match at all, or to subexpressions that do not
|
|
||||||
exist in the RE (that is, \fIi\fR\ > \fIpreg\fR\->\fIre_nsub\fR)\(emhave both
|
|
||||||
.I rm_so
|
|
||||||
and
|
|
||||||
.I rm_eo
|
|
||||||
set to \-1.
|
|
||||||
If a subexpression participated in the match several times,
|
|
||||||
the reported substring is the last one it matched.
|
|
||||||
(Note, as an example in particular, that when the RE `(b*)+' matches `bbb',
|
|
||||||
the parenthesized subexpression matches the three `b's and then
|
|
||||||
an infinite number of empty strings following the last `b',
|
|
||||||
so the reported substring is one of the empties.)
|
|
||||||
.PP
|
|
||||||
If REG_STARTEND is specified,
|
|
||||||
.I pmatch
|
|
||||||
must point to at least one
|
|
||||||
.I regmatch_t
|
|
||||||
(even if
|
|
||||||
.I nmatch
|
|
||||||
is 0 or REG_NOSUB was specified),
|
|
||||||
to hold the input offsets for REG_STARTEND.
|
|
||||||
Use for output is still entirely controlled by
|
|
||||||
.IR nmatch ;
|
|
||||||
if
|
|
||||||
.I nmatch
|
|
||||||
is 0 or REG_NOSUB was specified,
|
|
||||||
the value of
|
|
||||||
.IR pmatch [0]
|
|
||||||
will not be changed by a successful
|
|
||||||
.IR regexec .
|
|
||||||
.PP
|
|
||||||
.I Regerror
|
|
||||||
maps a non-zero
|
|
||||||
.I errcode
|
|
||||||
from either
|
|
||||||
.I regcomp
|
|
||||||
or
|
|
||||||
.I regexec
|
|
||||||
to a human-readable, printable message.
|
|
||||||
If
|
|
||||||
.I preg
|
|
||||||
is non-NULL,
|
|
||||||
the error code should have arisen from use of
|
|
||||||
the
|
|
||||||
.I regex_t
|
|
||||||
pointed to by
|
|
||||||
.IR preg ,
|
|
||||||
and if the error code came from
|
|
||||||
.IR regcomp ,
|
|
||||||
it should have been the result from the most recent
|
|
||||||
.I regcomp
|
|
||||||
using that
|
|
||||||
.IR regex_t .
|
|
||||||
.RI ( Regerror
|
|
||||||
may be able to supply a more detailed message using information
|
|
||||||
from the
|
|
||||||
.IR regex_t .)
|
|
||||||
.I Regerror
|
|
||||||
places the NUL-terminated message into the buffer pointed to by
|
|
||||||
.IR errbuf ,
|
|
||||||
limiting the length (including the NUL) to at most
|
|
||||||
.I errbuf_size
|
|
||||||
bytes.
|
|
||||||
If the whole message won't fit,
|
|
||||||
as much of it as will fit before the terminating NUL is supplied.
|
|
||||||
In any case,
|
|
||||||
the returned value is the size of buffer needed to hold the whole
|
|
||||||
message (including terminating NUL).
|
|
||||||
If
|
|
||||||
.I errbuf_size
|
|
||||||
is 0,
|
|
||||||
.I errbuf
|
|
||||||
is ignored but the return value is still correct.
|
|
||||||
.PP
|
|
||||||
If the
|
|
||||||
.I errcode
|
|
||||||
given to
|
|
||||||
.I regerror
|
|
||||||
is first ORed with REG_ITOA,
|
|
||||||
the ``message'' that results is the printable name of the error code,
|
|
||||||
e.g. ``REG_NOMATCH'',
|
|
||||||
rather than an explanation thereof.
|
|
||||||
If
|
|
||||||
.I errcode
|
|
||||||
is REG_ATOI,
|
|
||||||
then
|
|
||||||
.I preg
|
|
||||||
shall be non-NULL and the
|
|
||||||
.I re_endp
|
|
||||||
member of the structure it points to
|
|
||||||
must point to the printable name of an error code;
|
|
||||||
in this case, the result in
|
|
||||||
.I errbuf
|
|
||||||
is the decimal digits of
|
|
||||||
the numeric value of the error code
|
|
||||||
(0 if the name is not recognized).
|
|
||||||
REG_ITOA and REG_ATOI are intended primarily as debugging facilities;
|
|
||||||
they are extensions,
|
|
||||||
compatible with but not specified by POSIX 1003.2,
|
|
||||||
and should be used with
|
|
||||||
caution in software intended to be portable to other systems.
|
|
||||||
Be warned also that they are considered experimental and changes are possible.
|
|
||||||
.PP
|
|
||||||
.I Regfree
|
|
||||||
frees any dynamically-allocated storage associated with the compiled RE
|
|
||||||
pointed to by
|
|
||||||
.IR preg .
|
|
||||||
The remaining
|
|
||||||
.I regex_t
|
|
||||||
is no longer a valid compiled RE
|
|
||||||
and the effect of supplying it to
|
|
||||||
.I regexec
|
|
||||||
or
|
|
||||||
.I regerror
|
|
||||||
is undefined.
|
|
||||||
.PP
|
|
||||||
None of these functions references global variables except for tables
|
|
||||||
of constants;
|
|
||||||
all are safe for use from multiple threads if the arguments are safe.
|
|
||||||
.SH IMPLEMENTATION CHOICES
|
|
||||||
There are a number of decisions that 1003.2 leaves up to the implementor,
|
|
||||||
either by explicitly saying ``undefined'' or by virtue of them being
|
|
||||||
forbidden by the RE grammar.
|
|
||||||
This implementation treats them as follows.
|
|
||||||
.PP
|
|
||||||
See
|
|
||||||
.ZR
|
|
||||||
for a discussion of the definition of case-independent matching.
|
|
||||||
.PP
|
|
||||||
There is no particular limit on the length of REs,
|
|
||||||
except insofar as memory is limited.
|
|
||||||
Memory usage is approximately linear in RE size, and largely insensitive
|
|
||||||
to RE complexity, except for bounded repetitions.
|
|
||||||
See BUGS for one short RE using them
|
|
||||||
that will run almost any system out of memory.
|
|
||||||
.PP
|
|
||||||
A backslashed character other than one specifically given a magic meaning
|
|
||||||
by 1003.2 (such magic meanings occur only in obsolete [``basic''] REs)
|
|
||||||
is taken as an ordinary character.
|
|
||||||
.PP
|
|
||||||
Any unmatched [ is a REG_EBRACK error.
|
|
||||||
.PP
|
|
||||||
Equivalence classes cannot begin or end bracket-expression ranges.
|
|
||||||
The endpoint of one range cannot begin another.
|
|
||||||
.PP
|
|
||||||
RE_DUP_MAX, the limit on repetition counts in bounded repetitions, is 255.
|
|
||||||
.PP
|
|
||||||
A repetition operator (?, *, +, or bounds) cannot follow another
|
|
||||||
repetition operator.
|
|
||||||
A repetition operator cannot begin an expression or subexpression
|
|
||||||
or follow `^' or `|'.
|
|
||||||
.PP
|
|
||||||
`|' cannot appear first or last in a (sub)expression or after another `|',
|
|
||||||
i.e. an operand of `|' cannot be an empty subexpression.
|
|
||||||
An empty parenthesized subexpression, `()', is legal and matches an
|
|
||||||
empty (sub)string.
|
|
||||||
An empty string is not a legal RE.
|
|
||||||
.PP
|
|
||||||
A `{' followed by a digit is considered the beginning of bounds for a
|
|
||||||
bounded repetition, which must then follow the syntax for bounds.
|
|
||||||
A `{' \fInot\fR followed by a digit is considered an ordinary character.
|
|
||||||
.PP
|
|
||||||
`^' and `$' beginning and ending subexpressions in obsolete (``basic'')
|
|
||||||
REs are anchors, not ordinary characters.
|
|
||||||
.SH SEE ALSO
|
|
||||||
grep(1), regex(7)
|
|
||||||
.PP
|
|
||||||
POSIX 1003.2, sections 2.8 (Regular Expression Notation)
|
|
||||||
and
|
|
||||||
B.5 (C Binding for Regular Expression Matching).
|
|
||||||
.SH DIAGNOSTICS
|
|
||||||
Non-zero error codes from
|
|
||||||
.I regcomp
|
|
||||||
and
|
|
||||||
.I regexec
|
|
||||||
include the following:
|
|
||||||
.PP
|
|
||||||
.nf
|
|
||||||
.ta \w'REG_ECOLLATE'u+3n
|
|
||||||
REG_NOMATCH regexec() failed to match
|
|
||||||
REG_BADPAT invalid regular expression
|
|
||||||
REG_ECOLLATE invalid collating element
|
|
||||||
REG_ECTYPE invalid character class
|
|
||||||
REG_EESCAPE \e applied to unescapable character
|
|
||||||
REG_ESUBREG invalid backreference number
|
|
||||||
REG_EBRACK brackets [ ] not balanced
|
|
||||||
REG_EPAREN parentheses ( ) not balanced
|
|
||||||
REG_EBRACE braces { } not balanced
|
|
||||||
REG_BADBR invalid repetition count(s) in { }
|
|
||||||
REG_ERANGE invalid character range in [ ]
|
|
||||||
REG_ESPACE ran out of memory
|
|
||||||
REG_BADRPT ?, *, or + operand invalid
|
|
||||||
REG_EMPTY empty (sub)expression
|
|
||||||
REG_ASSERT ``can't happen''\(emyou found a bug
|
|
||||||
REG_INVARG invalid argument, e.g. negative-length string
|
|
||||||
.fi
|
|
||||||
.SH HISTORY
|
|
||||||
Written by Henry Spencer,
|
|
||||||
henry@zoo.toronto.edu.
|
|
||||||
.SH BUGS
|
|
||||||
This is an alpha release with known defects.
|
|
||||||
Please report problems.
|
|
||||||
.PP
|
|
||||||
There is one known functionality bug.
|
|
||||||
The implementation of internationalization is incomplete:
|
|
||||||
the locale is always assumed to be the default one of 1003.2,
|
|
||||||
and only the collating elements etc. of that locale are available.
|
|
||||||
.PP
|
|
||||||
The back-reference code is subtle and doubts linger about its correctness
|
|
||||||
in complex cases.
|
|
||||||
.PP
|
|
||||||
.I Regexec
|
|
||||||
performance is poor.
|
|
||||||
This will improve with later releases.
|
|
||||||
.I Nmatch
|
|
||||||
exceeding 0 is expensive;
|
|
||||||
.I nmatch
|
|
||||||
exceeding 1 is worse.
|
|
||||||
.I Regexec
|
|
||||||
is largely insensitive to RE complexity \fIexcept\fR that back
|
|
||||||
references are massively expensive.
|
|
||||||
RE length does matter; in particular, there is a strong speed bonus
|
|
||||||
for keeping RE length under about 30 characters,
|
|
||||||
with most special characters counting roughly double.
|
|
||||||
.PP
|
|
||||||
.I Regcomp
|
|
||||||
implements bounded repetitions by macro expansion,
|
|
||||||
which is costly in time and space if counts are large
|
|
||||||
or bounded repetitions are nested.
|
|
||||||
An RE like, say,
|
|
||||||
`((((a{1,100}){1,100}){1,100}){1,100}){1,100}'
|
|
||||||
will (eventually) run almost any existing machine out of swap space.
|
|
||||||
.PP
|
|
||||||
There are suspected problems with response to obscure error conditions.
|
|
||||||
Notably,
|
|
||||||
certain kinds of internal overflow,
|
|
||||||
produced only by truly enormous REs or by multiply nested bounded repetitions,
|
|
||||||
are probably not handled well.
|
|
||||||
.PP
|
|
||||||
Due to a mistake in 1003.2, things like `a)b' are legal REs because `)' is
|
|
||||||
a special character only in the presence of a previous unmatched `('.
|
|
||||||
This can't be fixed until the spec is fixed.
|
|
||||||
.PP
|
|
||||||
The standard's definition of back references is vague.
|
|
||||||
For example, does
|
|
||||||
`a\e(\e(b\e)*\e2\e)*d' match `abbbd'?
|
|
||||||
Until the standard is clarified,
|
|
||||||
behavior in such cases should not be relied on.
|
|
||||||
.PP
|
|
||||||
The implementation of word-boundary matching is a bit of a kludge,
|
|
||||||
and bugs may lurk in combinations of word-boundary matching and anchoring.
|
|
Reference in New Issue
Block a user