regexp(5)							   regexp(5)




 NAME
      regexp - regular expression and pattern matching notation definitions

 DESCRIPTION
      A regular expression is a mechanism supported by many utilities for
      locating and manipulating patterns in text.  pattern matching notation
      is used by shells and other utilities for file name expansion.  This
      manual entry defines two forms of regular expressions: Basic Regular
      Expressions and Extended Regular Expressions ; and the one form of
      Pattern Matching Notation .

 BASIC REGULAR EXPRESSIONS
      Basic regular expression (RE) notation and construction rules apply to
      utilities defined as using basic REs.  Any exceptions to the following
      rules are noted in the descriptions of the specific utilities that use
      REs.

    REs Matching a Single Character
      The following REs match a single character or a single collating
      element:

      Ordinary Characters
      An ordinary character is an RE that matches itself.  An ordinary
      character is any character in the supported character set except
      <newline> and the regular expression special characters listed in
      Special Characters below.	 An ordinary character preceded by a
      backslash (\) is treated as the ordinary character itself, except when
      the character is (, ), {, or }, or the digits 1 through 9 (see REs
      Matching Multiple Characters).  Matching is based on the bit pattern
      used for encoding the character; not on the graphic representation of
      the character.

      Special Characters
      A regular expression special character preceded by a backslash is a
      regular expression that matches the special character itself.  When
      not preceded by a backslash, such characters have special meaning in
      the specification of REs.	 Regular expression special characters and
      the contexts in which they have special meaning are:

	   [ \		  The period, left square bracket, and backslash are
			  special except when used in a bracket expression
			  (see RE Bracket Expression).

	   *		  The asterisk is special except when used in a
			  bracket expression, as the first character of a
			  regular expression, or as the first character
			  following the character pair \( (see REs Matching
			  Multiple Characters).

	   ^		  The circumflex is special when used as the first
			  character of an entire RE (see Expression



 Hewlett-Packard Company	    - 1 -    HP-UX Release 10.20:  July 1996






 regexp(5)							   regexp(5)




			  Anchoring) or as the first character of a bracket
			  expression.

	   $		  The dollar sign is special when used as the last
			  character of an entire RE (see Expression
			  Anchoring).

	   delimiter	  Any character used to bound (i.e., delimit) an
			  entire RE is special for that RE.

      Period

      A period (.), when used outside of a bracket expression, is an RE that
      matches any printable or nonprintable character except <newline>.

    RE Bracket Expression
      A bracket expression enclosed in square brackets ([ ]) is an RE that
      matches a single collating element contained in the nonempty set of
      collating elements represented by the bracket expression.

      The following rules apply to bracket expressions:

	   bracket expression
			  A bracket expression is either a matching list
			  expression or a non-matching list expression, and
			  consists of one or more expressions in any order.
			  Expressions can be: collating elements, collating
			  symbols, noncollating characters, equivalence
			  classes, range expressions, or character classes.
			  The right bracket (]) loses its special meaning
			  and represents itself in a bracket expression if
			  it occurs first in the list (after an initial ^,
			  if any).  Otherwise, it terminates the bracket
			  expression (unless it is the ending right bracket
			  for a valid collating symbol, equivalence class,
			  or character class, or it is the collating element
			  within a collating symbol or equivalence class
			  expression).	The special characters

				    . * [ \

			  (period, asterisk, left bracket, and backslash)
			  lose their special meaning within a bracket
			  expression.

			  The character sequences:

				    [.	 [=   [:

			  (left-bracket followed by a period, equal-sign or
			  colon) are special inside a bracket expression and



 Hewlett-Packard Company	    - 2 -    HP-UX Release 10.20:  July 1996






 regexp(5)							   regexp(5)




			  are used to delimit collating symbols, equivalence
			  class expressions and character class expressions.
			  These symbols must be followed by a valid
			  expression and the matching terminating .], =], or
			  :].

	   matching list  A matching list expression specifies a list that
			  matches any one of the characters represented in
			  the list.  The first character in the list cannot
			  be the circumflex.  For example, [abc] is an RE
			  that matches any of a, b, or c.

	   non-matching list
			  A non-matching list expression begins with a
			  circumflex (^), and specifies a list that matches
			  any character or collating element except
			  <newline> and the characters represented in the
			  list.	 For example, [^abc] is an RE that matches
			  any character except <newline> or a, b, or c.	 The
			  circumflex has this special meaning only when it
			  occurs first in the list, immediately following
			  the left square bracket.

	   collating element
			  A collating element is a sequence of one or more
			  characters that represents a single element in the
			  collating sequence as identified via the most
			  current setting of the locale category LC_COLLATE
			  (see setlocale(3C)).

	   collating symbol
			  A collating symbol is a collating element enclosed
			  within bracket-period ([.....]) delimiters.
			  Multi-character collating elements must be
			  represented as collating symbols to distinguish
			  them from single-character collating elements.
			  For example, if the string ch is a valid collating
			  element, then [[.ch.]] is treated as an element
			  matching the same string of characters, while ch
			  is treated as a simple list of the characters c
			  and h.  If the string within the bracket-period
			  delimiters is not a valid collating element in the
			  current collating sequence definition, the symbol
			  is treated as an invalid expression.

	   noncollating character
			  A noncollating character is a character that is
			  ignored for collating purposes.  By definition,
			  such characters cannot participate in equivalence
			  classes or range expressions.




 Hewlett-Packard Company	    - 3 -    HP-UX Release 10.20:  July 1996






 regexp(5)							   regexp(5)




	   equivalence class
			  An equivalence class expression represents the set
			  of collating elements belonging to an equivalence
			  class.  It is expressed by enclosing any one of
			  the collating elements in the equivalence class
			  within bracket-equal ([=...=]) delimiters.  For
			  example, if a,,and A belong to the same
			  equivalence class, then [[=a=]b], [=[]=b], and
			  [[=A=]b] are each equivalent toA[ba].

	   range expression
			  A range expression represents the set of collating
			  elements that fall between two elements in the
			  current collation sequence as defined via the most
			  current setting of the locale category LC_COLLATE
			  (see setlocale(3C)).	It is expressed as the
			  starting point and the ending point separated by a
			  hyphen (-).

			  The starting range point and the ending range
			  point must be a collating element, collating
			  symbol, or equivalence class expression.  An
			  equivalence class expression used as an end point
			  of a range expression is interpreted such that all
			  collating elements within the equivalence class
			  are included in the range.  For example, if the
			  collating order is A, a, B, b, C, c, ch, D, and d
			  and the characters A and a belong to the same
			  equivalence class, then the expression [[=a=]-D]
			  is treated as [AaBbCc[.ch.]D].

			  Both starting and ending range points must be
			  valid collating elements, collating symbols, or
			  equivalence class expressions, and the ending
			  range point must collate equal to or higher than
			  the starting range point; otherwise the expression
			  is invalid.  For example, with the above collating
			  order and assuming that E is a noncollating
			  character, then both the expressions [[=A=]-E] and
			  [d-a] are invalid.

			  An ending range point can also be the starting
			  range point in a subsequent range expression.
			  Each such range expression is evaluated
			  separately.  For example, the bracket expression
			  [a-m-o] is treated as [a-mm-o].

			  The hyphen character is treated as itself if it
			  occurs first (after an initial ^, if any) or last
			  in the list, or as the rightmost symbol in a range
			  expression.  As examples, the expressions [-ac]



 Hewlett-Packard Company	    - 4 -    HP-UX Release 10.20:  July 1996






 regexp(5)							   regexp(5)




			  and [ac-] are equivalent and match any of the
			  characters a, c, or -; the expressions [^-ac] and
			  [^ac-] are equivalent and match any characters
			  except <newline>, a, c, or -; the expression [%--]
			  matches any of the characters in the defined
			  collating sequence between % and - inclusive; the
			  expression [--@] matches any of the characters in
			  the defined collating sequence between - and @
			  inclusive; and the expression [a--@] is invalid,
			  assuming - precedes a in the collating sequence.

			  If a bracket expression must specify both - and ],
			  the ] must be placed first (after the ^, if any)
			  and the - last within the bracket expression.

	   character class
			  A character class expression represents the set of
			  characters belonging to a character class, as
			  defined via the most current setting of the locale
			  category LC_CTYPE.  It is expressed as a character
			  class name enclosed within bracket-colon ([: :])
			  delimiters.

			  Standard character class expressions supported in
			  all locales are:

			       [:alpha:]      letters

			       [:upper:]      upper-case letters

			       [:lower:]      lower-case letters

			       [:digit:]      decimal digits

			       [:xdigit:]     hexadecimal digits

			       [:alnum:]      letters or decimal digits

			       [:space:]      characters producing white-
					      space in displayed text

			       [:print:]      printing characters

			       [:punct:]      punctuation characters

			       [:graph:]      characters with a visible
					      representation

			       [:cntrl:]      control characters





 Hewlett-Packard Company	    - 5 -    HP-UX Release 10.20:  July 1996






 regexp(5)							   regexp(5)




			       [:blank:]      blank characters

    REs Matching Multiple Characters
      The following rules may be used to construct REs matching multiple
      characters from REs matching a single character:

	   RERE		  The concatenation of REs is an RE that matches the
			  first encountered concatenation of the strings
			  matched by each component of the RE.	For example,
			  the RE bc matches the second and third characters
			  of the string abcdefabcdef.

	   RE*		  An RE matching a single character followed by an
			  asterisk (*) is an RE that matches zero or more
			  occurrences of the RE preceding the asterisk.	 The
			  first encountered string that permits a match is
			  chosen, and the matched string will encompass the
			  maximum number of characters permitted by the RE.
			  For example, in the string abbbcdeabbbbbbcde, both
			  the RE b*c and the RE bbb*c are matched by the
			  substring bbbc in the second through fifth
			  positions.  An asterisk as the first character of
			  an RE loses this special meaning and is treated as
			  itself.

	   \(RE\)	  A subexpression can be defined within an RE by
			  enclosing it between the character pairs \( and
			  \).  Such a subexpression matches whatever it
			  would have matched without the \( and \).
			  Subexpressions can be arbitrarily nested.  An
			  asterisk immediately following the \( loses its
			  special meaning and is treated as itself.  An
			  asterisk immediately following the \) is treated
			  as an invalid character.

	   \n		  The expression \n matches the same string of
			  characters as was matched by a subexpression
			  enclosed between \( and \) preceding the \n.	The
			  character n must be a digit from 1 through 9,
			  specifying the n-th subexpression (the one that
			  begins with the n-th \( and ends with the
			  corresponding paired \).  For example, the
			  expression ^\(.*\)\1$ matches a line consisting of
			  two adjacent appearances of the same string.

			  If the \n is followed by an asterisk, it matches
			  zero or more occurrences of the subexpression
			  referred to.	For example, the expression
			  \(ab\(cd\)ef\)Z\2*Z\1 matches the string
			  abcdefZcdcdZabcdef.




 Hewlett-Packard Company	    - 6 -    HP-UX Release 10.20:  July 1996






 regexp(5)							   regexp(5)




	   RE\{m,n\}	  An RE matching a single character followed by
			  \{m\}, \{m,\}, or \{m,n\} is an RE that matches
			  repeated occurrences of the RE.  The values of m
			  and n must be decimal integers in the range 0
			  through 255, with m specifying the exact or
			  minimum number of occurrences and n specifying the
			  maximum number of occurrences.  \{m\} matches
			  exactly m occurrences of the preceding RE, \{m,\}
			  matches at least m occurrences, and \{m,n\}
			  matches any number of occurrences between m and n,
			  inclusive.

			  The first encountered string that matches the
			  expression is chosen; it will contain as many
			  occurrences of the RE as possible.  For example,
			  in the string abbbbbbbc the RE b\{3\} is matched
			  by characters two through four, the RE b\{3,\} is
			  matched by characters two through eight, and the
			  RE b\{3,5\}c is matched by characters four through
			  nine.

    Expression Anchoring
      An RE can be limited to matching strings that begin or end a line
      (i.e., anchored) according to the following rules:

	   o  A circumflex (^) as the first character of an RE anchors the
	      expression to the beginning of a line; only strings starting
	      at the first character of a line are matched by the RE.  For
	      example, the RE ^ab matches the string ab in the line abcdef,
	      but not the same string in the line cdefab.

	   o  A dollar sign ($) as the last character of an RE anchors the
	      expression to the end of a line; only strings ending at the
	      last character of a line are matched by the RE.  For example,
	      the RE ab$ matches the string ab in the line cdefab, but not
	      the same string in the line abcdef.

	   o  An RE anchored by both ^ and $ matches only strings that are
	      lines.  For example, the RE ^abcdef$ matches only lines
	      consisting of the string abcdef.

 EXTENDED REGULAR EXPRESSIONS
      The extended regular expression (ERE) notation and construction rules
      apply to utilities defined as using extended REs.	 Any exceptions to
      the following rules are noted in the descriptions of the specific
      utilities using EREs.

    EREs Matching a Single Character
      The following EREs match a single character or a single collating
      element:




 Hewlett-Packard Company	    - 7 -    HP-UX Release 10.20:  July 1996






 regexp(5)							   regexp(5)




      Ordinary Characters
      An ordinary character is an ERE that matches itself.  An ordinary
      character is any character in the supported character set except
      <newline> and the regular expression special characters listed in
      Special Characters below.	 An ordinary character preceded by a
      backslash (\) is treated as the ordinary character itself.  Matching
      is based on the bit pattern used for encoding the character, not on
      the graphic representation of the character.

      Special Characters
      A regular expression special character preceded by a backslash is a
      regular expression that matches the special character itself.  When
      not preceded by a backslash, such characters have special meaning in
      the specification of EREs.  The extended regular expression special
      characters and the contexts in which they have their special meaning
      are:

	   . [ \ ( ) * + ? $ |
			    The period, left square bracket, backslash, left
			    parenthesis, right parenthesis, asterisk, plus
			    sign, question mark, dollar sign, and vertical
			    bar are special except when used in a bracket
			    expression (see ERE Bracket Expression).

	   ^		    The circumflex is special except when used in a
			    bracket expression in a non-leading position.

	   delimiter	    Any character used to bound (i.e., delimit) an
			    entire ERE is special for that ERE.

      Period

      A period (.), when used outside of a bracket expression, is an ERE
      that matches any printable or nonprintable character except <newline>.

    ERE Bracket Expression
      The syntax and rules for ERE bracket expressions are the same as for
      RE bracket expressions found above.

    EREs Matching Multiple Characters
      The following rules may be used to construct EREs matching multiple
      characters from EREs matching a single character:

	   RERE		  A concatenation of EREs matches the first
			  encountered concatenation of the strings matched
			  by each component of the ERE.	 Such a
			  concatenation of EREs enclosed in parentheses
			  matches whatever the concatenation without the
			  parentheses matches.	For example, both the ERE bc
			  and the ERE (bc) matches the second and third
			  characters of the string abcdefabcdef.  The



 Hewlett-Packard Company	    - 8 -    HP-UX Release 10.20:  July 1996






 regexp(5)							   regexp(5)




			  longest overall string is matched.

	   RE+		  The special character plus (+), when following an
			  ERE matching a single character, or a
			  concatenation of EREs enclosed in parenthesis, is
			  an ERE that matches one or more occurrences of the
			  ERE preceding the plus sign.	The string matched
			  will contain as many occurrences as possible.	 For
			  example, the ERE b+c matches the fourth through
			  seventh characters in the string acabbbcde.

	   RE*		  The special character asterisk (*), when following
			  an ERE matching a single character, or a
			  concatenation of EREs enclosed in parenthesis, is
			  an ERE that matches zero or more occurrences of
			  the ERE preceding the asterisk.  For example, the
			  ERE b*c matches the first character in the string
			  cabbbcde.  If there is any choice, the longest
			  left-most string that permits a match is chosen.
			  For example, the ERE b*cd matches the third
			  through seventh characters in the string
			  cabbbcdebbbbbbcdbc.

	   RE?		  The special character question mark (?), when
			  following an ERE matching a single character, or a
			  concatenation of EREs enclosed in parenthesis, is
			  an ERE that matches zero or one occurrences of the
			  ERE preceding the question mark.  The string
			  matched will contain as many occurrences as
			  possible.  For example, the ERE b?c matches the
			  second character in the string acabbbcde.

	   RE{m,n}	  interval expression that functions the same way as
			  basic regular expression syntax, RE\{m,n\}

    Alternation
      Two EREs separated by the special character vertical bar (|) matches a
      string that is matched by either ERE.  For example, the ERE ((ab)|c)d
      matches the string abd and the string cd.

    Precedence
      The order of precedence is as follows, from high to low:

	   [ ]		  square brackets

	   * + ?	  asterisk, plus sign, question mark

	   ^ $		  anchoring

			  concatenation




 Hewlett-Packard Company	    - 9 -    HP-UX Release 10.20:  July 1996






 regexp(5)							   regexp(5)




	   |		  alternation

      For example, the ERE abba|cde is interpreted as "match either abba or
      cde.  It does not mean "match abb followed by a or c followed in turn
      by de (because concatenation has a higher order of precedence than
      alternation).

    Expression Anchoring
      An ERE can be limited to matching strings that begin or end a line
      (i.e., anchored) according to the following rules:

	   o  A circumflex (^) matches the beginning of a line (anchors the
	      expression to the beginning of a line).  For example, the ERE
	      ^ab matches the string ab in the line abcdef, but not the same
	      string in the line cdefab.

	   o  A dollar sign ($) matches the end of a line (anchors the
	      expression to the end of a line).	 For example, the ERE ab$
	      matches the string ab in the line cdefab, but not the same
	      string in the line abcdef.

	   o  An ERE anchored by both ^ and $ matches only strings that are
	      lines.  For example, the ERE ^abcdef$ matches only lines
	      consisting of the string abcdef.	Only empty lines match the
	      ERE ^$.

 PATTERN MATCHING NOTATION
      The following rules apply to pattern matching notation except as noted
      in the descriptions of the specific utilities using pattern matching.

    Patterns Matching a Single Character
      The following patterns match a single character or a single collating
      element:

      Ordinary Characters
      An ordinary character is a pattern that matches itself.  An ordinary
      character is any character in the supported character set except
      <newline> and the pattern matching special characters listed in
      Special Characters below.	 Matching is based on the bit pattern used
      for encoding the character, not on the graphic representation of the
      character.

      Special Characters
      A pattern matching special character preceded by a backslash (\) is a
      pattern that matches the special character itself.  When not preceded
      by a backslash, such characters have special meaning in the
      specification of patterns.  The pattern matching special characters
      and the contexts in which they have their special meaning are:

	   ? * [	  The question mark, asterisk, and left square
			  bracket are special except when used in a bracket



 Hewlett-Packard Company	   - 10 -    HP-UX Release 10.20:  July 1996






 regexp(5)							   regexp(5)




			  expression (see Pattern Bracket Expression).

      Question Mark

      A question mark (?), when used outside of a bracket expression, is a
      pattern that matches any printable or nonprintable character except
      <newline>.

    Pattern Bracket Expression
      The syntax and rules for pattern bracket expressions are the same as
      for RE bracket expressions found above with the following exceptions:

	   The exclamation point character (!) replaces the circumflex
	   character (^) in its role in a non-matching list in the regular
	   expression notation.

	   The backslash is used as an escape character within bracket
	   expressions.

    Patterns Matching Multiple Characters
      The following rules may be used to construct patterns matching
      multiple characters from patterns matching a single character:

	   *		  The asterisk (*) is a pattern that matches any
			  string, including the null string.

	   RERE		  The concatenation of patterns matching a single
			  character is a valid pattern that matches the
			  concatenation of the single characters or
			  collating elements matched by each of the
			  concatenated patterns.  For example, the pattern
			  a[bc] matches the string ab and ac.

			  The concatenation of one or more patterns matching
			  a single character with one or more asterisks is a
			  valid pattern.  In such patterns, each asterisk
			  matches a string of zero or more characters, up to
			  the first character that matches the character
			  following the asterisk in the pattern.

			  For example, the pattern a*d matches the strings
			  ad, abd, and abcd; but not the string abc.  When
			  an asterisk is the first or last character in a
			  pattern, it matches zero or more characters that
			  precede or follow the characters matched by the
			  remainder of the pattern.  For example, the
			  pattern a*d* matches the strings ad, abcd, abcdef,
			  aaaad, and adddd; the pattern *a*d matches the
			  strings ad, abcd, efabcd, aaaad, and adddd.





 Hewlett-Packard Company	   - 11 -    HP-UX Release 10.20:  July 1996






 regexp(5)							   regexp(5)




    Rule Qualification for Patterns Used for Filename Expansion
      The rules described above for pattern matching are qualified by the
      following rules when the pattern matching notation is used for
      filename expansion by sh(1), csh(1), ksh(1), and make(1).

	   If a filename (including the component of a pathname that follows
	   the slash (/) character) begins with a period (.), the period
	   must be explicitly matched by using a period as the first
	   character of the pattern; it cannot be matched by either the
	   asterisk special character, the question mark special character,
	   or a bracket expression.  This rule does not apply to make(1).

	   The slash character in a pathname must be explicitly matched by
	   using a slash in the pattern; it cannot be matched by either the
	   asterisk special character, the question mark special character,
	   or a bracket expression.  For make(1) only the part of the
	   pathname following the last slash character can be matched by a
	   special character.  That is, all special characters preceding the
	   last slash character lose their special meaning.

	   Specified patterns are matched against existing filenames and
	   pathnames, as appropriate.  If the pattern matches any existing
	   filenames or pathnames, the pattern is replaced with those
	   filenames and pathnames, sorted according to the collating
	   sequence in effect.	If the pattern does not match any existing
	   filenames or pathnames, the pattern string is left unchanged.

	   If the pattern begins with a tilde (~) character, all of the
	   ordinary characters preceding the first slash (or all characters
	   if there is no slash) are treated as a possible login name.	If
	   the login name is null (i.e., the pattern contains only the tilde
	   or the tilde is immediately followed by a slash), the tilde is
	   replaced by a pathname of the process's home directory, followed
	   by a slash.	Otherwise, the combination of tilde and login name
	   are replaced by a pathname of the home directory associated with
	   the login name, followed by a slash.	 If the system cannot
	   identify the login name, the result is implementation-defined.
	   This rule does not apply to sh(1) or make(1).

	   If the pattern contains a $ character, variable substitution can
	   take place.	Environmental variables can be embedded within
	   patterns as:

			  $name

	   or:

			  ${name}

	   Braces are used to guarantee that characters following name are
	   not interpreted as belonging to name.  Substitution occurs in the



 Hewlett-Packard Company	   - 12 -    HP-UX Release 10.20:  July 1996






 regexp(5)							   regexp(5)




	   order specified only once; that is, the resulting string is not
	   examined again for new names that occurred because of the
	   substitution.

    Rule Qualification for Patterns Used in the case Command
      The rules described above for pattern matching are qualified by the
      following rule when the pattern matching notation is used in the case
      command of sh(1) and ksh(1).

	   Multiple alternative patterns in a single clause can be specified
	   by separating individual patterns with the vertical bar character
	   (|); strings matching any of the patterns separated this way will
	   cause the corresponding command list to be selected.

 SEE ALSO
      ksh(1), sh(1), fnmatch(3C), glob(3C), regcomp(3C), setlocale(3C),
      cdf(4), environ(5).

 STANDARDS CONFORMANCE
      <regexp.h>: AES, SVID2, SVID3, XPG2, XPG3, XPG4


































 Hewlett-Packard Company	   - 13 -    HP-UX Release 10.20:  July 1996