[EMAIL PROTECTED] wrote:

Experiments using bash indicate that either ^ or ! is accepted
as the negation of a character set.  Hence,

    ls -d [^tu]*
    ls -d [!tu]*

both return the same thing - a list of all files and directories
in the current directory whose names do not begin with "t" or "u".

SQLite only supports ^, not !.  I wonder if this is something I
should change?  It would not be much trouble to get GLOB to support
both, must like the globber in bash.

Anybody have an old Bourne shell around?  An authentic C-shell?
What do they do?

Richard,

I found the following info in a Jedit appendix.

| |

   *

      |?| matches any one character

   *

      |*| matches any number of characters

   *

      |{!/|glob|/}| Matches anything that does /not/ match /|glob|/

   *

      |{/|a|/,/|b|/,/|c|/}| matches any one of /|a|/, /|b|/ or /|c|/

   *

      |[/|abc|/]| matches any character in the set /|a|/, /|b|/ or /|c|/

   *

      |[^/|abc|/]| matches any character not in the set /|a|/, /|b|/
      or /|c|/

   *

      |[/|a-z|/]| matches any character in the range /|a|/ to /|z|/,
      inclusive. A leading or trailing dash will be interpreted literally


I noticed that SQLite doesn't implement any of the curly brace grouping of globs. It also shows the use of ^ for inversion with a character set, and ! for inversion of a complete glob.


The following is from the TCL documentation:

The /pattern/ arguments may contain any of the following special characters:

*?*
Matches any single character. *** Matches any sequence of zero or more characters. *[*/chars/*]*
    Matches any single character in /chars/. If /chars/ contains a
    sequence of the form /a/*-*/b/ then any character between /a/ and
/b/ (inclusive) will match. *\*/x/ Matches the character /x/. *{*/a/*,*/b/*,*/.../} Matches any of the strings /a/, /b/, etc.
This doesn't mention inversion at all, but it does say a backslash can be used to escape a character.

And the following is from a the documentation of a glob compiler class.

    * *** - Matches zero or more instances of any character. If the
      STAR_CANNOT_MATCH_NULL_MASK option is used, *** matches one or
      more instances of any character.
    * *?* - Matches one instance of any character. If the
      QUESTION_MATCHES_ZERO_OR_ONE_MASK option is used, *?* matches
      zero or one instances of any character.
    * *[...]* - Matches any of characters enclosed by the brackets. *
      * * and *?* lose their special meanings within a character
      class. Additionaly if the first character following the opening
      bracket is a *!* or a *^*, then any character not in the
      character class is matched. A *-* between two characters can be
      used to denote a range. A *-* at the beginning or end of the
      character class matches itself rather than referring to a range.
      A *]* immediately following the opening *[* matches itself
      rather than indicating the end of the character class, otherwise
      it must be escaped with a backslash to refer to itself.
    * *\* - A backslash matches itself in most situations. But when a
      special character such as a *** follows it, a backslash /
      escapes / the character, indicating that the special chracter
      should be interpreted as a normal character instead of its
      special meaning.
    * All other characters match themselves.

This class explicitly mentions using either ^ or ! to invert a character set. It also allows backslash escapes for special characters. It says * and ? loose their special status in a character set, so it isn't really an escape.

The following is from the Apple's documentation

       *?*         Matches any single character.

       ***         Matches any sequence of zero or more characters.

       *[*_chars_*]*   Matches any single character in _chars_.  If _chars_  
contains  a
                 sequence  of  the form _a_*-*_b_ then any character between 
_a_ and _b_
                 (inclusive) will match.

       *\*_x_        Matches the character _x_.

       *{*_a_*,*_b_*,*_..._} Matches any of the strings _a_, _b_, etc.


And finally, from the GNU bash documentation:


          3.5.8.1 Pattern Matching

Any character that appears in a pattern, other than the special pattern characters described below, matches itself. The nul character may not occur in a pattern. A backslash escapes the following character; the escaping backslash is discarded when matching. The special pattern characters must be quoted if they are to be matched literally.

The special pattern characters have the following meanings:

|*|
    Matches any string, including the null string.
|?|
    Matches any single character.
|[...]|
    Matches any one of the enclosed characters. A pair of characters
    separated by a hyphen denotes a range expression; any character
    that sorts between those two characters, inclusive, using the
    current locale's collating sequence and character set, is matched.
    If the first character following the ‘[’ is a ‘!’ or a ‘^’ then
    any character not enclosed is matched. A ‘−’ may be matched by
    including it as the first or last character in the set. A ‘]’ may
    be matched by including it as the first character in the set. The
    sorting order of characters in range expressions is determined by
    the current locale and the value of the LC_COLLATE shell variable,
    if set.

    For example, in the default C locale, ‘[a-dx-z]’ is equivalent to
    ‘[abcdxyz]’. Many locales sort characters in dictionary order, and
    in these locales ‘[a-dx-z]’ is typically not equivalent to
    ‘[abcdxyz]’; it might be equivalent to ‘[aBbCcDdxXyYz]’, for
    example. To obtain the traditional interpretation of ranges in
    bracket expressions, you can force the use of the C locale by
    setting the LC_COLLATE or LC_ALL environment variable to the value
    ‘C’.

    Within ‘[’ and ‘]’, character classes can be specified using the
    syntax |[:|class|:]|, where class is one of the following classes
    defined in the posix standard:

              alnum   alpha   ascii   blank   cntrl   digit   graph   lower
              print   punct   space   upper   word    xdigit
    A character class matches any character belonging to that class.
    The |word| character class matches letters, digits, and the
    character ‘_’.

    Within ‘[’ and ‘]’, an equivalence class can be specified using
    the syntax |[=|c|=]|, which matches all characters with the same
    collation weight (as defined by the current locale) as the
    character c.

    Within ‘[’ and ‘]’, the syntax |[.|symbol|.]| matches the
    collating symbol symbol.

If the |extglob| shell option is enabled using the |shopt| builtin, several extended pattern matching operators are recognized. In the following description, a pattern-list is a list of one or more patterns separated by a ‘|’. Composite patterns may be formed using one or more of the following sub-patterns:

|?(|pattern-list|)|
    Matches zero or one occurrence of the given patterns.
|*(|pattern-list|)|
    Matches zero or more occurrences of the given patterns.
|+(|pattern-list|)|
    Matches one or more occurrences of the given patterns.
|@(|pattern-list|)|
    Matches one of the given patterns.
|!(|pattern-list|)|
Matches anything except one of the given patterns.
------------------------------------------------------------------------
This does say a backslash should be used to escape spacial characters, and that ^ and ! are equivalent at the beginning of a character set.

It seems like there is some variation in GLOB syntax. :-)

Perhaps you should add support for a backslash escape and then simply document what SQLite does (instead of saying it supports standard Unix glob syntax, since there isn't a standard).

HTH
Dennis Cote



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to