On Friday,  6 Jan 2017  3:48 PM -0500, Keith Medcalf wrote:
>
> On Friday, 6 January, 2017 12:49, James K. Lowden <jklow...@schemamania.org> 
> wrote:
>
>> On Fri, 6 Jan 2017 10:23:06 +1100
>> "dandl" <da...@andl.org> wrote:
>> 
>> > Unix globbing for Linux is defined here:
>> > http://man7.org/linux/man-pages/man7/glob.7.html. AFAICT Sqlite does
>> > not implement this behaviour.
>> 
>> A quick scan of SQLite sources shows only references to the glob
>> function, no implementation.  In func.c, we find
>> 
>>     LIKEFUNC(glob, 2, &globInfo, SQLITE_FUNC_LIKE|SQLITE_FUNC_CASE),
>> 
>> It looks to me like SQLite imports glob(3) as its default
>> implementation.  Have you an example for which a glob pattern behaves
>> differently in SQLite versus C?
>> 
>> (For those following along at home, bear in mind that glob(3) need not
>> necessarily be what your favorite shell uses.)
>> 
>> If indeed SQLite is using the glob function from libc, ISTM it's
>> perfectly sufficient to refer to glob(7) for syntax, since that's the
>> documentation for the controlling implementation.
>
> SQLite does not use the glob function from the standard library -- the 
> function is defined in func.c
>
> Both "glob" and "like" call the same function, likeFunc with different sets 
> of user_data.  likeFunc does a bunch of validation then calls patternCompare 
> which actually implements the like and glob functionality.  How like and glob 
> work are documented in the preface to patternCompare.
>
> like implements the standard sql like using % (0 or more) and _ (exactly 1 
> char) as wildcard matches.
>
> glob implements unix globbing using * (0 or more) and ? (exactly 1) as 
> wildcard matches. "sets" of characters are indicated by  squockets (square 
> brackets -- []).  Different from the standard unix glob however, it uses ^ to 
> invert the sense of a set rather than an !.  Since it is unicode, a character 
> is [\u0000-\u10FFFF].  [^1-7] is equivalent to a match of any of the 
> remaining unicode characters.
>
> thus in unix/linux one may pronounce "match anything where one character is 
> not the digits 1 through 7" as *[!1-7]*
> one would pronounce the same request to SQLite as *[^1-7]*
>
> This of course would match any string that was not composed entirely of only 
> the characters 1 through 7 (not that there are no characters 1 through 7 in 
> the string) -- and must be at least 1 character long.
>
> If one wanted to match strings that contained a 1 through 7 anywhere within, 
> then one would pronounce *[1-7]* on both unix/linux and to SQLite
>
> Were one to want a glob that excluded all strings that contained the digits 1 
> though 7 anywhere within, then one would pronounce, in SQLite, WHERE NOT x 
> GLOB '*[1-7]*' -- though this would also now match 0 length strings.
>
> There is no way to "invert" the match-sense of a glob pattern within the 
> pattern itself.  That is, one cannot use '^*[1-7]*' as an equivalent to the 
> above inversion of the results of a positive match.  GLOB patterns only 
> search for a positive match, not an exclusion.  The [^stuf] excludes the 
> characters or range provided from the characters matched by a ? -- [^stuf] is 
> not an exclusion of the characters stuf but rather a match for any of the 
> other unicode characters except stuf -- in other words a "somewhat limited ?".


I dug up some old references to investigate this further:

a) The UNIX C Shell Field Guide (1986): Ranges in the pattern
   [lower-upper] mentioned, no mention of negation of pattern.
   (Presumably even a pattern like "[1-9xyz]" wouldn't be valid
   either, though this is not explicit.)

b) UNIX in a Nutshell (1992): Negation operator mentioned for
   Bourne shell in the form [!abc...], but not for C shell.

c) Learning the bash Shell (1995): Negations of the form [!abc...]
   mentioned.

What I take away from this is that relying on [^1-9] to mean the same
thing as it would in a regular expression is non-portable.  If this
gets documented, then I think there should be a warning to this
effect.

As this thread has exhibited, supporting it can even be seen a
misfeature, as it encourages confusion between glob patterns and
regular expressions.

-- 
Will

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to