On Friday, 6 Jan 2017 3:48 PM -0500, Keith Medcalf wrote: > > On Friday, 6 January, 2017 12:49, James K. Lowden <jklow...@schemamania.org> > wrote: > >> On Fri, 6 Jan 2017 10:23:06 +1100 >> "dandl" <da...@andl.org> wrote: >> >> > Unix globbing for Linux is defined here: >> > http://man7.org/linux/man-pages/man7/glob.7.html. AFAICT Sqlite does >> > not implement this behaviour. >> >> A quick scan of SQLite sources shows only references to the glob >> function, no implementation. In func.c, we find >> >> LIKEFUNC(glob, 2, &globInfo, SQLITE_FUNC_LIKE|SQLITE_FUNC_CASE), >> >> It looks to me like SQLite imports glob(3) as its default >> implementation. Have you an example for which a glob pattern behaves >> differently in SQLite versus C? >> >> (For those following along at home, bear in mind that glob(3) need not >> necessarily be what your favorite shell uses.) >> >> If indeed SQLite is using the glob function from libc, ISTM it's >> perfectly sufficient to refer to glob(7) for syntax, since that's the >> documentation for the controlling implementation. > > SQLite does not use the glob function from the standard library -- the > function is defined in func.c > > Both "glob" and "like" call the same function, likeFunc with different sets > of user_data. likeFunc does a bunch of validation then calls patternCompare > which actually implements the like and glob functionality. How like and glob > work are documented in the preface to patternCompare. > > like implements the standard sql like using % (0 or more) and _ (exactly 1 > char) as wildcard matches. > > glob implements unix globbing using * (0 or more) and ? (exactly 1) as > wildcard matches. "sets" of characters are indicated by squockets (square > brackets -- []). Different from the standard unix glob however, it uses ^ to > invert the sense of a set rather than an !. Since it is unicode, a character > is [\u0000-\u10FFFF]. [^1-7] is equivalent to a match of any of the > remaining unicode characters. > > thus in unix/linux one may pronounce "match anything where one character is > not the digits 1 through 7" as *[!1-7]* > one would pronounce the same request to SQLite as *[^1-7]* > > This of course would match any string that was not composed entirely of only > the characters 1 through 7 (not that there are no characters 1 through 7 in > the string) -- and must be at least 1 character long. > > If one wanted to match strings that contained a 1 through 7 anywhere within, > then one would pronounce *[1-7]* on both unix/linux and to SQLite > > Were one to want a glob that excluded all strings that contained the digits 1 > though 7 anywhere within, then one would pronounce, in SQLite, WHERE NOT x > GLOB '*[1-7]*' -- though this would also now match 0 length strings. > > There is no way to "invert" the match-sense of a glob pattern within the > pattern itself. That is, one cannot use '^*[1-7]*' as an equivalent to the > above inversion of the results of a positive match. GLOB patterns only > search for a positive match, not an exclusion. The [^stuf] excludes the > characters or range provided from the characters matched by a ? -- [^stuf] is > not an exclusion of the characters stuf but rather a match for any of the > other unicode characters except stuf -- in other words a "somewhat limited ?".
I dug up some old references to investigate this further: a) The UNIX C Shell Field Guide (1986): Ranges in the pattern [lower-upper] mentioned, no mention of negation of pattern. (Presumably even a pattern like "[1-9xyz]" wouldn't be valid either, though this is not explicit.) b) UNIX in a Nutshell (1992): Negation operator mentioned for Bourne shell in the form [!abc...], but not for C shell. c) Learning the bash Shell (1995): Negations of the form [!abc...] mentioned. What I take away from this is that relying on [^1-9] to mean the same thing as it would in a regular expression is non-portable. If this gets documented, then I think there should be a warning to this effect. As this thread has exhibited, supporting it can even be seen a misfeature, as it encourages confusion between glob patterns and regular expressions. -- Will _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users