On 02/22/2016 10:27 AM, Ephraim Stevens wrote: > Greetings All, > > I use the Full Text Search facilities extensively. The feature is intuitive > and works beautifully and super fast even with 100GB+ databases. I must > applaud the SQLITE dev team on such amazing work. > > I have two questions with regards to FTS. > > 1) I'm going out on a limb here: > > As some of you may know, there is a undocumented, and not officially > supported feature/bug where you can specify your own delimiters using the > "simple" tokenizer: > > > http://article.gmane.org/gmane.comp.db.sqlite.general/74199 > > > > for example to create an table where tokens are split by an exclamation > point and carat, one would execute: > > create virtual table TEST using FTS4(title, body, tokenize=simple "" "!^"); > > Note the first argument is ignored, the second argument is where the > delimiters should be specified in quotes. > > This is very, very convenient and I've used it extensively without issue (I > hope the feature never gets removed). > > The question here is, how can one specify non printable characters within > the sqlite3 command shell. Often, delimiters are non-printable characters > such as NUL (0x00) or SOH (0x01) or even the newline character. If I > wanted to specify either one of those in my delimiter string parameter > above, how could it successfully be done? > >
0x00 is likely impossible, but for other ASCII range characters you can probably just insert the value into the SQL string, printable or not. > > > > 2) The colon ':' is used in the FTS engine as a column specifier: > > select * from FTS_TABLE where COL:foo MATCH "test123". This means show me > any rows that have test123 in the COL column only (and not any other > column). > > Is there a way to adjust/change this column delimiter during table setup or > is this a hard-coded value? I ask because sometimes, I need to be able to > perform searches with the colon character as part of a string. For > example, when searching on NETWORK MAC addresses. At the same time, I don't > want to lose the ability to utilize column specifiers. I think quoted strings are passed to the tokenizer without interpretation. So if your tokenizer considers ":" to be a token character, you might be able to do: ... MATCH '"01:23:45:67:89:AB"' ... or similar. Dan.