Hello all,
Another question: in my application I have a FTS4 virtual table that I am
writing a custom tokenizer for. I can't find any documentation on what
symbols are "safe" to include in the index, and which ones are not because
they have special meaning in an FTS query.
For example: if my tokenizer does not do case-folding, it's not safe for me
to store "AND" or "OR" directly as tokens, because they will become
un-queryable. Instead I'll need to in some way escape them ("\AND"?) both
when indexing and when querying so that I bypass the query parser.
Similarly I believe that the symbols "-", "!", "*", "(", ")" and the
double-quote itself are not safe for me to return from my custom tokenizer
without some form of escaping.
Is there a best practice here? My best-case scenario would be some way to
do a "raw" query on an FTS table where all symbols are interpreted as text
queries, and bypass the entire FTS "mini-language". I'm tempted to add a
custom new operator, e.g. RAWMATCH, to make this possible, but that seems
like a pretty heavy-weight solution.
Thanks,
Xavier Snelgrove
Cofounder & CTO, Whirlscape Inc.
http://whirlscape.com
xavier at whirlscape.com