Thanks for the reply, Dan!

> On Dec 12, 2018, at 7:08 AM, Dan Kennedy <danielk1...@gmail.com> wrote:
> 
> Leaving stop words in while parsing queries won't quite work anyway. If your 
> tokenizer returns "the" when parsing a query, FTS3/4 will search for "the" in 
> the index. And it won't be there if the tokenizer used for parsing documents 
> stripped it out.

I was only talking about leaving them in when followed immediately by a “*” — 
so it would preserve “the*” but not “the”. Then FTS4 will interpret “the*” as a 
prefix match, not the word “the”.

> I think your best options might be to switch to FTS5

I haven’t looked into how hard it would be to switch to FTS5. I recall that 
when I started writing this code a few years ago, FTS5 had some issues or 
limitations that led me to use FTS4 instead.

Also, there are by now many databases out in the field that have FTS4 
tables/indexes in them. If I switch to FTS5 will those be upgraded, or do I 
need to do so manually?

>  or to write a tokenizer smart enough to remove the AND or other syntax 
> tokens when required.

Not sure what you mean by this — the “when required” part is the sticking 
point, which is the reason I posted.

—Jens
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to