With fts4 you could search for matching terms in an fts4aux table, then use those to construct a query against the original table. You'd have a full scan of the fts index, but you'd not have to do a full table scan of the primary data. Unfortunately if there were a large number of hits in the index scan, then it would be cheaper to just do the full table scan and skip the index scan. I don't know if there's a similar thing for fts5 at this time.
This wouldn't be as efficient as something more suited to substring matches (an N-gram index, maybe?), but I haven't heard anyone talking about writing a virtual table to do that. -scott On Fri, Jan 8, 2016 at 11:54 AM, Charles Leifer <coleifer at gmail.com> wrote: > You can create a custom tokenizer as well then use the standard search > APIs. I imagine that functionality would work well in this case: > https://sqlite.org/fts5.html#section_7 > > On Thu, Jan 7, 2016 at 3:59 PM, Stadin, Benjamin < > Benjamin.Stadin at heidelberg-mobil.com> wrote: > > > One such algorithm would be a (generalized) Ukkonnen suffix tree ( > > https://en.m.wikipedia.org/wiki/Ukkonen%27s_algorithm). > > It allows you to search efficiently for substrings. > > It would be possible to do some match weigthing based on match distance > > within words. But a general solution for a database is probably not > trivial > > to implement. > > > > Ben > > > > Von meinem iPad gesendet > > > > > Am 07.01.2016 um 21:46 schrieb Matthias-Christian Ott <ott at mirix.org>: > > > > > >> On 2016-01-07 19:31, Mario M. Westphal wrote: > > >> I hence wonder if this problem has been tackled already and if there > is > > a > > >> "standard" solution. > > > > > > If I understand you correctly, it seems that you are looking for a > > > compound splitting or decompounding algorithm. Unfortunately there is > > > not a "standard solution" for this. There are many languages in the > > > world and for some usable compound splitting algorithms exist. There > are > > > also attempts to create statistical universal algorithms. > > > > > > As you said, for English a simple sub-string search might suffice but > > > for other languages it more complex. I assume that you speak German. If > > > you have a document that contains the term "Verkehrsleitsystem" and > your > > > search query is "Verkehr leiten", it's reasonable to assume that the > > > document is relevant to the search query. Unfortunately a sub-string > > > search could not find the document. Other languages are even more > > > difficult (a textbook on linguistics will explain this better than I > > can). > > > > > > Even if you have such algorithm, it's not trivial to score the results > > > and there are more aspects to consider to create a simple search > > > algorithm. For example, in English you will also have to do some > > > analysis of the phrase structure to identify open compounds. > > > > > > Perhaps it helps to mention the languages you are interested in and the > > > application you have in mind to evaluate whether the SQLite FTS5 could > > > meet your requirements. > > > _______________________________________________ > > > sqlite-users mailing list > > > sqlite-users at mailinglists.sqlite.org > > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > _______________________________________________ > > sqlite-users mailing list > > sqlite-users at mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > _______________________________________________ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >