On 2016-01-07 19:31, Mario M. Westphal wrote: > I hence wonder if this problem has been tackled already and if there is a > "standard" solution.
If I understand you correctly, it seems that you are looking for a compound splitting or decompounding algorithm. Unfortunately there is not a "standard solution" for this. There are many languages in the world and for some usable compound splitting algorithms exist. There are also attempts to create statistical universal algorithms. As you said, for English a simple sub-string search might suffice but for other languages it more complex. I assume that you speak German. If you have a document that contains the term "Verkehrsleitsystem" and your search query is "Verkehr leiten", it's reasonable to assume that the document is relevant to the search query. Unfortunately a sub-string search could not find the document. Other languages are even more difficult (a textbook on linguistics will explain this better than I can). Even if you have such algorithm, it's not trivial to score the results and there are more aspects to consider to create a simple search algorithm. For example, in English you will also have to do some analysis of the phrase structure to identify open compounds. Perhaps it helps to mention the languages you are interested in and the application you have in mind to evaluate whether the SQLite FTS5 could meet your requirements.