On 09/14/2015 01:45 AM, Abilio Marques wrote: > Hi, > > > > I've been into this mailing list for a month now, and I think I've heard > FTS5 mentioned a couple of times. Back when I first saw it, I remember it > to be labeled with something close to beta or preliminary. > > > > Long story short, I've previously worked with a dedicated search engine > called Sphinx Search. One of the things people love about it, is it's > ability to be linked to Snowball (http://snowball.tartarus.org), which is a > project created by Dr. Martin Porter. This code includes stemmers in > several other languages (Spanish, French, Portuguese, Italian, German, > Dutch, Swedish, Norwegian, Danish, Russian, Finnish and even an improved > English version), which would be an upgrade over the present FTS5 condition: > > > > "The porter stemmer algorithm is designed for use with English language > terms only - using it with other languages may or may not improve search > utility." > > > > I'm thinking about a possible approach to get Snowball working with SQLite. > I believe an extension is the way to go, as Snowball is published under the > BSD license (and so I guess it cannot be mixed with public domain code). > > > > But I have no experience mixing BSD and public domain, so anyone with more > information can shed a light on that matter? > > > > Second, and the most important question for me is, can I consider FTS5 > stable enough to start working on the extension?
I think so. The custom tokenizer API changed just recently in order to support synonyms: http://www.sqlite.org/src/info/0b7e4ab8abde3ae3 but I don't expect it to change again. The updated API is described here: http://sqlite.org/draft/fts5.html#section_7_1 For example code, see the built-in tokenizers: http://www.sqlite.org/src/artifact/f380f46f341af9c9 Dan.