On 09/14/2015 01:45 AM, Abilio Marques wrote:
> Hi,
>
>
>
> I've been into this mailing list for a month now, and I think I've heard
> FTS5 mentioned a couple of times. Back when I first saw it, I remember it
> to be labeled with something close to beta or preliminary.
>
>
>
> Long story short, I've previously worked with a dedicated search engine
> called Sphinx Search. One of the things people love about it, is it's
> ability to be linked to Snowball (http://snowball.tartarus.org), which is a
> project created by Dr. Martin Porter. This code includes stemmers in
> several other languages (Spanish, French, Portuguese, Italian, German,
> Dutch, Swedish, Norwegian, Danish, Russian, Finnish and even an improved
> English version), which would be an upgrade over the present FTS5 condition:
>
>
>
> "The porter stemmer algorithm is designed for use with English language
> terms only - using it with other languages may or may not improve search
> utility."
>
>
>
> I'm thinking about a possible approach to get Snowball working with SQLite.
> I believe an extension is the way to go, as Snowball is published under the
> BSD license (and so I guess it cannot be mixed with public domain code).
>
>
>
> But I have no experience mixing BSD and public domain, so anyone with more
> information can shed a light on that matter?
>
>
>
> Second, and the most important question for me is, can I consider FTS5
> stable enough to start working on the extension?

I think so.

The custom tokenizer API changed just recently in order to support synonyms:

   http://www.sqlite.org/src/info/0b7e4ab8abde3ae3

but I don't expect it to change again. The updated API is described here:

   http://sqlite.org/draft/fts5.html#section_7_1

For example code, see the built-in tokenizers:

   http://www.sqlite.org/src/artifact/f380f46f341af9c9

Dan.



Reply via email to