Alexey Pechnikov wrote:
> Hello!
>
> В сообщении от Saturday 26 July 2008 21:37:19 Stephen Woodbridge написал(а):
>> I have thought a lot about these issues and would appreciate any
>> thoughts or ideas on how to implement any of these concepts or others
>> for fuzzy searching and matching.
>
> I'm know that ispell, myspell, hunspell and trigrams are used in PostgreSQL
> FTS. A lot of languages are supported this. And soundex function useful for
> morphology search if to write word by latin alphabet (transliteration by
> replace each symbol of national alphabet by one or more latin):
>
> sqlite> select soundex('Moskva');
> M210
> sqlite> select soundex('Moscva');
> M210
> sqlite> select soundex('Mouscva');
> M210
> sqlite> select soundex('Mouskva');
> M210
> sqlite> select soundex('moskva');
> M210
>
> Note: compile SQLite with -DSQLITE_SOUNDEX=1
>
> There is stemming in Apache Lucene, Sphinx (included morphology by soundex)
> and Xapian too.
>
> Are these futures planned to be in SQLIte FTS?
Well, I will leave the question of plans to Scott Hess the FTS developer
to answer.
I just read a bunch of the FTS overview documents for Postgresql, which
I use a lot for other projects and I like the way they have things
broken down and integrated with the database. I haven't tried 8.3 yet,
but it is nice to see the FTS is now part of the main distribution.
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-history.html
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-basic.html
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-dict.html
I think you can add dictionaries as stemmers the same way you would add
a stemmer to SQLlite. Look at the code in the SQLite source tree:
ext/fts3/fts3_porter.c
ext/fts3/fts3_tokenizer.[ch]
ext/fts3/fts3_tokenizer1.c
As far as other lexemes, there is nothing stopping you from creating
your FTS table with additional lexeme columns that you can populate with
the appropriate lexemes from the full text column. Of course, you have
to generate the lexemes yourself and add them as the text for that column.
For example if you wanted to have a soundex column, you could preprocess
you document through the simple or porter stemmer and then take each of
the tokens and generate the soundex key for them and concatenate them
all with a separating space and then use that as the contents for the
soundex lexeme column. Then to do a query, you would tokenize the
incoming words, generate the soundex keys and to an FTS search on that
column. It would obviously be nicer if this was built into the existing
FTS engine, but you could do it today with some additional programming.
As I said before, I will leave questions of planning for FTS up to
Scott. I have read through his fts3 code, but I confess I do not
understand how it all works, but a relatively small amount of code it
works impressively well.
All the best,
-Steve
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users