Hello all,
I would like to know how are diacritics handled in FTS, specifically if I
can index text with diacritics and search for terms without them.

For example, given the queries

CREATE VIRTUAL TABLE fts_pages USING fts4(tokenize=snowball ro_RO);
 INSERT INTO fts_pages (docid,content) VALUES (1, 'România este o ţară
frumoasă');

the search
SELECT COUNT(1) FROM fts_pages WHERE content MATCH 'este'
returns 1,

but the next search
SELECT COUNT(1) FROM fts_pages WHERE content MATCH 'Romania'
returns 0.

The tokenizer I'm using is based on snowball and can be found at
https://bitbucket.org/sevkin/snowball_fts3

Thank you,
George.

PS: Other FTS engines (e.g. DTSearch/Sphinx) handle this: you can index
text with diacritics and search with or without them.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to