Hello Dan, yes, I thought of that. But wouldn't this break the snippet's function? If the tokenizer will return text without diacritics, wouldn't the snippet return the same?
Thanks, George. 2012/2/8 Dan Kennedy <danielk1...@gmail.com> > On 02/08/2012 11:34 PM, George Ionescu wrote: > >> Hello all, >> I would like to know how are diacritics handled in FTS, specifically if I >> can index text with diacritics and search for terms without them. >> >> For example, given the queries >> >> CREATE VIRTUAL TABLE fts_pages USING fts4(tokenize=snowball ro_RO); >> INSERT INTO fts_pages (docid,content) VALUES (1, 'România este o ţară >> frumoasă'); >> >> the search >> SELECT COUNT(1) FROM fts_pages WHERE content MATCH 'este' >> returns 1, >> >> but the next search >> SELECT COUNT(1) FROM fts_pages WHERE content MATCH 'Romania' >> returns 0. >> >> The tokenizer I'm using is based on snowball and can be found at >> https://bitbucket.org/sevkin/**snowball_fts3<https://bitbucket.org/sevkin/snowball_fts3> >> > > The custom tokenizer needs to normalize the tokens. So when it > parses "România" it should return "romania" (with no diacritic) > to FTS. Then when you query for "romania", it will match. > > Note that the custom tokenizer is also used to tokenize queries > as well as documents. So if I query for "România", the tokenizer > will normalize the query term to "romania" as well - which will > match the normalized entry in the index. > > ______________________________**_________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users<http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users> > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users