Hello Dan,

yes, I thought of that. But wouldn't this break the snippet's function?
If the tokenizer will return text without diacritics, wouldn't the snippet
return the same?

Thanks,
George.

2012/2/8 Dan Kennedy <danielk1...@gmail.com>

> On 02/08/2012 11:34 PM, George Ionescu wrote:
>
>> Hello all,
>> I would like to know how are diacritics handled in FTS, specifically if I
>> can index text with diacritics and search for terms without them.
>>
>> For example, given the queries
>>
>> CREATE VIRTUAL TABLE fts_pages USING fts4(tokenize=snowball ro_RO);
>>  INSERT INTO fts_pages (docid,content) VALUES (1, 'România este o ţară
>> frumoasă');
>>
>> the search
>> SELECT COUNT(1) FROM fts_pages WHERE content MATCH 'este'
>> returns 1,
>>
>> but the next search
>> SELECT COUNT(1) FROM fts_pages WHERE content MATCH 'Romania'
>> returns 0.
>>
>> The tokenizer I'm using is based on snowball and can be found at
>> https://bitbucket.org/sevkin/**snowball_fts3<https://bitbucket.org/sevkin/snowball_fts3>
>>
>
> The custom tokenizer needs to normalize the tokens. So when it
> parses "România" it should return "romania" (with no diacritic)
> to FTS. Then when you query for "romania", it will match.
>
> Note that the custom tokenizer is also used to tokenize queries
> as well as documents. So if I query for "România", the tokenizer
> will normalize the query term to "romania" as well - which will
> match the normalized entry in the index.
>
> ______________________________**_________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users<http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users>
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to