Re: [sqlite] Disambiguation of Latin accent characters for FTS3

Dami Laurent (PJ) Thu, 30 Sep 2010 09:30:16 -0700

Hi Travis,

You need to define a "tokenizer" to be used by FTS3; something somehow similar 
to user-defined collating sequences.
See  http://www.sqlite.org/fts3.html#section_5_1


The ICU library has language-specific library functions for  ignoring accents 
while tokenizing.
 
The Perl binding for SQLite has a general-purpose "unaccent" tokenizer, see
http://search.cpan.org/dist/DBD-SQLite/lib/DBD/SQLite.pm#FULLTEXT_SEARCH 
and
http://search.cpan.org/~dami/Search-Tokenizer-1.00/lib/Search/Tokenizer.pm 

Or you can write your own tokenizer in C ...

Best regards,
Laurent Dami


>-----Message d'origine-----
>De : [email protected] [mailto:sqlite-users-
>[email protected]] De la part de Travis Orr
>Envoyé : jeudi, 30. septembre 2010 17:36
>À : [email protected]
>Objet : [sqlite] Disambiguation of Latin accent characters for FTS3
>
>I know it is possible but can't figure out what needs to be done to be
>able to make FTS3 see E as being equal to É. And other similar cases.
>
>
>
>I have a custom collation sequence that does this disambiguation for
>sorting query results, but it doesn't appear to be functioning when
>performing FTS3 queries.
>
>
>
>Thanks
>
>
>
>_______________________________________________
>sqlite-users mailing list
>[email protected]
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Disambiguation of Latin accent characters for FTS3

Reply via email to