Hello !

I'm looking in the documentation and it doesn't seem to mention any option to specify a minimum number of characters to index, looking at some fts5 tables it seems that an option to limit the minimum number of characters to at least 2 or 3 would be a good shot as stopwords, another interest option would be a regex like black/white list of sequence of characters to be indexed.

Something like:

create virtual table if not exists pdfs_fts using fts5(pdf_name UNINDEXED, data,

    tokenize = 'unicode61 remove_diacritics 1 min_word_size 3 word_black_list [\d\.\d\d\w \a\d\d\d] word_white_list [\(\d+\) \d\d\.\d\d\d\.\d\d\a]');

The idea is to allow/disallow some specific domain sequences to be included/excluded from indexing.

Any idea on how to obtain that ?

Cheers !

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to