[sqlite] FTS and min token length

Pol-Online Thu, 30 Apr 2015 09:48:59 -0700

Hi,

It seems that the built-in tokenizers (or at least the unicode61 one) has no 
lower-limit regarding the number of characters in a token. For instance looking 
for records containing `t` will return the ones with sentences containing 
?don?t?.


Does this mean FTS is indexing all the ?I? and ?a? in English sentences as well 
as all single digit occurrences, or is there some higher level exclusion 
heuristic?

Is there any way to configure the tokenize to ignore token less than 2 
characters?  


-Pol

________________________________
Pol-Online
info at pol-online.net (mailto:info at pol-online.net)

[sqlite] FTS and min token length

Reply via email to