Hi,

It seems that the built-in tokenizers (or at least the unicode61 one) has no 
lower-limit regarding the number of characters in a token. For instance looking 
for records containing `t` will return the ones with sentences containing 
?don?t?.

Does this mean FTS is indexing all the ?I? and ?a? in English sentences as well 
as all single digit occurrences, or is there some higher level exclusion 
heuristic?

Is there any way to configure the tokenize to ignore token less than 2 
characters?  


-Pol

________________________________
Pol-Online
info at pol-online.net (mailto:info at pol-online.net)

Reply via email to