On 10/2/2014 4:33 AM, waynemailinglist wrote:
> Something that is still not clear in my mind is how this tokenising works.
> For example with the filters I have when I run the analyser I get:
> Field: Hello You
> 
> Hello|You
> Hello|You
> Hello|You
> hello|you
> hello|you
> 
> 
> Does this mean that the index is stored as 'hello|you' (the final one) and
> that when I run a query and it goes through the filters whatever the end
> result of that is must match the 'hello|you' in order to return a result?

The index has two terms for this field if this is the whole input --
hello and you -- which can be searched for individually.  The tokenizer
does the initial job of separating the input into tokens (terms) ...
some filters can create additional terms, depending on exactly what's
left when the tokenizer is done.

Thanks,
Shawn

Reply via email to