right, prior to 3.6, the standard way to handle wildcards was to,
essentially, pre-analyze the terms that had  wildcards. This works
fine for simple filters, things like lowercasing for instance, but
doesn't work so well for things like stemming.

So you're doing what can be done at this point, but moving to 4.x (or
even 3.6) would solve it better.

Best,
Erick

On Thu, Oct 2, 2014 at 6:29 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 10/2/2014 4:33 AM, waynemailinglist wrote:
>> Something that is still not clear in my mind is how this tokenising works.
>> For example with the filters I have when I run the analyser I get:
>> Field: Hello You
>>
>> Hello|You
>> Hello|You
>> Hello|You
>> hello|you
>> hello|you
>>
>>
>> Does this mean that the index is stored as 'hello|you' (the final one) and
>> that when I run a query and it goes through the filters whatever the end
>> result of that is must match the 'hello|you' in order to return a result?
>
> The index has two terms for this field if this is the whole input --
> hello and you -- which can be searched for individually.  The tokenizer
> does the initial job of separating the input into tokens (terms) ...
> some filters can create additional terms, depending on exactly what's
> left when the tokenizer is done.
>
> Thanks,
> Shawn
>

Reply via email to