You may want to use UAX29URLEmailTokenizerFactory tokenizer into your
analysis chain.

Thanks,
Susheel


On Thu, Sep 14, 2017 at 8:46 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 9/14/2017 5:06 AM, Mannott, Birgit wrote:
> > I have a problem when searching on email addresses.
> > @ seems to be handled as a special character but I don't find anything
> about it in the documentation.
> >
> > This is my test data
> > t...@one.com
> > t...@two.com
>
> Chances are that have analysis defined on this field, and that the
> analysis includes a tokenizer or tokenizer/filter combination that
> splits on punctuation.  This means that for the both entries, you have
> three terms.  For the first one, those terms are test, one, and com.
> For the second one, they are test,  two, and com.  The rest of what I'm
> writing assumes that this is the case.
>
> > searching for test* results both, ok.
>
> This matches the term "test" in both entries.
>
> > searching for t...@one.com results the correct one, ok.
>
> Query analysis probably splits the same way index analysis does, so the
> actual search is for all three terms.
>
> > searching for test results both, what I didn't expect but it's ok.
>
> In this case, it matches the simple term "test" that's in the index on
> both documents.
>
> > searching for test@one* results none and that's the problem.
>
> When you include wildcards in a query, most query analysis is skipped,
> so it's looking for the literal text "test@one" followed by any
> characters.  Because the index analysis removed the @ character and
> split the things around it into separate terms, this will not match any
> of the terms in the index.
>
> Wildcards, while they do work in many cases, are often not the correct
> way to do queries.
>
> Thanks,
> Shawn
>
>

Reply via email to