Hi Alexandre, CombiningFilter sounds close (no option to put spaces between original terms), but hasn't yet been committed: <https://issues.apache.org/jira/browse/LUCENE-3413>.
Steve On Jan 8, 2013, at 4:55 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Hello, > > I want to take a composite email address such as "John Doe < > john...@example.com>" and leave "John Doe" as a facet field. > > So far, I got UAX29.... Tokenizer combined with TypeTokenFilterFactory to > filter out email type. > > But that leaves with "John" and "Doe" as tokens which I cannot figure out > how to combine back with extra space to make it back into John Doe. > > I thought about using regexp instead to just string <....>, but that feels > even less robust. > > Do we have anything ready to use for that or do I need to custom code? > > Regards, > Alex. > > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)