Re: Good example of multiple tokenizers for a single field

Shawn Heisey Mon, 29 Nov 2010 23:07:52 -0800

On 11/29/2010 3:15 PM, Jacob Elder wrote:

I am looking for a clear example of using more than one tokenizer for a
source single field. My application has a single "body" field which until
recently was all latin characters, but we're now encountering both English
and Japanese words in a single message. Obviously, we need to be using CJK
in addition to WhitespaceTokenizerFactory.

What I'd like to see is a CJK filter that runs after tokenization(whitespace in my case) and doesn't do anything but handle the CJKcharacters. If there are no CJK characters in the token, it should donothing at all. The CJK tokenizer does a whole host of other thingsthat I want to handle myself.


Shawn

Re: Good example of multiple tokenizers for a single field

Reply via email to