Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory

Koji Sekiguchi Mon, 05 Jul 2010 07:31:03 -0700

In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory must 
be used with MappingCharFilterFactory. But, when I use these tokenizer and 
filter together, I get a sever error saying that the filed type containing 
these filter and tokenizer is unknown. However, when I use this filter with 
StandardTokenizerFactory  or WhitespaceTokenizerFactory!

The wiki is not correct today. Before Lucene 2.9 (and Solr 1.4),
Tokenizers can take Reader argument in constructor. But after that,
because they can take CharStream argument in constructor,
*CharStreamAware* Tokenizers are no longer needed (all Tokenizers
are aware of CharStream). I'll update the wiki.

Koji

--
http://www.rondhuit.com/en/

Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory

Reply via email to