In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory must be used with MappingCharFilterFactory. But, when I use these tokenizer and filter together, I get a sever error saying that the filed type containing these filter and tokenizer is unknown. However, when I use this filter with StandardTokenizerFactory or WhitespaceTokenizerFactory!
The wiki is not correct today. Before Lucene 2.9 (and Solr 1.4), Tokenizers can take Reader argument in constructor. But after that, because they can take CharStream argument in constructor, *CharStreamAware* Tokenizers are no longer needed (all Tokenizers are aware of CharStream). I'll update the wiki. Koji -- http://www.rondhuit.com/en/