The Char-filters MUST come before the Tokenizer, due to their nature of processing the character-stream and not the tokens.
If you need to apply the accent normalizatino later in the analysis chain, either use ISOLatin1AccentFilterFactory or help with the implementation of SOLR-1978. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 5. juli 2010, at 17.32, Saïd Radhouani wrote: > Thanks Koji for the reply and for updating wiki. As it's written now in wiki, > it sounds (at least to me) like MappingCharFilterFactory works only with > WhitespaceTokenizerFactory. > > Did you really mean that? Because this filter works also with other > tkenizers. For instance, in my text type, I'm using StandardTokenizerFactory > for document processing, and WhitespaceTokenizerFactory for query processing. > > I also noticed that, in whatever order you put this filter in the definition > of a field type, it's always applied (during text processing) before the > tokenizer and all the other filters. Is there a reason for that? Is there a > possibility to force the filter to be applied at a certain order among the > other filters? > > Thanks, > -S > > On Jul 5, 2010, at 4:28 PM, Koji Sekiguchi wrote: > >> >>> In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory >>> must be used with MappingCharFilterFactory. But, when I use these tokenizer >>> and filter together, I get a sever error saying that the filed type >>> containing these filter and tokenizer is unknown. However, when I use this >>> filter with StandardTokenizerFactory or WhitespaceTokenizerFactory! >>> >>> >> The wiki is not correct today. Before Lucene 2.9 (and Solr 1.4), >> Tokenizers can take Reader argument in constructor. But after that, >> because they can take CharStream argument in constructor, >> *CharStreamAware* Tokenizers are no longer needed (all Tokenizers >> are aware of CharStream). I'll update the wiki. >> >> Koji >> >> -- >> http://www.rondhuit.com/en/ >> >