Alright, thanks for all your help. I finally fix this problem using PatternReplaceFilterFactory + WordDelimeterfilterFactory.
I first replace _ (underscore) using PatternReplaceFilterFactory and then using WordDelimeterFilterFactory to generate word and number part to increase user search hit. Although this decrease search quality a little, but user need higher recall rate than precision. Thank you all. Floyd 2013/8/22 Floyd Wu <floyd...@gmail.com> > After trying some search case and different params combination of > WordDelimeter. I wonder what is the best strategy to index string > "2DA012_ISO MARK 2" and can be search by term "2DA012"? > > What if I just want _ to be removed both query/index time, what and how to > configure? > > Floyd > > > > 2013/8/22 Floyd Wu <floyd...@gmail.com> > >> Thank you all. >> By the way, Jack I gonna by your book. Where to buy? >> Floyd >> >> >> 2013/8/22 Jack Krupansky <j...@basetechnology.com> >> >>> "I thought that the StandardTokenizer always split on punctuation, " >>> >>> Proving that you haven't read my book! The section on the standard >>> tokenizer details the rules that the tokenizer uses (in addition to >>> extensive examples.) That's what I mean by "deep dive." >>> >>> -- Jack Krupansky >>> >>> -----Original Message----- From: Shawn Heisey >>> Sent: Wednesday, August 21, 2013 10:41 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: How to avoid underscore sign indexing problem? >>> >>> >>> On 8/21/2013 7:54 PM, Floyd Wu wrote: >>> >>>> When using StandardAnalyzer to tokenize string "Pacific_Rim" will get >>>> >>>> ST >>>> textraw_**bytesstartendtypeposition >>>> pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011<ALPHANUM>1 >>>> >>>> How to make this string to be tokenized to these two tokens "Pacific", >>>> "Rim"? >>>> Set _ as stopword? >>>> Please kindly help on this. >>>> Many thanks. >>>> >>> >>> Interesting. I thought that the StandardTokenizer always split on >>> punctuation, but apparently that's not the case for the underscore >>> character. >>> >>> You can always use the WordDelimeterFilter after the StandardTokenizer. >>> >>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.** >>> WordDelimiterFilterFactory<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory> >>> >>> Thanks, >>> Shawn >>> >> >> >