stocki, Solr's Analysis page will tell you what's happening. I can't tell by just looking, though I would first try removing the CommonGramsFF and see if repetition is still happening.
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: stockii <st...@shopgate.com> > To: solr-user@lucene.apache.org > Sent: Thu, June 24, 2010 10:04:55 AM > Subject: underscore, comma in terms.prefix > > Hello. this is my filterchain for suggestion with > termsComponent: <fieldType name="textgen" class="solr.TextField" > positionIncrementGap="100"> <analyzer > type="index"> <tokenizer > class="solr.WhitespaceTokenizerFactory"/> > <filter > class="solr.PatternReplaceFilterFactory" > pattern="([,_])" replacement=" " replace="all" > /> > <filter class="solr.CommonGramsFilterFactory" > words="stopwords.txt" ignoreCase="true"/> > <filter > class="solr.StandardFilterFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="0" catenateWords="0" > splitOnCaseChange="1" splitOnNumerics="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ShingleFilterFactory" > maxShingleSize="3" outputUnigrams="true" /> > <filter > class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> <analyzer > type="query"> <tokenizer > class="solr.WhitespaceTokenizerFactory"/> > <!-- Ein und > Mehrzahl, ü == ue und ue == ü --> > <filter class="solr.SnowballPorterFilterFactory" language="German2" > /> <charFilter > class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/> > <filter > class="solr.CommonGramsFilterFactory" > words="stopwords.txt" ignoreCase="true"/> > <filter class="solr.StandardFilterFactory"/> > <filter > class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateAll="1" splitOnCaseChange="1"/> > <filter > class="solr.LowerCaseFilterFactory"/> > <!-- <filter class="solr.ShingleFilterFactory" > maxShingleSize="2" outputUnigrams="false"/> --> > <filter > class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> </fieldType> so my > question/problem is. - when i index with this settings i got a underscore > ("_") in my index. is comma replace with underscore ? - solr import this > strin: "Eiseimer COOL mit Greifer" into this -> "cool mit mit" when i > search for terms.prefix=cool why is mit twice ? sometimes ist cool twice in > my suggest .... any idea ?? ! =) thx -- View this > message in context: > href="http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html" > > target=_blank > >http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html Sent > from the Solr - User mailing list archive at Nabble.com.