Re: underscore, comma in terms.prefix

Otis Gospodnetic Thu, 24 Jun 2010 07:27:46 -0700

stocki,

Solr's Analysis page will tell you what's happening.  I can't tell by just 
looking, though I would first try removing the CommonGramsFF and see if 
repetition is still happening.


 

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: stockii <st...@shopgate.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, June 24, 2010 10:04:55 AM
> Subject: underscore, comma in terms.prefix
> 
> 
Hello.

this is my filterchain for suggestion with 
> termsComponent:

<fieldType name="textgen" class="solr.TextField" 
> positionIncrementGap="100">
      <analyzer 
> type="index">
        <tokenizer 
> class="solr.WhitespaceTokenizerFactory"/>
    
>     
        <filter 
> class="solr.PatternReplaceFilterFactory"
          
>       pattern="([,_])" replacement=" " replace="all" 
> />
        
        
> <filter class="solr.CommonGramsFilterFactory" 
> words="stopwords.txt"
ignoreCase="true"/>
    
>     <filter 
> class="solr.StandardFilterFactory"/>
        
> <filter class="solr.WordDelimiterFilterFactory" 
> generateWordParts="1"
generateNumberParts="0" catenateWords="0" 
> splitOnCaseChange="1"
splitOnNumerics="0"/>
        
> <filter class="solr.LowerCaseFilterFactory"/>
    
>     <filter class="solr.ShingleFilterFactory" 
> maxShingleSize="3"
outputUnigrams="true" />
    
>     <filter 
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
      
> </analyzer>
      <analyzer 
> type="query">
        <tokenizer 
> class="solr.WhitespaceTokenizerFactory"/>
    
>     
        <!-- Ein und 
> Mehrzahl, ü == ue und ue == ü -->
        
> <filter class="solr.SnowballPorterFilterFactory" language="German2" 
> />
        <charFilter 
> class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
    
>     
        <filter 
> class="solr.CommonGramsFilterFactory" 
> words="stopwords.txt"
ignoreCase="true"/>
    
>     <filter class="solr.StandardFilterFactory"/>
  
>       <filter 
> class="solr.WordDelimiterFilterFactory"
generateWordParts="1" 
> generateNumberParts="1" catenateAll="1"
splitOnCaseChange="1"/>
  
>       <filter 
> class="solr.LowerCaseFilterFactory"/>
    
>     <!-- <filter class="solr.ShingleFilterFactory" 
> maxShingleSize="2"
outputUnigrams="false"/> -->
    
>     <filter 
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
      
> </analyzer>
    </fieldType>


so my 
> question/problem is.

- when i index with this settings i got a underscore 
> ("_") in my index. is
comma replace with underscore ? 
- solr import this 
> strin: "Eiseimer COOL mit Greifer" into this -> "cool mit
mit" when i 
> search for terms.prefix=cool
why is mit twice ? sometimes ist cool twice in 
> my suggest ....

any idea ?? ! =) thx



-- 
View this 
> message in context: 
> href="http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html";
>  
> target=_blank 
> >http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html
Sent 
> from the Solr - User mailing list archive at Nabble.com.

Re: underscore, comma in terms.prefix

Reply via email to