Re: ngramfilter minGramSize problem

Furkan KAMACI Sun, 06 Apr 2014 15:37:52 -0700

Correction: My patch is at SOLR-5152
7 Nis 2014 01:05 tarihinde "Andreas Owen" <ao...@swissonline.ch> yazdı:


> i thought i cound use <filter class="solr.LengthFilterFactory" min="1"
> max="2"/> to index and search words that are only 1 or 2 chars long. it
> seems to work but i have to test it some more
>
>
> On Sun, 06 Apr 2014 22:24:20 +0200, Andreas Owen <ao...@swissonline.ch>
> wrote:
>
>  i have the a fieldtype that uses ngramfilter whle indexing. is there a
>> setting that can force the ngramfilter to index smaller words then the
>> minGramSize? Mine is set to 3 and the search wont find word that are only 1
>> or 2 chars long. i would like to not set minGramSize=1 because the results
>> would be to diverse.
>>
>> fieldtype:
>>
>> <fieldType name="text_de" class="solr.TextField"
>> positionIncrementGap="100">
>>        <analyzer type="index">
>>          <tokenizer class="solr.StandardTokenizerFactory"/>
>>          <filter class="solr.LowerCaseFilterFactory"/>
>>                 <!-- <filter class="solr.WordDelimiterFilterFactory"
>> types="at-under-alpha.txt"/> -->
>>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="lang/stopwords_de.txt" format="snowball" 
>> enablePositionIncrements="true"/>
>> <!-- remove common words -->
>>          <filter class="solr.GermanNormalizationFilterFactory"/>
>>                 <filter class="solr.SnowballPorterFilterFactory"
>> language="German"/> <!-- remove noun/adjective inflections like plural
>> endings -->
>>                 <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>                 <filter class="solr.NGramFilterFactory" minGramSize="3"
>> maxGramSize="50"/>
>>
>>            </analyzer>
>>            <analyzer type="query">
>>                         <tokenizer class="solr.
>> WhiteSpaceTokenizerFactory"/>
>>                         <filter class="solr.LowerCaseFilterFactory"/>
>>                         <filter class="solr.StopFilterFactory"
>> ignoreCase="true" words="lang/stopwords_de.txt" format="snowball"
>> enablePositionIncrements="true"/> <!-- remove common words -->
>>                         <filter class="solr.
>> GermanNormalizationFilterFactory"/>
>>                         <filter class="solr.SnowballPorterFilterFactory"
>> language="German"/>
>>                         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>        </analyzer>
>>      </fieldType>
>>
>
>
> --
> Using Opera's mail client: http://www.opera.com/mail/
>

Re: ngramfilter minGramSize problem

Reply via email to