Re: ngramfilter minGramSize problem

Andreas Owen Mon, 07 Apr 2014 05:46:34 -0700

it works well. now why does the search only find something when thefieldname is added to the query with stopwords?


"cug" -> 9 hits
"mit cug" -> 0 hits
"plain_text:mit cug" -> 9 hits

why is this so? could it be a problem that stopwords aren't used in thequery because no all fields that are search have the stopwordfilter?

On Mon, 07 Apr 2014 00:37:15 +0200, Furkan KAMACI <furkankam...@gmail.com>wrote:

Correction: My patch is at SOLR-5152
7 Nis 2014 01:05 tarihinde "Andreas Owen" <ao...@swissonline.ch> yazdı:

i thought i cound use <filter class="solr.LengthFilterFactory" min="1"
max="2"/> to index and search words that are only 1 or 2 chars long. it
seems to work but i have to test it some more


On Sun, 06 Apr 2014 22:24:20 +0200, Andreas Owen <ao...@swissonline.ch>
wrote:

 i have the a fieldtype that uses ngramfilter whle indexing. is there a

setting that can force the ngramfilter to index smaller words then the

minGramSize? Mine is set to 3 and the search wont find word that areonly 1or 2 chars long. i would like to not set minGramSize=1 because theresults

would be to diverse.

fieldtype:

<fieldType name="text_de" class="solr.TextField"
positionIncrementGap="100">
       <analyzer type="index">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>
                <!-- <filter class="solr.WordDelimiterFilterFactory"
types="at-under-alpha.txt"/> -->

<filter class="solr.StopFilterFactory"ignoreCase="true"words="lang/stopwords_de.txt" format="snowball"enablePositionIncrements="true"/>

<!-- remove common words -->
         <filter class="solr.GermanNormalizationFilterFactory"/>
                <filter class="solr.SnowballPorterFilterFactory"
language="German"/> <!-- remove noun/adjective inflections like plural
endings -->
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.NGramFilterFactory" minGramSize="3"
maxGramSize="50"/>

           </analyzer>
           <analyzer type="query">
                        <tokenizer class="solr.
WhiteSpaceTokenizerFactory"/>
                        <filter class="solr.LowerCaseFilterFactory"/>
                        <filter class="solr.StopFilterFactory"
ignoreCase="true" words="lang/stopwords_de.txt" format="snowball"
enablePositionIncrements="true"/> <!-- remove common words -->
                        <filter class="solr.
GermanNormalizationFilterFactory"/>

<filterclass="solr.SnowballPorterFilterFactory"

language="German"/>
                        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
       </analyzer>
     </fieldType>



--
Using Opera's mail client: http://www.opera.com/mail/



--
Using Opera's mail client: http://www.opera.com/mail/

Re: ngramfilter minGramSize problem

Reply via email to