Re: WordDelimiterFilter and the dot character

Shawn Heisey Wed, 17 Oct 2012 07:01:54 -0700

On 10/17/2012 7:24 AM, dirk wrote:

Hi,


I had a very similar Problem while searching in a bibliographic field called
"signatur". I could solve it by the help of additional Filterclasses. At the
moment I use the following Filters. Then it works for me:

...
         <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-FoldToASCII.txt"/>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
         <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="German" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
maxPosAsterisk="2" maxPosQuestion="1" minTrailing="2"
maxFractionAsterisk="0"/>
...
The MappingCharFilterFactory I have added in order to have a better support
of german "Umlaute". Concerning the Wildcards:
It is important that you use the ReversedWildcardFilterFactory only at index
time. All other Filters I also use at query time.

This is unrelated to the original question, but I hope it's helpful.You can replace both MappingCharFilterFactory and LowerCaseFilterFactorywith the following, at the current position of the lowercase filter. Itmight produce better results, but you would have to reindex:


        <filter class="solr.ICUFoldingFilterFactory"/>

In order to do this, you must place the icu4j and lucene-icu(lucene-analyzers-icu in 4.x) jars in a lib folder accessed by Solr. Ifyou're using solr.xml (multicore) the best place is typically thesharedLib defined there. If you're not using multicore, the lib folderwould need to be defined in your solrconfig.xml.


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory

Thanks,
Shawn

Re: WordDelimiterFilter and the dot character

Reply via email to