On 10/17/2012 7:24 AM, dirk wrote:
Hi,
I had a very similar Problem while searching in a bibliographic field called
"signatur". I could solve it by the help of additional Filterclasses. At the
moment I use the following Filters. Then it works for me:
...
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-FoldToASCII.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="German" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
maxPosAsterisk="2" maxPosQuestion="1" minTrailing="2"
maxFractionAsterisk="0"/>
...
The MappingCharFilterFactory I have added in order to have a better support
of german "Umlaute". Concerning the Wildcards:
It is important that you use the ReversedWildcardFilterFactory only at index
time. All other Filters I also use at query time.
This is unrelated to the original question, but I hope it's helpful.
You can replace both MappingCharFilterFactory and LowerCaseFilterFactory
with the following, at the current position of the lowercase filter. It
might produce better results, but you would have to reindex:
<filter class="solr.ICUFoldingFilterFactory"/>
In order to do this, you must place the icu4j and lucene-icu
(lucene-analyzers-icu in 4.x) jars in a lib folder accessed by Solr. If
you're using solr.xml (multicore) the best place is typically the
sharedLib defined there. If you're not using multicore, the lib folder
would need to be defined in your solrconfig.xml.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory
Thanks,
Shawn