Markus, Thanks again. Ok, 1 by 1:
StemmerOverride wants \t separated fields, that is probably the cause of the AIooBE you get. Regarding schema definitions, each factory JavaDoc [1] has a proper example listed. I recommend putting a decompounder before a stemmer, and have an accent (or ICU) folder as one of the last filters. PVK COMMENT: Looking for Decompounders and found a few links, btw a lot of the pages these are linked to don't work. https://earlydance.org/news/9189-apachesolr-issues-german-and-other-germanic-languages http://lucene.apache.org/core/2_4_0/api/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html https://wiki.apache.org/solr/LanguageAnalysis#Decompounding https://wiki.apache.org/solr/DictionaryCompoundWordTokenFilterFactory my stemdict_nl.txt now contains (words separated by a single tab): aachen aach aachener aachener aalmoezen aalmoes beveel bevool dierenzaken dierenzaak The problem before was indeed like @Shawn indicates that I had words in there with a space like so: dieren zaken dierenzaak About the diff, it looks like KP output, it has the same issues with whether or not a word needs double or single vowels in the root. It also shows issues with strong verbs/nouns (beveel/bevool). Having this list seems like having KP configured so you should drop it, and only list exceptions to KP rules in the dict file. This is not easy, so i recommend to stay in to your domain's vocabulary. PVK COMMENT: That's what I now did above right? Also, unless you have a very specific need for it, drop the StopFilter. Nobody in these days should want a StopFilter unless they can justify it. We use them too, but only for very specific reasons, but never for text search. You might also want to have a WordDelimiterFilter as your first filter, look it up, you probably want to have it. PVK COMMENT: But without a Stopfilter, wont stopwords be included in searches? I though that for example Google excluded these words in their algorithms? This is what I have now: <fieldType name="searchtext_nl" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="compounds_nl.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="true"/> <filter class="solr.StemmerOverrideFilterFactory" dictionary="stemdict_nl.txt"/> <filter class="solr.ASCIIFoldingFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="compounds_nl.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="true"/> <filter class="solr.StemmerOverrideFilterFactory" dictionary="stemdict_nl.txt"/> <filter class="solr.ASCIIFoldingFilterFactory"/> </analyzer> </fieldType> Now for both this query http://localhost:8983/solr/tt-search-global/select?q=title_search_global%3A(dieren+zaak)&fl=id%2Ctitle&wt=xml&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true and this one: http://localhost:8983/solr/tt-search-global/select?q=title_search_global%3A(dieren+zaak)&fl=id%2Ctitle&wt=xml&indent=true&defType=edismax&qf=title_search_global&stopwords=true&lowercaseOperators=true This result is found: "Hi there dieren zaak something else" And these are NOT: "Hi there dier something else" "Hi there dierenzaak something else" "Hi there dierzaak something else" What else do you recommend I try? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html