You must stay in the Javadoc section, there the examples are good, or the reference guide: https://lucene.apache.org/core/6_5_0/analyzers-common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#filter-descriptions
PVK COMMENT 1: This seems to be for Solr 6.5+? I'm using 4.3.1. An upgrade is not on the radar soon. Will using DictionaryCompoundWordTokenFilterFactory as I'm doing now severely degrade my result quality as opposed to HyphenationCompoundWordTokenFilterFactory? Almost, zaken -> zaak is already KP output, no need to input what the stemmer will do for you. PVK COMMENT 2: How do you know zaken -> zaak is already KP output? Is there a list somewhere? PVK COMMENT 3: I now have: <fieldType name="searchtext_nl" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="compounds_nl.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="true"/> <filter class="solr.StemmerOverrideFilterFactory" dictionary="stemdict_nl.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="Kp" protected="protwords_nl.txt"/> <filter class="solr.ASCIIFoldingFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="compounds_nl.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="true"/> <filter class="solr.StemmerOverrideFilterFactory" dictionary="stemdict_nl.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="Kp" protected="protwords_nl.txt"/> <filter class="solr.ASCIIFoldingFilterFactory"/> </analyzer> </fieldType> I tested in admin UI (and yes, I restart Solr and reindex every time I make a change): http://localhost:8983/solr/tt-search-global/select?q=title_search_global%3A(dieren+zaak)&fl=id%2Ctitle&wt=xml&indent=true returns: "hi there dieren zaak something else" "hi there dier something else" http://localhost:8983/solr/tt-search-global/select?q=title_search_global%3A(dierenzaak)&fl=id%2Ctitle&wt=xml&indent=true&defType=edismax&qf=title_search_global&stopwords=true&lowercaseOperators=true returns "hi there dierenzaak something else" So I added "dieren" to compounds_nl.txt Now on "title_search_global:(dieren zaak)" it returns: <doc> <str name="title">hi there dieren zaak something else</str> <str name="id">115_3699638</str> </doc> <doc> <str name="title">hi there dier something else</str> <str name="id">115_3699637</str> </doc> <doc> <str name="title">hi there dierenzaak something else</str> <str name="id">115_3699639</str> </doc> So it's starting to look good! :-) What I want to know, how can I have Solr consider "dierenzaak" to be of higher importance than just "dier" in the above results? Also I'm still not 100% sure what my addition of "dieren" to compounds_nl.txt actually does...I assume DictionaryCompoundWordTokenFilterFactory just looks for that exact string and if it finds it, considers that a separate word? Correct? Thanks again! -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html