Hi, We are using SOLR to match query strings with a keyword database, where some of the keywords are actually more than one word. For example a keyword might be "apple pie" and we only want it to match for a query containing that word pair, but not one only containing "apple". Here is the relevant piece of the schema.xml, defining the index and query pipelines:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern=";"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ShingleFilterFactory" /> </analyzer> </fieldType> In the analysis tool this schema looks like it works correctly. Our multi-word keywords are indexed as a single entry, and then when a search phrase contains one of these multi-word keywords it is shingled and matched. Unfortunately, when we do the same queries on top of the actual index it responds with zero matches. I can see in the index histogram that the terms are correctly indexed from our mysql datasource containing the keywords, but somehow the shingling doesn't appear to work on this live data. Does anyone have experience with shingling that might have some tips for us, or otherwise advice for debugging the issue? Thanks, Jeff