Hi all Interesting and by the looks of things very solid project you have here with SOLR, however ..
I have an index that contains a large number of "phrases" that I need to search for over, each of these phrases is fairly small being on average about 4 words long. The search terms that I am given to search these phrases are very long, and quite arbitrary, sometimes the search terms will be up to 25 words long. As such the performance of my index when built naively is sporadic sometimes searches are very fast on average they are somewhat slower. I have attempted to improve this situation by using shingling for the phrases and the related search queries, in my schema I have the following <fieldType name="bigramed_phrase" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ShingleFilterFactory" outputUnigrams="true" outputUnigramIfNoNgram="true" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ShingleFilterFactory" outputUnigrams="false" outputUnigramIfNoNgram="true" /> </analyzer> </fieldType> In the indexes, as seen with luke I do indeed have a large range of shingled terms. When I run the analyser for either query or index terms I also see the breakdown with the shingled terms correctly displayed. However when I attempt to use this in a query I do not see the terms applied in the debug output, for example with the term "short red evil fox" I would expect to see the shingles 'short_red' 'red_evil' 'evil_fox' but instead I get the following "debug":{ "rawquerystring":"short red evil fox", "querystring":"short red evil fox", "parsedquery":"+() ()", "parsedquery_toString":"+() ()", "explain":{}, "QParser":"DisMaxQParser", "altquerystring":null, "boostfuncs":null, "filter_queries":["atomId:(8235 100000914 100000911 )"], "parsed_filter_queries":["atomId:8235 atomId:100000914 atomId:100000911"], "timing":{ ...... Does anyone know what I could be doing wrong here, is it a bug in the debug output, a stupid mistake misconception or piece of idiocy on my part or something else. Many thanks -- Greg Bowyer