Hey guys, I've been noticing for quite a long time that using minmatch parameter with a value less than 100% alongside the dismax qparser seriously degrades performance. My particular use case involves using dismax over a set of 4-6 textual fields, about half of which do *not* filter stop words. ( so yes, these do involve iterating over large portion of my index in some cases).
This is somewhat understandable as the task of constructing result sets is no longer simply intersection based, however I do wonder what work-arounds / standard solutions exist for this problem and which are applicable in the solr/lucene environment ( I.e dividing index to 'primary' / 'secondary' sections, using n-gram indices, caching configuration, sharding might help..? ) I'm working with not such a large corpus (~20 million documents) and the query processing time is way too long to my mind ( my goal is 90% percentile QTime to hit around 200ms, I can say that currently its more than double that.. ) Can anyone please share some of his knowledge? what is practiced i.e in google, yahoo..? Any plans to address these issue in solr/lucene or am i just using it wrongly? Any feedback appreciated.. Thanks, -Chak