: Date: Thu, 4 Jun 2009 19:30:10 -0700 : From: Kaktu Chakarabati : Subject: MM Parameter and Performance in Solr
Kaktu: It doesn't look like you ever got a reply to your question (possibly because you sent it to solr-dev, but it's more appropriate for solr-user) I haven't done any specific performance comparisons of dismax with mm=100% vs mm=X%, but in truth that would be an apples to oranges comparison. the lower the percentage, the more permutations of input terms there are that can produce matches, and the more documents that will match -- in which case Solr by definition is doing more work. Asking for a workarround or best practice for dealing with something like this is akin to asking for workarrounds for queries that are slow because they contain lots of terms and match lots of documents -- there aren't really a lot of options, other then preventing your users from executing those queries. The question i would ask in your shoes is wether having the partial matching of mm=X% is worth the added search time, or if you'd be happier having more exact matching (mm=100%) and faster searches. : Hey guys, : I've been noticing for quite a long time that using minmatch parameter with : a value less than 100% : alongside the dismax qparser seriously degrades performance. My particular : use case involves : using dismax over a set of 4-6 textual fields, about half of which do *not* : filter stop words. ( so yes, : these do involve iterating over large portion of my index in some cases). : : This is somewhat understandable as the task of constructing result sets is : no longer simply intersection based, : however I do wonder what work-arounds / standard solutions exist for this : problem and which are applicable : in the solr/lucene environment ( I.e dividing index to 'primary' / : 'secondary' sections, using n-gram indices, caching configuration, sharding : might help..? ) : I'm working with not such a large corpus (~20 million documents) and the : query processing time is way too long : to my mind ( my goal is 90% percentile QTime to hit around 200ms, I can say : that currently its more than double that.. ) : Can anyone please share some of his knowledge? what is practiced i.e in : google, yahoo..? Any plans to address these issue in solr/lucene or am : i just using it wrongly? -Hoss