Have a look at https://issues.apache.org/jira/browse/SOLR-5027 + https://wiki.apache.org/solr/CollapsingQParserPlugin
Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Wed, Nov 13, 2013 at 2:46 PM, David Anthony Troiano < dtroi...@basistech.com> wrote: > Hello, > > I'm hitting a performance issue when using field collapsing in a > distributed Solr setup and I'm wondering if others have seen it and if > anyone has an idea to work around. it. > > I'm using field collapsing to deduplicate documents that have the same near > duplicate hash value, and deduplicating at query time (as opposed to > filtering at index time) is a requirement. I have a sharded setup with 10 > cores (not SolrCloud), each having ~1000 documents each. Of the 10k docs, > most have a unique near duplicate hash value, so there are about 10k unique > values for the field that I'm grouping on. The grouping parameters that > I'm using are: > > group=true > group.field=<near dupe hash field> > group.main=true > > I'm attempting distributed queries (&shards=s1,s2,...,s10) where the only > difference is the absence or presence of these three grouping parameters > and I'm consistently seeing a marked difference in performance (as a > representative data point, 200ms latency without grouping and 1600ms with > grouping). Interestingly, if I put all 10k docs on the same core and query > that core independently with and without grouping, I don't see much of a > latency difference, so the performance degradation seems to exist only in > the sharded setup. > > Is there a known performance issue when field collapsing in a sharded setup > (perhaps only manifests when the grouping field has many unique values), or > have other people observed this? Any ideas for a workaround? Note that > docs in my sharded setup can only have the same signature if they're in the > same shard, so perhaps that can be used to boost perf, though I don't see > an exposed way to do so. > > A follow-on question is whether we're likely to see the same issue if / > when we move to SolrCloud. > > Thanks, > Dave >