Hi! As as I know currently there isn't another way. Unfortunately the performance degrades badly when having a lot of unique groups. I think an issue should be opened to investigate how we can improve this...
Question: Does Solr have a decent chuck of heap space (-Xmx)? Because grouping requires quite some heap space (also without group.ngroups=true). Martijn On 9 December 2011 23:08, Michael Jakl <jakl.mich...@gmail.com> wrote: > Hi! > > On Fri, Dec 9, 2011 at 17:41, Martijn v Groningen > <martijn.v.gronin...@gmail.com> wrote: >> On what field type are you grouping and what version of Solr are you >> using? Grouping by string field is faster. > > The field is defined as follows: > <field name="signature" type="string" indexed="true" stored="true" /> > > Grouping itself is quite fast, only computing the number of groups > seems to increase significantly with the number of documents (linear). > > I was hoping for a faster solution to compute the total number of > distinct documents (or in other terms, the number of distinct values > in the signature field). Facets came to mind, but as far as I could > see, they don't offer a total number of facets as well. > > I'm using Solr 3.5 (upgraded from Solr 3.4 without reindexing). > > Thanks, > Michael > >> On 9 December 2011 12:46, Michael Jakl <jakl.mich...@gmail.com> wrote: >>> Hi, I'm using the grouping feature of Solr to return a list of unique >>> documents together with a count of the duplicates. >>> >>> Essentially I use Solr's signature algorithm to create the "signature" >>> field and use grouping on it. >>> >>> To provide good numbers for paging through my result list, I'd like to >>> compute the total number of documents found (= matches) and the number >>> of unique documents (= ngroups). Unfortunately, enabling >>> "group.ngroups" considerably slows down the query (from 500ms to >>> 23000ms for a result list of roughly 300000 documents). >>> >>> Is there a faster way to compute the number of groups (or unique >>> values in the signature field) in the search result? My Solr instance >>> currently contains about 50 million documents and around 10% of them >>> are duplicates. >>> >>> Thank you, >>> Michael >> >> >> >> -- >> Met vriendelijke groet, >> >> Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen