Hi Lance, I don't understand enough of how the field collapsing is implemented, but I thought it worked with distributed search. Are you saying it only works if everything that needs collapsing is on the same shard?
Tom On Wed, Aug 22, 2012 at 2:41 AM, Lance Norskog <goks...@gmail.com> wrote: > How do you separate the documents among the shards? Can you set up the > shards such that one "collapse group" is only on a single shard? That > you never have to do distributed grouping? > > On Tue, Aug 21, 2012 at 4:10 PM, Tirthankar Chatterjee > <tchatter...@commvault.com> wrote: > > This wont work, see my thread on Solr3.6 Field collapsing > > Thanks, > > Tirthankar > > > > -----Original Message----- > > From: Tom Burton-West <tburt...@umich.edu> > > Date: Tue, 21 Aug 2012 18:39:25 > > To: solr-user@lucene.apache.org<solr-user@lucene.apache.org> > > Reply-To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > > Cc: William Dueber<dueb...@umich.edu>; Phillip Farber<pfar...@umich.edu> > > Subject: Scalability of Solr Result Grouping/Field Collapsing: > > Millions/Billions of documents? > > > > Hello all, > > > > We are thinking about using Solr Field Collapsing on a rather large scale > > and wonder if anyone has experience with performance when doing Field > > Collapsing on millions of or billions of documents (details below. ) Are > > there performance issues with grouping large result sets? > > > > Details: > > We have a collection of the full text of 10 million books/journals. This > > is spread across 12 shards with each shard holding about 800,000 > > documents. When a query matches a journal article, we would like to > group > > all the matching articles from the same journal together. (there is a > > unique id field identifying the journal). Similarly when there is a > match > > in multiple copies of the same book we would like to group all results > for > > the same book together (again we have a unique id field we can group on). > > Sometimes a short query against the OCR field will result in over one > > million hits. Are there known performance issues when field collapsing > > result sets containing a million hits? > > > > We currently index the entire book as one Solr document. We would like > to > > investigate the feasibility of indexing each page as a Solr document > with a > > field indicating the book id. We could then offer our users the choice > of > > a list of the most relevant pages, or a list of the books containing the > > most relevant pages. We have approximately 3 billion pages. Does > anyone > > have experience using field collapsing on this sort of scale? > > > > Tom > > > > Tom Burton-West > > Information Retrieval Programmer > > Digital Library Production Service > > Univerity of Michigan Library > > http://www.hathitrust.org/blogs/large-scale-search > > ******************Legal Disclaimer*************************** > > "This communication may contain confidential and privileged > > material for the sole use of the intended recipient. Any > > unauthorized review, use or distribution by others is strictly > > prohibited. If you have received the message in error, please > > advise the sender by reply email and delete the message. Thank > > you." > > ********************************************************* > > > > -- > Lance Norskog > goks...@gmail.com >