This wont work, see my thread on Solr3.6 Field collapsing
Thanks,
Tirthankar

-----Original Message-----
From: Tom Burton-West <tburt...@umich.edu>
Date: Tue, 21 Aug 2012 18:39:25 
To: solr-user@lucene.apache.org<solr-user@lucene.apache.org>
Reply-To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Cc: William Dueber<dueb...@umich.edu>; Phillip Farber<pfar...@umich.edu>
Subject: Scalability of Solr Result Grouping/Field Collapsing:
 Millions/Billions of documents?

Hello all,

We are thinking about using Solr Field Collapsing on a rather large scale
and wonder if anyone has experience with performance when doing Field
Collapsing on millions of or billions of documents (details below. )  Are
there performance issues with grouping large result sets?

Details:
We have a collection of the full text of 10 million books/journals.  This
is spread across 12 shards with each shard holding about 800,000
documents.  When a query matches a journal article, we would like to group
all the matching articles from the same journal together. (there is a
unique id field identifying the journal).  Similarly when there is a match
in multiple copies of the same book we would like to group all results for
the same book together (again we have a unique id field we can group on).
Sometimes a short query against the OCR field will result in over one
million hits.  Are there known performance issues when field collapsing
result sets containing a million hits?

We currently index the entire book as one Solr document.  We would like to
investigate the feasibility of indexing each page as a Solr document with a
field indicating the book id.  We could then offer our users the choice of
a list of the most relevant pages, or a list of the books containing the
most relevant pages.  We have approximately 3 billion pages.   Does anyone
have experience using field collapsing on this sort of scale?

Tom

Tom Burton-West
Information Retrieval Programmer
Digital Library Production Service
Univerity of Michigan Library
http://www.hathitrust.org/blogs/large-scale-search
******************Legal Disclaimer***************************
"This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you."
*********************************************************

Reply via email to