I have a solr index which is going to grow 3x in the near future.  I'm
considering using distributed search and was contemplating what would
be the best approach to splitting the index.  Since most of the
searches performed on the index are sorted by date descending, I'm
considering splitting the index based on the created date of the
documents.

>From Yonik Seeley's blog post,
http://yonik.wordpress.com/2008/02/27/distributed-search-for-solr/,
I've read that there are two phases to sharding.  The first phase
collects matching ids and documents across the shards.  Then the
second phase collects the stored fields for the documents.  I'm
assuming that this second phase's execution is limited by the number
of rows requested and the number of results.

So let's say I have 2 shards.  The first shard has docs with creation
dates of this year.  The Second shard contains documents from the
previous year.  I run a solr query requesting 10 rows sorted by date
and get 11 from the first shard and 3 from the second.  Will the
initial query only execute the first phase on the second shard?  If
so, that should result in more optimum performance, right?


Thanks,
-Tim

Reply via email to