Re: Sharding strategy

Otis Gospodnetic Tue, 09 Jun 2009 22:51:53 -0700

Aleksander,

In a sense you are lucky you have time-ordered data.  That makes it very easy 
to shard and cheaper to search - you know exactly which shards you need to 
query.  The beginning of the year situation should also be easy.  Do start with 
the latest shard for the current year, and go to next shard only if you have to 
(e.g. if you don't get enough results from the first shard).


 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Aleksander M. Stensby <aleksander.sten...@integrasco.no>
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Sent: Tuesday, June 9, 2009 7:07:47 AM
> Subject: Sharding strategy
> 
> Hi all,
> I'm trying to figure out how to shard our index as it is growing rapidly and 
> we 
> want to make our solution scalable.
> So, we have documents that are most commonly sorted by their date. My initial 
> thought is to shard the index by date, but I wonder if you have any input on 
> this and how to best solve this...
> 
> I know that the most frequent queries will be executed against the "latest" 
> shard, but then let's say we shard by year, how do we best solve the 
> situation 
> that will occur in the beginning of a new year? (Some of the data will be in 
> the 
> last shard, but most of it will be on the second last shard.)
> 
> Would it be stupid to have a "latest" shard with duplicate data (always 
> consisting of the last 6 months or something like that) and maintain that 
> index 
> in addition to the regular yearly shards? Any one else facing a similar 
> situation with a good solution?
> 
> Any input would be greatly appreciated :)
> 
> Cheers,
> Aleksander
> 
> 
> 
> --Aleksander M. Stensby
> Lead software developer and system architect
> Integrasco A/S
> www.integrasco.no
> http://twitter.com/Integrasco
> 
> Please consider the environment before printing all or any of this e-mail

Re: Sharding strategy

Reply via email to