Re: huge shards (300GB each) and load balancing

Upayavira Wed, 08 Jun 2011 02:32:56 -0700


On Wed, 08 Jun 2011 10:42 +0300, "Dmitry Kan" <dmitry....@gmail.com>
wrote:
> Hello list,
> 
> Thanks for attending to my previous questions so far, have learnt a lot.
> Here is another one, I hope it will be interesting to answer.
> 
> 
> 
> We run our SOLR shards and front end SOLR on the Amazon high-end
> machines.
> Currently we have 6 shards with around 200GB in each. Currently we have
> only
> one front end SOLR which, given a client query, redirects it to all the
> shards. Our shards are constantly growing, data is at times reindexed (in
> batches, which is done by removing a decent chunk before replacing it
> with
> updated data), constant stream of new data is coming every hour (usually
> hits the latest shard in time, but can also hit other shards, which have
> older data). Since the front end SOLR has started to be a SPOF, we are
> thinking about setting up some sort of load balancer.
> 
> 1) do you think ELB from Amazon is a good solution for starters? We don't
> need to maintain sessions between SOLR and client.
> 2) What other load balancers have been used specifically with SOLR?
> 
> 
> Overall: does SOLR scale to such size (200GB in an index) and what can be
> recommended as next step -- resharding (cutting existing shards to
> smaller
> chunks), replication?


Really, it is going to be up to you to work out what works in your
situation. You may be reaching the limit of what a Lucene index can
handle, don't know. If your query traffic is low, you might find that
two 100Gb cores in a single instance performs better. But then, maybe
not! Or two 100Gb shards on smaller Amazon hosts. But then, maybe not!
:-)

The principal issue with Amazon's load balancers (at least when I was
using them last year) is that the ports that they balance need to be
public. You can't use an Amazon load balancer as an internal service
within a security group. For a service such as Solr, that can be a bit
of a killer.

If they've fixed that issue, then they'd work fine (I used them quite
happily in another scenario).

When looking at resolving single points of failure, handling search is
pretty easy (as you say, stateless load balancer). You will need to give
more attention though to how you handle it regarding indexing.

Hope that helps a bit!

Upayavira





--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source

Re: huge shards (300GB each) and load balancing

Reply via email to