Note the number of zookeeper nodes is independent of number of shards.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Nov 22, 2012 4:19 AM, "Luis Cappa Banda" <luisca...@gmail.com> wrote:

> Hello,
>
> I´ve been dealing with the same question these days. In architecture terms,
> it´s always better to separate services (Solr and Zookeeper, in this case)
> rather to keep them in a single instance. However, when we have to deal
> with costs issues, all of use we are quite limitated and we must elect the
> best architecture/scalable/single point of failure option. As I see, the
> options are:
>
>
> *1. *Solr servers with Zookeeper embeded.
> *2. *Solr servers with external Zookeeper.
> *3.* Solr servers with external Zookeeper ensemble.
>
> *Note*: as far as I know, the recommended number of Zookeeper services to
> avoid single points of failure is:* ZkNum = 2 * Numshards - 1*. If you have
>
>
> The best option is the third one. Reasons:
>
> *1. *If one of your Solr servers goes down, Zookeeper services still up.
> *2.* If one of your Zookeeper services goes down, Solr servers and the rest
> of Zookeeper services still up.
>
> Considering that option, we have two ways to implement it in production:
>
> *1. *Each service (Solr and Zookeeper) in separate machines. Let´s imagine
> that we have 2 shards for a given collection, so we need at least 4 Solr
> servers to complete the leader-replica configuration. The best option is to
> deploy them in for Amazon instances, one per each server. We need at least
> 3 Zookeeper services in a Zookeeper ensemble configuration. The optimal way
> to install them is in separates machines (micro instance will be nice for
> Zookeeper), so we will have 7 Amazon instances. The reason is that if one
> machine goes down (Solr or Zookeeper one) the others services may still up
> and your production environment will be safe. However,* for me this is the
> best case, but it´s the more expensive one*, so in my case is imposible to
> make real.
>
> *2. *As wee need at least 4 Solr servers and 3 Zookeeper services up, I
> would install three Amazon instances with Solr and Zookeeper, and one of
> them only with Solr. So we´ll have: 3 complete Amazon instances (Solr +
> Zookeeper) and 1 single Amazon instance  (only Solr). If one of them goes
> down, the production environment will be safe. This architecture is not the
> best one, as I told you, but I think that is optimal in terms of
> robustness, single point of failure and costs.
>
>
> It would be a pleasure to hear new suggestions from other people that
> dealed with this kind of issues.
>
> Regards,
>
>
> - Luis Cappa.
>
>
> 2012/11/21 Marcin Rzewucki <mrzewu...@gmail.com>
>
> > Yes, I meant the same (not -zkRun). However, I was asking if it is safe
> to
> > have zookeeper and solr processes running on the same node or better on
> > different machines?
> >
> > On 21 November 2012 21:18, Rafał Kuć <r....@solr.pl> wrote:
> >
> > > Hello!
> > >
> > > As I told I wouldn't use the Zookeeper that is embedded into Solr, but
> > > rather setup a standalone one.
> > >
> > > --
> > > Regards,
> > >  Rafał Kuć
> > >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > ElasticSearch
> > >
> > > > First of all: thank you for your answers. Yes, I meant side by side
> > > > configuration. I think the worst case for ZKs here is to loose two of
> > > them.
> > > > However, I'm going to use 4 availability zones in same region so at
> > least
> > > > this will reduce the risk of loosing both of them at the same time.
> > > > Regards.
> > >
> > > > On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote:
> > >
> > > >> Hello!
> > > >>
> > > >> Zookeeper by itself is not demanding, but if something happens to
> your
> > > >> nodes that have Solr on it, you'll loose ZooKeeper too if you have
> > > >> them installed side by side. However if you will have 4 Solr nodes
> and
> > > >> 3 ZK instances you can get them running side by side.
> > > >>
> > > >> --
> > > >> Regards,
> > > >>  Rafał Kuć
> > > >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > > ElasticSearch
> > > >>
> > > >> > Separate is generally nice because then you can restart Solr nodes
> > > >> > without consideration for ZooKeeper.
> > > >>
> > > >> > Performance-wise, I doubt it's a big deal either way.
> > > >>
> > > >> > - Mark
> > > >>
> > > >> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mrzewu...@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >> >> Hi,
> > > >> >>
> > > >> >> I have 4 solr collections, 2-3mn documents per collection, up to
> > 100K
> > > >> >> updates per collection daily (roughly). I'm going to create
> > > SolrCloud4x
> > > >> on
> > > >> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The
> > > question is
> > > >> >> what about zookeeper? It's going to be external ensemble, but is
> it
> > > >> better
> > > >> >> to use same nodes as solr or dedicated micro instances? Zookeeper
> > > does
> > > >> not
> > > >> >> seem to be resources demanding process, but what would be better
> in
> > > this
> > > >> >> case ? To keep it inside of solrcloud or separately (micro
> > instances
> > > >> seem
> > > >> >> to be enough here) ?
> > > >> >>
> > > >> >> Thanks in advance.
> > > >> >> Regards.
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
>
> - Luis Cappa
>

Reply via email to