Re: Verifying - SOLR Cloud replaces load balancer?

Erick Erickson Mon, 18 Apr 2016 16:48:40 -0700

In short, I'm afraid I have to agree with your IT guy.

I like SolrCloud, it's waaaay cool. But in your situation I really
can't say it's compelling.


The places SolrCloud shines: automatically routing docs to shards..
You're not sharing.

Automatically electing a new leader (analogous to master) ... You
don't care since the pain of reindexing is so little.

Not losing data when a leader/master goes down during indexing... You
don't care since you can reindex quickly and you're indexing so
rarely.

In fact, I'd also optimize the index, Something I rarely recommend.

Even the argument that you get to use all your nodes for searching
doesn't really pertain since you can index on a node, then just copy
the index to all your nodes, you could get by without even configuring
master/slave. Or just, as you say, index to all your Solr nodes
simultaneously.

About the only downside is that you've got to create your Solr nodes
independently, making sure the proper configurations are on each one
etc, but even if those changed 2-3 times a year it's hardly onerous.

You _are_ getting all the latest and greatest indexing and search
improvements, all the SolrCloud stuff is built on top of exactly the
Solr you'd get without using SolrCloud.

And finally, there is certainly a learning curve to SolrCloud,
particularly in this case the care and feeding of Zookeeper.

The instant you need to have shards, the argument changes quite
dramatically. The argument changes some under significant indexing
loads. The argument totally changes if you need low latency. It
doesn't sound like your situation is sensitive to any of these
though....

Best,
Erick

On Apr 18, 2016 10:41 AM, "John Bickerstaff" <j...@johnbickerstaff.com> wrote:
>
> Nice - thanks Daniel.
>
> On Mon, Apr 18, 2016 at 11:38 AM, Davis, Daniel (NIH/NLM) [C] <
> daniel.da...@nih.gov> wrote:
>
> > One thing I like about SolrCloud is that I don't have to configure
> > Master/Slave replication in each "core" the same way to get them to
> > replicate.
> >
> > The other thing I like about SolrCloud, which is largely theoretical at
> > this point, is that I don't need to test changes to a collection's
> > configuration by bringing up a whole new solr on a whole new server -
> > SolrCloud already virtualizes this, and so I can make up a random
> > collection name that doesn't conflict, and create the thing, and smoke test
> > with it.   I know that standard practice is to bring up all new nodes, but
> > I don't see why this is needed.
> >
> > -----Original Message-----
> > From: John Bickerstaff [mailto:j...@johnbickerstaff.com]
> > Sent: Monday, April 18, 2016 1:23 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Verifying - SOLR Cloud replaces load balancer?
> >
> > So - my IT guy makes the case that we don't really need Zookeeper / Solr
> > Cloud...
> >
> > He may be right - we're serving static data (changes to the collection
> > occur only 2 or 3 times a year and are minor)
> >
> > We probably could have 3 or 4 Solr nodes running in non-Cloud mode -- each
> > configured the same way, behind a load balancer and do fine.
> >
> > I've got a Kafka server set up with the solr docs as topics.  It takes
> > about 10 minutes to reload a "blank" Solr Server from the Kafka topic...
> > If I target 3-4 SOLR servers from my microservice instead of one, it
> > wouldn't take much longer than 10 minutes to concurrently reload all 3 or 4
> > Solr servers from scratch...
> >
> > I'm biased in terms of using the most recent functionality, but I'm aware
> > that bias is not necessarily based on facts and want to do my due
> > diligence...
> >
> > Aside from the obvious benefits of spreading work across nodes (which may
> > not be a big deal in our application and which my IT guy proposes is more
> > transparently handled with a load balancer he understands) are there any
> > other considerations that would drive a choice for Solr Cloud (zookeeper
> > etc)?
> >
> >
> >
> > On Mon, Apr 18, 2016 at 9:26 AM, Tom Evans <tevans...@googlemail.com>
> > wrote:
> >
> > > On Mon, Apr 18, 2016 at 3:52 PM, John Bickerstaff
> > > <j...@johnbickerstaff.com> wrote:
> > > > Thanks all - very helpful.
> > > >
> > > > @Shawn - your reply implies that even if I'm hitting the URL for a
> > > > single endpoint via HTTP - the "balancing" will still occur across
> > > > the Solr
> > > Cloud
> > > > (I understand the caveat about that single endpoint being a
> > > > potential
> > > point
> > > > of failure).  I just want to verify that I'm interpreting your
> > > > response correctly...
> > > >
> > > > (I have been asked to provide IT with a comprehensive list of
> > > > options
> > > prior
> > > > to a design discussion - which is why I'm trying to get clear about
> > > > the various options)
> > > >
> > > > In a nutshell, I think I understand the following:
> > > >
> > > > a. Even if hitting a single URL, the Solr Cloud will "balance"
> > > > across all available nodes for searching
> > > >           Caveat: That single URL represents a potential single
> > > > point of failure and this should be taken into account
> > > >
> > > > b. SolrJ's CloudSolrClient API provides the ability to distribute
> > > > load -- based on Zookeeper's "knowledge" of all available Solr
> > instances.
> > > >           Note: This is more robust than "a" due to the fact that it
> > > > eliminates the "single point of failure"
> > > >
> > > > c.  Use of a load balancer hitting all known Solr instances will be
> > > > fine
> > > -
> > > > although the search requests may not run on the Solr instance the
> > > > load balancer targeted - due to "a" above.
> > > >
> > > > Corrections or refinements welcomed...
> > >
> > > With option a), although queries will be distributed across the
> > > cluster, all queries will be going through that single node. Not only
> > > is that a single point of failure, but you risk saturating the
> > > inter-node network traffic, possibly resulting in lower QPS and higher
> > > latency on your queries.
> > >
> > > With option b), as well as SolrJ, recent versions of pysolr have a
> > > ZK-aware SolrCloud client that behaves in a similar way.
> > >
> > > With option c), you can use the preferLocalShards so that shards that
> > > are local to the queried node are used in preference to distributed
> > > shards. Depending on your shard/cluster topology, this can increase
> > > performance if you are returning large amounts of data - many or large
> > > fields or many documents.
> > >
> > > Cheers
> > >
> > > Tom
> > >
> >

Re: Verifying - SOLR Cloud replaces load balancer?

Reply via email to