Nice - thanks Daniel. On Mon, Apr 18, 2016 at 11:38 AM, Davis, Daniel (NIH/NLM) [C] < daniel.da...@nih.gov> wrote:
> One thing I like about SolrCloud is that I don't have to configure > Master/Slave replication in each "core" the same way to get them to > replicate. > > The other thing I like about SolrCloud, which is largely theoretical at > this point, is that I don't need to test changes to a collection's > configuration by bringing up a whole new solr on a whole new server - > SolrCloud already virtualizes this, and so I can make up a random > collection name that doesn't conflict, and create the thing, and smoke test > with it. I know that standard practice is to bring up all new nodes, but > I don't see why this is needed. > > -----Original Message----- > From: John Bickerstaff [mailto:j...@johnbickerstaff.com] > Sent: Monday, April 18, 2016 1:23 PM > To: solr-user@lucene.apache.org > Subject: Re: Verifying - SOLR Cloud replaces load balancer? > > So - my IT guy makes the case that we don't really need Zookeeper / Solr > Cloud... > > He may be right - we're serving static data (changes to the collection > occur only 2 or 3 times a year and are minor) > > We probably could have 3 or 4 Solr nodes running in non-Cloud mode -- each > configured the same way, behind a load balancer and do fine. > > I've got a Kafka server set up with the solr docs as topics. It takes > about 10 minutes to reload a "blank" Solr Server from the Kafka topic... > If I target 3-4 SOLR servers from my microservice instead of one, it > wouldn't take much longer than 10 minutes to concurrently reload all 3 or 4 > Solr servers from scratch... > > I'm biased in terms of using the most recent functionality, but I'm aware > that bias is not necessarily based on facts and want to do my due > diligence... > > Aside from the obvious benefits of spreading work across nodes (which may > not be a big deal in our application and which my IT guy proposes is more > transparently handled with a load balancer he understands) are there any > other considerations that would drive a choice for Solr Cloud (zookeeper > etc)? > > > > On Mon, Apr 18, 2016 at 9:26 AM, Tom Evans <tevans...@googlemail.com> > wrote: > > > On Mon, Apr 18, 2016 at 3:52 PM, John Bickerstaff > > <j...@johnbickerstaff.com> wrote: > > > Thanks all - very helpful. > > > > > > @Shawn - your reply implies that even if I'm hitting the URL for a > > > single endpoint via HTTP - the "balancing" will still occur across > > > the Solr > > Cloud > > > (I understand the caveat about that single endpoint being a > > > potential > > point > > > of failure). I just want to verify that I'm interpreting your > > > response correctly... > > > > > > (I have been asked to provide IT with a comprehensive list of > > > options > > prior > > > to a design discussion - which is why I'm trying to get clear about > > > the various options) > > > > > > In a nutshell, I think I understand the following: > > > > > > a. Even if hitting a single URL, the Solr Cloud will "balance" > > > across all available nodes for searching > > > Caveat: That single URL represents a potential single > > > point of failure and this should be taken into account > > > > > > b. SolrJ's CloudSolrClient API provides the ability to distribute > > > load -- based on Zookeeper's "knowledge" of all available Solr > instances. > > > Note: This is more robust than "a" due to the fact that it > > > eliminates the "single point of failure" > > > > > > c. Use of a load balancer hitting all known Solr instances will be > > > fine > > - > > > although the search requests may not run on the Solr instance the > > > load balancer targeted - due to "a" above. > > > > > > Corrections or refinements welcomed... > > > > With option a), although queries will be distributed across the > > cluster, all queries will be going through that single node. Not only > > is that a single point of failure, but you risk saturating the > > inter-node network traffic, possibly resulting in lower QPS and higher > > latency on your queries. > > > > With option b), as well as SolrJ, recent versions of pysolr have a > > ZK-aware SolrCloud client that behaves in a similar way. > > > > With option c), you can use the preferLocalShards so that shards that > > are local to the queried node are used in preference to distributed > > shards. Depending on your shard/cluster topology, this can increase > > performance if you are returning large amounts of data - many or large > > fields or many documents. > > > > Cheers > > > > Tom > > >