We send batches of updates to a load balancer. The cluster gets the updates to the right leader with very little overhead. When we get an error, we resend the update batch. The load balancer will find a healthy node to receive it. This is simple, robust, and fast.
One handy tip: if a batch fails with a 400, we back off and resend it in batches of 1 document each so we can identify the bad one. This saves a ton of time trying to manually find the bad document. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 25, 2019, at 1:31 PM, Ganesh Sethuraman <ganeshmail...@gmail.com> > wrote: > > Thanks for details and updates. We are looking at load balancers not > because of the little improvement in performance. But more for high > availability. Other alternative is, if the update fails on one server using > curl, on error we have to call another SOLR server. I was looking to see if > there any other way to get the working leader from the Zookeeper before the > update, is there a way to query zookeeper for the same? But, I understand > there is no guarantee that leader wont change during the large CSV file > update. But at least some protection during planed server restarts can be > managed. > > Regarding the Solrj option, it certainly seems to be best option, do we > have the python solr client to it which can be Solr Leader aware? like how > it is done in the solrj (java) client. > > Regards, > Ganesh > > On Mon, Feb 25, 2019 at 3:00 PM Shawn Heisey <apa...@elyograg.org> wrote: > >> On 2/25/2019 11:15 AM, Ganesh Sethuraman wrote: >>> We are using Solr Cloud 7.2.1. We are using Solr CSV update handler to do >>> bulk update (several Millions of docs) in to multiple collections. When >> we >>> make a call to the CSV update handler using curl command line (as below), >>> we are pointing to single server in Solr. During the problem time, when >> one >>> of the Solr server goes down this approach could fail. Is there any way >>> that we do this to send the write to the leader, like how the solrj does, >>> through the simple curl command(s) line? >> >> The SolrJ client named CloudSolrClient is able to do this because it is >> a full ZooKeeper client that has instant access to the clusterstate >> maintained by your Solr servers. >> >> To get that capability in any other client would require that the client >> is aware of the ZooKeeper ensemble in the same way. Curl cannot do this. >> >>> >>> In the request below for some reason, if the SOLR1-SERVER is down, the >>> request will fail, even though the new leader say SOLR2-SERVER is up. >>> >>> curl 'http:// >> <<SOLR1-SERVER>>:8983/solr/my_collection/update?commit=true' >>> --data-binary @example/exampledocs/books.csv -H >>> 'Content-type:application/csv' >>> >>> 1. I can create load balancer / ALB infront of solr, but that may not >> still >>> identify the Leader for efficiency. >> >> A load balancer won't be able to identify the leader unless it is >> capable of talking to ZooKeeper and knows how Solr represents data in >> ZK. Have you measured the efficiency improvement that comes from >> sending to the leader? If that improvement is small, it's probably not >> worth implementing something that talks to ZooKeeper. I know there are >> people who don't try to send to leaders that are achieving very fast >> indexing rates ... I suspect that the improvement obtained by sending to >> leaders is relatively small. >> >>> 2. I can write a solrj client to update, but i am not sure if i will get >>> the efficiency of bulk update? not sure about the simplicity of the curl >>> as well. >> >> SolrJ is probably more efficient than something like curl, because it >> utilizes a compact binary format for data transfer in both directions, >> called javabin. With curl, you would most likely be using a text format >> like json, xml, or csv. >> >> SolrJ clients are fully thread-safe. Which means you can use a single >> instance to send updates in parallel with multiple threads. That is the >> best way to achieve good indexing performance with Solr. >> >> Thanks, >> Shawn >>