Right, it's a little arcane. But the lockup is because the various leaders send documents to each other and wait for returns. If there are a _lot_ of incoming packets to various leaders, it can generate the distributed deadlock. So the shuffling you refer to is the root of the issue.
If the leaders only receive documents for the shard they're a leader of, then they won't have to send updates to other leaders and shouldn't hit this condition. But you're right, this situation was encountered the first time by SolrJ clients sending lots and lots or parallel requests, I don't remember whether it was just one client with lots of threads or many clients. If you're not using SolrJ, then it won't do you much good since it's client-side only. As far as being a true fix or not, you can look at it as kicking the can down the road. This patch has several advantages: 1> It should pave the way for, and move towards, linear scalability as far as scaling up to many many nodes when indexing from SolrJ. 2> It should improve throughput in the normal case as well. 3> Along the way it _should_ significantly lower (perhaps remove entirely) the chance that this deadlock will occur, again when indexing from SolrJ. If you had a bunch of clients sending, say, posting csv files to SolrCloud I'd guess you'd find this happening again. So it's an improvement not a perfect cure. But if you think it'd help.... Best, Erick On Thu, Aug 22, 2013 at 3:23 PM, allrightname <allrightn...@gmail.com>wrote: > Erick, > > I've read over SOLR-4816 after finding your comment about the server-side > stack traces showing threads locked up over semaphores and I'm curious how > that issue cures the problem on the server-side as the patch only includes > client-side changes. Do the servers get so tied up shuffling documents > around when they're not sent to the master that they get blocked as > described? If they do get blocked due to shuffling documents around is a > client-side fix for this not more of a workaround than a true fix? > > I'm entirely willing to apply this patch to all of the code I've got that > talks to my solr servers and try it out but I'm reluctant to because this > looks like a client-side fix to a server-side issue. > > Thanks, > Greg > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-tp4067388p4086160.html > Sent from the Solr - User mailing list archive at Nabble.com. >