Update!! This happens with replicationFactor=1 Just for kicks I created a collection with a 24 shards, replicationfactor=1 cluster on my exisiting benchmark env. Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most metrics looks fine. Only indication seems to be netstat showing incoming request not being read in. Yago,
I saw your previous post (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631) Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets fixed, but no luck. Looks like this is a dominant and easily reproducible issue on SOLR cloud. Thanks, Rishi. -----Original Message----- From: Yago Riveiro <yago.rive...@gmail.com> To: solr-user <solr-user@lucene.apache.org> Sent: Mon, Jun 17, 2013 5:15 pm Subject: Re: Solr Cloud Hangs consistently . I can confirm that the deadlock happen with only 2 replicas by shard. I need shutdown one node that host a replica of the shard to recover the indexation capability. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote: > > > Hi All, > > I am trying to benchmark SOLR Cloud and it consistently hangs. > Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. > > A little bit about my set up. > I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. > JVM configs: http://apaste.info/57Ai > > My cluster has 12 shards with replication factor 2- http://apaste.info/09sA > > I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. > It got stuck repeatedly. > > I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. > It still shows same behaviour and hangs through the test. > > My test schema and config. > Schema.xml - http://apaste.info/imah > SolrConfig.xml - http://apaste.info/ku4F > > The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). > number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. > > When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. > Sample netstat on a stuck run. http://apaste.info/hr0O > hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. > > At the moment my benchmarking efforts are at a stand still. > > Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. > If I can provide anything else to diagnose this issue. just let me know. > > Thanks, > > Rishi.