Mark, I got a few stack dumps of the instance that was stuck ssdtest-d03:8011
http://apaste.info/cofK http://apaste.info/sv4M http://apaste.info/cxUf I can get dumps of others if needed. Thanks, Rishi. -----Original Message----- From: Mark Miller <markrmil...@gmail.com> To: solr-user <solr-user@lucene.apache.org> Sent: Mon, Jun 17, 2013 1:57 pm Subject: Re: Solr Cloud Hangs consistently . Could you give a simple stack trace dump as well? It's likely the distributed update deadlock that has been reported a few times now - I think usually with a replication factor greater than 2, but I can't be sure. The deadlock involves sending docs concurrently to replicas and I wouldn't have expected it to be so easily hit with only 2 replicas per shard. I should be able to tell from a stack trace though. If it is that, it's on my short list to investigate (been there a long time now though - but I still hope to look at it soon). - Mark On Jun 17, 2013, at 1:44 PM, Rishi Easwaran <rishi.easwa...@aol.com> wrote: > > > Hi All, > > I am trying to benchmark SOLR Cloud and it consistently hangs. > Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. > > A little bit about my set up. > I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. > JVM configs: http://apaste.info/57Ai > > My cluster has 12 shards with replication factor 2- http://apaste.info/09sA > > I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. > It got stuck repeatedly. > > I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. > It still shows same behaviour and hangs through the test. > > My test schema and config. > Schema.xml - http://apaste.info/imah > SolrConfig.xml - http://apaste.info/ku4F > > The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). > number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. > > When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. > Sample netstat on a stuck run. http://apaste.info/hr0O > hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. > > > At the moment my benchmarking efforts are at a stand still. > > Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. > If I can provide anything else to diagnose this issue. just let me know. > > Thanks, > > Rishi. > > > > > > > >