I do all the indexing through a HTTP POST, with replicationFactor=1 no problem, if is higher deadlock problems can appear
A stack trace like this http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862 is that I get -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 11:03 PM, Mark Miller wrote: > If it actually happens with replicationFactor=1, it doesn't likely have > anything to do with the update handler issue I'm referring to. In some cases > like these, people have better luck with Jetty than Tomcat - we test it much > more. For instance, it's setup to help avoid search side distributed > deadlocks. > > In any case, there is something special about it - I do and have seen a lot > of heavy indexing to SolrCloud by me and others without running into this. > Both with replicationFacotor=1 and greater. So there is something specific in > how the load is being done or what features/methods are being used that > likely causes it or makes it easier to cause. > > But again, the issue I know about involves threads that are not even created > in the replicationFactor = 1 case, so that could be a first report afaik. > > - Mark > > On Jun 17, 2013, at 5:52 PM, Rishi Easwaran <rishi.easwa...@aol.com > (mailto:rishi.easwa...@aol.com)> wrote: > > > Update!! > > > > This happens with replicationFactor=1 > > Just for kicks I created a collection with a 24 shards, replicationfactor=1 > > cluster on my exisiting benchmark env. > > Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu > > most metrics looks fine. > > Only indication seems to be netstat showing incoming request not being read > > in. > > > > Yago, > > > > I saw your previous post > > (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631) > > Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets > > fixed, but no luck. > > Looks like this is a dominant and easily reproducible issue on SOLR cloud. > > > > > > Thanks, > > > > Rishi. > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > From: Yago Riveiro <yago.rive...@gmail.com (mailto:yago.rive...@gmail.com)> > > To: solr-user <solr-user@lucene.apache.org > > (mailto:solr-user@lucene.apache.org)> > > Sent: Mon, Jun 17, 2013 5:15 pm > > Subject: Re: Solr Cloud Hangs consistently . > > > > > > I can confirm that the deadlock happen with only 2 replicas by shard. I > > need > > shutdown one node that host a replica of the shard to recover the > > indexation > > capability. > > > > -- > > Yago Riveiro > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote: > > > > > > > > > > > Hi All, > > > > > > I am trying to benchmark SOLR Cloud and it consistently hangs. > > > Nothing in the logs, no stack trace, no errors, no warnings, just seems > > > stuck. > > > > > > A little bit about my set up. > > > I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each > > > host > > > > > > > is configured to have 8 SOLR cloud nodes running at 4GB each. > > > JVM configs: http://apaste.info/57Ai > > > > > > My cluster has 12 shards with replication factor 2- > > > http://apaste.info/09sA > > > > > > I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already > > running this configuration in production in Non-Cloud form. > > > It got stuck repeatedly. > > > > > > I decided to upgrade to the latest and greatest of everything, SOLR 4.3, > > > JDK7 > > and tomcat7. > > > It still shows same behaviour and hangs through the test. > > > > > > My test schema and config. > > > Schema.xml - http://apaste.info/imah > > > SolrConfig.xml - http://apaste.info/ku4F > > > > > > The test is pretty simple. its a jmeter test with update command via SOAP > > > rpc > > (round robin request across every node), adding in 5 fields from a csv file > > - > > id, guid, subject, body, compositeID (guid!id). > > > number of jmeter threads = 150. loop count = 20, num of messages to > > > add/per > > > > guid = 3; total 150*3*20 = 9000 documents. > > > > > > When cloud gets stuck, i don't get anything in the logs, but when i run > > netstat i see the following. > > > Sample netstat on a stuck run. http://apaste.info/hr0O > > > hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. > > > > > > At the moment my benchmarking efforts are at a stand still. > > > > > > Any help from the community would be great, I got some heap dumps and > > > stack > > dumps, but haven't found a smoking gun yet. > > > If I can provide anything else to diagnose this issue. just let me know. > > > > > > Thanks, > > > > > > Rishi.