I would remove the load balancer from the equation. Compactions do not stop the world, they may degrade performance for a while but thats about it.
Look in the logs on the servers, are the nodes logging that other nodes are going DOWN ? Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/05/2012, at 2:25 AM, cem wrote: > It should retry but it doesn't. It is also clear that it delegates the retry > to the client " Retry burden pushed out to client " you can also check Hector > code. I wrote a separate service that retries when this exception occurs. > > I think you have a problem with your load balancer. Try to connect with > telnet. > > Cem. > > On Tue, May 29, 2012 at 3:06 PM, Shubham Srivastava > <shubham.srivast...@makemytrip.com> wrote: > My webapp connects to the LoadBalancer IP which has the actual nodes in its > pool. > > If there is by any chance a connection break then will hector not retry to > re-establish connection I guess it should retry every XX seconds based on > retryDownedHostsDelayInSeconds > . > > > Regards, > Shubham > From: cem [cayiro...@gmail.com] > Sent: Tuesday, May 29, 2012 6:13 PM > To: user@cassandra.apache.org > Subject: Re: All host pools Marked Down > > Since all hosts are seem to be down, Hector will not do retry. There should > be at least one node up in a cluster. Make sure that you have a proper > connection from your webapps to your cluster. > > Cem. > > On Tue, May 29, 2012 at 1:46 PM, Shubham Srivastava > <shubham.srivast...@makemytrip.com> wrote: > Any takers on this. Hitting us badly right now. > > Regards, > Shubham > From: Shubham Srivastava > Sent: Tuesday, May 29, 2012 12:55 PM > To: user@cassandra.apache.org > Subject: All host pools Marked Down > > I am getting this exception lot of times > > > me.prettyprint.hector.api.exceptions.HectorException: All host pools marked > down. Retry burden pushed out to client. > > > > What this causes is no data read/write from the ring from my WebApp. > > > I have retries as 3 and can see that max retries 3 getting exhausted with the > same error as above. > > > Checked cfstats and tpstats nothing seem to be a problem. > > > However through the logs I see lot of time taken in compactions like the below > > > INFO [CompactionExecutor:73] 2012-05-29 11:03:01,605 CompactionManager.java > (line 608) Compacted to > /opt/cassandra-data/data/LH/UserPrefrences-tmp-g-8906-Data.db. 36,986,932 to > 36,961,554 (~99% of original) bytes for 132,743 keys. Time: 112,910ms. > > > The time taken here seems pretty high. Will this cause a pause or read > timeout etc. > > > I have the connection from my web app through a hardware loadbalancer . > Cassandra version is 0.8.6 with multi-DC ring on 6 nodes each in one DC. > > CL:1 and RF:3. > > > Memeory:8Gb heap -> 14Gb Server memory with 8Core CPU. > > > How do I move ahead in this. > > > Shubham Srivastava | Technical Lead - Technology Development > > +91 124 4910 548 | MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, > Gurgaon, Haryana - 122 016, India > > <image001.gif>What's new? My Trip Rewards - An exclusive loyalty program for > MakeMyTrip customers. > > <image002.gif> > > <image003.gif> > Office Map > > <image004.gif> > Facebook > > <image005.gif> > Twitter > > > >