Re: problems on EC2?

2009-04-16 Thread Patrick Hunt
Take a look at this section to start: http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_commonProblems What type of monitoring are you doing on your cluster? You could monitor at both the host and at the java (jmx) level. That will give you some insight on where to look;

Re: problems on EC2?

2009-04-16 Thread Ted Dunning
Patrick, Thanks enormously. This hasn't helped yet, but that is just because it was a very large bite of the apple. Once I digest it, I can tell that it will be very helpful. I did have a chance to look at the stat output and maximum latency was 300ms. How that connects with what you are

Re: problems on EC2?

2009-04-16 Thread Patrick Hunt
Well that's good - 300ms max latency means that the server can round trip any requests pretty quickly. It would lead me to look at the client VMs or (intermittent) network problems... Keep in mind though that's one of your servers (unless you are saying you checked all X of the servers in the

Re: problems on EC2?

2009-04-14 Thread Nitay
Hi Ted, Fellow user coming from HBase. We were recently seeing lots of SessionExpired events as well. Check out this mail thread: http://markmail.org/search/?q=SessionExpired#query:SessionExpired+page:1+mid:gt4c2kn4n4f5s5kw+state:results Perhaps this might have something to do with what you're

Re: problems on EC2?

2009-04-14 Thread Ted Dunning
Very good pointer. Thanks. Are you still having your problems? On Tue, Apr 14, 2009 at 6:09 PM, Nitay nit...@gmail.com wrote: Hi Ted, Fellow user coming from HBase. We were recently seeing lots of SessionExpired events as well. Check out this mail thread:

Re: problems on EC2?

2009-04-14 Thread Nitay
Yes, we are. We currently don't handle SessionExpired very well at all in HBase. There are two things going on in parallel to fix it: 1) Reinitialize the ZooKeeper handler (and everything else that depends on it) on the node in question when a SessionExpired event occurs. 2) Reduce the number of