Re: Bit of help debugging a TIMED OUT session please

2010-02-23 Thread Patrick Hunt
Hard to say based on the bits/pieces of the log we have access to. I'd have to see the full log, preferably from both the server and client, to gain more insight. re low numbers, this is the received count for the server, this should always increase never decrease. The fact that it is so low e

Re: Bit of help debugging a TIMED OUT session please

2010-02-23 Thread Stack
Thanks Patrick. See below. On Tue, Feb 23, 2010 at 1:19 PM, Patrick Hunt wrote: > Stack you might look at the following: > > 1) why does server 14 have such a low recv count? > >        Received: 194 > > while the other servers are at 3.7k + received. Did server 14 fail at some > point? Or it's

Re: Bit of help debugging a TIMED OUT session please

2010-02-23 Thread Patrick Hunt
Stack you might look at the following: 1) why does server 14 have such a low recv count? Received: 194 while the other servers are at 3.7k + received. Did server 14 fail at some point? Or it's network? This may have caused the timeout seen by the client: --snippet- 2010-02-2

Re: Bit of help debugging a TIMED OUT session please

2010-02-22 Thread Stack
Dang. Didn't save the log. Pardon me. I pasted exceptions only and thought it all about 0x26ed968d880001 session but now I see that what I posted above has TIMED_OUT on another session altogether. Above I skipped pasting exceptions thinking them on the same session but now it seems they probabl

Re: Bit of help debugging a TIMED OUT session please

2010-02-22 Thread Mahadev Konar
HI stack, the other interesting part is with the session: 0x26ed968d880001 Looks like it gets disconnected from one of the servers (TIMEOUT). DO you see any of these messages: "Attempting connection to server" in the logs before you see all the consecutive org.apache.zookeeper.ClientCnxn: Except

Re: Bit of help debugging a TIMED OUT session please

2010-02-22 Thread Stack
The thing that seems odd to me is that the connectivity complaints are out of the zk client, right?, why is it failing getting to member 14 and why not move to another ensemble member if issue w/ 14?, and if there were a general connectivity issue, I'd think that the running hbase cluster would be

Re: Bit of help debugging a TIMED OUT session please

2010-02-22 Thread Mahadev Konar
I also looked at the logs. Ted might have a point. It does look like that zookeeper server's are doing fine (though as ted mentions the skew is a little concerning, though that might be due to very few packets served by the first server). Other than that the latencies of 300 ms at max should not ca

Re: Bit of help debugging a TIMED OUT session please

2010-02-22 Thread Ted Dunning
Not sure this helps at all, but these times are remarkably asymmetrical. I would expect members of a ZK cluster to have very comparable times. Additionally, 345 ms is nowhere near large enough to cause a session to expire. My take is that ZK doesn't think it caused the timeout. On Mon, Feb 22,

Bit of help debugging a TIMED OUT session please

2010-02-22 Thread Stack
Hey Lads: Any chance of some pointers debugging a session TIMED OUT? Client is hosted inside an hbase regionserver. Usually session timeout is because of some fat GC pause that is longer than session timeout but thats not the case here. It seems to be a connectivity problem. Let me post a few