Hard to say based on the bits/pieces of the log we have access to. I'd
have to see the full log, preferably from both the server and client, to
gain more insight.
re low numbers, this is the received count for the server, this should
always increase never decrease. The fact that it is so low e
Thanks Patrick. See below.
On Tue, Feb 23, 2010 at 1:19 PM, Patrick Hunt wrote:
> Stack you might look at the following:
>
> 1) why does server 14 have such a low recv count?
>
> Received: 194
>
> while the other servers are at 3.7k + received. Did server 14 fail at some
> point? Or it's
Stack you might look at the following:
1) why does server 14 have such a low recv count?
Received: 194
while the other servers are at 3.7k + received. Did server 14 fail at
some point? Or it's network? This may have caused the timeout seen by
the client:
--snippet-
2010-02-2
Dang. Didn't save the log. Pardon me.
I pasted exceptions only and thought it all about 0x26ed968d880001
session but now I see that what I posted above has TIMED_OUT on
another session altogether. Above I skipped pasting exceptions
thinking them on the same session but now it seems they probabl
HI stack,
the other interesting part is with the session:
0x26ed968d880001
Looks like it gets disconnected from one of the servers (TIMEOUT). DO you
see any of these messages: "Attempting connection to server" in the logs
before you see all the consecutive
org.apache.zookeeper.ClientCnxn: Except
The thing that seems odd to me is that the connectivity complaints are
out of the zk client, right?, why is it failing getting to member 14
and why not move to another ensemble member if issue w/ 14?, and if
there were a general connectivity issue, I'd think that the running
hbase cluster would be
I also looked at the logs. Ted might have a point. It does look like that
zookeeper server's are doing fine (though as ted mentions the skew is a
little concerning, though that might be due to very few packets served by
the first server). Other than that the latencies of 300 ms at max should not
ca
Not sure this helps at all, but these times are remarkably asymmetrical. I
would expect members of a ZK cluster to have very comparable times.
Additionally, 345 ms is nowhere near large enough to cause a session to
expire. My take is that ZK doesn't think it caused the timeout.
On Mon, Feb 22,
Hey Lads:
Any chance of some pointers debugging a session TIMED OUT?
Client is hosted inside an hbase regionserver. Usually session
timeout is because of some fat GC pause that is longer than session
timeout but thats not the case here. It seems to be a connectivity
problem. Let me post a few