Well that's good - 300ms max latency means that the server can round trip any requests pretty quickly. It would lead me to look at the client VMs or (intermittent) network problems...

Keep in mind though that's one of your servers (unless you are saying you checked all X of the servers in the cluster and that was the overall max?). You may discover one server that has issues while the other servers are fine. In which case only clients connected to the "bad" server(s) will experience problems. (and since clients can jump btw that might be contributing the the randomness in observed occurrence)

Good luck and keep us posted. EC2 is very interesting, I'd like to learn more about the operating environment and in particular the issues involved with running ZK there.

Patrick

Ted Dunning wrote:
Patrick,

Thanks enormously.

This hasn't helped yet, but that is just because it was a very large bite of
the apple.  Once I digest it, I can tell that it will be very helpful.

I did have a chance to look at the "stat" output and maximum latency was
<300ms.  How that connects with what you are saying isn't clear yet, but I
can see how that might not be diagnostic of whether the server side timeout
is sufficiently long.

Thanks again.

On Thu, Apr 16, 2009 at 10:57 AM, Patrick Hunt <ph...@apache.org> wrote:

lots of stuff about monitoring ... jmx ... packet loss ... vm latencies ...
timeout details.
... Hope this helps.

Patrick




Reply via email to