GC and disk IO (transactional log in particular) will cause significant latency in some cases. See this for details on the types of things you should look at:

http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting

I've seen cases where the JVM will pause for 2+ minutes for GC, in some cases I've seen 4+ and I've heard of worse than that. Tuning GC (in particular using incremental/cms gc) is critical for consistently low latencies.

Patrick

Josh Scheid wrote:
On Fri, Jan 22, 2010 at 17:48, Mahadev Konar <maha...@yahoo-inc.com> wrote:
 The server latency does seem huge. What os and hardware are you running it
on?

RHEL4 2.8GHz 2-core AMD.  8GB RAM.
I will check with my admin for evidence of swap activity, but I don't
anticipate any.

This is bursty.  I'm currently seeing ~200 connections, but maxlat in
the last hour has been 185ms with 4ms avg.

What is usage model of zookeeper?

Distributed lock service.  Using the lock recipe.  Hosts hold a lock
for 5s to a couple of minutes with zero to dozens of waiters.

How much memory are you allocating to the server?

That's a good question.  I'm not an expert at java deployment.  I just
use zkServer.sh defaults.
sun-jre 1.6.0_14.  It's taking 1188MB of virtual memory right now,
100MB resident.

The debug well exacerbate the problem.

OK.

A dedicated disk means the following:
Zookeeper has snapshots and transaction logs. The datadir is the directory
that stores the transaction logs. Its highly recommended that this directory
be on a separate disk that isnt being used by any other process. The
snapshots can sit on a disk that is being used by the OS and can be shared.

Yeah, I understand.  I just need to get that set up and was hoping
that my load wouldn't warrant it.

Also, Pat ran some tests for serve lantecies at:

http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview

You can take a look at that as well and see what the expected performance
should be for your workload.

I will take a look at that.  Thank you for your time.

-Josh

Reply via email to