GC and disk IO (transactional log in particular) will cause significant
latency in some cases. See this for details on the types of things you
should look at:
http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting
I've seen cases where the JVM will pause for 2+ minutes for GC, in some
cases I've seen 4+ and I've heard of worse than that. Tuning GC (in
particular using incremental/cms gc) is critical for consistently low
latencies.
Patrick
Josh Scheid wrote:
On Fri, Jan 22, 2010 at 17:48, Mahadev Konar <maha...@yahoo-inc.com> wrote:
The server latency does seem huge. What os and hardware are you running it
on?
RHEL4 2.8GHz 2-core AMD. 8GB RAM.
I will check with my admin for evidence of swap activity, but I don't
anticipate any.
This is bursty. I'm currently seeing ~200 connections, but maxlat in
the last hour has been 185ms with 4ms avg.
What is usage model of zookeeper?
Distributed lock service. Using the lock recipe. Hosts hold a lock
for 5s to a couple of minutes with zero to dozens of waiters.
How much memory are you allocating to the server?
That's a good question. I'm not an expert at java deployment. I just
use zkServer.sh defaults.
sun-jre 1.6.0_14. It's taking 1188MB of virtual memory right now,
100MB resident.
The debug well exacerbate the problem.
OK.
A dedicated disk means the following:
Zookeeper has snapshots and transaction logs. The datadir is the directory
that stores the transaction logs. Its highly recommended that this directory
be on a separate disk that isnt being used by any other process. The
snapshots can sit on a disk that is being used by the OS and can be shared.
Yeah, I understand. I just need to get that set up and was hoping
that my load wouldn't warrant it.
Also, Pat ran some tests for serve lantecies at:
http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview
You can take a look at that as well and see what the expected performance
should be for your workload.
I will take a look at that. Thank you for your time.
-Josh