Thanks,
Manosiz.
On Thu, Jan 19, 2012 at 11:31 AM, Patrick Hunt wrote:
> See "preAllocSize"
>
> http://zookeeper.apache.org/doc/r3.4.2/zookeeperAdmin.html#sc_advancedConfiguration
>
> On Thu, Jan 19, 2012 at 10:49 AM, Manosiz Bhattacharyya
> wrote:
> > Thanks a lot for this info. A pointer in t
See "preAllocSize"
http://zookeeper.apache.org/doc/r3.4.2/zookeeperAdmin.html#sc_advancedConfiguration
On Thu, Jan 19, 2012 at 10:49 AM, Manosiz Bhattacharyya
wrote:
> Thanks a lot for this info. A pointer in the code to where you do this
> preallocation or a flag to disable this would be very be
Thanks a lot for this info. A pointer in the code to where you do this
preallocation or a flag to disable this would be very beneficial.
On Thu, Jan 19, 2012 at 10:18 AM, Ted Dunning wrote:
> ZK does pretty much entirely sequential I/O.
>
> One thing that it does which might be very, very bad fo
We are using the zookeeper c client version 3.3.4 the same as the server.
We use libptread-2.10.1.so, and no special time slicing in user code. Will
let you know what we find.
Thanks,
Manosiz.
On Thu, Jan 19, 2012 at 10:09 AM, Patrick Hunt wrote:
> On Thu, Jan 19, 2012 at 9:31 AM, Manosiz Bhatt
ZK does pretty much entirely sequential I/O.
One thing that it does which might be very, very bad for SSD is that it
pre-allocates disk extents in the log by writing a bunch of zeros. This is
to avoid directory updates as the log is written, but it doubles the load
on the SSD.
On Thu, Jan 19, 20
On Thu, Jan 19, 2012 at 9:31 AM, Manosiz Bhattacharyya
wrote:
> I do not think that there is a problem with the queue size. I guess the
> problem is more with latency when the Fusion I/O goes in for a GC. We are
> enabling stats on the Zookeeper and the fusion I/O to be more precise. Does
> Zookee
I do not think that there is a problem with the queue size. I guess the
problem is more with latency when the Fusion I/O goes in for a GC. We are
enabling stats on the Zookeeper and the fusion I/O to be more precise. Does
Zookeeper typically do only sequential I/O, or does it do some random too.
We
If you aren't pushing much data through ZK, there is almost no way that the
request queue can fill up without the log or snapshot disks being slow.
See what happens if you put the log into a real disk or (heaven help us)
onto a tmpfs partition.
On Thu, Jan 19, 2012 at 2:18 AM, Manosiz Bhattachary
I will do as you mention.
We are using the async API's throughout. Also we do not write too much data
into Zookeeper. We just use it for leadership elections and health
monitoring, which is why we see the timeouts typically on idle zookeeper
connections.
The reason why we want the sessions to be
On Wed, Jan 18, 2012 at 4:47 PM, Manosiz Bhattacharyya
wrote:
> Thanks Patrick for your answer,
No problem.
> Actually we are in a virtualized environment, we have a FIO disk for
> transactional logs. It does have some latency sometimes during FIO garbage
> collection. We know this could be the
I was not indicating that we do not detect the situation of a stuck server.
A watchdog of some sort keeping track of queue changes could also suffice.
Thanks for you input. I guess we will try to work out with the increasing
the timeout.
-- Manosiz.
On Wed, Jan 18, 2012 at 4:54 PM, Ted Dunning w
Yes.
On Wed, Jan 18, 2012 at 5:15 PM, Ted Dunning wrote:
> Does FIO stand for Fusion I/O?
>
> On Thu, Jan 19, 2012 at 12:47 AM, Manosiz Bhattacharyya
> wrote:
>
> > ... we have a FIO disk
>
On Wed, Jan 18, 2012 at 3:21 PM, Camille Fournier
wrote:
> Duh, I knew there was something I was forgetting. You can't process the
> session timeout faster than the server can process the full pipeline, so
> making pings come back faster just means you will have a false sense of
> liveness for yo
Does FIO stand for Fusion I/O?
On Thu, Jan 19, 2012 at 12:47 AM, Manosiz Bhattacharyya
wrote:
> ... we have a FIO disk
That really depends on whether you think that a stuck server is a problem.
The primary indication of that is a full queue and you are suggesting that
we not detect this situation. It isn't a matter of keeping the session
alive ... it is a matter of whether or not we can guarantee that things are
Thanks Patrick for your answer,
Actually we are in a virtualized environment, we have a FIO disk for
transactional logs. It does have some latency sometimes during FIO garbage
collection. We know this could be the potential issue, but was trying to
workaround that.
We were trying to qualify the r
Duh, I knew there was something I was forgetting. You can't process the
session timeout faster than the server can process the full pipeline, so
making pings come back faster just means you will have a false sense of
liveness for your services.
The question about why the leaders and followers hand
Next up is disk. (I'm assuming you're not running in a virtualized
environment, correct?) You have a dedicated log device for the
transactional logs? Check your disk latency and make sure that's not
holding up the writes.
What does "stat" show you wrt latency in general and at the time you
see the
Thanks a lot for your response. We are running the c-client, as all our
components are C++ applications. We are tracing GC on the server side, but
did not see much activity there. We did tune GC. Our gc flags include the
following
JVMFLAGS="$JVMFLAGS -XX:+UseParNewGC"
JVMFLAGS="$JVMFLAGS -XX:+UseC
Monitor GC on *both* ZK server and client. Either side can easily cause a
1-2 second delay if mal-configured.
On Wed, Jan 18, 2012 at 10:34 PM, Patrick Hunt wrote:
> I suspect that you are being effected by GC pauses. Have you tuned the
> GC at all or just the defaults? Monitor the GC in the VM
On Wed, Jan 18, 2012 at 2:03 PM, Camille Fournier wrote:
> I think it can be done. Looking through the code, it seems like it should
> be safe modulo some stats that are set in the FinalRequestProcessor that
> may be less useful.
>
Turning around HBs at the head end of the server is a bad idea. I
Forgot to mention, use "stat" and some of the other 4letterwords to
get an idea what your request latency looks like across servers. In
particular you can see the "max latency" and correlate that with what
you're seeing on the clients & gc (etc...) activity.
Patrick
On Wed, Jan 18, 2012 at 2:34 P
5 seconds is fairly low. HBs are sent by the client every 1/3 the
timeout, with expectation that it will get a response in another 1/3
the timeout. if not the client session will time out.
As a result, any blip of 1.5 sec or more btw the client and server
could cause this to happen. Network latenc
I think it can be done. Looking through the code, it seems like it should
be safe modulo some stats that are set in the FinalRequestProcessor that
may be less useful.
A question for the other zookeeper devs out there, is there a reason that
we handle read-only operations in the first processor dif
24 matches
Mail list logo