Re: 15 minutes to sync?

Jordan Zimmerman Tue, 31 Jul 2012 17:47:09 -0700

No, I don't have that data. I'll try to get it next time.

On Jul 31, 2012, at 5:13 PM, Patrick Hunt <[email protected]> wrote:


> Any monitoring of mem, gc, disk, etc... that might give some
> additional insight? Perhaps the disks were loaded and that was slowing
> things? Or swapping/gc of the jvm? You might be able to tune to
> resolve some of that.
> 
> One thing you can try is copying the snapshot file to a an empty
> datadir on a separate machine and try starting a 2 node cluster.
> (where the second node starts with an empty datadir)
> 
> Patrick
> 
> On Tue, Jul 31, 2012 at 3:34 PM, Jordan Zimmerman
> <[email protected]> wrote:
>>> Seems you are down to 4gb now. That still seems way too high for
>>> "coordination" operations… ?
>> 
>> A big problem currently is detritus nodes. People use lock recipes for 
>> various movie IDs and they leave garbage parent nodes around in the 
>> thousands. I've written some gc tasks to clean them up but it's been a slow 
>> process to get everyone to use it. I know there is a Jira to help with this 
>> but I don't know the status.
>> 
>> -JZ
>> 
>> On Jul 31, 2012, at 3:17 PM, Patrick Hunt <[email protected]> wrote:
>> 
>>> On Tue, Jul 31, 2012 at 3:14 PM, Jordan Zimmerman
>>> <[email protected]> wrote:
>>>> There were a lot creations but I removed those nodes last night. How long 
>>>> does it take to clear out of the snapshot?
>>> 
>>> The snapshot is a copy of whatever is in the znode tree at the time
>>> the snapshot is taken. (so instantaneous the next time a snapshot is
>>> taken). You can see the dates and the epoch number if that gives you
>>> any insight (epoch is the upper 32 bits of the filename)
>>> 
>>> Seems you are down to 4gb now. That still seems way too high for
>>> "coordination" operations... ?
>>> 
>>> Patrick
>>> 
>>>> 
>>>> On Jul 31, 2012, at 2:52 PM, Patrick Hunt <[email protected]> wrote:
>>>> 
>>>>> You have an 11gig snapshot file. That's very large. Did someone
>>>>> unexpectedly overload the server with znode creations?
>>>>> 
>>>>> When a follower comes up the leader needs to serialize the znodes to
>>>>> the snapshot file, stream it to the follower, who saves it locally
>>>>> then deserializes it. (11g/15min is avg about 12meg/second for this
>>>>> process)
>>>>> 
>>>>> Often times this is exacerbated by the max heap and GC interactions.
>>>>> 
>>>>> Patrick
>>>>> 
>>>>> On Tue, Jul 31, 2012 at 2:23 PM, Jordan Zimmerman
>>>>> <[email protected]> wrote:
>>>>>> BTW - this is 3.3.5
>>>>>> 
>>>>>> On Jul 31, 2012, at 2:22 PM, Jordan Zimmerman 
>>>>>> <[email protected]> wrote:
>>>>>> 
>>>>>>> We've had a few outages of our ZK cluster recently. When trying to 
>>>>>>> bring the cluster back up it's been taking 10-15 minutes for the 
>>>>>>> followers to sync with the Leader. Any idea what might cause this? 
>>>>>>> Here's an ls of the data dir:
>>>>>>> 
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 20:39 
>>>>>>> log.3900a4bc75
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 20:40 
>>>>>>> log.3900a634ee
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 21:21 
>>>>>>> log.3a00000001
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 21:22 
>>>>>>> log.3a000139a2
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac  9279729723 Jul 31 20:42 
>>>>>>> snapshot.3900a634ec
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 11126306780 Jul 31 21:09 
>>>>>>> snapshot.3900a6b149
>>>>>>> -rw-r--r-- 1 zookeeperserverprod nac  4153727423 Jul 31 21:22 
>>>>>>> snapshot.3a000139a0
>>>>>>> 
>>>>>> 
>>>> 
>>

Re: 15 minutes to sync?

Reply via email to