No, I don't have that data. I'll try to get it next time. On Jul 31, 2012, at 5:13 PM, Patrick Hunt <[email protected]> wrote:
> Any monitoring of mem, gc, disk, etc... that might give some > additional insight? Perhaps the disks were loaded and that was slowing > things? Or swapping/gc of the jvm? You might be able to tune to > resolve some of that. > > One thing you can try is copying the snapshot file to a an empty > datadir on a separate machine and try starting a 2 node cluster. > (where the second node starts with an empty datadir) > > Patrick > > On Tue, Jul 31, 2012 at 3:34 PM, Jordan Zimmerman > <[email protected]> wrote: >>> Seems you are down to 4gb now. That still seems way too high for >>> "coordination" operations… ? >> >> A big problem currently is detritus nodes. People use lock recipes for >> various movie IDs and they leave garbage parent nodes around in the >> thousands. I've written some gc tasks to clean them up but it's been a slow >> process to get everyone to use it. I know there is a Jira to help with this >> but I don't know the status. >> >> -JZ >> >> On Jul 31, 2012, at 3:17 PM, Patrick Hunt <[email protected]> wrote: >> >>> On Tue, Jul 31, 2012 at 3:14 PM, Jordan Zimmerman >>> <[email protected]> wrote: >>>> There were a lot creations but I removed those nodes last night. How long >>>> does it take to clear out of the snapshot? >>> >>> The snapshot is a copy of whatever is in the znode tree at the time >>> the snapshot is taken. (so instantaneous the next time a snapshot is >>> taken). You can see the dates and the epoch number if that gives you >>> any insight (epoch is the upper 32 bits of the filename) >>> >>> Seems you are down to 4gb now. That still seems way too high for >>> "coordination" operations... ? >>> >>> Patrick >>> >>>> >>>> On Jul 31, 2012, at 2:52 PM, Patrick Hunt <[email protected]> wrote: >>>> >>>>> You have an 11gig snapshot file. That's very large. Did someone >>>>> unexpectedly overload the server with znode creations? >>>>> >>>>> When a follower comes up the leader needs to serialize the znodes to >>>>> the snapshot file, stream it to the follower, who saves it locally >>>>> then deserializes it. (11g/15min is avg about 12meg/second for this >>>>> process) >>>>> >>>>> Often times this is exacerbated by the max heap and GC interactions. >>>>> >>>>> Patrick >>>>> >>>>> On Tue, Jul 31, 2012 at 2:23 PM, Jordan Zimmerman >>>>> <[email protected]> wrote: >>>>>> BTW - this is 3.3.5 >>>>>> >>>>>> On Jul 31, 2012, at 2:22 PM, Jordan Zimmerman >>>>>> <[email protected]> wrote: >>>>>> >>>>>>> We've had a few outages of our ZK cluster recently. When trying to >>>>>>> bring the cluster back up it's been taking 10-15 minutes for the >>>>>>> followers to sync with the Leader. Any idea what might cause this? >>>>>>> Here's an ls of the data dir: >>>>>>> >>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 67108880 Jul 31 20:39 >>>>>>> log.3900a4bc75 >>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 67108880 Jul 31 20:40 >>>>>>> log.3900a634ee >>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 67108880 Jul 31 21:21 >>>>>>> log.3a00000001 >>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 67108880 Jul 31 21:22 >>>>>>> log.3a000139a2 >>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 9279729723 Jul 31 20:42 >>>>>>> snapshot.3900a634ec >>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 11126306780 Jul 31 21:09 >>>>>>> snapshot.3900a6b149 >>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 4153727423 Jul 31 21:22 >>>>>>> snapshot.3a000139a0 >>>>>>> >>>>>> >>>> >>
