I understand no strategy will work perfectly in all circumstances, just need better documentation so developers can make correct assumptions. Previously I assume delivery of session expiration event & ephemeral dissapearance will occur together - not exact same time but within certain definite time frame...
BTW, here is the thread dump of the zombie client: Full thread dump Java HotSpot(TM) 64-Bit Server VM (11.3-b02 mixed mode): "Attach Listener" daemon prio=10 tid=0x0000000054aad800 nid=0x6813 waiting on condition [0x0000000000000000..0x0000000000000000] java.lang.Thread.State: RUNNABLE "IPC Client (47) connection to hdpnn/10.249.54.101:9000 from taobao" daemon prio=10 tid=0x00002aaadc31c800 nid=0x67f8 in Object.wait() [0x00000000427fa000..0x00000000427faa90] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaaaf9360e0> (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:396) - locked <0x00002aaaaf9360e0> (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:438) "ResponseProcessor for block blk_-7997360194615811589_639843163" daemon prio=10 tid=0x0000000054aae000 nid=0x67ec runnable [0x00000000429fc000..0x00000000429fcd10] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x00002aaaaf9b1de0> (a sun.nio.ch.Util$1) - locked <0x00002aaaaf9b1dc8> (a java.util.Collections$UnmodifiableSet) - locked <0x00002aaaaf9818a0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2318) "DataStreamer for file /group/tbads/TimeTunnel2/merge_pv/20100815/02/35/tt2yunti2.sds.cnz.alimama.com/43_040500a8-5aa3-4816-9ab5-31ffd70bf899.log.tmp block blk_-7997360194615811589_639843163" daemon prio=10 tid=0x00000000549cc400 nid=0x67c9 in Object.wait() [0x00000000423f6000..0x00000000423f6c90] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaaaf6faf80> (a java.util.LinkedList) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2166) - locked <0x00002aaaaf6faf80> (a java.util.LinkedList) "LeaseChecker" daemon prio=10 tid=0x0000000054692800 nid=0x5882 waiting on condition [0x00000000428fb000..0x00000000428fbb90] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:978) at java.lang.Thread.run(Thread.java:619) "DestroyJavaVM" prio=10 tid=0x00002aaac022d000 nid=0x585f waiting on condition [0x0000000000000000..0x00000000415c9d00] java.lang.Thread.State: RUNNABLE "Thread-5" prio=10 tid=0x00002aaac022b800 nid=0x5880 waiting on condition [0x00000000426f9000..0x00000000426f9a90] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aaaaf6e8460> (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217) at java.util.concurrent.Semaphore.acquire(Semaphore.java:286) at com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher$WatchDataOperation.execute(PathDataWatcher.java:37) at com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher$WatchDataOperation.execute(PathDataWatcher.java:28) at org.apache.zookeeper.recipes.lock.ProtocolSupport.retryOperation(ProtocolSupport.java:120) at com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher.watch(PathDataWatcher.java:45) at com.taobao.timetunnel2.cluster.zookeeper.ZooKeeperClient$2.run(ZooKeeperClient.java:82) "Thread-4" prio=10 tid=0x00002aaac02a0c00 nid=0x587f waiting on condition [0x00000000425f8000..0x00000000425f8d10] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aaaaf6ec150> (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217) at java.util.concurrent.Semaphore.acquire(Semaphore.java:286) at com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher$WatchDataOperation.execute(PathDataWatcher.java:37) at com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher$WatchDataOperation.execute(PathDataWatcher.java:28) at org.apache.zookeeper.recipes.lock.ProtocolSupport.retryOperation(ProtocolSupport.java:120) at com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher.watch(PathDataWatcher.java:45) at com.taobao.timetunnel2.cluster.zookeeper.ZooKeeperClient$2.run(ZooKeeperClient.java:82) "Thread-3" prio=10 tid=0x00002aaac0264800 nid=0x587e runnable [0x00000000424f7000..0x00000000424f7d90] java.lang.Thread.State: RUNNABLE at java.util.zip.Deflater.deflateBytes(Native Method) at java.util.zip.Deflater.deflate(Deflater.java:290) - locked <0x00002aaaaf8ce280> (a org.apache.hadoop.io.compress.zlib.BuiltInZlibDeflater) at org.apache.hadoop.io.compress.zlib.BuiltInZlibDeflater.compress(BuiltInZlibDeflater.java:47) - locked <0x00002aaaaf8ce280> (a org.apache.hadoop.io.compress.zlib.BuiltInZlibDeflater) at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:76) at org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:71) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) - locked <0x00002aaaaf73b1b0> (a java.io.BufferedOutputStream) at java.io.DataOutputStream.write(DataOutputStream.java:90) - locked <0x00002aaaaf90e210> (a java.io.DataOutputStream) at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1247) - locked <0x00002aaaaf6eeac8> (a org.apache.hadoop.io.SequenceFile$BlockCompressWriter) at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1270) - locked <0x00002aaaaf6eeac8> (a org.apache.hadoop.io.SequenceFile$BlockCompressWriter) at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.append(SequenceFile.java:1321) - locked <0x00002aaaaf6eeac8> (a org.apache.hadoop.io.SequenceFile$BlockCompressWriter) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977) - locked <0x00002aaaaf6eeac8> (a org.apache.hadoop.io.SequenceFile$BlockCompressWriter) at com.taobao.timetunnel2.savefile.util.HDFSWriter.write(HDFSWriter.java:42) at com.taobao.timetunnel2.savefile.reader.HDFSHandler.handleRecord(HDFSHandler.java:46) at com.taobao.timetunnel2.savefile.reader.FileReader.processFile(FileReader.java:151) at com.taobao.timetunnel2.savefile.reader.FileReader.doProcessFile(FileReader.java:130) at com.taobao.timetunnel2.savefile.reader.FileReader.execute(FileReader.java:82) at com.taobao.timetunnel2.savefile.app.StoppableService.run(StoppableService.java:37) at java.lang.Thread.run(Thread.java:619) "main-EventThread" daemon prio=10 tid=0x00002aaac01c7400 nid=0x587c waiting on condition [0x00000000422f5000..0x00000000422f5c90] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aaaaf6e1538> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) "main-SendThread" daemon prio=10 tid=0x00002aaac01ad000 nid=0x587b runnable [0x00000000421f4000..0x00000000421f4b10] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x00002aaaaf6efb68> (a sun.nio.ch.Util$1) - locked <0x00002aaaaf6efb80> (a java.util.Collections$UnmodifiableSet) - locked <0x00002aaaaf6f7ff0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:921) "Low Memory Detector" daemon prio=10 tid=0x00002aaac0026800 nid=0x5879 runnable [0x0000000000000000..0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x00002aaac0024400 nid=0x5878 waiting on condition [0x0000000000000000..0x00000000418cb320] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x00002aaac0022400 nid=0x5877 waiting on condition [0x0000000000000000..0x00000000417ca5b0] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x00002aaac0020800 nid=0x5876 runnable [0x0000000000000000..0x00000000416caa20] java.lang.Thread.State: RUNNABLE "Surrogate Locker Thread (CMS)" daemon prio=10 tid=0x00002aaac001ec00 nid=0x5875 waiting on condition [0x0000000000000000..0x0000000041472ec8] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x00002aaac0000c00 nid=0x5874 in Object.wait() [0x0000000041ff2000..0x0000000041ff2c90] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaaaf6efb98> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) - locked <0x00002aaaaf6efb98> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x0000000054301000 nid=0x5873 in Object.wait() [0x0000000041ef1000..0x0000000041ef1b10] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaaaf6ef670> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x00002aaaaf6ef670> (a java.lang.ref.Reference$Lock) "VM Thread" prio=10 tid=0x00000000542fb800 nid=0x5872 runnable "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x0000000053ff7c00 nid=0x5860 runnable "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x0000000053ff9400 nid=0x5861 runnable "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x0000000053ffb000 nid=0x5862 runnable "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x0000000053ffc800 nid=0x5863 runnable "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x0000000053ffe000 nid=0x5864 runnable "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x0000000053fffc00 nid=0x5865 runnable "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x0000000054001400 nid=0x5866 runnable "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x0000000054002c00 nid=0x5867 runnable "Gang worker#8 (Parallel GC Threads)" prio=10 tid=0x0000000054004800 nid=0x5868 runnable "Gang worker#9 (Parallel GC Threads)" prio=10 tid=0x0000000054006000 nid=0x5869 runnable "Gang worker#10 (Parallel GC Threads)" prio=10 tid=0x0000000054007800 nid=0x586a runnable "Gang worker#11 (Parallel GC Threads)" prio=10 tid=0x0000000054009400 nid=0x586b runnable "Gang worker#12 (Parallel GC Threads)" prio=10 tid=0x000000005400ac00 nid=0x586c runnable "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000054144400 nid=0x5871 runnable "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x000000005413d800 nid=0x586d runnable "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x000000005413f000 nid=0x586e runnable "Gang worker#2 (Parallel CMS Threads)" prio=10 tid=0x0000000054140c00 nid=0x586f runnable "Gang worker#3 (Parallel CMS Threads)" prio=10 tid=0x0000000054142400 nid=0x5870 runnable "VM Periodic Task Thread" prio=10 tid=0x00002aaac0029000 nid=0x587a waiting on condition JNI global references: 637 On Tue, Aug 17, 2010 at 12:03 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > Ben or somebody else will have to repeat some of the detailed logic for > this, but it has > to do with the fact that you can't be sure what has happened during the > network partition. > One possibility is the one you describe, but another is that the partition > happened because > a majority of the ZK cluster lost power and you can't see the remaining > nodes. Those nodes > will continue to serve any files in a read-only fashion. If the partition > involves you losing > contact with the entire cluster at the same time a partition of the cluster > into a quorum and > a minority happens, then your ephemeral files could continue to exist at > least until the breach > in the cluster itself is healed. > > Suffice it to say that there are only a few strategies that leave you with a > coherent picture > of the universe. Importantly, you shouldn't assume that the ephemerals will > disappear at > the same time as the session expiration event is delivered. > > On Mon, Aug 16, 2010 at 8:31 PM, Qing Yan <qing...@gmail.com> wrote: > >> Ouch, is this the current ZK behavior? This is unexpected, if the >> client get partitioned from ZK cluster, he should >> get notified and take some action(e.g. commit suicide) otherwise how >> to tell a ephemeral node is really >> up or down? Zombie can create synchronization nightmares.. >> >> >> >> On Mon, Aug 16, 2010 at 7:22 PM, Dave Wright <wrig...@gmail.com> wrote: >> > Another possible cause for this that I ran into recently with the c >> client - >> > you don't get the session expired notification until you are reconnected >> to >> > the quorum and it informs you the session is lost. If you get >> disconnected >> > and can't reconnect you won't get the notification. Personally I think >> the >> > client api should track the session expiration time locally and >> information >> > you once it's expired. >> > >> > On Aug 16, 2010 2:09 AM, "Qing Yan" <qing...@gmail.com> wrote: >> > >> > Hi Ted, >> > >> > Do you mean GC problem can prevent delivery of SESSION EXPIRE event? >> > Hum...so you have met this problem before? >> > I didn't see any OOM though, will look into it more. >> > >> > >> > On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning <ted.dunn...@gmail.com> >> wrote: >> >> I am assuming that y... >> > >> >