Hi, I get exactly the same deadlock when using a dedicated (non-distributed) ZK instance. I tried 3.3.6 and 3.4.5.
I haven't used Giraph for a while, so I can't say whether this has worked recently... Best, Sebastian On 23.01.2013 05:14, Eli Reisman wrote: > Hi Sebastian, > > This seems to be a new issue related to our recent upgrade to > multithreading. I have not seen this before. All my other emails related to > the array index out of bounds error you found over the weekend. > > however, I have had trouble with the local zk instance for some time now on > a number of Giraph profiles and pretty much exclusively use a separate ZK > instance of my own. Last summer I was running a lot of jobs on a 1.0.x > hadoop cluster with Giraph, and I was told to use the on-cluster dedicated > ZK quorum due to "problems" with Giraph's local ZK instanantiation. No one > got more specific with me than that. I also can't get the local ZK > instances to come up on Hadoop-2.0.x -- perhaps this feature of Giraph has > had problems for a while. Was it working for you recently? > > If you notice any other clues as to the cause, please post them I'm hoping > to do some work aorund this soon. > > On Tue, Jan 22, 2013 at 11:05 AM, Claudio Martella < > claudio.marte...@gmail.com> wrote: > >> Hi Sebastian, >> >> I do not know what is happening, I am also having problems of jobs >> blocking while waiting to setup the zookeeper instance. >> We should look into this. >> >> Best, >> Claudio >> >> >> On Mon, Jan 21, 2013 at 1:59 PM, Sebastian Schelter <s...@apache.org>wrote: >> >>> Hi, >>> >>> I'm testing a custom PageRank implementation using trunk on Hadoop >>> 1.0.4. I seem to run into a deadlock after the input superstep. >>> >>> The workers report "finishSuperstep: (all workers done) WORKER_ONLY - >>> Attempt=0, Superstep=0" and the master reports that all workers are done >>> with superstep -1. >>> >>> I reconstructed this using a local setup and seems to me that the >>> BspServiceMaster hangs in the cleanUpZooKeeper method infinitely. >>> >>> I'm not using a dedicated zk instance, I just have Giraph start one. Any >>> ideas what can be done to fix my problem? >>> >>> Best, >>> Sebastian >>> >>> >>> excerpt from jstack >>> >>> "org.apache.giraph.master.MasterThread" prio=10 tid=0x00007f29fc385000 >>> nid=0x29d1 waiting on condition [0x00007f2a09a5f000] >>> java.lang.Thread.State: TIMED_WAITING (parking) >>> at sun.misc.Unsafe.park(Native Method) >>> - parking to wait for <0x00000000f38967d8> (a >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) >>> at >>> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) >>> at >>> >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2116) >>> at >>> org.apache.giraph.zk.PredicateLock.waitMsecs(PredicateLock.java:112) >>> at >>> org.apache.giraph.zk.PredicateLock.waitForever(PredicateLock.java:138) >>> at >>> >>> org.apache.giraph.master.BspServiceMaster.cleanUpZooKeeper(BspServiceMaster.java:1602) >>> at >>> >>> org.apache.giraph.master.BspServiceMaster.cleanup(BspServiceMaster.java:1692) >>> at >>> org.apache.giraph.master.MasterThread.run(MasterThread.java:144) >>> >>> >>> >> >> >> -- >> Claudio Martella >> claudio.marte...@gmail.com >> >