Re: Deadlock when running on Hadoop 1.0.4

Sebastian Schelter Fri, 25 Jan 2013 03:07:59 -0800

Hi,

I get exactly the same deadlock when using a dedicated (non-distributed)
ZK instance. I tried 3.3.6 and 3.4.5.


I haven't used Giraph for a while, so I can't say whether this has
worked recently...

Best,
Sebastian



On 23.01.2013 05:14, Eli Reisman wrote:
> Hi Sebastian,
> 
> This seems to be a new issue related to our recent upgrade to
> multithreading. I have not seen this before. All my other emails related to
> the array index out of bounds error you found over the weekend.
> 
> however, I have had trouble with the local zk instance for some time now on
> a number of Giraph profiles and pretty much exclusively use a separate ZK
> instance of my own. Last summer I was running a lot of jobs on a 1.0.x
> hadoop cluster with Giraph, and I was told to use the on-cluster dedicated
> ZK quorum due to "problems" with Giraph's local ZK instanantiation. No one
> got more specific with me than that. I also can't get the local ZK
> instances to come up on Hadoop-2.0.x -- perhaps this feature of Giraph has
> had problems for a while. Was it working for you recently?
> 
> If you notice any other clues as to the cause, please post them I'm hoping
> to do some work aorund this soon.
> 
> On Tue, Jan 22, 2013 at 11:05 AM, Claudio Martella <
> claudio.marte...@gmail.com> wrote:
> 
>> Hi Sebastian,
>>
>> I do not know what is happening, I am also having problems of jobs
>> blocking while waiting to setup the zookeeper instance.
>> We should look into this.
>>
>> Best,
>> Claudio
>>
>>
>> On Mon, Jan 21, 2013 at 1:59 PM, Sebastian Schelter <s...@apache.org>wrote:
>>
>>> Hi,
>>>
>>> I'm testing a custom PageRank implementation using trunk on Hadoop
>>> 1.0.4. I seem to run into a deadlock after the input superstep.
>>>
>>> The workers report "finishSuperstep: (all workers done) WORKER_ONLY -
>>> Attempt=0, Superstep=0" and the master reports that all workers are done
>>> with superstep -1.
>>>
>>> I reconstructed this using a local setup and seems to me that the
>>> BspServiceMaster hangs in the cleanUpZooKeeper method infinitely.
>>>
>>> I'm not using a dedicated zk instance, I just have Giraph start one. Any
>>> ideas what can be done to fix my problem?
>>>
>>> Best,
>>> Sebastian
>>>
>>>
>>> excerpt from jstack
>>>
>>> "org.apache.giraph.master.MasterThread" prio=10 tid=0x00007f29fc385000
>>> nid=0x29d1 waiting on condition [0x00007f2a09a5f000]
>>>    java.lang.Thread.State: TIMED_WAITING (parking)
>>>         at sun.misc.Unsafe.park(Native Method)
>>>         - parking to wait for  <0x00000000f38967d8> (a
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>>         at
>>> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
>>>         at
>>>
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2116)
>>>         at
>>> org.apache.giraph.zk.PredicateLock.waitMsecs(PredicateLock.java:112)
>>>         at
>>> org.apache.giraph.zk.PredicateLock.waitForever(PredicateLock.java:138)
>>>         at
>>>
>>> org.apache.giraph.master.BspServiceMaster.cleanUpZooKeeper(BspServiceMaster.java:1602)
>>>         at
>>>
>>> org.apache.giraph.master.BspServiceMaster.cleanup(BspServiceMaster.java:1692)
>>>         at
>>> org.apache.giraph.master.MasterThread.run(MasterThread.java:144)
>>>
>>>
>>>
>>
>>
>> --
>>    Claudio Martella
>>    claudio.marte...@gmail.com
>>
>

Re: Deadlock when running on Hadoop 1.0.4

Reply via email to