Re: Giraph/Netty issues on a cluster

Alessandro Presta Wed, 13 Feb 2013 11:36:01 -0800

Hi Zachary,

Are you running one of the examples or your own code?
It seems to me that a call to edge.getValue() is returning null, which should 
never happen.

Alessandro

From: Zachary Hanif <zh4...@gmail.com<mailto:zh4...@gmail.com>>
Reply-To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
<user@giraph.apache.org<mailto:user@giraph.apache.org>>
Date: Wednesday, February 13, 2013 11:29 AM
To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
<user@giraph.apache.org<mailto:user@giraph.apache.org>>
Subject: Giraph/Netty issues on a cluster

(How embarrassing! I forgot a subject header in a previous attempt to post 
this. Please reply to this thread, not the other.)

Hi everyone,

I am having some odd issues when trying to run a Giraph 0.2 job across my CDH 
3u3 cluster. After building the jar, and deploying it across the cluster, I 
start to notice a handful of my nodes reporting the following error:

2013-02-13 17:47:43,341 WARN 
org.apache.giraph.comm.netty.handler.ResponseClientHandler: exceptionCaught: 
Channel failed with remote address 
<EDITED_INTERNAL_DNS>/10.2.0.16:30001<http://10.2.0.16:30001>
java.lang.NullPointerException
    at 
org.apache.giraph.vertex.EdgeListVertexBase.write(EdgeListVertexBase.java:106)
    at 
org.apache.giraph.partition.SimplePartition.write(SimplePartition.java:169)
    at 
org.apache.giraph.comm.requests.SendVertexRequest.writeRequest(SendVertexRequest.java:71)
    at 
org.apache.giraph.comm.requests.WritableRequest.write(WritableRequest.java:127)
    at 
org.apache.giraph.comm.netty.handler.RequestEncoder.encode(RequestEncoder.java:96)
    at 
org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:61)
    at 
org.jboss.netty.handler.execution.ExecutionHandler.handleDownstream(ExecutionHandler.java:185)
    at org.jboss.netty.channel.Channels.write(Channels.java:712)
    at org.jboss.netty.channel.Channels.write(Channels.java:679)
    at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:246)
    at 
org.apache.giraph.comm.netty.NettyClient.sendWritableRequest(NettyClient.java:655)
    at 
org.apache.giraph.comm.netty.NettyWorkerClient.sendWritableRequest(NettyWorkerClient.java:144)
    at 
org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:425)
    at 
org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:195)
    at 
org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:365)
    at 
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:190)
    at 
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)

What would be causing this? All other Hadoop jobs run well on the cluster, and 
when the Giraph job is run with only one worker, it completes without any 
issues. When run with any number of workers >1, the above error occurs. I have 
referenced this 
post<http://mail-archives.apache.org/mod_mbox/giraph-user/201209.mbox/%3ccaeq6y7shc4in-l73nr7abizspmrrfw9sfa8tmi3myqml8vk...@mail.gmail.com%3E>
 where superficially similar issues were discussed, but the root cause appears 
to be different, and suggested methods of resolution are not panning out.

As extra background, the 'remote address' changes, as the error cycles through 
my available cluster nodes, and the failing workers do not seem to favor one 
physical machine over another. Not all nodes present this issue, only a handful 
per job. Is there soemthing simple that I am missing?

Re: Giraph/Netty issues on a cluster

Reply via email to