I try to use Giraph on fully distributed hadoop eco system. But when I executed SimpleShortestPathsComputation example, but master is waiting infinitely at superstep -1.
This is my logs at master. 2014-01-21 04:19:45,255 DEBUG org.apache.giraph.bsp.BspService: process: Got a new event, path = /_hadoopBsp/job_201401131006_0043/_vertexInputSplitDoneDir, type = NodeChild 2014-01-21 04:19:45,255 DEBUG org.apache.giraph.bsp.BspService: process: vertexInputSplitsDoneStateChanged (worker finished sending) 2014-01-21 04:19:45,255 DEBUG org.apache.giraph.zk.PredicateLock: waitMsecs: Got timed signaled of true 2014-01-21 04:19:45,256 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x3434162634304d1, packet:: clientPath:null serverPath:null finished:false header:: 69 2014-01-21 04:19:45,257 DEBUG org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished worker list = [CCLab33_3, CCLab35_1, CCLab39_2], size = 3, worker 2014-01-21 04:19:45,257 DEBUG org.apache.giraph.zk.ZooKeeperExt: createExt: Creating path /_hadoopBsp/job_201401131006_0043/_vertexInputSplitsAllDone 2014-01-21 04:19:45,268 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x3434162634304d1, packet:: clientPath:null serverPath:null finished:false header:: 70 2014-01-21 04:19:45,268 DEBUG org.apache.giraph.zk.ZooKeeperExt: createExt: Creating path /_hadoopBsp/job_201401131006_0043/_applicationAttemptsDir/0/_superstepDir/-1/_worke 2014-01-21 04:19:45,276 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x3434162634304d1, packet:: clientPath:null serverPath:null finished:false header:: 71 2014-01-21 04:19:45,278 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x3434162634304d1, packet:: clientPath:null serverPath:null finished:false header:: 72 2014-01-21 04:19:45,278 DEBUG org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished worker list = [], size = 0, worker list = [Worker(hostname=CCLab33 2014-01-21 04:19:45,278 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 3 workers finished on superstep -1 on path /_hadoopBsp/job_201401131006 2014-01-21 04:19:45,278 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [CCLab35_1, CCLab39_2, CCLab33_3] 2014-01-21 04:19:45,278 DEBUG org.apache.giraph.zk.PredicateLock: waitMsecs: Wait for 10000 2014-01-21 04:19:45,281 DEBUG org.apache.giraph.comm.netty.WrappedAdaptiveReceiveBufferSizePredictorFactory$WrappedAdaptiveReceiveBufferSizePredictor: nextReceiveBufferSize: 2014-01-21 04:19:45,281 DEBUG org.apache.giraph.comm.netty.WrappedAdaptiveReceiveBufferSizePredictorFactory$WrappedAdaptiveReceiveBufferSizePredictor: previousReceiveBufferS 2014-01-21 04:19:45,281 DEBUG org.apache.giraph.comm.netty.ByteCounter: handleUpstream: serverByteCounter buffer size = 25, total bytes = 25 2014-01-21 04:19:45,283 INFO org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server window metrics MBytes/sec sent = 0, MBytes/sec received = 0, MBytesSent = 0, 2014-01-21 04:19:45,283 DEBUG org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Client 2, requestId 0, SEND_AGGREGATORS_TO_MASTER_REQUEST, with size 21 took 23833 2014-01-21 04:19:45,283 TRACE org.apache.giraph.comm.netty.handler.RequestServerHandler: messageReceived: Got class org.apache.giraph.comm.requests.SendAggregatorsToMasterRe 2014-01-21 04:19:45,284 DEBUG org.apache.giraph.master.MasterAggregatorHandler: acceptAggregatedValues: Accepted one set with 0 aggregated values 2014-01-21 04:19:45,284 DEBUG org.apache.giraph.comm.netty.handler.RequestServerHandler: messageReceived: Processing client 2, requestId 0, SEND_AGGREGATORS_TO_MASTER_REQUES 2014-01-21 04:19:45,286 DEBUG org.apache.giraph.comm.netty.ByteCounter: handleDownstream: serverByteCounter buffer size = 13, total bytes = 13 2014-01-21 04:19:45,291 DEBUG org.apache.giraph.comm.netty.ByteCounter: handleUpstream: serverByteCounter buffer size = 25, total bytes = 25 2014-01-21 04:19:45,291 DEBUG org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Client 1, requestId 0, SEND_AGGREGATORS_TO_MASTER_REQUEST, with size 21 took 53916 2014-01-21 04:19:45,291 DEBUG org.apache.giraph.comm.netty.ByteCounter: handleUpstream: serverByteCounter buffer size = 25, total bytes = 50 2014-01-21 04:19:45,291 TRACE org.apache.giraph.comm.netty.handler.RequestServerHandler: messageReceived: Got class org.apache.giraph.comm.requests.SendAggregatorsToMasterRe 2014-01-21 04:19:45,291 DEBUG org.apache.giraph.master.MasterAggregatorHandler: acceptAggregatedValues: Accepted one set with 0 aggregated values 2014-01-21 04:19:45,291 DEBUG org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Client 3, requestId 0, SEND_AGGREGATORS_TO_MASTER_REQUEST, with size 21 took 52829 2014-01-21 04:19:45,292 DEBUG org.apache.giraph.comm.netty.handler.RequestServerHandler: messageReceived: Processing client 1, requestId 0, SEND_AGGREGATORS_TO_MASTER_REQUES 2014-01-21 04:19:45,292 TRACE org.apache.giraph.comm.netty.handler.RequestServerHandler: messageReceived: Got class org.apache.giraph.comm.requests.SendAggregatorsToMasterRe 2014-01-21 04:19:45,292 DEBUG org.apache.giraph.comm.netty.ByteCounter: handleDownstream: serverByteCounter buffer size = 13, total bytes = 26 2014-01-21 04:19:45,292 DEBUG org.apache.giraph.master.MasterAggregatorHandler: acceptAggregatedValues: Accepted one set with 0 aggregated values 2014-01-21 04:19:45,292 DEBUG org.apache.giraph.comm.netty.handler.RequestServerHandler: messageReceived: Processing client 3, requestId 0, SEND_AGGREGATORS_TO_MASTER_REQUES 2014-01-21 04:19:45,292 DEBUG org.apache.giraph.comm.netty.ByteCounter: handleDownstream: serverByteCounter buffer size = 13, total bytes = 39 2014-01-21 04:19:45,309 DEBUG org.apache.zookeeper.ClientCnxn: Got notification sessionid:0x3434162634304d1 2014-01-21 04:19:45,309 DEBUG org.apache.zookeeper.ClientCnxn: Got WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/_hadoopBsp/job_201401131006_0043/_applicat 2014-01-21 04:19:45,309 DEBUG org.apache.giraph.bsp.BspService: process: Got a new event, path = /_hadoopBsp/job_201401131006_0043/_applicationAttemptsDir/0/_superstepDir/-1 2014-01-21 04:19:45,309 DEBUG org.apache.giraph.master.BspServiceMaster: processEvent: Worker finished (node change) event - superstepStateChanged signaled 2014-01-21 04:19:45,310 DEBUG org.apache.giraph.zk.PredicateLock: waitMsecs: Got timed signaled of true 2014-01-21 04:19:45,318 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x3434162634304d1, packet:: clientPath:null serverPath:null finished:false header:: 73 2014-01-21 04:19:45,319 DEBUG org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished worker list = [cclab33_3, cclab35_1, cclab39_2], size = 3, worker 2014-01-21 04:19:45,319 DEBUG org.apache.giraph.zk.PredicateLock: waitMsecs: Wait for 10000 2014-01-21 04:19:45,680 DEBUG org.apache.hadoop.mapred.Counters: Creating group FileSystemCounters with nothing 2014-01-21 04:19:45,680 DEBUG org.apache.hadoop.mapred.Counters: Adding HDFS_BYTES_READ 2014-01-21 04:19:45,680 DEBUG org.apache.hadoop.mapred.Counters: Adding FILE_BYTES_WRITTEN 2014-01-21 04:19:45,680 DEBUG org.apache.hadoop.mapred.Counters: Adding COMMITTED_HEAP_BYTES 2014-01-21 04:19:45,714 DEBUG org.apache.hadoop.util.ProcfsBasedProcessTree: [ 18302 ] 2014-01-21 04:19:45,714 DEBUG org.apache.hadoop.mapred.Counters: Adding CPU_MILLISECONDS 2014-01-21 04:19:45,714 DEBUG org.apache.hadoop.mapred.Counters: Adding PHYSICAL_MEMORY_BYTES 2014-01-21 04:19:45,714 DEBUG org.apache.hadoop.mapred.Counters: Adding VIRTUAL_MEMORY_BYTES 2014-01-21 04:19:45,714 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to /127.0.0.1:54142 from job_201401131006_0043 sending #7 2014-01-21 04:19:45,719 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to /127.0.0.1:54142 from job_201401131006_0043 got value #7 2014-01-21 04:19:45,719 DEBUG org.apache.hadoop.ipc.RPC: Call: statusUpdate 5 2014-01-21 04:19:48,751 DEBUG org.apache.hadoop.util.ProcfsBasedProcessTree: [ 18302 ] 2014-01-21 04:19:48,751 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to /127.0.0.1:54142 from job_201401131006_0043 sending #8 2014-01-21 04:19:48,753 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to /127.0.0.1:54142 from job_201401131006_0043 got value #8 2014-01-21 04:19:48,753 DEBUG org.apache.hadoop.ipc.RPC: Call: statusUpdate 2 2014-01-21 04:19:51,753 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to /127.0.0.1:54142 from job_201401131006_0043 sending #9 2014-01-21 04:19:51,754 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to /127.0.0.1:54142 from job_201401131006_0043 got value #9 2014-01-21 04:19:51,755 DEBUG org.apache.hadoop.ipc.RPC: Call: ping 2 2014-01-21 04:19:54,755 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to /127.0.0.1:54142 from job_201401131006_0043 sending #10 2014-01-21 04:19:54,756 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to /127.0.0.1:54142 from job_201401131006_0043 got value #10 2014-01-21 04:19:54,756 DEBUG org.apache.hadoop.ipc.RPC: Call: ping 1 2014-01-21 04:19:54,905 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to cclab48/10.0.0.48:8020 from root: closed 2014-01-21 04:19:54,905 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to cclab48/10.0.0.48:8020 from root: stopped, remaining connections 1 2014-01-21 04:19:55,319 DEBUG org.apache.giraph.zk.PredicateLock: waitMsecs: Got timed signaled of false 2014-01-21 04:19:55,319 DEBUG org.apache.giraph.zk.PredicateLock: waitMsecs: Wait for 10000 And this is my worker’s log. 2014-01-21 04:19:45,270 DEBUG org.apache.giraph.comm.netty.ByteCounter: handleUpstream: clientByteCounter buffer size = 13, total bytes = 13 2014-01-21 04:19:45,270 DEBUG org.apache.giraph.comm.netty.ByteCounter: handleUpstream: serverByteCounter buffer size = 63, total bytes = 483 2014-01-21 04:19:45,270 DEBUG org.apache.giraph.comm.netty.ByteCounter: handleUpstream: clientByteCounter buffer size = 13, total bytes = 26 2014-01-21 04:19:45,271 DEBUG org.apache.giraph.comm.netty.handler.ResponseClientHandler: messageReceived: Completed (reqId=1,destAddr=CCLab33:30003,elapsedNanos=2325954,sta 2014-01-21 04:19:45,271 DEBUG org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Client 2, requestId 1, SEND_WORKER_AGGREGATORS_REQUEST, with size 59 took 56302 ns 2014-01-21 04:19:45,271 TRACE org.apache.giraph.comm.netty.handler.RequestServerHandler: messageReceived: Got class org.apache.giraph.comm.requests.SendWorkerAggregatorsRequ 2014-01-21 04:19:45,271 DEBUG org.apache.giraph.comm.netty.handler.ResponseClientHandler: messageReceived: Completed (reqId=1,destAddr=CCLab39:30002,elapsedNanos=2636704,sta 2014-01-21 04:19:45,279 DEBUG org.apache.giraph.comm.netty.handler.RequestServerHandler: messageReceived: Processing client 2, requestId 1, SEND_WORKER_AGGREGATORS_REQUEST t 2014-01-21 04:19:45,279 DEBUG org.apache.giraph.comm.aggregators.OwnerAggregatorServerData: getMyAggregatorValuesWhenReady: Values ready 2014-01-21 04:19:45,279 DEBUG org.apache.giraph.comm.netty.ByteCounter: handleDownstream: serverByteCounter buffer size = 13, total bytes = 78 2014-01-21 04:19:45,283 DEBUG org.apache.giraph.comm.netty.handler.RequestEncoder: encode: Client 1, requestId 0, size = 25, SEND_AGGREGATORS_TO_MASTER_REQUEST took 27840 ns 2014-01-21 04:19:45,283 DEBUG org.apache.giraph.comm.netty.ByteCounter: handleDownstream: clientByteCounter buffer size = 25, total bytes = 25 2014-01-21 04:19:45,285 DEBUG org.apache.giraph.comm.netty.ByteCounter: handleUpstream: clientByteCounter buffer size = 13, total bytes = 13 2014-01-21 04:19:45,285 DEBUG org.apache.giraph.comm.netty.handler.ResponseClientHandler: messageReceived: Completed (reqId=0,destAddr=CCLab07:30000,elapsedNanos=2577026,sta 2014-01-21 04:19:45,286 INFO org.apache.giraph.comm.netty.NettyClient: waitAllRequests: Finished all requests. MBytes/sec sent = 0.0048, MBytes/sec received = 0.0025, MBytes 2014-01-21 04:19:45,286 DEBUG org.apache.giraph.comm.aggregators.OwnerAggregatorServerData: reset: Ready for next superstep 2014-01-21 04:19:45,286 DEBUG org.apache.giraph.worker.WorkerAggregatorHandler: finishSuperstep: Aggregators finished 2014-01-21 04:19:45,287 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Superstep -1, messages = 0 , message bytes = 0 , Memory (free/total/max) = 222.42M / 2014-01-21 04:19:45,298 DEBUG org.apache.giraph.zk.ZooKeeperExt: createExt: Creating path /_hadoopBsp/job_201401131006_0043/_applicationAttemptsDir/0/_superstepDir/-1/_worke 2014-01-21 04:19:45,312 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x3434162634304d2, packet:: clientPath:null serverPath:null finished:false header:: 19 2014-01-21 04:19:45,312 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: (waiting for rest of workers) WORKER_ONLY - Attempt=0, Superstep=-1 2014-01-21 04:19:45,314 DEBUG org.apache.zookeeper.ClientCnxn: Reading reply sessionid:0x3434162634304d2, packet:: clientPath:null serverPath:null finished:false header:: 20 2014-01-21 04:19:45,314 DEBUG org.apache.giraph.zk.PredicateLock: waitMsecs: Wait for 10000 2014-01-21 04:19:47,966 DEBUG org.apache.hadoop.mapred.Counters: Creating group FileSystemCounters with nothing 2014-01-21 04:19:47,966 DEBUG org.apache.hadoop.mapred.Counters: Adding HDFS_BYTES_READ 2014-01-21 04:19:47,966 DEBUG org.apache.hadoop.mapred.Counters: Adding FILE_BYTES_WRITTEN 2014-01-21 04:19:47,967 DEBUG org.apache.hadoop.mapred.Counters: Adding COMMITTED_HEAP_BYTES 2014-01-21 04:19:48,002 DEBUG org.apache.hadoop.util.ProcfsBasedProcessTree: [ 16391 ] 2014-01-21 04:19:48,002 DEBUG org.apache.hadoop.mapred.Counters: Adding CPU_MILLISECONDS 2014-01-21 04:19:48,003 DEBUG org.apache.hadoop.mapred.Counters: Adding PHYSICAL_MEMORY_BYTES 2014-01-21 04:19:48,003 DEBUG org.apache.hadoop.mapred.Counters: Adding VIRTUAL_MEMORY_BYTES 2014-01-21 04:19:48,003 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to /127.0.0.1:55945 from job_201401131006_0043 sending #5 2014-01-21 04:19:48,006 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to /127.0.0.1:55945 from job_201401131006_0043 got value #5 2014-01-21 04:19:48,007 DEBUG org.apache.hadoop.ipc.RPC: Call: statusUpdate 4 2014-01-21 04:19:51,037 DEBUG org.apache.hadoop.util.ProcfsBasedProcessTree: [ 16391 ] 2014-01-21 04:19:51,037 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to /127.0.0.1:55945 from job_201401131006_0043 sending #6 2014-01-21 04:19:51,038 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) connection to /127.0.0.1:55945 from job_201401131006_0043 got value #6 2014-01-21 04:19:51,039 DEBUG org.apache.hadoop.ipc.RPC: Call: statusUpdate 2 I printed log in DEBUG level, but I don’t know why. I use Hadoop 1.0.4 Zookeeper 3.4.5 And this is my execution. hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.0.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -D giraph.zkList="cclab48:2181,cclab47:2181,cclab46:2181” -D giraph.logLevel=“all" org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip giraphtest/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op giraphtest/output/shortestpaths2 -w 3 Lee Taekyu Computation & Communication Lab School of Integrated Technology Yonsei University