Thanks Claudio and Gustavo for your answer. I have another question. I run my algorithm on a cluster that has 20 nodes. When I specify the number of workers to be 10 (or more), the algorithms works well and produces the expected output. But, if the number of workers is less than 10 I get the following exception in ZooKeeper. <https://plus.google.com/u/0/101834038373575526108?prsrc=4> 2013-09-06 10:39:04,313 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 0 connections, (0 total connected) 0 failed, 0 failures total. 2013-09-06 10:39:04,313 INFO org.apache.giraph.partition.PartitionBalancer: balancePartitionsAcrossWorkers: Using algorithm static 2013-09-06 10:39:04,314 INFO org.apache.giraph.partition.PartitionUtils: analyzePartitionStats: Vertices - Mean: 200000, Min: Worker(hostname= node1.cluster.net, MRtaskID=5, port=30005) - 200000, Max: Worker(hostname= node7.cluster.net, MRtaskID=1, port=30001) - 200000 2013-09-06 10:39:04,314 INFO org.apache.giraph.partition.PartitionUtils: analyzePartitionStats: Edges - Mean: 10019985, Min: Worker(hostname= node9.cluster.net, MRtaskID=4, port=30004) - 10000354, Max: Worker(hostname= node5.cluster.net, MRtaskID=2, port=30002) - 10088901 2013-09-06 10:39:04,339 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 5 workers finished on superstep 2 on path /_hadoopBsp/job_201309060934_0013/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir 2013-09-06 10:39:04,340 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [node8.cluster.net_3, node1.cluster.net_5, node9.cluster.net_4, node5.cluster.net_2, node7.cluster.net_1] 2013-09-06 10:40:15,255 INFO org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server window metrics MBytes/sec sent = 0, MBytes/sec received = 0, MBytesSent = 0, MBytesReceived = 0, ave sent req MBytes = 0, ave received req MBytes = 0, secs waited = 71.241 2013-09-06 10:40:15,291 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 3 out of 5 workers finished on superstep 2 on path /_hadoopBsp/job_201309060934_0013/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir 2013-09-06 10:40:15,291 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [node1.cluster.net_5, node7.cluster.net_1] 2013-09-06 10:40:15,388 INFO org.apache.giraph.master.BspServiceMaster: aggregateWorkerStats: Aggregation found (vtx=1000000,finVtx=0,edges=50099927,msgCount=0,msgBytesCount=0,haltComputation=false) on superstep = 2 2013-09-06 10:40:15,394 INFO org.apache.giraph.master.BspServiceMaster: coordinateSuperstep: Cleaning up old Superstep /_hadoopBsp/job_201309060934_0013/_applicationAttemptsDir/0/_superstepDir/1 2013-09-06 10:40:15,531 INFO org.apache.giraph.master.MasterThread: masterThread: Coordination of superstep 2 took 71.313 seconds ended with state THIS_SUPERSTEP_DONE and is now on superstep 3 2013-09-06 10:40:15,563 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 0 connections, (0 total connected) 0 failed, 0 failures total. 2013-09-06 10:40:15,563 INFO org.apache.giraph.partition.PartitionBalancer: balancePartitionsAcrossWorkers: Using algorithm static 2013-09-06 10:40:15,564 INFO org.apache.giraph.partition.PartitionUtils: analyzePartitionStats: Vertices - Mean: 200000, Min: Worker(hostname= node1.cluster.net, MRtaskID=5, port=30005) - 200000, Max: Worker(hostname= node7.cluster.net, MRtaskID=1, port=30001) - 200000 2013-09-06 10:40:15,564 INFO org.apache.giraph.partition.PartitionUtils: analyzePartitionStats: Edges - Mean: 10019985, Min: Worker(hostname= node9.cluster.net, MRtaskID=4, port=30004) - 10000354, Max: Worker(hostname= node5.cluster.net, MRtaskID=2, port=30002) - 10088901 2013-09-06 10:40:15,587 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 5 workers finished on superstep 3 on path /_hadoopBsp/job_201309060934_0013/_applicationAttemptsDir/0/_superstepDir/3/_workerFinishedDir 2013-09-06 10:40:15,587 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [node8.cluster.net_3, node1.cluster.net_5, node9.cluster.net_4, node5.cluster.net_2, node7.cluster.net_1] 2013-09-06 10:50:18,111 ERROR org.apache.giraph.master.BspServiceMaster: superstepChosenWorkerAlive: Missing chosen worker Worker(hostname= node7.cluster.net, MRtaskID=1, port=30001) on superstep 3 2013-09-06 10:50:18,111 ERROR org.apache.giraph.master.BspServiceMaster: superstepChosenWorkerAlive: Missing chosen worker Worker(hostname= node9.cluster.net, MRtaskID=4, port=30004) on superstep 3 2013-09-06 10:50:18,111 INFO org.apache.giraph.master.MasterThread: masterThread: Coordination of superstep 3 took 602.58 seconds ended with state WORKER_FAILURE and is now on superstep 3 2013-09-06 10:50:18,118 ERROR org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed with ArrayIndexOutOfBoundsException java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1272) at org.apache.giraph.master.MasterThread.run(MasterThread.java:139) 2013-09-06 10:50:18,119 FATAL org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.ArrayIndexOutOfBoundsException: -1, exiting... java.lang.IllegalStateException: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.giraph.master.MasterThread.run(MasterThread.java:185) Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1272) at org.apache.giraph.master.MasterThread.run(MasterThread.java:139) 2013-09-06 10:50:18,122 INFO org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started. 2013-09-06 10:50:18,122 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process. 2013-09-06 10:50:18,495 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x140f459adcd0000, likely server has closed socket, closing socket connection and attempting reconnect 2013-09-06 10:50:18,496 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed).
Thank you. On Fri, Sep 6, 2013 at 3:51 AM, Gustavo Enrique Salazar Torres < gsala...@ime.usp.br> wrote: > Hi Bu: > Until the interface with Gora is available you could use Apache Sqoop to > import your mysql table into HDFS and then run your Giraph job. > > Cheers > Gustavo > Em 06/09/2013 04:43, "Claudio Martella" <claudio.marte...@gmail.com> > escreveu: > > Hi Bu, >> >> no, currently we do not have a DBInputFormat. We have an open issue with >> a google summer of code student working on a GoraInputFormat, which >> supports also reading from RDBMs through Gora. However, if/when it will get >> it, it will not provide a rich semantic as DBInputFormat, e.g. you'll be >> able to only provide scan-like/range queries, instead of ANY query like >> DBInputFormat. >> >> I think that creating an DB[Vertex|Edge]InputFormat starting from the >> hadoop DBInputFormat should not be too hard and could prove to be a very >> useful contribution. If you think about providing an implementation, I can >> provide guidance. >> >> Best, >> Claudio >> >> >> On Fri, Sep 6, 2013 at 1:45 AM, Bu Xiao <buxia...@gmail.com> wrote: >> >>> Hi Girapher, >>> >>> I am currently working on algorithm that requires reading the >>> vertices from MySQL table and not from HDFS. I thought that there has to be >>> a way of reading data from SQL table since Giraph is built on top of >>> Hadoop. But I do not seem to figure this part out. Do you have a class >>> similar to the DBInputFormat in Hadoop? Thank you very much for your help. >>> >>> >>> >> >> >> -- >> Claudio Martella >> claudio.marte...@gmail.com >> >