Hi, I'm trying to run the connected components example in Giraph with the following command:
hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.0.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.ConnectedComponentsVertex -vif org.apache.giraph.io.formats.IntIntNullTextInputFormat -vip /user/ubuntu/giraph-input/${inputgraph} -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/ubuntu/giraph-input/wcc -w 8 where ${inputgraph} is a text file with rows being "vertex-id edge-dst-id edge-dst-id ...". E.g., 1 2 3 4 -> (1 with out-edges to 2, 3, 4) 2 3 1 -> (2 with out-edges to 3, 1) What happens is that Giraph runs for Superstep 0, and then completes without sending any messages. There The graph I tested with are web-Google, cit-Patents, and amazon0505, all from SNAP Stanford ( https://snap.stanford.edu/). I don't think this behaviour is correct, as I'm testing other Pregel-like systems and they do send messages and take >1 superstep. Giraph works correctly for PageRank, SSSP, using the "tinygraph" input format ([vertex-id, vertex-val, [[edge-id, edge-val], ...]]). I've also tried testing connected components with an extra combiner option: -c org.apache.giraph.combiner.MinimumIntCombiner but the same behaviour occurs. Here's a user-log snippet showing superstep 0 finishing: 2013-11-30 20:24:01,889 INFO org.apache.giraph.comm.netty.NettyClient: waitAllRequests: Finished all requests. MBytes/sec sent = 0.0079, MBytes/sec received = 0.0041, MBytesSent = 0, MBytesReceived = 0, ave sent req MBytes = 0, ave received req MBytes = 0, secs waited = 0.0030 2013-11-30 20:24:01,889 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Superstep 0, messages = 0 Memory (free/total/max) = 33.61M / 56.69M / 989.88M 2013-11-30 20:24:01,920 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: (waiting for rest of workers) WORKER_ONLY - Attempt=0, Superstep=0 2013-11-30 20:24:01,947 INFO org.apache.giraph.bsp.BspService: process: superstepFinished signaled 2013-11-30 20:24:01,968 INFO org.apache.giraph.worker.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart 2013-11-30 20:24:01,981 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Completed superstep 0 with global stats (vtx=739454,finVtx=739454,edges=739454,msgCount=0,haltComputation=true) 2013-11-30 20:24:01,981 INFO org.apache.giraph.graph.GraphTaskManager: execute: BSP application done (global vertices marked done) 2013-11-30 20:24:01,981 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for WORKER_ONLY 2013-11-30 20:24:01,991 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201311302021_0001/_masterJobState) 2013-11-30 20:24:02,018 INFO org.apache.giraph.comm.netty.NettyClient: stop: reached wait threshold, 8 connections closed, releasing NettyClient.bootstrap resources now. 2013-11-30 20:24:02,024 INFO org.apache.giraph.worker.BspServiceWorker: saveVertices: Starting to save 92571 vertices using 1 threads 2013-11-30 20:24:02,697 INFO org.apache.giraph.worker.InputSplitsHandler: process: Input split /_hadoopBsp/job_201311302021_0001/_vertexInputSplitDir/0 lost reservation 2013-11-30 20:24:02,703 INFO org.apache.giraph.worker.InputSplitsHandler: process: Input split /_hadoopBsp/job_201311302021_0001/_vertexInputSplitDir/1 lost reservation 2013-11-30 20:24:02,841 INFO org.apache.giraph.worker.BspServiceWorker: saveVertices: Done saving vertices. 2013-11-30 20:24:02,846 INFO org.apache.giraph.worker.BspServiceWorker: cleanup: Notifying master its okay to cleanup with /_hadoopBsp/job_201311302021_0001/_cleanedUpDir/7_worker 2013-11-30 20:24:02,859 INFO org.apache.zookeeper.ZooKeeper: Session: 0x142abbff3b40007 closed 2013-11-30 20:24:02,859 INFO org.apache.giraph.comm.netty.NettyServer: stop: Halting netty server Is my input format wrong? Is there a bug? Thanks, Young