Hi,

I'm trying to run the connected components example in Giraph with the
following command:

hadoop jar
$GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.0.2-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.ConnectedComponentsVertex -vif
org.apache.giraph.io.formats.IntIntNullTextInputFormat -vip
/user/ubuntu/giraph-input/${inputgraph} -of
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/ubuntu/giraph-input/wcc -w 8

where ${inputgraph} is a text file with rows being "vertex-id edge-dst-id
edge-dst-id ...". E.g.,

1 2 3 4   -> (1 with out-edges to 2, 3, 4)
2 3 1      -> (2 with out-edges to 3, 1)

What happens is that Giraph runs for Superstep 0, and then completes
without sending any messages. There  The graph I tested with are
web-Google, cit-Patents, and amazon0505, all from SNAP Stanford (
https://snap.stanford.edu/). I don't think this behaviour is correct, as
I'm testing other Pregel-like systems and they do send messages and take >1
superstep.

Giraph works correctly for PageRank, SSSP, using the "tinygraph" input
format ([vertex-id, vertex-val, [[edge-id, edge-val], ...]]). I've also
tried testing connected components with an extra combiner option:

 -c org.apache.giraph.combiner.MinimumIntCombiner

but the same behaviour occurs.


Here's a user-log snippet showing superstep 0 finishing:

2013-11-30 20:24:01,889 INFO org.apache.giraph.comm.netty.NettyClient:
waitAllRequests: Finished all requests. MBytes/sec sent = 0.0079,
MBytes/sec received = 0.0041, MBytesSent = 0, MBytesReceived = 0, ave sent
req MBytes = 0, ave received req MBytes = 0, secs waited = 0.0030
2013-11-30 20:24:01,889 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Superstep 0, messages = 0 Memory (free/total/max) = 33.61M
/ 56.69M / 989.88M
2013-11-30 20:24:01,920 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: (waiting for rest of workers) WORKER_ONLY - Attempt=0,
Superstep=0
2013-11-30 20:24:01,947 INFO org.apache.giraph.bsp.BspService: process:
superstepFinished signaled
2013-11-30 20:24:01,968 INFO org.apache.giraph.worker.BspServiceWorker:
processEvent: Job state changed, checking to see if it needs to restart
2013-11-30 20:24:01,981 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Completed superstep 0 with global stats
(vtx=739454,finVtx=739454,edges=739454,msgCount=0,haltComputation=true)
2013-11-30 20:24:01,981 INFO org.apache.giraph.graph.GraphTaskManager:
execute: BSP application done (global vertices marked done)
2013-11-30 20:24:01,981 INFO org.apache.giraph.graph.GraphTaskManager:
cleanup: Starting for WORKER_ONLY
2013-11-30 20:24:01,991 INFO org.apache.giraph.bsp.BspService: getJobState:
Job state already exists (/_hadoopBsp/job_201311302021_0001/_masterJobState)
2013-11-30 20:24:02,018 INFO org.apache.giraph.comm.netty.NettyClient:
stop: reached wait threshold, 8 connections closed, releasing
NettyClient.bootstrap resources now.
2013-11-30 20:24:02,024 INFO org.apache.giraph.worker.BspServiceWorker:
saveVertices: Starting to save 92571 vertices using 1 threads
2013-11-30 20:24:02,697 INFO org.apache.giraph.worker.InputSplitsHandler:
process: Input split
/_hadoopBsp/job_201311302021_0001/_vertexInputSplitDir/0 lost reservation
2013-11-30 20:24:02,703 INFO org.apache.giraph.worker.InputSplitsHandler:
process: Input split
/_hadoopBsp/job_201311302021_0001/_vertexInputSplitDir/1 lost reservation
2013-11-30 20:24:02,841 INFO org.apache.giraph.worker.BspServiceWorker:
saveVertices: Done saving vertices.
2013-11-30 20:24:02,846 INFO org.apache.giraph.worker.BspServiceWorker:
cleanup: Notifying master its okay to cleanup with
/_hadoopBsp/job_201311302021_0001/_cleanedUpDir/7_worker
2013-11-30 20:24:02,859 INFO org.apache.zookeeper.ZooKeeper: Session:
0x142abbff3b40007 closed
2013-11-30 20:24:02,859 INFO org.apache.giraph.comm.netty.NettyServer:
stop: Halting netty server

Is my input format wrong? Is there a bug?


Thanks,
Young

Reply via email to