I'm seeing a strange behavior that I can't explain.
hadoop jar giraph-0.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.ConnectedComponentsVertex --inputFormat org.apache.giraph.examples.IntIntNullIntTextInputFormat --inputPath /user/vpatel/graph_in/elist.txt --outputFormat org.apache.giraph.examples.VertexWithComponentTextOutputFormat --outputPath hdfs:///user/vpatel/giraph_out/1 --workers 4 --combiner org.apache.giraph.examples.MinimumIntCombiner Warning: $HADOOP_HOME is deprecated. 12/08/06 16:16:40 INFO mapred.JobClient: Running job: job_201208031459_0591 12/08/06 16:16:41 INFO mapred.JobClient: map 0% reduce 0% 12/08/06 16:16:59 INFO mapred.JobClient: map 20% reduce 0% 12/08/06 16:17:05 INFO mapred.JobClient: map 40% reduce 0% 12/08/06 16:17:08 INFO mapred.JobClient: map 100% reduce 0% 12/08/06 16:17:11 INFO mapred.JobClient: map 80% reduce 0% 12/08/06 16:17:16 INFO mapred.JobClient: Task Id : attempt_201208031459_0591_m_000000_0, Status : FAILED *java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) * I either get the above error, which I can avoid if I decrease my number of workers (based on previous post on the mailing list). However when I do specify lesser workers (say 2) or sometimes I don't get the above error: the result is missing for one part in the hdfs. i.e. when I did workers=2, I got two parts. One of them had 5,000 out of the 10k nodes and other part was blank. This happens when I did workers=4,5 etc as well. There are no errors in the log. Just to be clear, the input format is adjacency list, i.e if a -> b, a ->c and b -> d then a b c b a d c a d b Since the graph is undirected. Any idea what could be wrong? Here is the log when I do workers=1 Finally loaded a total of *(v=10000, e=19996)* 2012-08-06 16:39:13,902 INFO org.apache.giraph.graph.BspService: process: inputSplitsAllDoneChanged (all vertices sent from input splits) 2012-08-06 16:39:13,904 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: starting for superstep -1 totalMem = 191.6875M, maxMem = 191.6875M, freeMem = 164.6044M 2012-08-06 16:39:13,906 INFO org.apache.giraph.comm.BasicRPCCommunications: flush: ended for superstep -1 totalMem = 191.6875M, maxMem = 191.6875M, freeMem = 164.60431M 2012-08-06 16:39:13,906 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Superstep -1 totalMem = 191.6875M, maxMem = 191.6875M, freeMem = 164.60431M 2012-08-06 16:39:13,922 INFO org.apache.giraph.graph.BspService: process: superstepFinished signaled 2012-08-06 16:39:13,924 INFO org.apache.giraph.graph.BspServiceWorker: finishSuperstep: Completed superstep -1 with global stats (vtx=0,finVtx=0,edges=0,msgCount=0) 2012-08-06 16:39:13,924 INFO org.apache.giraph.graph.GraphMapper: cleanup: Starting for WORKER_ONLY 2012-08-06 16:39:13,925 INFO org.apache.giraph.graph.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart 2012-08-06 16:39:13,926 INFO org.apache.giraph.graph.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201208031459_0621/_masterJobState) 2012-08-06 16:39:13,929 INFO org.apache.giraph.graph.BspServiceWorker: cleanup: Notifying master its okay to cleanup with /_hadoopBsp/job_201208031459_0621/_cleanedUpDir/1_worker 2012-08-06 16:39:13,930 INFO org.apache.zookeeper.ZooKeeper: Session: 0x138fe1c4699003d closed 2012-08-06 16:39:13,930 INFO org.apache.giraph.comm.BasicRPCCommunications: close: shutting down RPC server 2012-08-06 16:39:13,930 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2012-08-06 16:39:13,930 INFO org.apache.hadoop.ipc.Server: Stopping server on 30003 2012-08-06 16:39:13,930 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 30003: exiting 2012-08-06 16:39:13,930 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 30003 2012-08-06 16:39:13,930 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down 2012-08-06 16:39:13,931 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2012-08-06 16:39:13,931 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 30003: exiting 2012-08-06 16:39:13,931 INFO org.apache.giraph.zk.ZooKeeperManager: createZooKeeperClosedStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201208031459_0621/_task/1.COMPUTATION_DONE 2012-08-06 16:39:13,934 INFO org.apache.hadoop.mapred.Task: Task:attempt_201208031459_0621_m_000001_0 is done. And is in the process of commiting 2012-08-06 16:39:15,026 INFO org.apache.hadoop.mapred.Task: Task attempt_201208031459_0621_m_000001_0 is allowed to commit now 2012-08-06 16:39:15,036 INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_201208031459_0621_m_000001_0' to hdfs:/user/vpatel/giraph_out/one 2012-08-06 16:39:16,068 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201208031459_0621_m_000001_0' done. 2012-08-06 16:39:16,087 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2012-08-06 16:39:16,117 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds. 2012-08-06 16:39:16,118 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName vpatel for UID 10020 from the native implementation On Mon, Aug 6, 2012 at 3:05 PM, Sebastian Schelter <s...@apache.org> wrote: > The job expects the input data in adjacency list format, each line > should look like: > > vertex neighbor1 neighbor2 .... > > --sebastian > > > On 07.08.2012 00:02, Vishal Patel wrote: > > Thanks Sebastian, it runs fine now. However, the output comes back as > > > > 0 0 > > 1 1 > > 2 2 > > 3 3 > > 4 4 > > 5 5 > > 6 6 > > .. > > > > I have an unsorted edge file with just int values. > > http://www.ics.uci.edu/~vishalrp/public/testg.txt > > > > My test graph (head below) has 10,000 nodes ( from 0 to 9999) and 9998 > > edges. There are 4 connected components in the graph. > > > > 0 5800 > > 0 5981 > > 1 1239 > > 1 2989 > > 1 3961 > > 2 5417 > > 2 7350 > > > > What am I doing wrong? Also, in general does the graph have to have int > > values for nodes? Or can I have strings? > > > > Appreciate your help! > > > > Vishal > > > > > > > > > > On Mon, Aug 6, 2012 at 2:22 PM, Sebastian Schelter <s...@apache.org> > wrote: > > > >> You cannot run the vertex class directly. Instead you can use > >> GiraphRunner, e.g. > >> > >> hadoop jar giraph-jar-with-dependencies.jar > >> org.apache.giraph.GiraphRunner > >> org.apache.giraph.examples.ConnectedComponentsVertex --inputFormat > >> org.apache.giraph.examples.IntIntNullIntTextInputFormat --inputPath > >> hdfs:///path/to/input --outputFormat > >> org.apache.giraph.examples.VertexWithComponentTextOutputFormat > >> --outputPath hdfs:///path/to/output --workers numWorkers --combiner > >> org.apache.giraph.examples.MinimumIntCombiner > >> > >> --sebastian > >> > >> > >> 2012/8/6 Vishal Patel <write2vis...@gmail.com>: > >>> Hi, I am trying to run the connected-components example. I have giraph > >>> installed, all the test pass on a 3 node cluster running hadoop-1.0.3/ > >>> > >>> The main method is missing in the ConnectedComponentsVertex class > >>> > >>> cd target/classes > >>> hadoop jar ../giraph-0.1-jar-with-dependencies.jar > >>> org.apache.giraph.examples.ConnectedComponentsVertex > >>> > >>> Exception in thread "main" java.lang.NoSuchMethodException: > >>> > >> > org.apache.giraph.examples.ConnectedComponentsVertex.main([Ljava.lang.String;) > >>> at java.lang.Class.getMethod(Class.java:1622) > >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:150) > >>> > >>> Can someone please help me with running this example? > >>> > >>> Vishal > >>> > >> > > > >