Can you check what the mappers where doing via the web interface of Hadoop? Can you run 4 mappers at once?
On 07.08.2012 01:46, Vishal Patel wrote: > I'm seeing a strange behavior that I can't explain. > > > hadoop jar giraph-0.1-jar-with-dependencies.jar > org.apache.giraph.GiraphRunner > org.apache.giraph.examples.ConnectedComponentsVertex --inputFormat > org.apache.giraph.examples.IntIntNullIntTextInputFormat --inputPath > /user/vpatel/graph_in/elist.txt --outputFormat > org.apache.giraph.examples.VertexWithComponentTextOutputFormat --outputPath > hdfs:///user/vpatel/giraph_out/1 --workers 4 --combiner > org.apache.giraph.examples.MinimumIntCombiner > Warning: $HADOOP_HOME is deprecated. > > 12/08/06 16:16:40 INFO mapred.JobClient: Running job: job_201208031459_0591 > 12/08/06 16:16:41 INFO mapred.JobClient: map 0% reduce 0% > 12/08/06 16:16:59 INFO mapred.JobClient: map 20% reduce 0% > 12/08/06 16:17:05 INFO mapred.JobClient: map 40% reduce 0% > 12/08/06 16:17:08 INFO mapred.JobClient: map 100% reduce 0% > 12/08/06 16:17:11 INFO mapred.JobClient: map 80% reduce 0% > 12/08/06 16:17:16 INFO mapred.JobClient: Task Id : > attempt_201208031459_0591_m_000000_0, Status : FAILED > *java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > * > > I either get the above error, which I can avoid if I decrease my number of > workers (based on previous post on the mailing list). > > However when I do specify lesser workers (say 2) or sometimes I don't get > the above error: the result is missing for one part in the hdfs. > i.e. when I did workers=2, I got two parts. One of them had 5,000 out of > the 10k nodes and other part was blank. This happens when I did workers=4,5 > etc as well. > > There are no errors in the log. > > Just to be clear, the input format is adjacency list, > i.e if a -> b, a ->c and b -> d then > a b c > b a d > c a > d b > > Since the graph is undirected. Any idea what could be wrong? > > Here is the log when I do workers=1 > > Finally loaded a total of *(v=10000, e=19996)* > 2012-08-06 16:39:13,902 INFO org.apache.giraph.graph.BspService: > process: inputSplitsAllDoneChanged (all vertices sent from input > splits) > 2012-08-06 16:39:13,904 INFO > org.apache.giraph.comm.BasicRPCCommunications: flush: starting for > superstep -1 totalMem = 191.6875M, maxMem = 191.6875M, freeMem = > 164.6044M > 2012-08-06 16:39:13,906 INFO > org.apache.giraph.comm.BasicRPCCommunications: flush: ended for > superstep -1 totalMem = 191.6875M, maxMem = 191.6875M, freeMem = > 164.60431M > 2012-08-06 16:39:13,906 INFO org.apache.giraph.graph.BspServiceWorker: > finishSuperstep: Superstep -1 totalMem = 191.6875M, maxMem = > 191.6875M, freeMem = 164.60431M > 2012-08-06 16:39:13,922 INFO org.apache.giraph.graph.BspService: > process: superstepFinished signaled > 2012-08-06 16:39:13,924 INFO org.apache.giraph.graph.BspServiceWorker: > finishSuperstep: Completed superstep -1 with global stats > (vtx=0,finVtx=0,edges=0,msgCount=0) > 2012-08-06 16:39:13,924 INFO org.apache.giraph.graph.GraphMapper: > cleanup: Starting for WORKER_ONLY > 2012-08-06 16:39:13,925 INFO org.apache.giraph.graph.BspServiceWorker: > processEvent: Job state changed, checking to see if it needs to > restart > 2012-08-06 16:39:13,926 INFO org.apache.giraph.graph.BspService: > getJobState: Job state already exists > (/_hadoopBsp/job_201208031459_0621/_masterJobState) > 2012-08-06 16:39:13,929 INFO org.apache.giraph.graph.BspServiceWorker: > cleanup: Notifying master its okay to cleanup with > /_hadoopBsp/job_201208031459_0621/_cleanedUpDir/1_worker > 2012-08-06 16:39:13,930 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x138fe1c4699003d closed > 2012-08-06 16:39:13,930 INFO > org.apache.giraph.comm.BasicRPCCommunications: close: shutting down > RPC server > 2012-08-06 16:39:13,930 INFO org.apache.zookeeper.ClientCnxn: > EventThread shut down > 2012-08-06 16:39:13,930 INFO org.apache.hadoop.ipc.Server: Stopping > server on 30003 > 2012-08-06 16:39:13,930 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 0 on 30003: exiting > 2012-08-06 16:39:13,930 INFO org.apache.hadoop.ipc.Server: Stopping > IPC Server listener on 30003 > 2012-08-06 16:39:13,930 INFO > org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down > 2012-08-06 16:39:13,931 INFO org.apache.hadoop.ipc.Server: Stopping > IPC Server Responder > 2012-08-06 16:39:13,931 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 1 on 30003: exiting > 2012-08-06 16:39:13,931 INFO org.apache.giraph.zk.ZooKeeperManager: > createZooKeeperClosedStamp: Creating my filestamp > _bsp/_defaultZkManagerDir/job_201208031459_0621/_task/1.COMPUTATION_DONE > 2012-08-06 16:39:13,934 INFO org.apache.hadoop.mapred.Task: > Task:attempt_201208031459_0621_m_000001_0 is done. And is in the > process of commiting > 2012-08-06 16:39:15,026 INFO org.apache.hadoop.mapred.Task: Task > attempt_201208031459_0621_m_000001_0 is allowed to commit now > 2012-08-06 16:39:15,036 INFO > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved > output of task 'attempt_201208031459_0621_m_000001_0' to > hdfs:/user/vpatel/giraph_out/one > 2012-08-06 16:39:16,068 INFO org.apache.hadoop.mapred.Task: Task > 'attempt_201208031459_0621_m_000001_0' done. > 2012-08-06 16:39:16,087 INFO > org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' > truncater with mapRetainSize=-1 and reduceRetainSize=-1 > 2012-08-06 16:39:16,117 INFO org.apache.hadoop.io.nativeio.NativeIO: > Initialized cache for UID to User mapping with a cache timeout of > 14400 seconds. > 2012-08-06 16:39:16,118 INFO org.apache.hadoop.io.nativeio.NativeIO: > Got UserName vpatel for UID 10020 from the native implementation > > > > > On Mon, Aug 6, 2012 at 3:05 PM, Sebastian Schelter <s...@apache.org> wrote: > >> The job expects the input data in adjacency list format, each line >> should look like: >> >> vertex neighbor1 neighbor2 .... >> >> --sebastian >> >> >> On 07.08.2012 00:02, Vishal Patel wrote: >>> Thanks Sebastian, it runs fine now. However, the output comes back as >>> >>> 0 0 >>> 1 1 >>> 2 2 >>> 3 3 >>> 4 4 >>> 5 5 >>> 6 6 >>> .. >>> >>> I have an unsorted edge file with just int values. >>> http://www.ics.uci.edu/~vishalrp/public/testg.txt >>> >>> My test graph (head below) has 10,000 nodes ( from 0 to 9999) and 9998 >>> edges. There are 4 connected components in the graph. >>> >>> 0 5800 >>> 0 5981 >>> 1 1239 >>> 1 2989 >>> 1 3961 >>> 2 5417 >>> 2 7350 >>> >>> What am I doing wrong? Also, in general does the graph have to have int >>> values for nodes? Or can I have strings? >>> >>> Appreciate your help! >>> >>> Vishal >>> >>> >>> >>> >>> On Mon, Aug 6, 2012 at 2:22 PM, Sebastian Schelter <s...@apache.org> >> wrote: >>> >>>> You cannot run the vertex class directly. Instead you can use >>>> GiraphRunner, e.g. >>>> >>>> hadoop jar giraph-jar-with-dependencies.jar >>>> org.apache.giraph.GiraphRunner >>>> org.apache.giraph.examples.ConnectedComponentsVertex --inputFormat >>>> org.apache.giraph.examples.IntIntNullIntTextInputFormat --inputPath >>>> hdfs:///path/to/input --outputFormat >>>> org.apache.giraph.examples.VertexWithComponentTextOutputFormat >>>> --outputPath hdfs:///path/to/output --workers numWorkers --combiner >>>> org.apache.giraph.examples.MinimumIntCombiner >>>> >>>> --sebastian >>>> >>>> >>>> 2012/8/6 Vishal Patel <write2vis...@gmail.com>: >>>>> Hi, I am trying to run the connected-components example. I have giraph >>>>> installed, all the test pass on a 3 node cluster running hadoop-1.0.3/ >>>>> >>>>> The main method is missing in the ConnectedComponentsVertex class >>>>> >>>>> cd target/classes >>>>> hadoop jar ../giraph-0.1-jar-with-dependencies.jar >>>>> org.apache.giraph.examples.ConnectedComponentsVertex >>>>> >>>>> Exception in thread "main" java.lang.NoSuchMethodException: >>>>> >>>> >> org.apache.giraph.examples.ConnectedComponentsVertex.main([Ljava.lang.String;) >>>>> at java.lang.Class.getMethod(Class.java:1622) >>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:150) >>>>> >>>>> Can someone please help me with running this example? >>>>> >>>>> Vishal >>>>> >>>> >>> >> >> >