Hi, I am new to Giraph. Recently I am trying to write a very simple PageRank program using Giraph, which is as below:
package org.apache.giraph.examples; import org.apache.giraph.graph.BasicComputation; import org.apache.giraph.conf.LongConfOption; import org.apache.giraph.edge.Edge; import org.apache.giraph.graph.Vertex; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.FloatWritable; import org.apache.hadoop.io.LongWritable; import org.apache.log4j.Logger; import java.io.IOException; /** * My simplified Google page rank example. */ @Algorithm( name = "Page Rank", description = "My simplified page rank" ) public class MyPageRankComputation extends BasicComputation< LongWritable, DoubleWritable, FloatWritable, DoubleWritable> { public static final int MAX_SUPERSTEPS = 2; @Override public void compute(Vertex<LongWritable, DoubleWritable, FloatWritable> vertex, Iterable<DoubleWritable> messages) throws IOException { if (getSuperstep() >= 1) { double sum = 0; for (DoubleWritable message : messages) { sum += message.get(); } vertex.setValue(new DoubleWritable(sum)); } if (getSuperstep() < MAX_SUPERSTEPS) { int numEdges = vertex.getNumEdges(); DoubleWritable message = new DoubleWritable(vertex.getValue().get() / numEdges); sendMessageToAllEdges(vertex, message); } else { vertex.voteToHalt(); } } } I didn't use Aggregator just to make the program simple. And put the program under the path of the giraph examples: /home/hduser/my-giraph/giraph-examples/src/main/java/org/apache/giraph/examples where I just extract the folder giraph-examples from the giraph repo and put it into another folder called my-giraph. The compilation is fine. I also set the HADOOP_CLASSPATH as: export HADOOP_CLASSPATH=/home/hduser/my-giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar:$HADOOP_PATH export LIBJARS=/home/hduser/my-giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar:/usr/local/giraph/giraph-core.jar TO run the program, I provide the input command line which I mimic the "Giraph Quick Start Guide, Running a Giraph Job", http://giraph.apache.org/quick_start.html $HADOOP_HOME/bin/hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.MyPageRankComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hduser/page_rank/input/tiny_input.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/hduser/page_rank/output -w 1 The input is very similar to SSSP's, which is : [1,0.2,[[2,0],[4,0]]] [2,0.2,[3,0],[5,0]] [3,0.2,[4,0]] [4,0.2,[5,0]] [5,0.2,[1,0],[2,0],[3,0]] So far so good !! --------------- Now the problem is when I run the job, it gets hanged on the reduce phase, of which is shown as below: //////////////////////////////////////////////// hduser@cwang ~/my-giraph/giraph-examples/target $ $HADOOP_HOME/bin/hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.MyPageRankComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hduser/page_rank/input/tiny_input.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/hduser/page_rank/output -w 1 15/04/29 16:14:59 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one. 15/04/29 16:14:59 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one. 15/04/29 16:15:00 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 1, old value = 4) 15/04/29 16:15:02 INFO job.GiraphJob: Tracking URL: http://hdnode01:50030/jobdetails.jsp?jobid=job_201504291528_0005 15/04/29 16:15:02 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers 15/04/29 16:15:39 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer cwang:22181 --zkNode /_hadoopBsp/job_201504291528_0005/_haltComputation' 15/04/29 16:15:39 INFO mapred.JobClient: Running job: job_201504291528_0005 15/04/29 16:15:40 INFO mapred.JobClient: map 100% reduce 0% 15/04/29 16:20:28 INFO mapred.JobClient: Job complete: job_201504291528_0005 15/04/29 16:20:28 INFO mapred.JobClient: Counters: 5 15/04/29 16:20:28 INFO mapred.JobClient: Job Counters 15/04/29 16:20:28 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=625803 15/04/29 16:20:28 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/04/29 16:20:28 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/04/29 16:20:28 INFO mapred.JobClient: Launched map tasks=2 15/04/29 16:20:28 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 ////////////////////////////////////////////////// And there is no desired output generated. Can someone tell me where is the problem? Thanks Cheng