Hi Young, I'd just like to say first thank you for your help it's much appreciated!
I did the sanity check and everything seems fine I see the correct results. Yes I hadn't noticed that before that is strange, I don't know how that happened as on the quick start guide ( https://giraph.apache.org/quick_start.html#qs_section_2) it says hadoop 0.20.203 was the assumed default. I have both Giraph 1.1.0 and Giraph 1.0.0 and my Giraph 1.0.0 is compiled to 0.20.203. I edited the code as you said for Giraph 1.1.0 but still received the same error as before, so I thought it may be due to the hadoop version it was compiled for. So I decided to try modify the code in Giraph 1.0.0 instead, however since I do not have the correct input format class and the vertex object is not instantiated in the ConnectedComponents class of Giraph 1.0.0, I was wondering if you could send me the full classes for both the ConnectedComponents class and the InputFormat so that I know code wise everything should be correct. I will be trying to implement the InputFormat class and ConnectedComponents in the meantime and if I get it working before you respond I'll update this post. Thanks Ghufran. On Sun, Mar 30, 2014 at 5:41 PM, Young Han <young....@uwaterloo.ca> wrote: > Hey, > > As a sanity check, is the graph really loaded on HDFS? Do you see the > correct results if you do "hadoop dfs -cat /user/ghufran/in/my_graph.txt"? > (Where hadoop is your hadoop binary). > > Also, I noticed that your Giraph has been compiled for Hadoop 1.x, while > the logs show Hadoop 0.20.203.0. Maybe that could be the cause too? > > Finally, this may be completely irrelevant, but I had issues running > connected components on Giraph 1.0.0 and I fixed it by changing the > algorithm and the input format. The input format you're using on 1.1.0 > looks correct to me. The algorithm change I did was to the first "if" block > in ConnectedComponentsComputation: > > if (getSuperstep() == 0) { currentComponent = vertex.getId().get(); > vertex.setValue(new IntWritable(currentComponent)); > sendMessageToAllEdges(vertex, vertex.getValue()); vertex.voteToHalt(); > return; } > > I forget what error this change solved, so it may not help in your case. > > Young > > > > On Sun, Mar 30, 2014 at 6:13 AM, ghufran malik <ghufran1ma...@gmail.com>wrote: > >> Hello, >> >> I am a final year Bsc Computer Science Student who is using Apache Giraph >> for my final year project and dissertation and would appreciate very much >> if someone could help me with the following issue. >> >> I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am >> having trouble running the ConnectedComponents example. I use the following >> command: >> >> hadoop jar >> /home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar >> org.apache.giraph.GiraphRunner >> org.apache.giraph.examples.ConnectedComponentsComputation -vif >> org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip >> /user/ghufran/in/my_graph.txt -vof >> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op >> /user/ghufran/outCC -w 1 >> >> >> I believe it gets stuck in the InputSuperstep as the following is >> displayed in terminal when the command is running: >> >> 14/03/30 10:48:46 INFO mapred.JobClient: map 100% reduce 0% >> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers - >> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, >> average 109.01MB >> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers - >> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, >> average 109.01MB >> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers - >> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB, >> average 108.78MB >> .... >> >> which I traced back to the following if statement in the toString() >> method of core.org.apache.job.CombinedWorkerProgress: >> >> if (isInputSuperstep()) { >> sb.append("Loading data: "); >> sb.append(verticesLoaded).append(" vertices loaded, "); >> sb.append(vertexInputSplitsLoaded).append( >> " vertex input splits loaded; "); >> sb.append(edgesLoaded).append(" edges loaded, "); >> sb.append(edgeInputSplitsLoaded).append(" edge input splits >> loaded"); >> >> sb.append("; min free memory on worker ").append( >> workerWithMinFreeMemory).append(" - ").append( >> DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average >> ").append( >> DECIMAL_FORMAT.format(freeMemoryMB)).append("MB"); >> >> So it seems to me that it's not loading in the InputFormat correctly. So >> I am assuming there's something wrong with my input format class or, >> probably more likely, something wrong with the graph I passed in? >> >> I pass in a small graph that has the format vertex id, vertex value, >> neighbours separated by tabs, my graph is shown below: >> >> 1 0 2 >> 2 1 1 3 4 >> 3 2 2 >> 4 3 2 >> >> The full output is shown below after I ran my command is shown below. If >> anyone could explain to me why I am not getting the expected output I would >> greatly appreciate it. >> >> Many thanks, >> >> Ghufran >> >> >> FULL OUTPUT: >> >> >> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format >> specified. Ensure your InputFormat does not require one. >> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format >> specified. Ensure your OutputFormat does not require one. >> 14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is >> disabled (default), do not allow any task retries (setting >> mapred.map.max.attempts = 0, old value = 4) >> 14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL: >> http://ghufran:50030/jobdetails.jsp?jobid=job_201403301044_0001 >> 14/03/30 10:48:45 INFO >> job.HaltApplicationUtils$DefaultHaltInstructionsWriter: >> writeHaltInstructions: To halt after next superstep execute: >> 'bin/halt-application --zkServer ghufran:22181 --zkNode >> /_hadoopBsp/job_201403301044_0001/_haltComputation' >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:host.name >> =ghufran >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >> environment:java.version=1.7.0_51 >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >> environment:java.vendor=Oracle Corporation >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >> environment:java.home=/usr/lib/jvm/java-7-oracle/jre >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >> environment:java.class.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >> environment:java.library.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64 >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >> environment:java.io.tmpdir=/tmp >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >> environment:java.compiler=<NA> >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.name >> =Linux >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >> environment:os.arch=amd64 >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >> environment:os.version=3.8.0-35-generic >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.name >> =ghufran >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >> environment:user.home=/home/ghufran >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >> environment:user.dir=/home/ghufran/Downloads/hadoop-0.20.203.0/bin >> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client connection, >> connectString=ghufran:22181 sessionTimeout=60000 >> watcher=org.apache.giraph.job.JobProgressTracker@209fa588 >> 14/03/30 10:48:45 INFO mapred.JobClient: Running job: >> job_201403301044_0001 >> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection to >> server ghufran/127.0.1.1:22181. Will not attempt to authenticate using >> SASL (unknown error) >> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection >> established to ghufran/127.0.1.1:22181, initiating session >> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment >> complete on server ghufran/127.0.1.1:22181, sessionid = >> 0x1451263c44c0002, negotiated timeout = 600000 >> 14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers - >> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, >> average 109.01MB >> 14/03/30 10:48:46 INFO mapred.JobClient: map 100% reduce 0% >> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers - >> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, >> average 109.01MB >> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers - >> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >> loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, >> average 109.01MB >> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers - >> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >> loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB, >> average 108.78MB >> >> >