Hello, I am a final year Bsc Computer Science Student who is using Apache Giraph for my final year project and dissertation and would appreciate very much if someone could help me with the following issue.
I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am having trouble running the ConnectedComponents example. I use the following command: hadoop jar /home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.ConnectedComponentsComputation -vif org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip /user/ghufran/in/my_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/ghufran/outCC -w 1 I believe it gets stuck in the InputSuperstep as the following is displayed in terminal when the command is running: 14/03/30 10:48:46 INFO mapred.JobClient: map 100% reduce 0% 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, average 109.01MB 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, average 109.01MB 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB, average 108.78MB .... which I traced back to the following if statement in the toString() method of core.org.apache.job.CombinedWorkerProgress: if (isInputSuperstep()) { sb.append("Loading data: "); sb.append(verticesLoaded).append(" vertices loaded, "); sb.append(vertexInputSplitsLoaded).append( " vertex input splits loaded; "); sb.append(edgesLoaded).append(" edges loaded, "); sb.append(edgeInputSplitsLoaded).append(" edge input splits loaded"); sb.append("; min free memory on worker ").append( workerWithMinFreeMemory).append(" - ").append( DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average ").append( DECIMAL_FORMAT.format(freeMemoryMB)).append("MB"); So it seems to me that it's not loading in the InputFormat correctly. So I am assuming there's something wrong with my input format class or, probably more likely, something wrong with the graph I passed in? I pass in a small graph that has the format vertex id, vertex value, neighbours separated by tabs, my graph is shown below: 1 0 2 2 1 1 3 4 3 2 2 4 3 2 The full output is shown below after I ran my command is shown below. If anyone could explain to me why I am not getting the expected output I would greatly appreciate it. Many thanks, Ghufran FULL OUTPUT: 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one. 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one. 14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4) 14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL: http://ghufran:50030/jobdetails.jsp?jobid=job_201403301044_0001 14/03/30 10:48:45 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer ghufran:22181 --zkNode /_hadoopBsp/job_201403301044_0001/_haltComputation' 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:host.name =ghufran 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_51 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.version=3.8.0-35-generic 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.name =ghufran 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/ghufran 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/ghufran/Downloads/hadoop-0.20.203.0/bin 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ghufran:22181 sessionTimeout=60000 watcher=org.apache.giraph.job.JobProgressTracker@209fa588 14/03/30 10:48:45 INFO mapred.JobClient: Running job: job_201403301044_0001 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection to server ghufran/127.0.1.1:22181. Will not attempt to authenticate using SASL (unknown error) 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection established to ghufran/127.0.1.1:22181, initiating session 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment complete on server ghufran/127.0.1.1:22181, sessionid = 0x1451263c44c0002, negotiated timeout = 600000 14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, average 109.01MB 14/03/30 10:48:46 INFO mapred.JobClient: map 100% reduce 0% 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, average 109.01MB 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, average 109.01MB 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB, average 108.78MB