Here is my understanding of Giraph, but please Giraph experts correct me if this is wrong. Giraph loads Hadoop configuration information from files in folder /etc/hadoop/conf. One of properties Giraph looks for in mapred-site.xml is called mapred.job.tracker. If this property is not defined or if it's set to "local" Giraph assumes that the Hadoop local job runner is used. When class org.apache.giraph.job.GiraphJob executes it checks to see if the mapred.job.tracker property is set "local" and then makes sure that the number of workers property is set to one and that the split master worker property is set to false and otherwise throws an exception indicating that the arguments are not valid for the local job runner.
I have access to three Cloudera clusters (CDH4.2, CHD4.5 and CDH5.0) and for each cluster /etc/hadoop/conf/mapred-site.xml doesn't contain a property called mapred.job.tracker. However, that property is defined in the Cloudera Manager console. So in order to inform Giraph of what that value is I simply added another command line parameter to my Giraph command called -Dmapred.job.tracker and that solved this problem. Here is the full command: hadoop jar /users/stbesk/snapshot_from_git/jars/giraph-ex.jar org.apache.giraph.GiraphRunner -Dmapred.job.tracker=el01cn16.unx.sas.com org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/stbesk/input/tiny-graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/stbesk/output/shortestPath -w 10 -ca giraph.SplitMasterWorker=true -ca giraph.zkList=el01cn16.unx.sas.com:2181 Cheers. Stefan From: Stefan Beskow Sent: Monday, February 24, 2014 12:36 AM To: 'user@giraph.apache.org' Subject: Run SimpleShortestPathsVertex sample application using multiple workers Hi. I'm trying to run Giraph on Hadoop 2.0.0-cdh4.2.0 using a cluster with 60 nodes. When I run the sample application org.apache.giraph.examples.SimpleShortestPathsVertex with just 1 worker it works fine, but when I specify more than 1 worker it throws exception java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration as shown below. Is there a way to pass a command line parameter to Giraph so that it doesn't use the local job runner or do I need to update any of the Hadoop configuration files for this to work? Here is the command I use to run sample application with 2 workers: hadoop jar giraph-examples-1.0.0-for-hadoop-2.0.0-cdh4.2.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -Dgiraph.zkList=rdcgrd001.unx.sas.com:2181 -libjars giraph-examples-1.0.0-for-hadoop-2.0.0-cdh4.2.0-jar-with-dependencies.jar org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/stbesk/input/tiny_graph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/stbesk/output/shortestpathsC2 -ca SimpleShortestPathsVertex.source=2 -w 2 -ca giraph.SplitMasterWorker=true Here is the exception: 14/02/24 00:20:23 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one. 14/02/24 00:20:23 INFO utils.ConfigurationUtils: Setting custom argument [SimpleShortestPathsVertex.source] to [2] in GiraphConfiguration 14/02/24 00:20:23 INFO utils.ConfigurationUtils: Setting custom argument [giraph.SplitMasterWorker] to [true] in GiraphConfiguration 14/02/24 00:20:23 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known 14/02/24 00:20:23 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known 14/02/24 00:20:23 WARN job.GiraphConfigurationValidator: Output format edge value type is not known 14/02/24 00:20:23 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4) Exception in thread "main" java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, must have only one worker since only 1 task at a time! at org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:151) at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:225) at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:94) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Appreciate any help. Thanks. Stefan