Here is my understanding of Giraph, but please Giraph experts correct me if 
this is wrong. Giraph loads Hadoop configuration information from files in 
folder /etc/hadoop/conf. One of properties Giraph looks for in mapred-site.xml 
is called mapred.job.tracker. If this property is not defined or if it's set to 
"local" Giraph assumes that the Hadoop local job runner is used. When class 
org.apache.giraph.job.GiraphJob executes it checks to see if the 
mapred.job.tracker property is set "local" and then makes sure that the number 
of workers property is set to one and that the split master worker property is 
set to false and otherwise throws an exception indicating that the arguments 
are not valid for the local job runner.

I have access to three Cloudera clusters (CDH4.2, CHD4.5 and CDH5.0) and for 
each cluster /etc/hadoop/conf/mapred-site.xml doesn't contain a property called 
mapred.job.tracker. However, that property is defined in the Cloudera Manager 
console. So in order to inform Giraph of what that value is I simply added 
another command line parameter to my Giraph command called -Dmapred.job.tracker 
and that solved this problem.

Here is the full command:
hadoop jar /users/stbesk/snapshot_from_git/jars/giraph-ex.jar 
org.apache.giraph.GiraphRunner -Dmapred.job.tracker=el01cn16.unx.sas.com 
org.apache.giraph.examples.SimpleShortestPathsComputation -vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip 
/user/stbesk/input/tiny-graph.txt -vof 
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
/user/stbesk/output/shortestPath -w 10 -ca giraph.SplitMasterWorker=true -ca 
giraph.zkList=el01cn16.unx.sas.com:2181

Cheers.
Stefan

From: Stefan Beskow
Sent: Monday, February 24, 2014 12:36 AM
To: 'user@giraph.apache.org'
Subject: Run SimpleShortestPathsVertex sample application using multiple workers

Hi.

I'm trying to run Giraph on Hadoop 2.0.0-cdh4.2.0 using a cluster with 60 
nodes. When I run the sample application 
org.apache.giraph.examples.SimpleShortestPathsVertex with just 1 worker it 
works fine, but when I specify more than 1 worker it throws exception 
java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration as shown 
below. Is there a way to pass a command line parameter to Giraph so that it 
doesn't use the local job runner or do I need to update any of the Hadoop 
configuration files for this to work?

Here is the command I use to run sample application with 2 workers:
hadoop jar 
giraph-examples-1.0.0-for-hadoop-2.0.0-cdh4.2.0-jar-with-dependencies.jar 
org.apache.giraph.GiraphRunner -Dgiraph.zkList=rdcgrd001.unx.sas.com:2181 
-libjars 
giraph-examples-1.0.0-for-hadoop-2.0.0-cdh4.2.0-jar-with-dependencies.jar 
org.apache.giraph.examples.SimpleShortestPathsVertex -vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip 
/user/stbesk/input/tiny_graph.txt -of 
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
/user/stbesk/output/shortestpathsC2 -ca SimpleShortestPathsVertex.source=2 -w 2 
-ca giraph.SplitMasterWorker=true

Here is the exception:
14/02/24 00:20:23 INFO utils.ConfigurationUtils: No edge input format 
specified. Ensure your InputFormat does not require one.
14/02/24 00:20:23 INFO utils.ConfigurationUtils: Setting custom argument 
[SimpleShortestPathsVertex.source] to [2] in GiraphConfiguration
14/02/24 00:20:23 INFO utils.ConfigurationUtils: Setting custom argument 
[giraph.SplitMasterWorker] to [true] in GiraphConfiguration
14/02/24 00:20:23 WARN job.GiraphConfigurationValidator: Output format vertex 
index type is not known
14/02/24 00:20:23 WARN job.GiraphConfigurationValidator: Output format vertex 
value type is not known
14/02/24 00:20:23 WARN job.GiraphConfigurationValidator: Output format edge 
value type is not known
14/02/24 00:20:23 INFO job.GiraphJob: run: Since checkpointing is disabled 
(default), do not allow any task retries (setting mapred.map.max.attempts = 0, 
old value = 4)
Exception in thread "main" java.lang.IllegalArgumentException: 
checkLocalJobRunnerConfiguration: When using LocalJobRunner, must have only one 
worker since only 1 task at a time!
        at 
org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:151)
        at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:225)
        at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:94)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Appreciate any help.

Thanks.
Stefan


Reply via email to