Hi Paneet, Using pseudo-distributed, single-node Hadoop cluster means that you are running Hadoop on a single node acting as both a master and a slave. Giraph’s code assumes that you can run at least 4 mappers at once, but unfortunately the default configuration allows only 2. Therefore, you need to updated "mapred-site.xml" so that you can run at least 4 mappers. If you have at least another data node, which means you are running a distributed, multi-node Hadoop cluster, this requirement is met and no change is required.
I hope this clarifies the difference. It's time to run some examples! Cheers, Yazan On Thu, Jun 6, 2013 at 9:35 PM, Puneet Agarwal <puagar...@yahoo.com> wrote: > Dear Yazan, > > The problem was that I was using Hadoop in Standalone mode. > > > Then configured Hadoop in Pseudo Distributed more / Single Node Set-up, and > it worked without any problem. > > Regards > Puneet > > > ----- Original Message ----- > From: Yazan Boshmaf <bosh...@ece.ubc.ca> > To: user@giraph.apache.org; Puneet Agarwal <puagar...@yahoo.com> > Cc: > Sent: Sunday, June 2, 2013 8:24 AM > Subject: Re: GiraphJob Vs InternalVertexRunner > > Puneet, concerning the examples, recently there was a new diff to > decouple vertex data and computation which was a significant change. > Check issue GIRAPH-667 for more info > (https://issues.apache.org/jira/browse/GIRAPH-667). > > On Sat, Jun 1, 2013 at 7:50 PM, Yazan Boshmaf <bosh...@ece.ubc.ca> wrote: >> I think the SimpleShortestPathsVertex.java class got renamed to >> SimpleShortestPathsComputation.java, as can be seen in trunk (check >> here: >> https://github.com/apache/giraph/tree/trunk/giraph-examples/src/main/java/org/apache/giraph/examples). >> >> Concerning the issue you stated, you seem to have run Giraph using a >> local configuration and as far as I know, the LocalJobRunner has some >> restrictions (only one task at a time, and thus no split master/worker >> mode). What's your Hadoop setup? Are you running a local >> pseudo-distributed Hadoop instance? If so, have you updated >> /directory-to-hadoop/conf/mapred-site.xml with: >> >> <property> >> <name>mapred.tasktracker.map.tasks.maximum</name> >> <value>4</value> >> </property> >> >> <property> >> <name>mapred.map.tasks</name> >> <value>4</value> >> </property> >> >> Please give it a shot and let me know. >> >> Best, >> Yazan >> >> On Sat, Jun 1, 2013 at 6:03 AM, Puneet Agarwal <puagar...@yahoo.com> wrote: >>> Hi Yazan, >>> >>> This indeed if of great help especially the help command: >>> >>> "/directory-to-hadoop/bin/hadoop >>> jar/directory-to-giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar >>> org.apache.giraph.GiraphRunner -h" >>> >>> But there seems to be some problem in your command because there is no >>> class named "org.apache.giraph.examples.SimpleShortestPathsComputation" >>> >>> I then looked up in the path "org/apache/giraph/examples/" and found that >>> there is a class named >>> "org.apache.giraph.examples.SimpleShortestPathsVertex" so I tried this >>> instead of yours. But this does not work >>> >>> It gives following error. Any guidance will be helpful >>> >>> 13/06/01 18:28:22 INFO utils.ConfigurationUtils: No edge input format >>> specified. Ensure your InputFormat does not require one. >>> 13/06/01 18:28:22 WARN job.GiraphConfigurationValidator: Output format >>> vertex index type is not known >>> 13/06/01 18:28:22 WARN job.GiraphConfigurationValidator: Output format >>> vertex value type is not known >>> 13/06/01 18:28:22 WARN job.GiraphConfigurationValidator: Output format edge >>> value type is not known >>> 13/06/01 18:28:22 INFO job.GiraphJob: run: Since checkpointing is disabled >>> (default), do not allow any task retries (setting mapred.map.max.attempts = >>> 0, old value = 4) >>> Exception in thread "main" java.lang.IllegalArgumentException: >>> checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run >>> in split master / worker mode since there is only 1 task at a time! >>> at >>> org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:157) >>> at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:225) >>> at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:94) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >>> at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:616) >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >>> >>> >>> >>> >>> Regards >>> >>> Puneet >>> >>> >>> >>> ----- Original Message ----- >>> From: Yazan Boshmaf <bosh...@ece.ubc.ca> >>> To: user@giraph.apache.org; Puneet Agarwal <puagar...@yahoo.com> >>> Cc: >>> Sent: Saturday, June 1, 2013 7:59 AM >>> Subject: Re: GiraphJob Vs InternalVertexRunner >>> >>> After packaging Giraph (i.e., you can locate the JAR files under >>> "target" folder in each module), you can run one of the included >>> examples under /directory-to-giraph/giraph-examples as follows: >>> >>> /directory-to-hadoop/bin/hadoop jar >>> /directory-to-giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar >>> org.apache.giraph.GiraphRunner >>> org.apache.giraph.examples.SimpleShortestPathsComputation -vif >>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat >>> -vip /dfs-user-direcory/some-input-json -of >>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op >>> /dfs-user-direcory/some-output-text -w 1 >>> >>> This runs the SimpleShortestPathsComputation using the input file >>> /dfs-user-direcory/some-input-json, which has the format [source_id, >>> source_value, [ [dest_id, edge_weight], ...] ]. It comutes the >>> shortest paths to all nodes from a given source, which is the first >>> source_id in the input file. The output file is >>> /dfs-user-direcory/some-output-text, and its format is "source_id >>> distance_value". The computation is done using one worker. >>> >>> You can run he following for more info: >>> >>> /directory-to-hadoop/bin/hadoop jar >>> /directory-to-giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar >>> org.apache.giraph.GiraphRunner -h >>> >>> Hope this helps. >>> >>> Cheers, >>> Yazan >>> >>> On Fri, May 31, 2013 at 6:07 PM, Puneet Agarwal <puagar...@yahoo.com> wrote: >>>> It seems there are two ways to run a Giraph job. >>>> >>>> a) using the class InternalVertexRunner >>>> b) using the class GiraphJob >>>> >>>> Which one should be used where ? >>>> >>>> Thanks >>>> Puneet >>> >