Re: GiraphJob Vs InternalVertexRunner

Yazan Boshmaf Thu, 06 Jun 2013 22:00:35 -0700

Hi Paneet,

Using pseudo-distributed, single-node Hadoop cluster means that you
are running Hadoop on a single node acting as both a master and a
slave. Giraph’s code assumes that you can run at least 4 mappers at
once, but unfortunately the default configuration allows only 2.
Therefore, you need to updated "mapred-site.xml" so that you can run
at least 4 mappers. If you have at least another data node, which
means you are running a distributed, multi-node Hadoop cluster, this
requirement is met and no change is required.


I hope this clarifies the difference. It's time to run some examples!

Cheers,
Yazan

On Thu, Jun 6, 2013 at 9:35 PM, Puneet Agarwal <puagar...@yahoo.com> wrote:
> Dear Yazan,
>
> The problem was that I was using Hadoop in Standalone mode.
>
>
> Then configured Hadoop in Pseudo Distributed more / Single Node Set-up, and 
> it worked without any problem.
>
> Regards
> Puneet
>
>
> ----- Original Message -----
> From: Yazan Boshmaf <bosh...@ece.ubc.ca>
> To: user@giraph.apache.org; Puneet Agarwal <puagar...@yahoo.com>
> Cc:
> Sent: Sunday, June 2, 2013 8:24 AM
> Subject: Re: GiraphJob Vs InternalVertexRunner
>
> Puneet, concerning the examples, recently there was a new diff to
> decouple vertex data and computation which was a significant change.
> Check issue GIRAPH-667 for more info
> (https://issues.apache.org/jira/browse/GIRAPH-667).
>
> On Sat, Jun 1, 2013 at 7:50 PM, Yazan Boshmaf <bosh...@ece.ubc.ca> wrote:
>> I think the SimpleShortestPathsVertex.java class got renamed to
>> SimpleShortestPathsComputation.java, as can be seen in trunk (check
>> here: 
>> https://github.com/apache/giraph/tree/trunk/giraph-examples/src/main/java/org/apache/giraph/examples).
>>
>> Concerning the issue you stated, you seem to have run Giraph using a
>> local configuration and as far as I know, the LocalJobRunner has some
>> restrictions (only one task at a time, and thus no split master/worker
>> mode). What's your Hadoop setup? Are you running a local
>> pseudo-distributed Hadoop instance? If so, have you updated
>> /directory-to-hadoop/conf/mapred-site.xml with:
>>
>> <property>
>>   <name>mapred.tasktracker.map.tasks.maximum</name>
>>   <value>4</value>
>> </property>
>>
>> <property>
>>   <name>mapred.map.tasks</name>
>>   <value>4</value>
>> </property>
>>
>> Please give it a shot and let me know.
>>
>> Best,
>> Yazan
>>
>> On Sat, Jun 1, 2013 at 6:03 AM, Puneet Agarwal <puagar...@yahoo.com> wrote:
>>> Hi Yazan,
>>>
>>> This indeed if of great help especially the help command:
>>>
>>> "/directory-to-hadoop/bin/hadoop 
>>> jar/directory-to-giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>>> org.apache.giraph.GiraphRunner -h"
>>>
>>> But there seems to be some problem in your command because there is no 
>>> class named "org.apache.giraph.examples.SimpleShortestPathsComputation"
>>>
>>> I then looked up in the path "org/apache/giraph/examples/" and found that 
>>> there is a class named 
>>> "org.apache.giraph.examples.SimpleShortestPathsVertex" so I tried this 
>>> instead of yours. But this does not work
>>>
>>> It gives following error. Any guidance will be helpful
>>>
>>> 13/06/01 18:28:22 INFO utils.ConfigurationUtils: No edge input format 
>>> specified. Ensure your InputFormat does not require one.
>>> 13/06/01 18:28:22 WARN job.GiraphConfigurationValidator: Output format 
>>> vertex index type is not known
>>> 13/06/01 18:28:22 WARN job.GiraphConfigurationValidator: Output format 
>>> vertex value type is not known
>>> 13/06/01 18:28:22 WARN job.GiraphConfigurationValidator: Output format edge 
>>> value type is not known
>>> 13/06/01 18:28:22 INFO job.GiraphJob: run: Since checkpointing is disabled 
>>> (default), do not allow any task retries (setting mapred.map.max.attempts = 
>>> 0, old value = 4)
>>> Exception in thread "main" java.lang.IllegalArgumentException: 
>>> checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run 
>>> in split master / worker mode since there is only 1 task at a time!
>>> at 
>>> org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:157)
>>> at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:225)
>>> at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:94)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>> at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:616)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>
>>>
>>>
>>>
>>> Regards
>>>
>>> Puneet
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: Yazan Boshmaf <bosh...@ece.ubc.ca>
>>> To: user@giraph.apache.org; Puneet Agarwal <puagar...@yahoo.com>
>>> Cc:
>>> Sent: Saturday, June 1, 2013 7:59 AM
>>> Subject: Re: GiraphJob Vs InternalVertexRunner
>>>
>>> After packaging Giraph (i.e., you can locate the JAR files under
>>> "target" folder in each module), you can run one of the included
>>> examples under /directory-to-giraph/giraph-examples as follows:
>>>
>>> /directory-to-hadoop/bin/hadoop jar
>>> /directory-to-giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>>> org.apache.giraph.GiraphRunner
>>> org.apache.giraph.examples.SimpleShortestPathsComputation -vif
>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>>> -vip /dfs-user-direcory/some-input-json -of
>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>> /dfs-user-direcory/some-output-text -w 1
>>>
>>> This runs the SimpleShortestPathsComputation using the input file
>>> /dfs-user-direcory/some-input-json, which has the format [source_id,
>>> source_value, [ [dest_id, edge_weight], ...] ]. It comutes the
>>> shortest paths to all nodes from a given source, which is the first
>>> source_id in the input file. The output file is
>>> /dfs-user-direcory/some-output-text, and its format is "source_id
>>> distance_value". The computation is done using one worker.
>>>
>>> You can run he following for more info:
>>>
>>> /directory-to-hadoop/bin/hadoop jar
>>> /directory-to-giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>>> org.apache.giraph.GiraphRunner -h
>>>
>>> Hope this helps.
>>>
>>> Cheers,
>>> Yazan
>>>
>>> On Fri, May 31, 2013 at 6:07 PM, Puneet Agarwal <puagar...@yahoo.com> wrote:
>>>> It seems there are two ways to run a Giraph job.
>>>>
>>>> a) using the class InternalVertexRunner
>>>> b) using the class GiraphJob
>>>>
>>>> Which one should be used where ?
>>>>
>>>> Thanks
>>>> Puneet
>>>
>

Re: GiraphJob Vs InternalVertexRunner

Reply via email to