Hi Claudio, The version of hadoop should be 0.20.203.0, but I am not quite sure about the version of Giraph, I got it from:
git clone https://github.com/apache/giraph.git and the command I used is something like the one below, but I might also used the giraph.maxPartitionsInMemory=1 option at that time too, but with or without this option, it did not work. $HADOOP_HOME/bin/hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreMessages=true -Dgiraph.useOutOfCoreGraph=true org.apache.giraph.examples.SimplePageRankComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/andy/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/andy/output/page6 -w 3 -mc org.apache.giraph.examples.SimplePageRankComputation\$SimplePageRankMasterCompute Thanks, Jian On Sat, Oct 19, 2013 at 11:21 AM, Claudio Martella < claudio.marte...@gmail.com> wrote: > looking at your logs, there's a null pointer exception. looks like a bug > to me. what version are you running? what command are you using to run the > job? > > > On Fri, Oct 18, 2013 at 9:03 AM, Jianqiang Ou <oujianqiang...@gmail.com>wrote: > >> Thanks, I just tried another dataset, which could be successfully handled >> by my cluster within memory. However, exceptions still occurred with the >> -Dgiraph.useOutOfCoreGraph=true option, but it works fine with only >> -Dgiraph.useOutOfCoreMessages=true >> option, so do you still think it is the dir permission issue? >> >> By the way, the dir path you mentioned should be the dir to store the >> outofcore partion and messages in local file system, right? But how do I >> know where it is? It should be determined by Giraph instead of the >> applications, right? >> >> Thanks for your time and patience again, >> Jian >> >> >> On Thu, Oct 17, 2013 at 5:32 PM, Jyotirmoy Sundi <sundi...@gmail.com>wrote: >> >>> apart from these you might also want to check permissions of the dir >>> path where offloading of vertices and messages happen. >>> Ideally giraph is not meant for out-of-core if you graph is much bigger >>> then the cluster can handle in memory, using giraph defeats the purpose in >>> this case. >>> >>> >>> >>> On Thu, Oct 17, 2013 at 8:13 AM, Jianqiang Ou >>> <oujianqiang...@gmail.com>wrote: >>> >>>> Thanks very much, so are you saying if I use Dgiraph.maxPartitionsInMemory >>>> and Dgiraph.maxMessagesInMemory to make them both smaller number, then >>>> it might work? >>>> >>>> Thanks again, >>>> Jian >>>> >>>> >>>> On Thu, Oct 17, 2013 at 12:56 AM, Jyotirmoy Sundi >>>> <sundi...@gmail.com>wrote: >>>> >>>>> You need to tune it per your cluster. This is what mentioned in the >>>>> docs: >>>>> *"It is difficult to decide a general policy to use out-of-core >>>>> capabilities*, as it depends on the behavior of the algorithm and the >>>>> input graph. The exact number of partitions and messages to keep in memory >>>>> depends on the cluster capabilities, the number of messages produced per >>>>> superstep, and number of active vertices per superstep. Moreover, it >>>>> depends on the type and size of vertex values and messages. For example, >>>>> algorithms such as Belief Propagation tend to keep large vertex values, >>>>> while algorithms such as clique computations tend to send large messages >>>>> along. Hence, it depends on your algorithm what feature to rely on more." >>>>> >>>>> Thanks >>>>> Sundi >>>>> >>>>> >>>>> On Wed, Oct 16, 2013 at 9:41 PM, Jianqiang Ou < >>>>> oujianqiang...@gmail.com> wrote: >>>>> >>>>>> Hi Sundi, >>>>>> >>>>>> I just tried your method, but somehow the job failed, the attached is >>>>>> the history of the job. and it was good without the outofcore options. Do >>>>>> you have any clue why is that? >>>>>> >>>>>> The command I used to run the program is below: >>>>>> >>>>>> $HADOOP_HOME/bin/hadoop jar >>>>>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop- >>>>>> 0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner >>>>>> -Dgiraph.useOutOfCoreMessages=true -Dgiraph.useOutOfCoreGraph=true >>>>>> org.apache.giraph.examples.SimplePageRankComputation -vif >>>>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat >>>>>> -vip /user/andy/input/tiny_graph.txt -vof >>>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op >>>>>> /user/andy/output/page3 -w 3 -mc >>>>>> org.apache.giraph.examples.SimplePageRankComputation\$SimplePageRankMasterCompute >>>>>> >>>>>> Many thanks, >>>>>> >>>>>> Jianqiang >>>>>> >>>>>> On Wed, Oct 16, 2013 at 12:11 PM, Jianqiang Ou < >>>>>> oujianqiang...@gmail.com> wrote: >>>>>> >>>>>>> got it, thank you very much! >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 16, 2013 at 10:43 AM, Jyotirmoy Sundi < >>>>>>> sundi...@gmail.com> wrote: >>>>>>> >>>>>>>> Put it as -Dgiraph.useOutOfCoreMessages=true >>>>>>>> -Dgiraph.useOutOfCoreGraph=true after GiraphRuuner >>>>>>>> like >>>>>>>> hadoop jar girap.jar org.apache.giraph.GiraphRunner >>>>>>>> -Dgiraph.useOutOfCoreMessages=true >>>>>>>> -Dgiraph.useOutOfCoreGraph=true ... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Oct 16, 2013 at 7:29 AM, Jianqiang Ou < >>>>>>>> oujianqiang...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi I have a question about the out of core giraph. It is said >>>>>>>>> that, in order to use disk to store the partions, we need to use " >>>>>>>>> giraph.useOutOfCoreGraph=true", but where should I put this >>>>>>>>> statement to? >>>>>>>>> >>>>>>>>> BTW, I am just trying to use the pagerank or shortestpath example >>>>>>>>> to test the out of core performance of my cluster. >>>>>>>>> >>>>>>>>> Thanks very much, >>>>>>>>> Jian >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, >>>>>>>> Jyotirmoy Sundi >>>>>>>> Data Engineer, >>>>>>>> Admobius >>>>>>>> >>>>>>>> San Francisco, CA 94158 >>>>>>>> >>>>>>> >>>>>>> >>>>>> On Wed, Oct 16, 2013 at 12:11 PM, Jianqiang Ou < >>>>>> oujianqiang...@gmail.com> wrote: >>>>>> >>>>>>> got it, thank you very much! >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 16, 2013 at 10:43 AM, Jyotirmoy Sundi < >>>>>>> sundi...@gmail.com> wrote: >>>>>>> >>>>>>>> Put it as -Dgiraph.useOutOfCoreMessages=true >>>>>>>> -Dgiraph.useOutOfCoreGraph=true after GiraphRuuner >>>>>>>> like >>>>>>>> hadoop jar girap.jar org.apache.giraph.GiraphRunner >>>>>>>> -Dgiraph.useOutOfCoreMessages=true >>>>>>>> -Dgiraph.useOutOfCoreGraph=true ... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Oct 16, 2013 at 7:29 AM, Jianqiang Ou < >>>>>>>> oujianqiang...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi I have a question about the out of core giraph. It is said >>>>>>>>> that, in order to use disk to store the partions, we need to use " >>>>>>>>> giraph.useOutOfCoreGraph=true", but where should I put this >>>>>>>>> statement to? >>>>>>>>> >>>>>>>>> BTW, I am just trying to use the pagerank or shortestpath example >>>>>>>>> to test the out of core performance of my cluster. >>>>>>>>> >>>>>>>>> Thanks very much, >>>>>>>>> Jian >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, >>>>>>>> Jyotirmoy Sundi >>>>>>>> Data Engineer, >>>>>>>> Admobius >>>>>>>> >>>>>>>> San Francisco, CA 94158 >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> Jyotirmoy Sundi >>>>> Data Engineer, >>>>> Admobius >>>>> >>>>> San Francisco, CA 94158 >>>>> >>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Jyotirmoy Sundi >>> Data Engineer, >>> Admobius >>> >>> San Francisco, CA 94158 >>> >> >> > > > -- > Claudio Martella > claudio.marte...@gmail.com >