Okay I have observed this problem as well with 10gb of adjacency text file. I was running on a 75gb instance on EC2 with 70gigs heap, which should be no problem, but it fails after several steps. I'm profiling it now in more detail.
Can't be that 10gb text use more than 20gb of heap as graph with messages. 2012/9/14 Thomas Jungblut <[email protected]> > I would trim the spaces in the key and value. > If it afterwards still crashes, I have no idea anymore and would recommend > you to take a heapdump with hprof and look what is sucking all that memory. > > 2012/9/14 庄克琛 <[email protected]> > >> Hi, I set the property to hama-site.xml. >> <property> >> <name> hama.messenger.queue.class </name> >> <value> org.apache.hama.bsp.message.DiskQueue </value> >> </property> >> Am I set it right? >> and restart the hama,(stop-bspd.sh and start-bspd.sh), try the test job >> again, and watch the memory slowly up to 70%, 80%, 90%, then crash... >_< >> >> >> 2012/9/14 Thomas Jungblut <[email protected]> >> >> > Yes, I wanted to have direct memory in Hama months ago, but hadn't >> managed >> > to find enough time. >> > That is a very good idea. >> > >> > 2012/9/14 Tommaso Teofili <[email protected]> >> > >> > > I think we may also create an Apache DirectMemory based DiskQueue >> which >> > > cache things on disk but hides most of the complexity. >> > > My 2 cents, >> > > Tommaso >> > > >> > > 2012/9/14 Thomas Jungblut <[email protected]> >> > > >> > > > I have created an issue for that: >> > > > HAMA-642<https://issues.apache.org/jira/browse/HAMA-642> >> > > > >> > > > 2012/9/14 Thomas Jungblut <[email protected]> >> > > > >> > > > > Basically I think that the graph should fit into memory of your >> task. >> > > > > So the messages could cause the overflow. >> > > > > >> > > > > You can try out the DiskQueue, this can be configured with setting >> > the >> > > > > property "hama.messenger.queue.class" to >> > > > > "org.apache.hama.bsp.message.DiskQueue". >> > > > > >> > > > > This will immediately flush the messages to disk. However this is >> > > > > experimental currently, so if you try it out please tell us if it >> > > helped. >> > > > > >> > > > > Thanks. >> > > > > >> > > > > To further scale this, we should write vertices that don't fit in >> > > memory >> > > > > on the disk. I will add another jira for that soon. >> > > > > >> > > > > 2012/9/14 庄克琛 <[email protected]> >> > > > > >> > > > >> oh, the HDFS block size is 128Mb, not 64Mb, so the 73Mb graph >> will >> > not >> > > > >> be split-ed on the HDFS. >> > > > >> >> > > > >> 2012/9/14 庄克琛 <[email protected]> >> > > > >> >> > > > >> > em... I have try your configure advise and restart the hama. >> > > > >> > I use the Google web graph( >> > > > >> > http://wiki.apache.org/hama/WriteHamaGraphFile ), >> > > > >> > Nodes: 875713 Edges: 5105039, which is about 73Mb, upload to a >> > small >> > > > >> HDFS >> > > > >> > cluster(block size is 64Mb), test the PageRank in ( >> > > > >> > http://wiki.apache.org/hama/WriteHamaGraphFile ), got the >> result >> > > as: >> > > > >> > ################ >> > > > >> > function@624-PC:~/hadoop-1.0.3/hama-0.6.0$ hama jar hama-6-P* >> > > > >> > input-google ouput-google >> > > > >> > 12/09/14 14:27:50 INFO bsp.FileInputFormat: Total input paths >> to >> > > > >> process : >> > > > >> > 1 >> > > > >> > 12/09/14 14:27:50 INFO bsp.FileInputFormat: Total # of splits: >> 3 >> > > > >> > 12/09/14 14:27:50 INFO bsp.BSPJobClient: Running job: >> > > > >> job_201008141420_0004 >> > > > >> > 12/09/14 14:27:53 INFO bsp.BSPJobClient: Current supersteps >> > number: >> > > 0 >> > > > >> > Java HotSpot(TM) Server VM warning: Attempt to allocate stack >> > guard >> > > > >> pages >> > > > >> > failed. >> > > > >> > ################### >> > > > >> > >> > > > >> > Last time the supersteps could be 1 or 2, then the same >> result. >> > > > >> > the task attempt****.err files are empty. >> > > > >> > Is the graph too large? >> > > > >> > I test on a small graph, get the right Rank results >> > > > >> > >> > > > >> > >> > > > >> > 2012/9/14 Edward J. Yoon <[email protected]> >> > > > >> > >> > > > >> > I've added multi-step partitioning method to save memory[1]. >> > > > >> >> >> > > > >> >> Please try to configure below property to hama-site.xml. >> > > > >> >> >> > > > >> >> <property> >> > > > >> >> <name>hama.graph.multi.step.partitioning.interval</name> >> > > > >> >> <value>10000000</value> >> > > > >> >> </property> >> > > > >> >> >> > > > >> >> 1. https://issues.apache.org/jira/browse/HAMA-599 >> > > > >> >> >> > > > >> >> On Fri, Sep 14, 2012 at 3:13 PM, 庄克琛 <[email protected]> >> > > wrote: >> > > > >> >> > HI, Actually I use this ( >> > > > >> >> > >> > > > >> >> >> > > > >> >> > > > >> > > >> > >> https://builds.apache.org/job/Hama-Nightly/672/artifact/.repository/org/apache/hama/hama-dist/0.6.0-SNAPSHOT/ >> > > > >> >> > ) >> > > > >> >> > to test again, I mean use this 0.6.0SNAPSHOT version replace >> > > > >> everything, >> > > > >> >> > got the same out of memory results. I just don't know what >> > cause >> > > > the >> > > > >> >> out of >> > > > >> >> > memory fails, only some small graph computing can be >> finished. >> > Is >> > > > >> this >> > > > >> >> > version finished the " >> > > > >> >> > [HAMA-596<https://issues.apache.org/jira/browse/HAMA-596 >> > > > >]:Optimize >> > > > >> >> > memory usage of graph job" ? >> > > > >> >> > Thanks >> > > > >> >> > >> > > > >> >> > 2012/9/14 Thomas Jungblut <[email protected]> >> > > > >> >> > >> > > > >> >> >> Hey, what jar did you exactly replace? >> > > > >> >> >> Am 14.09.2012 07:49 schrieb "庄克琛" <[email protected] >> >: >> > > > >> >> >> >> > > > >> >> >> > hi, every one: >> > > > >> >> >> > I use the hama-0.5.0 with the hadoop-1.0.3, try to do >> some >> > > large >> > > > >> >> graphs >> > > > >> >> >> > analysis. >> > > > >> >> >> > When I test the PageRank examples, as the ( >> > > > >> >> >> > http://wiki.apache.org/hama/WriteHamaGraphFile) shows, I >> > > > download >> > > > >> >> the >> > > > >> >> >> > graph >> > > > >> >> >> > data, and run the PageRank job on a small distributed >> > cluser, >> > > I >> > > > >> can >> > > > >> >> only >> > > > >> >> >> > get the out of memory failed, with Superstep 0,1,2 works >> > well, >> > > > >> then >> > > > >> >> get >> > > > >> >> >> the >> > > > >> >> >> > memory out fail.(Each computer have 2G memory) But when I >> > test >> > > > >> some >> > > > >> >> small >> > > > >> >> >> > graph, everything went well. >> > > > >> >> >> > Also I try the trunk version( >> > > > >> >> >> > >> > > https://builds.apache.org/job/Hama-Nightly/672/changes#detail3 >> > > > ), >> > > > >> >> replace >> > > > >> >> >> > my >> > > > >> >> >> > hama-0.5.0 with the hama-0.6.0-snapshot, only get the >> same >> > > > >> results. >> > > > >> >> >> > Anyone got better ideas? >> > > > >> >> >> > >> > > > >> >> >> > Thanks! >> > > > >> >> >> > >> > > > >> >> >> > -- >> > > > >> >> >> > >> > > > >> >> >> > *Zhuang Kechen >> > > > >> >> >> > * >> > > > >> >> >> > >> > > > >> >> >> >> > > > >> >> > >> > > > >> >> > >> > > > >> >> > >> > > > >> >> > -- >> > > > >> >> > >> > > > >> >> > *Zhuang Kechen* >> > > > >> >> > >> > > > >> >> > School of Computer Science & Technology >> > > > >> >> > >> > > > >> >> > ** >> > > > >> >> > Nanjing University of Science & Technology >> > > > >> >> > >> > > > >> >> > Lab.623, School of Computer Sci. & Tech. >> > > > >> >> > >> > > > >> >> > No.200, Xiaolingwei Street >> > > > >> >> > >> > > > >> >> > Nanjing, Jiangsu, 210094 >> > > > >> >> > >> > > > >> >> > P.R. China >> > > > >> >> > >> > > > >> >> > Tel: 025-84315982** >> > > > >> >> > >> > > > >> >> > Email: [email protected] >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> -- >> > > > >> >> Best Regards, Edward J. Yoon >> > > > >> >> @eddieyoon >> > > > >> >> >> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > -- >> > > > >> > >> > > > >> > *Zhuang Kechen >> > > > >> > * >> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> >> > > > >> >> > > > >> -- >> > > > >> >> > > > >> *Zhuang Kechen* >> > > > >> >> > > > >> School of Computer Science & Technology >> > > > >> >> > > > >> ** >> > > > >> Nanjing University of Science & Technology >> > > > >> >> > > > >> Lab.623, School of Computer Sci. & Tech. >> > > > >> >> > > > >> No.200, Xiaolingwei Street >> > > > >> >> > > > >> Nanjing, Jiangsu, 210094 >> > > > >> >> > > > >> P.R. China >> > > > >> >> > > > >> Tel: 025-84315982** >> > > > >> >> > > > >> Email: [email protected] >> > > > >> >> > > > > >> > > > > >> > > > >> > > >> > >> >> >> >> -- >> >> *Zhuang Kechen* >> > >
