There's no way to solve your problem until implementing disk-based sort queue. So, you should increase the max heap size and number of machines.
On Tue, Jan 21, 2014 at 11:58 PM, Ammar Sahib <[email protected]> wrote: > Hi Edward > > I tried to run my program with the option of DiskVerticesInfo using a cluster > of 5 "virtual" machines each with 4 GB of RAM. I configured the heap memory > to 2048 MB (-Xmx2048m). > > I am working with graph consists of 10 million vertices. After a round 3 > hours I get the error of Java heap space. Do you think that using a virtual > machines instead of real physical machines might have something to do with > this problem? > > The problem that I get: > 14/01/21 15:22:23 ERROR bsp.LocalBSPRunner: Exception during BSP execution! > java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java > heap space > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) > at java.util.concurrent.FutureTask.get(FutureTask.java:111) > at > org.apache.hama.bsp.LocalBSPRunner$ThreadObserver.run(LocalBSPRunner.java:313) > at java.lang.Thread.run(Thread.java:724) > Caused by: java.lang.OutOfMemoryError: Java heap space > > > > My configuration file content: > > <configuration> > <property> > <name>bsp.master.address</name> > <value>master</value> > </property> > <property> > <name>bsp.system.dir</name> > <value>/tmp/hama-hadoop/bsp/system</value> > </property> > <property> > <name>bsp.local.dir</name> > <value>/tmp/hama-hadoop/bsp/local</value> > </property> > <property> > <name>hama.tmp.dir</name> > <value>/tmp/hama-hadoop</value> > </property> > <property> > <name>fs.default.name</name> > <value>hdfs://master:54310</value> > </property> > <property> > <name>hama.zookeeper.quorum</name> > <value>master,slave1,slave2,slave3,slave4</value> > </property> > <property> > <name>bsp.child.java.opts</name> > <value>-Xmx2048m</value> > </property> > </configuration> > > > > > > On Tuesday, January 21, 2014 2:52 AM, Edward J. Yoon <[email protected]> > wrote: > > To use OffHeapVerticesInfo, you need to add Apache DirectMemory > libraries to lib folder. > > or, Try with DiskVerticesInfo. > > With trunk version, I was able to run 30 thousand vertices graph on > single machine, and 1B vertices on a full rack cluster (child opt: > -Xmx2048m). > > > On Tue, Jan 21, 2014 at 1:57 AM, Ammar Sahib <[email protected]> wrote: >> Hi >> >> Thanks for the reply. I am using the HAMA version from the TRUNK and I am >> running my own developed algorithm. I am trying to work with a grapg >> consists of 10 million vertices. Did someone experienced working with big >> graphs (millions of vertices) using HAMA? can you please share your >> experience? >> >> >> I am trying now to use: >> >> Conf.setClass(" >> hama.graph.vertices.info",org.apache.hama.graph. >> OffHeapVerticesInfo.class,org.apache.hama.graph.VerticesInfo.class); >> >> > I get the error: >> >> >> 14/01/20 17:42:16 ERROR bsp.LocalBSPRunner: Exception during BSP execution! >> java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: >> org/apache/directmemory/utils/CacheValuesIterable >> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) >> at java.util.concurrent.FutureTask.get(FutureTask.java:111) >> at >> org.apache.hama.bsp.LocalBSPRunner$ThreadObserver.run(LocalBSPRunner.java:313) >> at java.lang.Thread.run(Thread.java:724) >> Caused by: java.lang.NoClassDefFoundError: >> org/apache/directmemory/utils/CacheValuesIterable >> at > > org.apache.hama.graph.OffHeapVerticesInfo.skippingIterator(OffHeapVerticesInfo.java:112) >> at >> org.apache.hama.graph.GraphJobRunner.cleanup(GraphJobRunner.java:163) >> at >> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:262) >> at >> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286) >> at >> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> >> The log of my master is as following: >> >> /************************************************************ >> STARTUP_MSG: Starting BSPMaster >> STARTUP_MSG: host = c3-large1-master/10.255.255.2 >> STARTUP_MSG: args = [] >> STARTUP_MSG: version = 1.2.0 >> STARTUP_MSG: build = >> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r >> 1479473; compiled by 'hortonfo' on Mon May 6 18:29:07 UTC 2013 >> STARTUP_MSG: java = 1.7.0_25 >> ************************************************************/ >> 2014-01-14 21:27:35,808 INFO org.apache.hama.bsp.BSPMaster: RPC BSPMaster: >> host master port 40000 >> 2014-01-14 21:27:37,200 INFO org.apache.hama.ipc.Server: Starting Socket >> Reader #1 for port 40000 >> 2014-01-14 21:27:37,732 INFO org.mortbay.log: Logging to >> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via >> org.mortbay.log.Slf4jLog >> 2014-01-14 21:27:38,147 INFO org.apache.hama.http.HttpServer: Port returned >> by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening >> the > listener on 40013 >> 2014-01-14 21:27:38,168 INFO org.apache.hama.http.HttpServer: >> listener.getLocalPort() returned 40013 >> webServer.getConnectors()[0].getLocalPort() returned 40013 >> 2014-01-14 21:27:38,168 INFO org.apache.hama.http.HttpServer: Jetty bound to >> port 40013 >> 2014-01-14 21:27:38,168 INFO org.mortbay.log: jetty-6.1.14 >> 2014-01-14 21:27:38,446 INFO org.mortbay.log: Extract >> jar:file:/usr/local/hama-0.6.3/hama-core-0.6.3.jar!/webapp/bspmaster/ to >> /tmp/Jetty_master_40013_bspmaster____ge2lxf/webapp >> 2014-01-14 21:27:40,162 INFO org.mortbay.log: Started >> SelectChannelConnector@master:40013 >> 2014-01-14 21:27:40,734 INFO org.apache.hama.bsp.BSPMaster: Cleaning up the >> system directory >> 2014-01-14 21:27:40,734 > INFO org.apache.hama.bsp.BSPMaster: > hdfs://master:54310/tmp/hama-hadoop/bsp/system >> 2014-01-14 21:27:40,991 INFO org.apache.hama.bsp.sync.ZKSyncBSPMasterClient: >> Initialized ZK false >> 2014-01-14 21:27:40,991 INFO org.apache.hama.bsp.sync.ZKSyncClient: >> Initializing ZK Sync Client >> 2014-01-14 21:27:41,073 INFO org.apache.hama.ipc.Server: IPC Server >> Responder: starting >> 2014-01-14 21:27:41,077 INFO org.apache.hama.ipc.Server: IPC Server listener >> on 40000: starting >> 2014-01-14 21:27:41,085 INFO org.apache.hama.ipc.Server: IPC Server handler >> 0 on 40000: starting >> 2014-01-14 21:27:41,088 INFO org.apache.hama.bsp.BSPMaster: Starting RUNNING >> 2014-01-14 21:27:41,168 INFO org.apache.hama.bsp.BSPMaster: >> groomd_slave2_50000 is added. >> 2014-01-14 21:27:49,634 INFO org.apache.hama.bsp.BSPMaster: > groomd_slave1_50000 is added. >> 2014-01-14 21:28:15,943 INFO org.apache.hama.bsp.BSPMaster: >> groomd_master_50000 is added. >> >> >> >> >> >> >> >> On Sunday, January 19, 2014 7:58 AM, Tommaso Teofili >> <[email protected]> wrote: >> >> yes, the correct way of setting OffHeapVI is: conf.setClass(" >> hama.graph.vertices.info",org.apache.hama.graph. >> OffHeapVerticesInfo.class,org.apache.hama.graph.VerticesInfo.class); >> >> Apart from that, what Hama version are you running on? >> Looking at the code in trunk it shouldn't be possible to have a NPE on the >> currentVertex if the iterator is consumed correctly, instead if one doesn't >> call hasNext before next and / or calls next even if hasNext returns false >> then it's possible to have that NPE. >> Also what algorithm / example are you running? Any useful information (like >> environment, execution mode, logs, version, etc.) would be useful to help >> you. >> >> Tommaso >> >> >> >> >> 2014/1/19 步青云 <[email protected]> >> >>> I got the same problem about loading vertices into RAM.And I try to use >>> off OffHeapVerticesInfo. >>> You may use the > method setClass like this: >>> conf.setClass("hama.graph.vertices.info >>> ",org.apache.hama.graph.OffHeapVerticesInfo.class,org.apache.hama.graph.VerticesInfo.class); >>> However,I got the Nullexception using OffHeapVerticesInfo.The errors are >>> as follows: >>> >>> 14/01/18 20:54:23 ERROR bsp.LocalBSPRunner: Exception during BSP execution! >>> java.lang.NullPointerException >>> at >>> org.apache.hama.graph.OffHeapVerticesInfo$1.next(OffHeapVerticesInfo.java:139) >>> at >>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:251) >>> at org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:145) >>> at >>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256) >>> at >>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286) >>> >>> Anyone could help me to solve this problem?Thanks a lot. >>> >>> >>> >>> >>> ------------------ Original ------------------ >>> From: "Ammar Sahib"<[email protected]>; >>> Date: Jan 17, 2014 >>> To: "[email protected]"<[email protected]>; >>> >>> Subject: Re: loading vertices into RAM >>> >>> >>> >>> I think we are getting close now, However now I have runtime exception: >>> >>> Exception in thread "main" java.lang.RuntimeException: interface >>> org.apache.hama.graph.VerticesInfo not >>> org.apache.hama.graph.ListVerticesInfo >>> at >>> org.apache.hadoop.conf.Configuration.setClass(Configuration.java:858) >>> >>> >>> >>> >>> >>> On Friday, January 17, 2014 2:30 PM, Tommaso Teofili < >>> [email protected]> wrote: >>> >>> ah yes, sorry, you also have to specify the interface, I don't have the >>> code in front of me but it should be : >>> >>> conf.setClass("hama.graph.vertices.info", >>> org.apache.hama.graph.VerticesInfo.class, org.apache. >>> hama.graph.ListVerticesInfo.class); >>> >>> Tommaso >>> >>> >>> >>> 2014/1/17 Ammar Sahib <[email protected]> >>> >>> > Hi >>> > >>> > > Thanks for your reply. I used now: >>> > >>> > conf.setClass("hama.graph.vertices.info >>> > ",org.apache.hama.graph.ListVerticesInfo.class); >>> > >>> > Now I get this error: >>> > The method setClass(String, Class<?>, Class<?>) in the type Configuration >>> > is not applicable for the arguments (String, Class<ListVerticesInfo>) >>> > >>> > I am using HAMA 0.6.3 >>> > >>> > >>> > >>> > >>> > >>> > On Friday, January 17, 2014 12:59 PM, Tommaso Teofili < >>> > [email protected]> wrote: >>> > >>> > you're passing the fully qualified name of the Class as a String to a >>> > method setClass(String, Class) while you should pass the Class itself, >>> > e.g.: >>> > HamaConfiguration conf = new HamaConfiguration(); >>> > conf.setClass("hama.graph.vertices.info",org.apache. >>> > hama.graph.ListVerticesInfo.class); >>> > >>> > Hope this helps, >>> > Tommaso >>> > >>> > >>> > >>> > >>> > 2014/1/17 Ammar Sahib <[email protected]> >>> > >>> > > Hi >>> > > >>> > > I am trying to evaluate the different implementation below: >>> > > >>> > > >>> > > - ListVerticesinfo: loads vertices into array list. >>> > > - MapVerticesinfo: loads vertices into tree map. >>> > > - DiskVerticesInfo: loads vertices into a local file. >>> > > >>> > > When using the conf.setClass method I got an error. Below is sample of >>> my >>> > > code: >>> > > HamaConfiguration conf = new HamaConfiguration(); >>> > > conf.setClass("hama.graph.vertices.info >>> > > ","org.apache.hama.graph.ListVerticesInfo"); >>> > > >>> > > The error I am getting is: >>> > > The method setClass(String, Class<?>, Class<?>) in the type >>> Configuration >>> > > is not applicable for the arguments (String, String). >>> > > >>> > > However I found that I can use conf.set method. >>> > > >>> > > >>> > > Can someone tell me what is I am doing wrong? >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > On Wednesday, January 15, 2014 8:01 AM, Tommaso Teofili < >>> > > [email protected]> wrote: >>> > > >>> > > and OffHeapVerticesInfo for loading vertices off heap, which is >>> available >>> > > with 0.6.3 as well if I recall correctly. >>> > > Tommaso >>> > > >>> > > >>> > > >>> > > 2014/1/15 Edward J. Yoon <[email protected]> >>> > > >>> > > > There are few implementations. >>> > > > >>> > > > - ListVerticesinfo: > loads vertices into array list. >>> > > > - MapVerticesinfo: loads vertices into tree map. >>> > > > - DiskVerticesInfo: loads vertices into a local file. >>> > > > >>> > > > You can choose one of them by setting the "hama.graph.vertices.info" >>> > > > in job configuration. >>> > > > >>> > > > > conf.setClass("hama.graph.vertices.info", >>> > > > "org.apache.hama.graph.ListVerticesInfo". >>> > > > >>> > > > With the latest 0.6.3 version, you can use only ListVerticesInfo. >>> > > > Please use the TRUNK. >>> > > > >>> > > > >>> > > > > On Tue, Jan 14, 2014 at 11:18 PM, Ammar Sahib <[email protected] >>> > >>> > > > wrote: >>> > > > > Hi >>> > > > > >>> > > > > According to the BSP model, the data is processed in the RAM and >>> that >>> > > is >>> > > > the reason why Pregel model is faster than the MapReduce (MapReduce >>> > > > writedown to disk). Can someone explains to me how to be sure that >>> all >>> > > the >>> > > > graph vertices are actually been loaded in RAM? >>> > > > > >>> > > > > >>> > > > > How would HAMA behave if the vertices values are so big such that >>> the >>> > > > available RAM memory is not enough to contains all of the vertices? >>> > > > > >>> > > > > Regards >>> > > > >>> > > > >>> > > > >>> > > > -- >>> > > > Best Regards, Edward J. Yoon >>> > > > @eddieyoon > >>> > > > >>> > > >>> > >>> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon
