More info: If I add -Dgiraph.useOutOfCoreGraph=true it can run successfully but superstep -1 is extremely slow. If I do not add Dgiraph.useOutOfCoreGraph=true, it loads much faster but will show error at waiting about last 10 workers to finished superstep -1. The error is:
org.apache.giraph.master.BspServiceMaster: *barrierOnWorkerList: Missing chosen workers* [Worker(hostname=trantor17.umiacs.umd.edu, MRtaskID=124, port=30124), Worker(hostname=trantor17.umiacs.umd.edu, MRtaskID=126, port=30126), Worker(hostname=trantor17.umiacs.umd.edu, MRtaskID=128, port=30128), Worker(hostname=trantor17.umiacs.umd.edu, MRtaskID=130, port=30130)] on superstep -1 2016-10-23 10:40:16,358 ERROR [org.apache.giraph.master.MasterThread] org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed with IllegalStateException java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported) Seems this error is just like https://issues.apache. org/jira/browse/GIRAPH-904 but there is no upper case in my hostnames Any ideas about this? Many Thanks, Hai On Sun, Oct 23, 2016 at 8:36 AM, Hai Lan <[email protected]> wrote: > Thanks Agrta > > Thanks for your response. How exact I can do to increase min and max > RAM?(in which conf file or by using any command/arguments? my > giraph-site.xml is empty as default). > > As I saw online how to increase the heap size(not sure it is the same > thing like you mentioned min max RAM size), many people suggest to increase: > mapred.child.java.opts OR HADOOP_DATANODE_OPTS > > But they are not help. My problem happen during "VertexInputSplitsCallable: > readVertexInputSplit:", so I tried to increase mapreduce.map.memory.mb > and decrease # of container/workers. Currently I'm using 248 workers and > mapreduce.map.memory.mb=12000, ratio=0.7. This can help but I face new > problem: > > 1. The superstep -1 is extremely slow, like take 7-8 hours to load a 150G > graph: > e.g. > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 106 out > of 248 workers finished on superstep -1 on path > /_hadoopBsp/job_1477020594559_0012/_vertexInputSplitDoneDir > > I saw in log like: > INFO [main] org.apache.giraph.comm.netty.NettyClient: > logInfoAboutOpenRequests: Waiting interval of 15000 msecs, 2499 open > requests, waiting for it to be <= 0, MBytes/sec received = 0.0001, > MBytesReceived = 0.0058, ave received req MBytes = 0, secs waited = 92.12 > MBytes/sec sent = 10.4373, MBytesSent = 961.4983, ave sent req MBytes = > 0.3244, secs waited = 92.12 > > To finish those 2499 open requests will take a very long time. *I'm not > sure is this normal?* > > 2. I tried out-of-core graph option but I'm not sure I'm using it correct. > I did add -Dgiraph.useOutOfCoreGraph=true -ca > isStaticGraph=true,giraph.maxPartitionsInMemory=10. > But how I know if it is work? > > I doubt when I tried 15T graph, the problem will be worse. What should I > do? > > Thanks for your help. > > Best, > Hai > > > On Sun, Oct 23, 2016 at 7:11 AM, Agrta Rawat <[email protected]> > wrote: > >> Hi Hai, >> >> Please check your giraph configurations. Try increasing min and max RAM >> size in your configurations. >> This should help. >> >> Regards, >> Agrta Rawat >> >> >> On Sat, Oct 22, 2016 at 7:46 PM, Hai Lan <[email protected]> wrote: >> >>> Can anyone help with this? >>> >>> Thanks a lot! >>> >>> >>> On Thu, Oct 20, 2016 at 9:48 PM, Hai Lan <[email protected]> wrote: >>> >>>> Dear all, >>>> >>>> I'm facing a problem when I run large graph job (currently 1.6T, will >>>> be 16T then), it always shows java.lang.OutOfMemoryError: Java heap >>>> space error when loaded specific numbers of vertex(near 59000000). I tried >>>> to add like: >>>> -Dgiraph.useOutOfCoreGraph=true >>>> -Dmapred.child.java.opts="-XX:-UseGCOverheadLimit" OR >>>> -Dmapred.child.java.opts="-Xmx16384" >>>> -Dgiraph.yarn.task.heap.mb=36570 >>>> >>>> but the problem remain though I can see those value are shown in >>>> Metadata. >>>> >>>> I'm not sure the max value of memory in this VertexInputSplitsCallable >>>> info is related to java heap size. >>>> INFO [load-0] org.apache.giraph.worker.VertexInputSplitsCallable: >>>> readVertexInputSplit: Loaded 46975802 vertices at 68977.49310291892 >>>> vertices/sec 0 edges at 0.0 edges/sec Memory (free/total/max) = 475.08M / >>>> 2759.00M / 2759.00M >>>> >>>> But I am noticed in main log, it *always* shows: >>>> INFO [AsyncDispatcher event handler] org.apache.hadoop.mapred.JobConf: >>>> Task java-opts do not specify heap size. Setting task attempt jvm max heap >>>> size to -Xmx2868m >>>> *no matter what arguments I added*. Even when I run normal Hadoop >>>> jobs. >>>> >>>> Any ideas about this? Following is the log. >>>> >>>> 2016-10-20 21:25:49,008 ERROR [netty-client-worker-2] >>>> org.apache.giraph.comm.netty.NettyClient: Request failed >>>> java.lang.OutOfMemoryError: Java heap space >>>> at io.netty.buffer.UnpooledHeapByteBuf.<init>(UnpooledHeapByteB >>>> uf.java:45) >>>> at io.netty.buffer.UnpooledByteBufAllocator.newHeapBuffer(Unpoo >>>> ledByteBufAllocator.java:43) >>>> at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(Abstract >>>> ByteBufAllocator.java:136) >>>> at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(Abstract >>>> ByteBufAllocator.java:127) >>>> at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByte >>>> BufAllocator.java:85) >>>> at org.apache.giraph.comm.netty.handler.RequestEncoder.write(Re >>>> questEncoder.java:81) >>>> at io.netty.channel.DefaultChannelHandlerContext.invokeWrite(De >>>> faultChannelHandlerContext.java:645) >>>> at io.netty.channel.DefaultChannelHandlerContext.access$2000(De >>>> faultChannelHandlerContext.java:29) >>>> at io.netty.channel.DefaultChannelHandlerContext$WriteTask.run( >>>> DefaultChannelHandlerContext.java:906) >>>> at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEve >>>> ntExecutor.java:36) >>>> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(Sin >>>> gleThreadEventExecutor.java:101) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 2016-10-20 21:25:55,299 ERROR [netty-client-worker-1] >>>> org.apache.giraph.comm.netty.NettyClient: Request failed >>>> java.lang.OutOfMemoryError: Java heap space >>>> at io.netty.buffer.UnpooledHeapByteBuf.<init>(UnpooledHeapByteB >>>> uf.java:45) >>>> at io.netty.buffer.UnpooledByteBufAllocator.newHeapBuffer(Unpoo >>>> ledByteBufAllocator.java:43) >>>> at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(Abstract >>>> ByteBufAllocator.java:136) >>>> at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(Abstract >>>> ByteBufAllocator.java:127) >>>> at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByte >>>> BufAllocator.java:85) >>>> at org.apache.giraph.comm.netty.handler.RequestEncoder.write(Re >>>> questEncoder.java:81) >>>> at io.netty.channel.DefaultChannelHandlerContext.invokeWrite(De >>>> faultChannelHandlerContext.java:645) >>>> at io.netty.channel.DefaultChannelHandlerContext.access$2000(De >>>> faultChannelHandlerContext.java:29) >>>> at io.netty.channel.DefaultChannelHandlerContext$WriteTask.run( >>>> DefaultChannelHandlerContext.java:906) >>>> at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEve >>>> ntExecutor.java:36) >>>> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(Sin >>>> gleThreadEventExecutor.java:101) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 2016-10-20 21:26:06,731 ERROR [main] org.apache.giraph.graph.GraphMapper: >>>> Caught an unrecoverable exception waitFor: ExecutionException occurred >>>> while waiting for org.apache.giraph.utils.Progre >>>> ssableUtils$FutureWaitable@6737a445 >>>> java.lang.IllegalStateException: waitFor: ExecutionException occurred >>>> while waiting for org.apache.giraph.utils.Progre >>>> ssableUtils$FutureWaitable@6737a445 >>>> at org.apache.giraph.utils.ProgressableUtils.waitFor(Progressab >>>> leUtils.java:193) >>>> at org.apache.giraph.utils.ProgressableUtils.waitForever(Progre >>>> ssableUtils.java:151) >>>> at org.apache.giraph.utils.ProgressableUtils.waitForever(Progre >>>> ssableUtils.java:136) >>>> at org.apache.giraph.utils.ProgressableUtils.getFutureResult(Pr >>>> ogressableUtils.java:99) >>>> at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCal >>>> lables(ProgressableUtils.java:233) >>>> at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(Bs >>>> pServiceWorker.java:316) >>>> at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspSe >>>> rviceWorker.java:409) >>>> at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWo >>>> rker.java:629) >>>> at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskMa >>>> nager.java:284) >>>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93) >>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) >>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro >>>> upInformation.java:1693) >>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >>>> Caused by: java.util.concurrent.ExecutionException: >>>> java.lang.OutOfMemoryError: Java heap space >>>> at java.util.concurrent.FutureTask.report(FutureTask.java:122) >>>> at java.util.concurrent.FutureTask.get(FutureTask.java:202) >>>> at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.wai >>>> tFor(ProgressableUtils.java:312) >>>> at org.apache.giraph.utils.ProgressableUtils.waitFor(Progressab >>>> leUtils.java:185) >>>> ... 16 more >>>> Caused by: java.lang.OutOfMemoryError: Java heap space >>>> at org.apache.giraph.utils.UnsafeByteArrayOutputStream.<init>(U >>>> nsafeByteArrayOutputStream.java:81) >>>> at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.c >>>> reateExtendedDataOutput(ImmutableClassesGiraphConfiguration.java:1161) >>>> at org.apache.giraph.comm.SendPartitionCache.addVertex(SendPart >>>> itionCache.java:77) >>>> at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcess >>>> or.sendVertexRequest(NettyWorkerClientRequestProcessor.java:248) >>>> at org.apache.giraph.worker.VertexInputSplitsCallable.readInput >>>> Split(VertexInputSplitsCallable.java:231) >>>> at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit( >>>> InputSplitsCallable.java:267) >>>> at org.apache.giraph.worker.InputSplitsCallable.call(InputSplit >>>> sCallable.java:211) >>>> at org.apache.giraph.worker.InputSplitsCallable.call(InputSplit >>>> sCallable.java:60) >>>> at org.apache.giraph.utils.LogStacktraceCallable.call(LogStackt >>>> raceCallable.java:51) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>>> Executor.java:1145) >>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>>> lExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 2016-10-20 21:26:06,737 ERROR [main] >>>> org.apache.giraph.worker.BspServiceWorker: >>>> unregisterHealth: Got failure, unregistering health on >>>> /_hadoopBsp/job_1476386340018_0175/_applicationAttemptsDir/0 >>>> /_superstepDir/-1/_workerHealthyDir/hadoop18.umd.com_23 on superstep -1 >>>> 2016-10-20 21:26:06,746 WARN [main] org.apache.hadoop.mapred.YarnChild: >>>> Exception running child : java.lang.IllegalStateException: run: Caught >>>> an unrecoverable exception waitFor: ExecutionException occurred while >>>> waiting for org.apache.giraph.utils.Progre >>>> ssableUtils$FutureWaitable@6737a445 >>>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:104) >>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) >>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro >>>> upInformation.java:1693) >>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >>>> Caused by: java.lang.IllegalStateException: waitFor: >>>> ExecutionException occurred while waiting for >>>> org.apache.giraph.utils.Progre >>>> ssableUtils$FutureWaitable@6737a445 >>>> at org.apache.giraph.utils.ProgressableUtils.waitFor(Progressab >>>> leUtils.java:193) >>>> at org.apache.giraph.utils.ProgressableUtils.waitForever(Progre >>>> ssableUtils.java:151) >>>> at org.apache.giraph.utils.ProgressableUtils.waitForever(Progre >>>> ssableUtils.java:136) >>>> at org.apache.giraph.utils.ProgressableUtils.getFutureResult(Pr >>>> ogressableUtils.java:99) >>>> at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCal >>>> lables(ProgressableUtils.java:233) >>>> at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(Bs >>>> pServiceWorker.java:316) >>>> at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspSe >>>> rviceWorker.java:409) >>>> at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWo >>>> rker.java:629) >>>> at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskMa >>>> nager.java:284) >>>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93) >>>> ... 7 more >>>> Caused by: java.util.concurrent.ExecutionException: >>>> java.lang.OutOfMemoryError: Java heap space >>>> at java.util.concurrent.FutureTask.report(FutureTask.java:122) >>>> at java.util.concurrent.FutureTask.get(FutureTask.java:202) >>>> at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.wai >>>> tFor(ProgressableUtils.java:312) >>>> at org.apache.giraph.utils.ProgressableUtils.waitFor(Progressab >>>> leUtils.java:185) >>>> ... 16 more >>>> Caused by: java.lang.OutOfMemoryError: Java heap space >>>> at org.apache.giraph.utils.UnsafeByteArrayOutputStream.<init>(U >>>> nsafeByteArrayOutputStream.java:81) >>>> at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.c >>>> reateExtendedDataOutput(ImmutableClassesGiraphConfiguration.java:1161) >>>> at org.apache.giraph.comm.SendPartitionCache.addVertex(SendPart >>>> itionCache.java:77) >>>> at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcess >>>> or.sendVertexRequest(NettyWorkerClientRequestProcessor.java:248) >>>> at org.apache.giraph.worker.VertexInputSplitsCallable.readInput >>>> Split(VertexInputSplitsCallable.java:231) >>>> at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit( >>>> InputSplitsCallable.java:267) >>>> at org.apache.giraph.worker.InputSplitsCallable.call(InputSplit >>>> sCallable.java:211) >>>> at org.apache.giraph.worker.InputSplitsCallable.call(InputSplit >>>> sCallable.java:60) >>>> at org.apache.giraph.utils.LogStacktraceCallable.call(LogStackt >>>> raceCallable.java:51) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>>> Executor.java:1145) >>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>>> lExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> >>>> Thank you so much! >>>> >>>> Best, >>>> >>>> Hai >>>> >>> >>> >> >
