Hi, I am running Spark 1.6 on EMR. I have workflow which does the following things:
1. Read the 2 flat file, create the data frame and join it. 2. Read the particular partition from the hive table and joins the dataframe from 1 with it. 3. Finally, insert overwrite into hive table which is partitioned into two fields. The stout log message in terminal when I submit the jobs show the below message. # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="kill -9 %p" # Executing /bin/sh -c "kill -9 30149"... Killed And while I check the YARN logs it shows the below error. The Spark UI doesn't show any failure stages or tasks but the jobs get stuck in the middle without completing all the stages. Did anyone come across similar issues? What could be the reason behind it and how could I troubleshoot it? 16/04/11 00:19:38 ERROR client.TransportResponseHandler: Still have 1 requests outstanding when connection from ip-10-184-195-29.ec2.internal/ 10.184.195.29:43162 is closed 16/04/11 00:19:38 WARN executor.CoarseGrainedExecutorBackend: An unknown (ip-10-184-195-29.ec2.internal:43162) driver disconnected. 16/04/11 00:19:38 ERROR executor.CoarseGrainedExecutorBackend: Driver 10.184.195.29:43162 disassociated! Shutting down. 16/04/11 00:19:38 WARN netty.NettyRpcEndpointRef: Error sending message [message = Heartbeat(12,[Lscala.Tuple2;@6545df9a,BlockManagerId(12, ip-10-184-194-43.ec2.internal, 43867))] in 1 attempts java.io.IOException: Connection from ip-10-184-195-29.ec2.internal/ 10.184.195.29:43162 closed at org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:124) at org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:94) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:739) at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:659) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) 16/04/11 00:19:38 INFO storage.DiskBlockManager: Shutdown hook called 16/04/11 00:19:38 INFO util.ShutdownHookManager: Shutdown hook called