ok solved. Looks like breathing the the spark-summit SFO air for 3 days helped a lot ! Piping the 7 million records to local disk still runs out of memory.So piped the results into another Hive table. I can live with that :-) /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e "use aers; create table unique_aers_demo as select distinct isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view " --driver-memory 4G --total-executor-cores 12 --executor-memory 4G
thanks From: Sanjay Subramanian <sanjaysubraman...@yahoo.com.INVALID> To: "user@spark.apache.org" <user@spark.apache.org> Sent: Thursday, June 11, 2015 8:43 AM Subject: spark-sql from CLI --->EXCEPTION: java.lang.OutOfMemoryError: Java heap space hey guys Using Hive and Impala daily intensively.Want to transition to spark-sql in CLI mode Currently in my sandbox I am using the Spark (standalone mode) in the CDH distribution (starving developer version 5.3.3) 3 datanode hadoop cluster32GB RAM per node8 cores per node | spark | 1.2.0+cdh5.3.3+371 | I am testing some stuff on one view and getting memory errorsPossibly reason is default memory per executor showing on 18080 is 512M These options when used to start the spark-sql CLI does not seem to have any effect --total-executor-cores 12 --executor-memory 4G /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e "select distinct isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view" aers.aers_demo_view (7 million+ records)===================isr bigint case idevent_dt bigint Event dateage double age of patientage_cod string days,months yearssex string M or Fyear intquarter int VIEW DEFINITION================CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM (SELECT `aers_demo_v1`.`isr`, `aers_demo_v1`.`event_dt`, `aers_demo_v1`.`age`, `aers_demo_v1`.`age_cod`, `aers_demo_v1`.`gndr_cod`, `aers_demo_v1`.`year`, `aers_demo_v1`.`quarter`FROM `aers`.`aers_demo_v1`UNION ALLSELECT `aers_demo_v2`.`isr`, `aers_demo_v2`.`event_dt`, `aers_demo_v2`.`age`, `aers_demo_v2`.`age_cod`, `aers_demo_v2`.`gndr_cod`, `aers_demo_v2`.`year`, `aers_demo_v2`.`quarter`FROM `aers`.`aers_demo_v2`UNION ALLSELECT `aers_demo_v3`.`isr`, `aers_demo_v3`.`event_dt`, `aers_demo_v3`.`age`, `aers_demo_v3`.`age_cod`, `aers_demo_v3`.`gndr_cod`, `aers_demo_v3`.`year`, `aers_demo_v3`.`quarter`FROM `aers`.`aers_demo_v3`UNION ALLSELECT `aers_demo_v4`.`isr`, `aers_demo_v4`.`event_dt`, `aers_demo_v4`.`age`, `aers_demo_v4`.`age_cod`, `aers_demo_v4`.`gndr_cod`, `aers_demo_v4`.`year`, `aers_demo_v4`.`quarter`FROM `aers`.`aers_demo_v4`UNION ALLSELECT `aers_demo_v5`.`primaryid` AS `ISR`, `aers_demo_v5`.`event_dt`, `aers_demo_v5`.`age`, `aers_demo_v5`.`age_cod`, `aers_demo_v5`.`gndr_cod`, `aers_demo_v5`.`year`, `aers_demo_v5`.`quarter`FROM `aers`.`aers_demo_v5`UNION ALLSELECT `aers_demo_v6`.`primaryid` AS `ISR`, `aers_demo_v6`.`event_dt`, `aers_demo_v6`.`age`, `aers_demo_v6`.`age_cod`, `aers_demo_v6`.`sex` AS `GNDR_COD`, `aers_demo_v6`.`year`, `aers_demo_v6`.`quarter`FROM `aers`.`aers_demo_v6`) `aers_demo_view` 15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x01b99855, /10.0.0.19:58117 => /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: Java heap space)java.lang.OutOfMemoryError: Java heap space at org.jboss.netty.buffer.HeapChannelBuffer.<init>(HeapChannelBuffer.java:42) at org.jboss.netty.buffer.BigEndianHeapChannelBuffer.<init>(BigEndianHeapChannelBuffer.java:34) at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134) at org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68) at org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48) at org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507) at org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:40 ERROR Utils: Uncaught exception in thread task-result-getter-0java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.Long.valueOf(Long.java:577) at com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:113) at com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:103) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:171) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:558) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:352) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:80) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:48) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:38 ERROR ActorSystemImpl: exception on LARS’ timer threadjava.lang.OutOfMemoryError: GC overhead limit exceeded at akka.dispatch.AbstractNodeQueue.<init>(AbstractNodeQueue.java:19) at akka.actor.LightArrayRevolverScheduler$TaskQueue.<init>(Scheduler.scala:431) at akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397) at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363) at java.lang.Thread.run(Thread.java:745)Exception in thread "task-result-getter-0" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.Long.valueOf(Long.java:577) at com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:113) at com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:103) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:171) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:558) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:352) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:80) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:48) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:41 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-scheduler-1] shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: GC overhead limit exceeded at akka.dispatch.AbstractNodeQueue.<init>(AbstractNodeQueue.java:19) at akka.actor.LightArrayRevolverScheduler$TaskQueue.<init>(Scheduler.scala:431) at akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397) at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363) at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:46 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-4] shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: GC overhead limit exceeded15/06/11 08:36:46 ERROR SparkSQLDriver: Failed in [select distinct isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view]org.apache.spark.SparkException: Job cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:702) at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:701) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:701) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1428) at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201) at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163) at akka.actor.ActorCell.terminate(ActorCell.scala:338) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262) at akka.dispatch.Mailbox.run(Mailbox.scala:218) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)15/06/11 08:36:51 WARN DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x79935a9b, /10.0.0.35:54028 => /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: Java heap space)java.lang.OutOfMemoryError: Java heap space15/06/11 08:36:52 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-5] shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: Java heap space15/06/11 08:36:53 WARN DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0xcb8c4b5d, /10.0.0.18:46744 => /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: Java heap space)java.lang.OutOfMemoryError: Java heap space15/06/11 08:36:56 WARN NioEventLoop: Unexpected exception in the selector loop.java.lang.OutOfMemoryError: GC overhead limit exceeded15/06/11 08:36:57 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-18] shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: GC overhead limit exceeded15/06/11 08:36:58 ERROR Utils: Uncaught exception in thread task-result-getter-3java.lang.OutOfMemoryError: GC overhead limit exceededException in thread "task-result-getter-3" java.lang.OutOfMemoryError: GC overhead limit exceeded15/06/11 08:37:01 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-4] shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: Java heap spaceTime taken: 70.982 seconds15/06/11 08:37:06 WARN QueuedThreadPool: 4 threads could not be stopped15/06/11 08:37:11 ERROR MapOutputTrackerMaster: Error communicating with MapOutputTrackerakka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#-2109395547]] had already been terminated. at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134) at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111) at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:122) at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:330) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:83) at org.apache.spark.SparkContext.stop(SparkContext.scala:1210) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.stop(SparkSQLEnv.scala:66) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$$anon$1.run(SparkSQLCLIDriver.scala:107)Exception in thread "Thread-3" org.apache.spark.SparkException: Error communicating with MapOutputTracker at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:116) at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:122) at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:330) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:83) at org.apache.spark.SparkContext.stop(SparkContext.scala:1210) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.stop(SparkSQLEnv.scala:66) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$$anon$1.run(SparkSQLCLIDriver.scala:107)Caused by: akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#-2109395547]] had already been terminated. at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134) at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111)