Log shows stack traces that seem to match the assert in JIRA so it seems I am hitting the issue. Thanks for the heads up ...
15/03/23 20:29:50 ERROR actor.OneForOneStrategy: assertion failed: Allocator killed more executors than are allocated! java.lang.AssertionError: assertion failed: Allocator killed more executors than are allocated! at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.deploy.yarn.YarnAllocator.killExecutor(YarnAllocator.scala:152) at org.apache.spark.deploy.yarn.ApplicationMaster$AMActor$$anonfun$receive$1$$anonfun$applyOrElse$6.apply(ApplicationMaster.scala:547) at org.apache.spark.deploy.yarn.ApplicationMaster$AMActor$$anonfun$receive$1$$anonfun$applyOrElse$6.apply(ApplicationMaster.scala:547) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.deploy.yarn.ApplicationMaster$AMActor$$anonfun$receive$1.applyOrElse(ApplicationMaster.scala:547) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.deploy.yarn.ApplicationMaster$AMActor.aroundReceive(ApplicationMaster.scala:506) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) On Mon, Mar 23, 2015 at 2:25 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Mon, Mar 23, 2015 at 2:15 PM, Manoj Samel <manojsamelt...@gmail.com> > wrote: > > Found the issue above error - the setting for spark_shuffle was > incomplete. > > > > Now it is able to ask and get additional executors. The issue is once > they > > are released, it is not able to proceed with next query. > > That looks like SPARK-6325, which unfortunately was not fixed in time > for 1.3.0... > > -- > Marcelo >