Re: Worker dies while submitting a job

Luis Ángel Vicente Sánchez Tue, 17 Jun 2014 10:05:54 -0700

I have been able to submit a job successfully but I had to config my spark
job this way:


  val sparkConf: SparkConf =
    new SparkConf()
      .setAppName("TwitterPopularTags")
      .setMaster("spark://int-spark-master:7077")
      .setSparkHome("/opt/spark")
      .setJars(Seq("/tmp/spark-test-0.1-SNAPSHOT.jar"))

Now I'm getting this error on my worker:

4/06/17 17:03:40 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory




2014-06-17 17:36 GMT+01:00 Luis Ángel Vicente Sánchez <
langel.gro...@gmail.com>:

> Ok... I was checking the wrong version of that file yesterday. My worker
> is sending a DriverStateChanged(_, DriverState.FAILED, _) but there is no
> case branch for that state and the worker is crashing. I still don't know
> why I'm getting a FAILED state but I'm sure that should kill the actor due
> to a scala.MatchError.
>
> Usually in scala is a best-practice to use a sealed trait and case
> classes/objects in a match statement instead of an enumeration (the
> compiler will complain about missing cases); I think that should be
> refactored to catch this kind of errors at compile time.
>
> Now I need to find why that state changed message is sent... I will
> continue updating this thread until I found the problem :D
>
>
> 2014-06-16 18:25 GMT+01:00 Luis Ángel Vicente Sánchez <
> langel.gro...@gmail.com>:
>
> I'm playing with a modified version of the TwitterPopularTags example and
>> when I tried to submit the job to my cluster, workers keep dying with this
>> message:
>>
>> 14/06/16 17:11:16 INFO DriverRunner: Launch Command: "java" "-cp"
>> "/opt/spark-1.0.0-bin-hadoop1/work/driver-20140616171115-0014/spark-test-0.1-SNAPSHOT.jar:::/opt/spark-1.0.0-bin-hadoop1/conf:/opt/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar"
>> "-XX:MaxPermSize=128m" "-Xms512M" "-Xmx512M"
>> "org.apache.spark.deploy.worker.DriverWrapper"
>> "akka.tcp://sparkWorker@int-spark-worker:51676/user/Worker"
>> "org.apache.spark.examples.streaming.TwitterPopularTags"
>> 14/06/16 17:11:17 ERROR OneForOneStrategy: FAILED (of class
>> scala.Enumeration$Val)
>> scala.MatchError: FAILED (of class scala.Enumeration$Val)
>> at
>> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:317)
>>  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>  at
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>  at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>  at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> 14/06/16 17:11:17 INFO Worker: Starting Spark worker
>> int-spark-app-ie005d6a3.mclabs.io:51676 with 2 cores, 6.5 GB RAM
>> 14/06/16 17:11:17 INFO Worker: Spark home: /opt/spark-1.0.0-bin-hadoop1
>> 14/06/16 17:11:17 INFO WorkerWebUI: Started WorkerWebUI at
>> http://int-spark-app-ie005d6a3.mclabs.io:8081
>> 14/06/16 17:11:17 INFO Worker: Connecting to master
>> spark://int-spark-app-ie005d6a3.mclabs.io:7077...
>> 14/06/16 17:11:17 ERROR Worker: Worker registration failed: Attempted to
>> re-register worker at same address: akka.tcp://
>> sparkwor...@int-spark-app-ie005d6a3.mclabs.io:51676
>>
>> This happens when the worker receive a DriverStateChanged(driverId,
>> state, exception) message.
>>
>> To deploy the job I copied the jar file to the temporary folder of master
>> node and execute the following command:
>>
>> ./spark-submit \
>> --class org.apache.spark.examples.streaming.TwitterPopularTags \
>> --master spark://int-spark-master:7077 \
>> --deploy-mode cluster \
>> file:///tmp/spark-test-0.1-SNAPSHOT.jar
>>
>> I don't really know what the problem could be as there is a 'case _' that
>> should avoid that problem :S
>>
>
>

Re: Worker dies while submitting a job

Reply via email to