Well, though randomly chosen, SPARK_CLASSPATH is a recognized env variable that is picked up by spark-submit. That is what was used pre-Spark-1.0, but got deprecated after that. Mind renamign that variable and trying it out again? At least it will reduce one possible source of problem.
TD On Sat, Jun 27, 2015 at 2:32 AM, Guillermo Ortiz <konstt2...@gmail.com> wrote: > I'm checking the logs in YARN and I found this error as well > > Application application_1434976209271_15614 failed 2 times due to AM > Container for appattempt_1434976209271_15614_000002 exited with exitCode: > 255 > > > Diagnostics: Exception from container-launch. > Container id: container_1434976209271_15614_02_000001 > Exit code: 255 > Stack trace: ExitCodeException exitCode=255: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) > at org.apache.hadoop.util.Shell.run(Shell.java:455) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:293) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Shell output: Requested user hdfs is not whitelisted and has id 496,which > is below the minimum allowed 1000 > Container exited with a non-zero exit code 255 > Failing this attempt. Failing the application. > > 2015-06-27 11:25 GMT+02:00 Guillermo Ortiz <konstt2...@gmail.com>: > >> Well SPARK_CLASSPATH it's just a random name, the complete script is this: >> >> export HADOOP_CONF_DIR=/etc/hadoop/conf >> >> SPARK_CLASSPATH="file:/usr/metrics/conf/elasticSearch.properties,file:/usr/metrics/conf/redis.properties,/etc/spark/conf.cloudera.spark_on_yarn/yarn-conf/" >> for lib in `ls /usr/metrics/lib/*.jar` >> do >> if [ -z "$SPARK_CLASSPATH" ]; then >> SPARK_CLASSPATH=$lib >> else >> SPARK_CLASSPATH=$SPARK_CLASSPATH,$lib >> fi >> done >> spark-submit --name "Metrics".... >> >> I need to add all the jars as you know,, maybe it was a bad name >> SPARK_CLASSPATH >> >> The code doesn't have any stateful operation, yo I guess that it¡s okay >> doesn't have checkpoint. I have executed hundres of times thiscode in VM >> from Cloudera and never got this error. >> >> 2015-06-27 11:21 GMT+02:00 Tathagata Das <t...@databricks.com>: >> >>> 1. you need checkpointing mostly for recovering from driver failures, >>> and in some cases also for some stateful operations. >>> >>> 2. Could you try not using the SPARK_CLASSPATH environment variable. >>> >>> TD >>> >>> On Sat, Jun 27, 2015 at 1:00 AM, Guillermo Ortiz <konstt2...@gmail.com> >>> wrote: >>> >>>> I don't have any checkpoint on my code. Really, I don't have to save >>>> any state. It's just a log processing of a PoC. >>>> I have been testing the code in a VM from Cloudera and I never got that >>>> error.. Not it's a real cluster. >>>> >>>> The command to execute Spark >>>> spark-submit --name "PoC Logs" --master yarn-client --class >>>> com.metrics.MetricsSpark --jars $SPARK_CLASSPATH --executor-memory 1g >>>> /usr/metrics/ex/metrics-spark.jar $1 $2 $3 >>>> >>>> val sparkConf = new SparkConf() >>>> val ssc = new StreamingContext(sparkConf, Seconds(5)) >>>> val kafkaParams = Map[String, String]("metadata.broker.list" -> >>>> args(0)) >>>> val topics = args(1).split("\\,") >>>> val directKafkaStream = KafkaUtils.createDirectStream[String, >>>> String, StringDecoder, StringDecoder](ssc, kafkaParams, topics.toSet) >>>> >>>> directKafkaStream.foreachRDD { rdd => >>>> val offsets = rdd.asInstanceOf[HasOffsetRanges].offsetRanges >>>> val documents = rdd.mapPartitionsWithIndex { (i, kafkaEvent) => >>>> ..... >>>> } >>>> >>>> I understand that I just need a checkpoint if I need to recover the >>>> task it something goes wrong, right? >>>> >>>> >>>> 2015-06-27 9:39 GMT+02:00 Tathagata Das <t...@databricks.com>: >>>> >>>>> How are you trying to execute the code again? From checkpoints, or >>>>> otherwise? >>>>> Also cc'ed Hari who may have a better idea of YARN related issues. >>>>> >>>>> On Sat, Jun 27, 2015 at 12:35 AM, Guillermo Ortiz < >>>>> konstt2...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm executing a SparkStreamig code with Kafka. IçThe code was working >>>>>> but today I tried to execute the code again and I got an exception, I >>>>>> dn't >>>>>> know what's it happening. right now , there are no jobs executions on >>>>>> YARN. >>>>>> How could it fix it? >>>>>> >>>>>> Exception in thread "main" org.apache.spark.SparkException: Yarn >>>>>> application has already ended! It might have been killed or unable to >>>>>> launch application master. >>>>>> at >>>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:113) >>>>>> at >>>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59) >>>>>> at >>>>>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) >>>>>> at >>>>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:379) >>>>>> at >>>>>> org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:642) >>>>>> at >>>>>> org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:75) >>>>>> at >>>>>> com.produban.metrics.MetricsTransfInternationalSpark$.main(MetricsTransfInternationalSpark.scala:66) >>>>>> at >>>>>> com.produban.metrics.MetricsTransfInternationalSpark.main(MetricsTransfInternationalSpark.scala) >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>> at >>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>>> at >>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>>> at >>>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) >>>>>> at >>>>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) >>>>>> at >>>>>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) >>>>>> at >>>>>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) >>>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>>>> *15/06/27 09:27:09 ERROR Utils: Uncaught exception in thread delete >>>>>> Spark local dirs* >>>>>> java.lang.NullPointerException >>>>>> at org.apache.spark.storage.DiskBlockManager.org >>>>>> $apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:161) >>>>>> at >>>>>> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:141) >>>>>> at >>>>>> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139) >>>>>> at >>>>>> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139) >>>>>> at >>>>>> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617) >>>>>> at >>>>>> org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139) >>>>>> Exception in thread "delete Spark local dirs" >>>>>> java.lang.NullPointerException >>>>>> at org.apache.spark.storage.DiskBlockManager.org >>>>>> $apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:161) >>>>>> at >>>>>> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:141) >>>>>> at >>>>>> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139) >>>>>> at >>>>>> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139) >>>>>> at >>>>>> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617) >>>>>> at >>>>>> org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139) >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >