SPARK_CLASSPATH is nice, spark.jars needs to list all the jars one by one when submitting to yarn because spark.driver.classpath and spark.executor.classpath are not available in yarn mode. Can someone remove the warnning from the code or upload the jar in spark.driver.classpath and spark.executor.classpath? ?
------------------ 原始邮件 ------------------ 发件人: "Tathagata Das";<t...@databricks.com>; 发送时间: 2015年6月27日(星期六) 下午5:36 收件人: "Guillermo Ortiz"<konstt2...@gmail.com>; 抄送: "Hari Shreedharan"<hshreedha...@cloudera.com>; "user"<user@spark.apache.org>; 主题: Re: Uncaught exception in thread delete Spark local dirs Well, though randomly chosen, SPARK_CLASSPATH is a recognized env variable that is picked up by spark-submit. That is what was used pre-Spark-1.0, but got deprecated after that. Mind renamign that variable and trying it out again? At least it will reduce one possible source of problem. TD On Sat, Jun 27, 2015 at 2:32 AM, Guillermo Ortiz <konstt2...@gmail.com> wrote: I'm checking the logs in YARN and I found this error as well Application application_1434976209271_15614 failed 2 times due to AM Container for appattempt_1434976209271_15614_000002 exited with exitCode: 255 Diagnostics: Exception from container-launch. Container id: container_1434976209271_15614_02_000001 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:293) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Shell output: Requested user hdfs is not whitelisted and has id 496,which is below the minimum allowed 1000 Container exited with a non-zero exit code 255 Failing this attempt. Failing the application. 2015-06-27 11:25 GMT+02:00 Guillermo Ortiz <konstt2...@gmail.com>: Well SPARK_CLASSPATH it's just a random name, the complete script is this: export HADOOP_CONF_DIR=/etc/hadoop/conf SPARK_CLASSPATH="file:/usr/metrics/conf/elasticSearch.properties,file:/usr/metrics/conf/redis.properties,/etc/spark/conf.cloudera.spark_on_yarn/yarn-conf/" for lib in `ls /usr/metrics/lib/*.jar` do if [ -z "$SPARK_CLASSPATH" ]; then SPARK_CLASSPATH=$lib else SPARK_CLASSPATH=$SPARK_CLASSPATH,$lib fi done spark-submit --name "Metrics".... I need to add all the jars as you know,, maybe it was a bad name SPARK_CLASSPATH The code doesn't have any stateful operation, yo I guess that it¡s okay doesn't have checkpoint. I have executed hundres of times thiscode in VM from Cloudera and never got this error. 2015-06-27 11:21 GMT+02:00 Tathagata Das <t...@databricks.com>: 1. you need checkpointing mostly for recovering from driver failures, and in some cases also for some stateful operations. 2. Could you try not using the SPARK_CLASSPATH environment variable. TD On Sat, Jun 27, 2015 at 1:00 AM, Guillermo Ortiz <konstt2...@gmail.com> wrote: I don't have any checkpoint on my code. Really, I don't have to save any state. It's just a log processing of a PoC.I have been testing the code in a VM from Cloudera and I never got that error.. Not it's a real cluster. The command to execute Spark spark-submit --name "PoC Logs" --master yarn-client --class com.metrics.MetricsSpark --jars $SPARK_CLASSPATH --executor-memory 1g /usr/metrics/ex/metrics-spark.jar $1 $2 $3 val sparkConf = new SparkConf() val ssc = new StreamingContext(sparkConf, Seconds(5)) val kafkaParams = Map[String, String]("metadata.broker.list" -> args(0)) val topics = args(1).split("\\,") val directKafkaStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics.toSet) directKafkaStream.foreachRDD { rdd => val offsets = rdd.asInstanceOf[HasOffsetRanges].offsetRanges val documents = rdd.mapPartitionsWithIndex { (i, kafkaEvent) => ..... } I understand that I just need a checkpoint if I need to recover the task it something goes wrong, right? 2015-06-27 9:39 GMT+02:00 Tathagata Das <t...@databricks.com>: How are you trying to execute the code again? From checkpoints, or otherwise?Also cc'ed Hari who may have a better idea of YARN related issues. On Sat, Jun 27, 2015 at 12:35 AM, Guillermo Ortiz <konstt2...@gmail.com> wrote: Hi, I'm executing a SparkStreamig code with Kafka. IçThe code was working but today I tried to execute the code again and I got an exception, I dn't know what's it happening. right now , there are no jobs executions on YARN. How could it fix it? Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:113) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) at org.apache.spark.SparkContext.<init>(SparkContext.scala:379) at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:642) at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:75) at com.produban.metrics.MetricsTransfInternationalSpark$.main(MetricsTransfInternationalSpark.scala:66) at com.produban.metrics.MetricsTransfInternationalSpark.main(MetricsTransfInternationalSpark.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/06/27 09:27:09 ERROR Utils: Uncaught exception in thread delete Spark local dirs java.lang.NullPointerException at org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:161) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:141) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617) at org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139) Exception in thread "delete Spark local dirs" java.lang.NullPointerException at org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:161) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:141) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617) at org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139)