Hi All, The error you are seeing looks really similar to Spark-13514 to me. I could be wrong though
https://issues.apache.org/jira/browse/SPARK-13514 Can you check yarn.nodemanager.local-dirs in your YARN configuration for "file://" Cheers! David Newberger -----Original Message----- From: Cui, Weifeng [mailto:weife...@a9.com] Sent: Friday, May 20, 2016 4:26 PM To: Marcelo Vanzin Cc: Ted Yu; Rodrick Brown; user; Zhao, Jun; Aulakh, Sahib; Song, Yiwei Subject: Re: Can not set spark dynamic resource allocation Sorry, here is the node-manager log. application_1463692924309_0002 is my test. Hope this will help. http://pastebin.com/0BPEcgcW On 5/20/16, 2:09 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote: >Hi Weifeng, > >That's the Spark event log, not the YARN application log. You get the >latter using the "yarn logs" command. > >On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote: >> Here is the application log for this spark job. >> >> http://pastebin.com/2UJS9L4e >> >> >> >> Thanks, >> Weifeng >> >> >> >> >> >> From: "Aulakh, Sahib" <aula...@a9.com> >> Date: Friday, May 20, 2016 at 12:43 PM >> To: Ted Yu <yuzhih...@gmail.com> >> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng >> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun" >> <junz...@a9.com> >> Subject: Re: Can not set spark dynamic resource allocation >> >> >> >> Yes it is yarn. We have configured spark shuffle service w yarn node >> manager but something must be off. >> >> >> >> We will send u app log on paste bin. >> >> Sent from my iPhone >> >> >> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >> Since yarn-site.xml was cited, I assume the cluster runs YARN. >> >> >> >> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown >> <rodr...@orchardplatform.com> wrote: >> >> Is this Yarn or Mesos? For the later you need to start an external >> shuffle service. >> >> Get Outlook for iOS >> >> >> >> >> >> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" >> <weife...@a9.com> >> wrote: >> >> Hi guys, >> >> >> >> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set >> dynamic resource allocation for spark and we followed the following >> link. After the changes, all spark jobs failed. >> >> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-reso >> urce-allocation >> >> This test was on a test cluster which has 1 master machine (running >> namenode, resourcemanager and hive server), 1 worker machine (running >> datanode and nodemanager) and 1 machine as client( running spark shell). >> >> >> >> What I updated in config : >> >> >> >> 1. Update in spark-defaults.conf >> >> spark.dynamicAllocation.enabled true >> >> spark.shuffle.service.enabled true >> >> >> >> 2. Update yarn-site.xml >> >> <property> >> >> <name>yarn.nodemanager.aux-services</name> >> <value>mapreduce_shuffle,spark_shuffle</value> >> </property> >> >> <property> >> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> >> <value>org.apache.spark.network.yarn.YarnShuffleService</value> >> </property> >> >> <property> >> <name>spark.shuffle.service.enabled</name> >> <value>true</value> >> </property> >> >> 3. Copy spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath >> ($HADOOP_HOME/share/hadoop/yarn/*) in python code >> >> 4. Restart namenode, datanode, resourcemanager, nodemanger... retart >> everything >> >> 5. The config will update in all machines, resourcemanager and nodemanager. >> We update the config in one place and copy to all machines. >> >> >> >> What I tested: >> >> >> >> 1. I started a scala spark shell and check its environment variables, >> spark.dynamicAllocation.enabled is true. >> >> 2. I used the following code: >> >> scala > val line = >> sc.textFile("/spark-events/application_1463681113470_0006") >> >> line: org.apache.spark.rdd.RDD[String] = >> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at >> textFile at <console>:27 >> >> scala > line.count # This command just stuck here >> >> >> >> 3. In the beginning, there is only 1 executor(this is for driver) and >> after line.count, I could see 3 executors, then dropped to 1. >> >> 4. Several jobs were launched and all of them failed. Tasks (for all >> stages): Succeeded/Total : 0/2 (4 failed) >> >> >> >> Error messages: >> >> >> >> I found the following messages in spark web UI. I found this in >> spark.log on nodemanager machine as well. >> >> >> >> ExecutorLostFailure (executor 1 exited caused by one of the running >> tasks) >> Reason: Container marked as failed: >> container_1463692924309_0002_01_000002 >> on host: xxxxxxxxxxxxxxx.com. Exit status: 1. Diagnostics: Exception >> from container-launch. >> Container id: container_1463692924309_0002_01_000002 >> Exit code: 1 >> Stack trace: ExitCodeException exitCode=1: >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) >> at org.apache.hadoop.util.Shell.run(Shell.java:455) >> at >> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java: >> 715) >> at >> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.la >> unchContainer(DefaultContainerExecutor.java:211) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C >> ontainerLaunch.call(ContainerLaunch.java:302) >> at >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C >> ontainerLaunch.call(ContainerLaunch.java:82) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. >> java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor >> .java:617) at java.lang.Thread.run(Thread.java:745) >> >> Container exited with a non-zero exit code 1 >> >> >> >> Thanks a lot for help. We can provide more information if needed. >> >> >> >> Thanks, >> Weifeng >> >> >> >> >> >> >> >> >> >> >> >> >> >> NOTICE TO RECIPIENTS: This communication is confidential and intended >> for the use of the addressee only. If you are not an intended >> recipient of this communication, please delete it immediately and >> notify the sender by return email. Unauthorized reading, >> dissemination, distribution or copying of this communication is >> prohibited. This communication does not constitute an offer to sell >> or a solicitation of an indication of interest to purchase any loan, >> security or any other financial product or instrument, nor is it an >> offer to sell or a solicitation of an indication of interest to >> purchase any products or services to any persons who are prohibited >> from receiving such information under applicable law. The contents of >> this communication may not be accurate or complete and are subject to >> change without notice. As such, Orchard App, Inc. (including its >> subsidiaries and affiliates, "Orchard") makes no representation >> regarding the accuracy or completeness of the information contained >> herein. The intended recipient is advised to consult its own >> professional advisors, including those specializing in legal, tax and >> accounting matters. Orchard does not provide legal, tax or accounting advice. >> >> > > > >-- >Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org