Hi Weifeng, That's the Spark event log, not the YARN application log. You get the latter using the "yarn logs" command.
On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote: > Here is the application log for this spark job. > > http://pastebin.com/2UJS9L4e > > > > Thanks, > Weifeng > > > > > > From: "Aulakh, Sahib" <aula...@a9.com> > Date: Friday, May 20, 2016 at 12:43 PM > To: Ted Yu <yuzhih...@gmail.com> > Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng > <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun" > <junz...@a9.com> > Subject: Re: Can not set spark dynamic resource allocation > > > > Yes it is yarn. We have configured spark shuffle service w yarn node manager > but something must be off. > > > > We will send u app log on paste bin. > > Sent from my iPhone > > > On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > Since yarn-site.xml was cited, I assume the cluster runs YARN. > > > > On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown > <rodr...@orchardplatform.com> wrote: > > Is this Yarn or Mesos? For the later you need to start an external shuffle > service. > > Get Outlook for iOS > > > > > > On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" <weife...@a9.com> > wrote: > > Hi guys, > > > > Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic > resource allocation for spark and we followed the following link. After the > changes, all spark jobs failed. > > https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation > > This test was on a test cluster which has 1 master machine (running > namenode, resourcemanager and hive server), 1 worker machine (running > datanode and nodemanager) and 1 machine as client( running spark shell). > > > > What I updated in config : > > > > 1. Update in spark-defaults.conf > > spark.dynamicAllocation.enabled true > > spark.shuffle.service.enabled true > > > > 2. Update yarn-site.xml > > <property> > > <name>yarn.nodemanager.aux-services</name> > <value>mapreduce_shuffle,spark_shuffle</value> > </property> > > <property> > <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> > <value>org.apache.spark.network.yarn.YarnShuffleService</value> > </property> > > <property> > <name>spark.shuffle.service.enabled</name> > <value>true</value> > </property> > > 3. Copy spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath > ($HADOOP_HOME/share/hadoop/yarn/*) in python code > > 4. Restart namenode, datanode, resourcemanager, nodemanger... retart > everything > > 5. The config will update in all machines, resourcemanager and nodemanager. > We update the config in one place and copy to all machines. > > > > What I tested: > > > > 1. I started a scala spark shell and check its environment variables, > spark.dynamicAllocation.enabled is true. > > 2. I used the following code: > > scala > val line = > sc.textFile("/spark-events/application_1463681113470_0006") > > line: org.apache.spark.rdd.RDD[String] = > /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile > at <console>:27 > > scala > line.count # This command just stuck here > > > > 3. In the beginning, there is only 1 executor(this is for driver) and after > line.count, I could see 3 executors, then dropped to 1. > > 4. Several jobs were launched and all of them failed. Tasks (for all > stages): Succeeded/Total : 0/2 (4 failed) > > > > Error messages: > > > > I found the following messages in spark web UI. I found this in spark.log on > nodemanager machine as well. > > > > ExecutorLostFailure (executor 1 exited caused by one of the running tasks) > Reason: Container marked as failed: container_1463692924309_0002_01_000002 > on host: xxxxxxxxxxxxxxx.com. Exit status: 1. Diagnostics: Exception from > container-launch. > Container id: container_1463692924309_0002_01_000002 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) > at org.apache.hadoop.util.Shell.run(Shell.java:455) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > Container exited with a non-zero exit code 1 > > > > Thanks a lot for help. We can provide more information if needed. > > > > Thanks, > Weifeng > > > > > > > > > > > > > > NOTICE TO RECIPIENTS: This communication is confidential and intended for > the use of the addressee only. If you are not an intended recipient of this > communication, please delete it immediately and notify the sender by return > email. Unauthorized reading, dissemination, distribution or copying of this > communication is prohibited. This communication does not constitute an offer > to sell or a solicitation of an indication of interest to purchase any loan, > security or any other financial product or instrument, nor is it an offer to > sell or a solicitation of an indication of interest to purchase any products > or services to any persons who are prohibited from receiving such > information under applicable law. The contents of this communication may not > be accurate or complete and are subject to change without notice. As such, > Orchard App, Inc. (including its subsidiaries and affiliates, "Orchard") > makes no representation regarding the accuracy or completeness of the > information contained herein. The intended recipient is advised to consult > its own professional advisors, including those specializing in legal, tax > and accounting matters. Orchard does not provide legal, tax or accounting > advice. > > -- Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org