RE: Can not set spark dynamic resource allocation

David Newberger Fri, 20 May 2016 14:42:04 -0700

Hi All,

The error you are seeing looks really similar to Spark-13514 to me. I could be 
wrong though


https://issues.apache.org/jira/browse/SPARK-13514

Can you check yarn.nodemanager.local-dirs  in your YARN configuration for 
"file://"


Cheers!
David Newberger

-----Original Message-----
From: Cui, Weifeng [mailto:weife...@a9.com] 
Sent: Friday, May 20, 2016 4:26 PM
To: Marcelo Vanzin
Cc: Ted Yu; Rodrick Brown; user; Zhao, Jun; Aulakh, Sahib; Song, Yiwei
Subject: Re: Can not set spark dynamic resource allocation

Sorry, here is the node-manager log. application_1463692924309_0002 is my test. 
Hope this will help.
http://pastebin.com/0BPEcgcW



On 5/20/16, 2:09 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:

>Hi Weifeng,
>
>That's the Spark event log, not the YARN application log. You get the 
>latter using the "yarn logs" command.
>
>On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote:
>> Here is the application log for this spark job.
>>
>> http://pastebin.com/2UJS9L4e
>>
>>
>>
>> Thanks,
>> Weifeng
>>
>>
>>
>>
>>
>> From: "Aulakh, Sahib" <aula...@a9.com>
>> Date: Friday, May 20, 2016 at 12:43 PM
>> To: Ted Yu <yuzhih...@gmail.com>
>> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng 
>> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun"
>> <junz...@a9.com>
>> Subject: Re: Can not set spark dynamic resource allocation
>>
>>
>>
>> Yes it is yarn. We have configured spark shuffle service w yarn node 
>> manager but something must be off.
>>
>>
>>
>> We will send u app log on paste bin.
>>
>> Sent from my iPhone
>>
>>
>> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> Since yarn-site.xml was cited, I assume the cluster runs YARN.
>>
>>
>>
>> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown 
>> <rodr...@orchardplatform.com> wrote:
>>
>> Is this Yarn or Mesos? For the later you need to start an external 
>> shuffle service.
>>
>> Get Outlook for iOS
>>
>>
>>
>>
>>
>> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" 
>> <weife...@a9.com>
>> wrote:
>>
>> Hi guys,
>>
>>
>>
>> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set 
>> dynamic resource allocation for spark and we followed the following 
>> link. After the changes, all spark jobs failed.
>>
>> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-reso
>> urce-allocation
>>
>> This test was on a test cluster which has 1 master machine (running 
>> namenode, resourcemanager and hive server), 1 worker machine (running 
>> datanode and nodemanager) and 1 machine as client( running spark shell).
>>
>>
>>
>> What I updated in config :
>>
>>
>>
>> 1. Update in spark-defaults.conf
>>
>>         spark.dynamicAllocation.enabled     true
>>
>>         spark.shuffle.service.enabled            true
>>
>>
>>
>> 2. Update yarn-site.xml
>>
>>         <property>
>>
>>              <name>yarn.nodemanager.aux-services</name>
>>               <value>mapreduce_shuffle,spark_shuffle</value>
>>         </property>
>>
>>         <property>
>>             <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
>>             <value>org.apache.spark.network.yarn.YarnShuffleService</value>
>>         </property>
>>
>>         <property>
>>             <name>spark.shuffle.service.enabled</name>
>>              <value>true</value>
>>         </property>
>>
>> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
>> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>>
>> 4. Restart namenode, datanode, resourcemanager, nodemanger... retart 
>> everything
>>
>> 5. The config will update in all machines, resourcemanager and nodemanager.
>> We update the config in one place and copy to all machines.
>>
>>
>>
>> What I tested:
>>
>>
>>
>> 1. I started a scala spark shell and check its environment variables, 
>> spark.dynamicAllocation.enabled is true.
>>
>> 2. I used the following code:
>>
>>         scala > val line =
>> sc.textFile("/spark-events/application_1463681113470_0006")
>>
>>                     line: org.apache.spark.rdd.RDD[String] =
>> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at 
>> textFile at <console>:27
>>
>>         scala > line.count # This command just stuck here
>>
>>
>>
>> 3. In the beginning, there is only 1 executor(this is for driver) and 
>> after line.count, I could see 3 executors, then dropped to 1.
>>
>> 4. Several jobs were launched and all of them failed.   Tasks (for all
>> stages): Succeeded/Total : 0/2 (4 failed)
>>
>>
>>
>> Error messages:
>>
>>
>>
>> I found the following messages in spark web UI. I found this in 
>> spark.log on nodemanager machine as well.
>>
>>
>>
>> ExecutorLostFailure (executor 1 exited caused by one of the running 
>> tasks)
>> Reason: Container marked as failed: 
>> container_1463692924309_0002_01_000002
>> on host: xxxxxxxxxxxxxxx.com. Exit status: 1. Diagnostics: Exception 
>> from container-launch.
>> Container id: container_1463692924309_0002_01_000002
>> Exit code: 1
>> Stack trace: ExitCodeException exitCode=1:
>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>> at org.apache.hadoop.util.Shell.run(Shell.java:455)
>> at 
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
>> 715)
>> at
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.la
>> unchContainer(DefaultContainerExecutor.java:211)
>> at
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
>> ontainerLaunch.call(ContainerLaunch.java:302)
>> at
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
>> ontainerLaunch.call(ContainerLaunch.java:82)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
>> java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> .java:617) at java.lang.Thread.run(Thread.java:745)
>>
>> Container exited with a non-zero exit code 1
>>
>>
>>
>> Thanks a lot for help. We can provide more information if needed.
>>
>>
>>
>> Thanks,
>> Weifeng
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> NOTICE TO RECIPIENTS: This communication is confidential and intended 
>> for the use of the addressee only. If you are not an intended 
>> recipient of this communication, please delete it immediately and 
>> notify the sender by return email. Unauthorized reading, 
>> dissemination, distribution or copying of this communication is 
>> prohibited. This communication does not constitute an offer to sell 
>> or a solicitation of an indication of interest to purchase any loan, 
>> security or any other financial product or instrument, nor is it an 
>> offer to sell or a solicitation of an indication of interest to 
>> purchase any products or services to any persons who are prohibited 
>> from receiving such information under applicable law. The contents of 
>> this communication may not be accurate or complete and are subject to 
>> change without notice. As such, Orchard App, Inc. (including its 
>> subsidiaries and affiliates, "Orchard") makes no representation 
>> regarding the accuracy or completeness of the information contained 
>> herein. The intended recipient is advised to consult its own 
>> professional advisors, including those specializing in legal, tax and 
>> accounting matters. Orchard does not provide legal, tax or accounting advice.
>>
>>
>
>
>
>--
>Marcelo


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: Can not set spark dynamic resource allocation

Reply via email to