Re: Can not set spark dynamic resource allocation

Cui, Weifeng Fri, 27 May 2016 11:54:12 -0700

Sorry to reply this late.

<property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/local/output/logs/nm-log-dir</value>
  </property>


We do not use file://  in the settings, so that should not be the problem. Any 
other guesses?

Weifeng



On 5/20/16, 2:40 PM, "David Newberger" <david.newber...@wandcorp.com> wrote:

>Hi All,
>
>The error you are seeing looks really similar to Spark-13514 to me. I could be 
>wrong though
>
>https://issues.apache.org/jira/browse/SPARK-13514
>
>Can you check yarn.nodemanager.local-dirs  in your YARN configuration for 
>"file://"
>
>
>Cheers!
>David Newberger
>
>-----Original Message-----
>From: Cui, Weifeng [mailto:weife...@a9.com] 
>Sent: Friday, May 20, 2016 4:26 PM
>To: Marcelo Vanzin
>Cc: Ted Yu; Rodrick Brown; user; Zhao, Jun; Aulakh, Sahib; Song, Yiwei
>Subject: Re: Can not set spark dynamic resource allocation
>
>Sorry, here is the node-manager log. application_1463692924309_0002 is my 
>test. Hope this will help.
>http://pastebin.com/0BPEcgcW
>
>
>
>On 5/20/16, 2:09 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:
>
>>Hi Weifeng,
>>
>>That's the Spark event log, not the YARN application log. You get the 
>>latter using the "yarn logs" command.
>>
>>On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote:
>>> Here is the application log for this spark job.
>>>
>>> http://pastebin.com/2UJS9L4e
>>>
>>>
>>>
>>> Thanks,
>>> Weifeng
>>>
>>>
>>>
>>>
>>>
>>> From: "Aulakh, Sahib" <aula...@a9.com>
>>> Date: Friday, May 20, 2016 at 12:43 PM
>>> To: Ted Yu <yuzhih...@gmail.com>
>>> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng 
>>> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun"
>>> <junz...@a9.com>
>>> Subject: Re: Can not set spark dynamic resource allocation
>>>
>>>
>>>
>>> Yes it is yarn. We have configured spark shuffle service w yarn node 
>>> manager but something must be off.
>>>
>>>
>>>
>>> We will send u app log on paste bin.
>>>
>>> Sent from my iPhone
>>>
>>>
>>> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>> Since yarn-site.xml was cited, I assume the cluster runs YARN.
>>>
>>>
>>>
>>> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown 
>>> <rodr...@orchardplatform.com> wrote:
>>>
>>> Is this Yarn or Mesos? For the later you need to start an external 
>>> shuffle service.
>>>
>>> Get Outlook for iOS
>>>
>>>
>>>
>>>
>>>
>>> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" 
>>> <weife...@a9.com>
>>> wrote:
>>>
>>> Hi guys,
>>>
>>>
>>>
>>> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set 
>>> dynamic resource allocation for spark and we followed the following 
>>> link. After the changes, all spark jobs failed.
>>>
>>> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-reso
>>> urce-allocation
>>>
>>> This test was on a test cluster which has 1 master machine (running 
>>> namenode, resourcemanager and hive server), 1 worker machine (running 
>>> datanode and nodemanager) and 1 machine as client( running spark shell).
>>>
>>>
>>>
>>> What I updated in config :
>>>
>>>
>>>
>>> 1. Update in spark-defaults.conf
>>>
>>>         spark.dynamicAllocation.enabled     true
>>>
>>>         spark.shuffle.service.enabled            true
>>>
>>>
>>>
>>> 2. Update yarn-site.xml
>>>
>>>         <property>
>>>
>>>              <name>yarn.nodemanager.aux-services</name>
>>>               <value>mapreduce_shuffle,spark_shuffle</value>
>>>         </property>
>>>
>>>         <property>
>>>             <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
>>>             <value>org.apache.spark.network.yarn.YarnShuffleService</value>
>>>         </property>
>>>
>>>         <property>
>>>             <name>spark.shuffle.service.enabled</name>
>>>              <value>true</value>
>>>         </property>
>>>
>>> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
>>> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>>>
>>> 4. Restart namenode, datanode, resourcemanager, nodemanger... retart 
>>> everything
>>>
>>> 5. The config will update in all machines, resourcemanager and nodemanager.
>>> We update the config in one place and copy to all machines.
>>>
>>>
>>>
>>> What I tested:
>>>
>>>
>>>
>>> 1. I started a scala spark shell and check its environment variables, 
>>> spark.dynamicAllocation.enabled is true.
>>>
>>> 2. I used the following code:
>>>
>>>         scala > val line =
>>> sc.textFile("/spark-events/application_1463681113470_0006")
>>>
>>>                     line: org.apache.spark.rdd.RDD[String] =
>>> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at 
>>> textFile at <console>:27
>>>
>>>         scala > line.count # This command just stuck here
>>>
>>>
>>>
>>> 3. In the beginning, there is only 1 executor(this is for driver) and 
>>> after line.count, I could see 3 executors, then dropped to 1.
>>>
>>> 4. Several jobs were launched and all of them failed.   Tasks (for all
>>> stages): Succeeded/Total : 0/2 (4 failed)
>>>
>>>
>>>
>>> Error messages:
>>>
>>>
>>>
>>> I found the following messages in spark web UI. I found this in 
>>> spark.log on nodemanager machine as well.
>>>
>>>
>>>
>>> ExecutorLostFailure (executor 1 exited caused by one of the running 
>>> tasks)
>>> Reason: Container marked as failed: 
>>> container_1463692924309_0002_01_000002
>>> on host: xxxxxxxxxxxxxxx.com. Exit status: 1. Diagnostics: Exception 
>>> from container-launch.
>>> Container id: container_1463692924309_0002_01_000002
>>> Exit code: 1
>>> Stack trace: ExitCodeException exitCode=1:
>>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>>> at org.apache.hadoop.util.Shell.run(Shell.java:455)
>>> at 
>>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
>>> 715)
>>> at
>>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.la
>>> unchContainer(DefaultContainerExecutor.java:211)
>>> at
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
>>> ontainerLaunch.call(ContainerLaunch.java:302)
>>> at
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
>>> ontainerLaunch.call(ContainerLaunch.java:82)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
>>> java:1142)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>>> .java:617) at java.lang.Thread.run(Thread.java:745)
>>>
>>> Container exited with a non-zero exit code 1
>>>
>>>
>>>
>>> Thanks a lot for help. We can provide more information if needed.
>>>
>>>
>>>
>>> Thanks,
>>> Weifeng
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> NOTICE TO RECIPIENTS: This communication is confidential and intended 
>>> for the use of the addressee only. If you are not an intended 
>>> recipient of this communication, please delete it immediately and 
>>> notify the sender by return email. Unauthorized reading, 
>>> dissemination, distribution or copying of this communication is 
>>> prohibited. This communication does not constitute an offer to sell 
>>> or a solicitation of an indication of interest to purchase any loan, 
>>> security or any other financial product or instrument, nor is it an 
>>> offer to sell or a solicitation of an indication of interest to 
>>> purchase any products or services to any persons who are prohibited 
>>> from receiving such information under applicable law. The contents of 
>>> this communication may not be accurate or complete and are subject to 
>>> change without notice. As such, Orchard App, Inc. (including its 
>>> subsidiaries and affiliates, "Orchard") makes no representation 
>>> regarding the accuracy or completeness of the information contained 
>>> herein. The intended recipient is advised to consult its own 
>>> professional advisors, including those specializing in legal, tax and 
>>> accounting matters. Orchard does not provide legal, tax or accounting 
>>> advice.
>>>
>>>
>>
>>
>>
>>--
>>Marcelo
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>For additional commands, e-mail: user-h...@spark.apache.org
>

Re: Can not set spark dynamic resource allocation

Reply via email to