Sorry, here is the node-manager log. application_1463692924309_0002 is my test. 
Hope this will help.
http://pastebin.com/0BPEcgcW



On 5/20/16, 2:09 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:

>Hi Weifeng,
>
>That's the Spark event log, not the YARN application log. You get the
>latter using the "yarn logs" command.
>
>On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng <weife...@a9.com> wrote:
>> Here is the application log for this spark job.
>>
>> http://pastebin.com/2UJS9L4e
>>
>>
>>
>> Thanks,
>> Weifeng
>>
>>
>>
>>
>>
>> From: "Aulakh, Sahib" <aula...@a9.com>
>> Date: Friday, May 20, 2016 at 12:43 PM
>> To: Ted Yu <yuzhih...@gmail.com>
>> Cc: Rodrick Brown <rodr...@orchardplatform.com>, Cui Weifeng
>> <weife...@a9.com>, user <user@spark.apache.org>, "Zhao, Jun"
>> <junz...@a9.com>
>> Subject: Re: Can not set spark dynamic resource allocation
>>
>>
>>
>> Yes it is yarn. We have configured spark shuffle service w yarn node manager
>> but something must be off.
>>
>>
>>
>> We will send u app log on paste bin.
>>
>> Sent from my iPhone
>>
>>
>> On May 20, 2016, at 12:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> Since yarn-site.xml was cited, I assume the cluster runs YARN.
>>
>>
>>
>> On Fri, May 20, 2016 at 12:30 PM, Rodrick Brown
>> <rodr...@orchardplatform.com> wrote:
>>
>> Is this Yarn or Mesos? For the later you need to start an external shuffle
>> service.
>>
>> Get Outlook for iOS
>>
>>
>>
>>
>>
>> On Fri, May 20, 2016 at 11:48 AM -0700, "Cui, Weifeng" <weife...@a9.com>
>> wrote:
>>
>> Hi guys,
>>
>>
>>
>> Our team has a hadoop 2.6.0 cluster with Spark 1.6.1. We want to set dynamic
>> resource allocation for spark and we followed the following link. After the
>> changes, all spark jobs failed.
>>
>> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
>>
>> This test was on a test cluster which has 1 master machine (running
>> namenode, resourcemanager and hive server), 1 worker machine (running
>> datanode and nodemanager) and 1 machine as client( running spark shell).
>>
>>
>>
>> What I updated in config :
>>
>>
>>
>> 1. Update in spark-defaults.conf
>>
>>         spark.dynamicAllocation.enabled     true
>>
>>         spark.shuffle.service.enabled            true
>>
>>
>>
>> 2. Update yarn-site.xml
>>
>>         <property>
>>
>>              <name>yarn.nodemanager.aux-services</name>
>>               <value>mapreduce_shuffle,spark_shuffle</value>
>>         </property>
>>
>>         <property>
>>             <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
>>             <value>org.apache.spark.network.yarn.YarnShuffleService</value>
>>         </property>
>>
>>         <property>
>>             <name>spark.shuffle.service.enabled</name>
>>              <value>true</value>
>>         </property>
>>
>> 3. Copy  spark-1.6.1-yarn-shuffle.jar to yarn.application.classpath
>> ($HADOOP_HOME/share/hadoop/yarn/*) in python code
>>
>> 4. Restart namenode, datanode, resourcemanager, nodemanger... retart
>> everything
>>
>> 5. The config will update in all machines, resourcemanager and nodemanager.
>> We update the config in one place and copy to all machines.
>>
>>
>>
>> What I tested:
>>
>>
>>
>> 1. I started a scala spark shell and check its environment variables,
>> spark.dynamicAllocation.enabled is true.
>>
>> 2. I used the following code:
>>
>>         scala > val line =
>> sc.textFile("/spark-events/application_1463681113470_0006")
>>
>>                     line: org.apache.spark.rdd.RDD[String] =
>> /spark-events/application_1463681113470_0006 MapPartitionsRDD[1] at textFile
>> at <console>:27
>>
>>         scala > line.count # This command just stuck here
>>
>>
>>
>> 3. In the beginning, there is only 1 executor(this is for driver) and after
>> line.count, I could see 3 executors, then dropped to 1.
>>
>> 4. Several jobs were launched and all of them failed.   Tasks (for all
>> stages): Succeeded/Total : 0/2 (4 failed)
>>
>>
>>
>> Error messages:
>>
>>
>>
>> I found the following messages in spark web UI. I found this in spark.log on
>> nodemanager machine as well.
>>
>>
>>
>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>> Reason: Container marked as failed: container_1463692924309_0002_01_000002
>> on host: xxxxxxxxxxxxxxx.com. Exit status: 1. Diagnostics: Exception from
>> container-launch.
>> Container id: container_1463692924309_0002_01_000002
>> Exit code: 1
>> Stack trace: ExitCodeException exitCode=1:
>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>> at org.apache.hadoop.util.Shell.run(Shell.java:455)
>> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>> at
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
>> at
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>> at
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Container exited with a non-zero exit code 1
>>
>>
>>
>> Thanks a lot for help. We can provide more information if needed.
>>
>>
>>
>> Thanks,
>> Weifeng
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> NOTICE TO RECIPIENTS: This communication is confidential and intended for
>> the use of the addressee only. If you are not an intended recipient of this
>> communication, please delete it immediately and notify the sender by return
>> email. Unauthorized reading, dissemination, distribution or copying of this
>> communication is prohibited. This communication does not constitute an offer
>> to sell or a solicitation of an indication of interest to purchase any loan,
>> security or any other financial product or instrument, nor is it an offer to
>> sell or a solicitation of an indication of interest to purchase any products
>> or services to any persons who are prohibited from receiving such
>> information under applicable law. The contents of this communication may not
>> be accurate or complete and are subject to change without notice. As such,
>> Orchard App, Inc. (including its subsidiaries and affiliates, "Orchard")
>> makes no representation regarding the accuracy or completeness of the
>> information contained herein. The intended recipient is advised to consult
>> its own professional advisors, including those specializing in legal, tax
>> and accounting matters. Orchard does not provide legal, tax or accounting
>> advice.
>>
>>
>
>
>
>-- 
>Marcelo


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to