Is all NodeManager services restarted after the change in yarn-site.xml On Thu, Mar 3, 2016 at 6:00 AM, Jeff Zhang <zjf...@gmail.com> wrote:
> The executor may fail to start. You need to check the executor logs, if > there's no executor log then you need to check node manager log. > > On Wed, Mar 2, 2016 at 4:26 PM, Xiaoye Sun <sunxiaoy...@gmail.com> wrote: > >> Hi all, >> >> I am very new to spark and yarn. >> >> I am running a BroadcastTest example application using spark 1.6.0 and >> Hadoop/Yarn 2.7.1. in a 5 nodes cluster. >> >> I configured my configuration files according to >> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation >> >> 1. copy >> ./spark-1.6.0/network/yarn/target/scala-2.10/spark-1.6.0-yarn-shuffle.jar >> to /hadoop-2.7.1/share/hadoop/yarn/lib/ >> 2. yarn-site.xml is like this >> http://www.owlnet.rice.edu/~xs6/yarn-site.xml >> 3. spark-defaults.conf is like this >> http://www.owlnet.rice.edu/~xs6/spark-defaults.conf >> 4. spark-env.sh is like this http://www.owlnet.rice.edu/~xs6/spark-env.sh >> 5. the command I use to submit spark application is: ./bin/spark-submit >> --class org.apache.spark.examples.BroadcastTest --master yarn --deploy-mode >> cluster ./examples/target/spark-examples_2.10-1.6.0.jar 1 10000000 Http >> >> However, the job is stuck at RUNNING status, and by looking at the log, I >> found that the executor is failed/cancelled frequently... >> Here is the log output http://www.owlnet.rice.edu/~xs6/stderr >> It shows something like >> >> 16/03/02 02:07:35 WARN yarn.YarnAllocator: Container marked as failed: >> container_1456905762620_0002_01_000002 on host: bold-x.rice.edu. Exit >> status: 1. Diagnostics: Exception from container-launch. >> >> >> Is there anybody know what is the problem here? >> Best, >> Xiaoye >> > > > > -- > Best Regards > > Jeff Zhang >