Hi Marcelo, Thanks. I think something more subtle is happening.
I'm running a single-node cluster, so there's only 1 NM. When I executed the exact same job the 4th time, the cluster was idle, and there was nothing else being executed. RM currently reports that I have 6.5GB of memory and 4 cpus available. However, the job is still stuck in the "ACCEPTED" state a day later. Like I mentioned earlier, I'm able to execute Hadoop jobs fine even now - this problem is specific to Spark. Thanks, -Matt On Tue, Jun 9, 2015 at 12:32 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > If your application is stuck in that state, it generally means your > cluster doesn't have enough resources to start it. > > In the RM logs you can see how many vcores / memory the application is > asking for, and then you can check your RM configuration to see if that's > currently available on any single NM. > > On Tue, Jun 9, 2015 at 7:56 AM, Matt Kapilevich <matve...@gmail.com> > wrote: > >> Hi all, >> >> I'm manually building Spark from source against 1.4 branch and submitting >> the job against Yarn. I am seeing very strange behavior. The first 2 or 3 >> times I submit the job, it runs fine, computes Pi, and exits. The next time >> I run it, it gets stuck in the "ACCEPTED" state. >> >> I'm kicking off a job using "yarn-client" mode like this: >> >> ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master >> yarn-client --num-executors 3 --driver-memory 4g --executor-memory >> 2g --executor-cores 1 --queue thequeue >> examples/target/scala-2.10/spark-examples*.jar 10 >> >> Here's what ResourceManager shows:[image: Yarn ResourceManager UI] >> >> In Yarn ResourceManager logs, all I'm seeing is this: >> >> 2015-06-08 14:49:57,166 INFO >> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: >> Added Application Attempt appattempt_1433789077942_0004_000001 to scheduler >> from user: root >> 2015-06-08 14:49:57,166 INFO >> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: >> appattempt_1433789077942_0004_000001 State change from SUBMITTED to >> SCHEDULED >> >> There's nothing in the NodeManager logs (though its up and running), the >> job isn't getting that far. >> >> It seems to me that there's an issue somewhere between Spark 1.4 and Yarn >> integration. Hadoop runs without any issues. I've ran the below multiple >> times. >> >> yarn jar >> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.4.2.jar pi >> 16 100 >> >> For reference, I'm compiling the source against 1.4 branch, and running >> it on a single-node cluster with CDH5.4 and Hadoop 2.6, distributed mode. I >> am using the following to compile: "mvn -Phadoop-2.6 -Dhadoop.version=2.6.0 >> -Pyarn -Phive -Phive-thriftserver -DskipTests clean package" >> >> Any help appreciated. >> >> Thanks, >> -Matt >> > > > > -- > Marcelo >