Still the same. I increased the memory of the node holding resource manager to 5 Gig. I also spotted an HDFS alert of replication factor 3 that I now dropped to the number of data nodes. I also shut all down all services not in use. Still the issue remains.
I have noticed following two events that are fired when I start the Spark run : Zookeeper : caught end of stream exception Yarn : The specific max attempts: 0 for application: 1 is invalid, because it is out of the range [1, 2]. Use the global max attempts -jan On 20 May 2014, at 11:14, Jan Holmberg <jan.holmb...@perigeum.fi<mailto:jan.holmb...@perigeum.fi>> wrote: Hi, each node has 4Gig of memory. After total reboot and re-run of SparkPi resource manager shows no running containers and 1 pending container. -jan On 20 May 2014, at 10:24, <sandy.r...@cloudera.com<mailto:sandy.r...@cloudera.com>> <sandy.r...@cloudera.com<mailto:sandy.r...@cloudera.com>> wrote: Hi Jan, How much memory capacity is configured for each node? If you go to the ResourceManager web UI, does it indicate any containers are running? -Sandy On May 19, 2014, at 11:43 PM, Jan Holmberg <jan.holmb...@perigeum.fi<mailto:jan.holmb...@perigeum.fi>> wrote: Hi, I’m new to Spark and trying to test first Spark prog. I’m running SparkPi successfully in yarn-client -mode but when running the same in yarn-mode, app gets stuck to ACCEPTED phase. I’ve tried hours to hunt down the reason but the outcome is always the same. Any hints what to look for next? cheers, -jan vagrant@vm-cluster-node1:~$ ./run_pi.sh 14/05/20 06:24:04 INFO RMProxy: Connecting to ResourceManager at vm-cluster-node2/10.211.55.101:8032 14/05/20 06:24:05 INFO Client: Got Cluster metric info from ApplicationsManager (ASM), number of NodeManagers: 2 14/05/20 06:24:05 INFO Client: Queue info ... queueName: root.default, queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0, queueApplicationCount = 0, queueChildQueueCount = 0 14/05/20 06:24:05 INFO Client: Max mem capabililty of a single resource in this cluster 2048 14/05/20 06:24:05 INFO Client: Preparing Local resources 14/05/20 06:24:05 INFO Client: Uploading file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/assembly/lib/spark-assembly_2.10-0.9.0-cdh5.0.0-hadoop2.3.0-cdh5.0.0.jar to hdfs://vm-cluster-node2:8020/user/vagrant/.sparkStaging/application_1400563733088_0012/spark-assembly_2.10-0.9.0-cdh5.0.0-hadoop2.3.0-cdh5.0.0.jar 14/05/20 06:24:07 INFO Client: Setting up the launch environment 14/05/20 06:24:07 INFO Client: Setting up container launch context 14/05/20 06:24:07 INFO Client: Command for starting the Spark ApplicationMaster: java -server -Xmx1024m -Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.ApplicationMaster --class org.apache.spark.examples.SparkPi --jar /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/assembly/lib/spark-assembly_2.10-0.9.0-cdh5.0.0-hadoop2.3.0-cdh5.0.0.jar --args 'yarn-standalone' --args '10' --worker-memory 500 --worker-cores 1 --num-workers 1 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr 14/05/20 06:24:07 INFO Client: Submitting application to ASM 14/05/20 06:24:07 INFO YarnClientImpl: Submitted application application_1400563733088_0012 14/05/20 06:24:08 INFO Client: Application report from ASM: <THIS PART GET REPEATING FOREVER> application identifier: application_1400563733088_0012 appId: 12 clientToAMToken: null appDiagnostics: appMasterHost: N/A appQueue: root.vagrant appMasterRpcPort: -1 appStartTime: 1400567047343 yarnAppState: ACCEPTED distributedFinalState: UNDEFINED appTrackingUrl: http://vm-cluster-node2:8088/proxy/application_1400563733088_0012/ appUser: vagrant Log files give me no additional help. Latest log entry just acknowledges the status change: hadoop-yarn/hadoop-cmf-yarn-RESOURCEMANAGER-vm-cluster-node2.log.out:2014-05-20 06:24:07,347 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1400563733088_0012 State change from SUBMITTED to ACCEPTED I’m running the example in local test environment with three virtual nodes in Cloudera (CDH5). Below is the run_pi.sh : #!/bin/bash export SPARK_HOME=/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark export STANDALONE_SPARK_MASTER_HOST=vm-cluster-node2 export SPARK_MASTER_PORT=7077 export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop export SPARK_JAR_HDFS_PATH=/user/spark/share/lib/spark-assembly.jar export SPARK_LAUNCH_WITH_SCALA=0 export SPARK_LIBRARY_PATH=${SPARK_HOME}/lib export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME} if [ -n "$HADOOP_HOME" ]; then export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:${HADOOP_HOME}/lib/native fi export SPARK_JAR=hdfs://vm-cluster-node2:8020/user/spark/share/lib/spark-assembly.jar APP_JAR=/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/assembly/lib/spark-assembly_2.10-0.9.0-cdh5.0.0-hadoop2.3.0-cdh5.0.0.jar $SPARK_HOME/bin/spark-class org.apache.spark.deploy.yarn.Client \ --jar $APP_JAR \ --class org.apache.spark.examples.SparkPi \ --args yarn-standalone \ --args 10 \ --num-workers 1 \ --master-memory 1g \ --worker-memory 500m \ --worker-cores 1