I found a warn in nodemanager log. is the virtual memory exceed? how should I config yarn to solve this problem?
2016-10-21 10:41:12,588 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20299 for container-id container_1477017445921_0001_02_000001: 335.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used 2016-10-21 10:41:12,589 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_1477017445921_0001_02_000001 has processes older than 1 iteration running over the configured limit. Limit=2254857728, current usage = 2338873344 On Fri, Oct 21, 2016 at 8:49 AM, Saisai Shao <sai.sai.s...@gmail.com> wrote: > It is not Spark has difficulty to communicate with YARN, it simply means AM > is exited with FINISHED state. > > I'm guessing it might be related to memory constraints for container, please > check the yarn RM and NM logs to find out more details. > > Thanks > Saisai > > On Fri, Oct 21, 2016 at 8:14 AM, Xi Shen <davidshe...@gmail.com> wrote: >> >> 16/10/20 18:12:14 ERROR cluster.YarnClientSchedulerBackend: Yarn >> application has already exited with state FINISHED! >> >> From this, I think it is spark has difficult communicating with YARN. You >> should check your Spark log. >> >> >> On Fri, Oct 21, 2016 at 8:06 AM Li Li <fancye...@gmail.com> wrote: >>> >>> which log file should I >>> >>> On Thu, Oct 20, 2016 at 10:02 PM, Saisai Shao <sai.sai.s...@gmail.com> >>> wrote: >>> > Looks like ApplicationMaster is killed by SIGTERM. >>> > >>> > 16/10/20 18:12:04 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM >>> > 16/10/20 18:12:04 INFO yarn.ApplicationMaster: Final app status: >>> > >>> > This container may be killed by yarn NodeManager or other processes, >>> > you'd >>> > better check yarn log to dig out more details. >>> > >>> > Thanks >>> > Saisai >>> > >>> > On Thu, Oct 20, 2016 at 6:51 PM, Li Li <fancye...@gmail.com> wrote: >>> >> >>> >> I am setting up a small yarn/spark cluster. hadoop/yarn version is >>> >> 2.7.3 and I can run wordcount map-reduce correctly in yarn. >>> >> And I am using spark-2.0.1-bin-hadoop2.7 using command: >>> >> ~/spark-2.0.1-bin-hadoop2.7$ ./bin/spark-submit --class >>> >> org.apache.spark.examples.SparkPi --master yarn-client >>> >> examples/jars/spark-examples_2.11-2.0.1.jar 10000 >>> >> it fails and the first error is: >>> >> 16/10/20 18:12:03 INFO storage.BlockManagerMaster: Registered >>> >> BlockManager BlockManagerId(driver, 10.161.219.189, 39161) >>> >> 16/10/20 18:12:03 INFO handler.ContextHandler: Started >>> >> o.s.j.s.ServletContextHandler@76ad6715{/metrics/json,null,AVAILABLE} >>> >> 16/10/20 18:12:12 INFO >>> >> cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster >>> >> registered as NettyRpcEndpointRef(null) >>> >> 16/10/20 18:12:12 INFO cluster.YarnClientSchedulerBackend: Add WebUI >>> >> Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, >>> >> Map(PROXY_HOSTS -> ai-hz1-spark1, PROXY_URI_BASES -> >>> >> http://ai-hz1-spark1:8088/proxy/application_1476957324184_0002), >>> >> /proxy/application_1476957324184_0002 >>> >> 16/10/20 18:12:12 INFO ui.JettyUtils: Adding filter: >>> >> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter >>> >> 16/10/20 18:12:12 INFO cluster.YarnClientSchedulerBackend: >>> >> SchedulerBackend is ready for scheduling beginning after waiting >>> >> maxRegisteredResourcesWaitingTime: 30000(ms) >>> >> 16/10/20 18:12:12 WARN spark.SparkContext: Use an existing >>> >> SparkContext, some configuration may not take effect. >>> >> 16/10/20 18:12:12 INFO handler.ContextHandler: Started >>> >> o.s.j.s.ServletContextHandler@489091bd{/SQL,null,AVAILABLE} >>> >> 16/10/20 18:12:12 INFO handler.ContextHandler: Started >>> >> o.s.j.s.ServletContextHandler@1de9b505{/SQL/json,null,AVAILABLE} >>> >> 16/10/20 18:12:12 INFO handler.ContextHandler: Started >>> >> o.s.j.s.ServletContextHandler@378f002a{/SQL/execution,null,AVAILABLE} >>> >> 16/10/20 18:12:12 INFO handler.ContextHandler: Started >>> >> >>> >> o.s.j.s.ServletContextHandler@2cc75074{/SQL/execution/json,null,AVAILABLE} >>> >> 16/10/20 18:12:12 INFO handler.ContextHandler: Started >>> >> o.s.j.s.ServletContextHandler@2d64160c{/static/sql,null,AVAILABLE} >>> >> 16/10/20 18:12:12 INFO internal.SharedState: Warehouse path is >>> >> '/home/hadoop/spark-2.0.1-bin-hadoop2.7/spark-warehouse'. >>> >> 16/10/20 18:12:13 INFO spark.SparkContext: Starting job: reduce at >>> >> SparkPi.scala:38 >>> >> 16/10/20 18:12:13 INFO scheduler.DAGScheduler: Got job 0 (reduce at >>> >> SparkPi.scala:38) with 10000 output partitions >>> >> 16/10/20 18:12:13 INFO scheduler.DAGScheduler: Final stage: >>> >> ResultStage 0 (reduce at SparkPi.scala:38) >>> >> 16/10/20 18:12:13 INFO scheduler.DAGScheduler: Parents of final stage: >>> >> List() >>> >> 16/10/20 18:12:13 INFO scheduler.DAGScheduler: Missing parents: List() >>> >> 16/10/20 18:12:13 INFO scheduler.DAGScheduler: Submitting ResultStage >>> >> 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no >>> >> missing parents >>> >> 16/10/20 18:12:13 INFO memory.MemoryStore: Block broadcast_0 stored as >>> >> values in memory (estimated size 1832.0 B, free 366.3 MB) >>> >> 16/10/20 18:12:13 INFO memory.MemoryStore: Block broadcast_0_piece0 >>> >> stored as bytes in memory (estimated size 1169.0 B, free 366.3 MB) >>> >> 16/10/20 18:12:13 INFO storage.BlockManagerInfo: Added >>> >> broadcast_0_piece0 in memory on 10.161.219.189:39161 (size: 1169.0 B, >>> >> free: 366.3 MB) >>> >> 16/10/20 18:12:13 INFO spark.SparkContext: Created broadcast 0 from >>> >> broadcast at DAGScheduler.scala:1012 >>> >> 16/10/20 18:12:13 INFO scheduler.DAGScheduler: Submitting 10000 >>> >> missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at >>> >> SparkPi.scala:34) >>> >> 16/10/20 18:12:13 INFO cluster.YarnScheduler: Adding task set 0.0 with >>> >> 10000 tasks >>> >> 16/10/20 18:12:14 ERROR cluster.YarnClientSchedulerBackend: Yarn >>> >> application has already exited with state FINISHED! >>> >> 16/10/20 18:12:14 INFO server.ServerConnector: Stopped >>> >> ServerConnector@389adf1d{HTTP/1.1}{0.0.0.0:4040} >>> >> 16/10/20 18:12:14 INFO handler.ContextHandler: Stopped >>> >> >>> >> o.s.j.s.ServletContextHandler@841e575{/stages/stage/kill,null,UNAVAILABLE} >>> >> 16/10/20 18:12:14 INFO handler.ContextHandler: Stopped >>> >> o.s.j.s.ServletContextHandler@66629f63{/api,null,UNAVAILABLE} >>> >> 16/10/20 18:12:14 INFO handler.ContextHandler: Stopped >>> >> o.s.j.s.ServletContextHandler@2b62442c{/,null,UNAVAILABLE} >>> >> >>> >> >>> >> I also use yarn log to get logs from yarn(total log is very lengthy in >>> >> attachement): >>> >> 16/10/20 18:12:03 INFO yarn.ExecutorRunnable: >>> >> >>> >> >>> >> =============================================================================== >>> >> YARN executor launch context: >>> >> env: >>> >> CLASSPATH -> >>> >> >>> >> >>> >> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/* >>> >> SPARK_LOG_URL_STDERR -> >>> >> >>> >> >>> >> http://ai-hz1-spark3:8042/node/containerlogs/container_1476957324184_0002_01_000003/hadoop/stderr?start=-4096 >>> >> SPARK_YARN_STAGING_DIR -> >>> >> >>> >> >>> >> hdfs://ai-hz1-spark1/user/hadoop/.sparkStaging/application_1476957324184_0002 >>> >> SPARK_USER -> hadoop >>> >> SPARK_YARN_MODE -> true >>> >> SPARK_LOG_URL_STDOUT -> >>> >> >>> >> >>> >> http://ai-hz1-spark3:8042/node/containerlogs/container_1476957324184_0002_01_000003/hadoop/stdout?start=-4096 >>> >> >>> >> command: >>> >> {{JAVA_HOME}}/bin/java -server -Xmx1024m >>> >> -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.driver.port=60657' >>> >> -Dspark.yarn.app.container.log.dir=<LOG_DIR> >>> >> -XX:OnOutOfMemoryError='kill %p' >>> >> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url >>> >> spark://CoarseGrainedScheduler@10.161.219.189:60657 --executor-id 2 >>> >> --hostname ai-hz1-spark3 --cores 1 --app-id >>> >> application_1476957324184_0002 --user-class-path file:$PWD/__app__.jar >>> >> 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr >>> >> >>> >> >>> >> =============================================================================== >>> >> >>> >> 16/10/20 18:12:03 INFO impl.ContainerManagementProtocolProxy: Opening >>> >> proxy : ai-hz1-spark5:55857 >>> >> 16/10/20 18:12:03 INFO impl.ContainerManagementProtocolProxy: Opening >>> >> proxy : ai-hz1-spark3:51061 >>> >> 16/10/20 18:12:04 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM >>> >> 16/10/20 18:12:04 INFO yarn.ApplicationMaster: Final app status: >>> >> UNDEFINED, exitCode: 16, (reason: Shutdown hook called before final >>> >> status was reported.) >>> >> 16/10/20 18:12:04 INFO util.ShutdownHookManager: Shutdown hook called >>> >> >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> > >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >> -- >> >> >> Thanks, >> David S. > > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org