Are you using the capacity scheduler or fifo scheduler without multi resource scheduling by any chance?
On Thu, Feb 12, 2015 at 1:51 PM, Anders Arpteg <arp...@spotify.com> wrote: > The nm logs only seems to contain similar to the following. Nothing else > in the same time range. Any help? > > 2015-02-12 20:47:31,245 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Event EventType: KILL_CONTAINER sent to absent container > container_1422406067005_0053_01_000002 > 2015-02-12 20:47:31,246 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Event EventType: KILL_CONTAINER sent to absent container > container_1422406067005_0053_01_000012 > 2015-02-12 20:47:31,246 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Event EventType: KILL_CONTAINER sent to absent container > container_1422406067005_0053_01_000022 > 2015-02-12 20:47:31,246 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Event EventType: KILL_CONTAINER sent to absent container > container_1422406067005_0053_01_000032 > 2015-02-12 20:47:31,246 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Event EventType: KILL_CONTAINER sent to absent container > container_1422406067005_0053_01_000042 > 2015-02-12 21:24:30,515 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Event EventType: FINISH_APPLICATION sent to absent application > application_1422406067005_0053 > > On Thu, Feb 12, 2015 at 10:38 PM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > >> It seems unlikely to me that it would be a 2.2 issue, though not entirely >> impossible. Are you able to find any of the container logs? Is the >> NodeManager launching containers and reporting some exit code? >> >> -Sandy >> >> On Thu, Feb 12, 2015 at 1:21 PM, Anders Arpteg <arp...@spotify.com> >> wrote: >> >>> No, not submitting from windows, from a debian distribution. Had a quick >>> look at the rm logs, and it seems some containers are allocated but then >>> released again for some reason. Not easy to make sense of the logs, but >>> here is a snippet from the logs (from a test in our small test cluster) if >>> you'd like to have a closer look: http://pastebin.com/8WU9ivqC >>> >>> Sandy, sounds like it could possible be a 2.2 issue then, or what do you >>> think? >>> >>> Thanks, >>> Anders >>> >>> On Thu, Feb 12, 2015 at 3:11 PM, Aniket Bhatnagar < >>> aniket.bhatna...@gmail.com> wrote: >>> >>>> This is tricky to debug. Check logs of node and resource manager of >>>> YARN to see if you can trace the error. In the past I have to closely look >>>> at arguments getting passed to YARN container (they get logged before >>>> attempting to launch containers). If I still don't get a clue, I had to >>>> check the script generated by YARN to execute the container and even run >>>> manually to trace at what line the error has occurred. >>>> >>>> BTW are you submitting the job from windows? >>>> >>>> On Thu, Feb 12, 2015, 3:34 PM Anders Arpteg <arp...@spotify.com> wrote: >>>> >>>>> Interesting to hear that it works for you. Are you using Yarn 2.2 as >>>>> well? No strange log message during startup, and can't see any other log >>>>> messages since no executer gets launched. Does not seems to work in >>>>> yarn-client mode either, failing with the exception below. >>>>> >>>>> Exception in thread "main" org.apache.spark.SparkException: Yarn >>>>> application has already ended! It might have been killed or unable to >>>>> launch application master. >>>>> at >>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:119) >>>>> at >>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59) >>>>> at >>>>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) >>>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:370) >>>>> at >>>>> com.spotify.analytics.AnalyticsSparkContext.<init>(AnalyticsSparkContext.scala:8) >>>>> at >>>>> com.spotify.analytics.DataSampler$.main(DataSampler.scala:42) >>>>> at com.spotify.analytics.DataSampler.main(DataSampler.scala) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>> at >>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:551) >>>>> at >>>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:155) >>>>> at >>>>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:178) >>>>> at >>>>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:99) >>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>>> >>>>> /Anders >>>>> >>>>> >>>>> On Thu, Feb 12, 2015 at 1:33 AM, Sandy Ryza <sandy.r...@cloudera.com> >>>>> wrote: >>>>> >>>>>> Hi Anders, >>>>>> >>>>>> I just tried this out and was able to successfully acquire >>>>>> executors. Any strange log messages or additional color you can provide >>>>>> on >>>>>> your setup? Does yarn-client mode work? >>>>>> >>>>>> -Sandy >>>>>> >>>>>> On Wed, Feb 11, 2015 at 1:28 PM, Anders Arpteg <arp...@spotify.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Compiled the latest master of Spark yesterday (2015-02-10) for >>>>>>> Hadoop 2.2 and failed executing jobs in yarn-cluster mode for that >>>>>>> build. Works successfully with spark 1.2 (and also master from >>>>>>> 2015-01-16), >>>>>>> so something has changed since then that prevents the job from receiving >>>>>>> any executors on the cluster. >>>>>>> >>>>>>> Basic symptoms are that the jobs fires up the AM, but after >>>>>>> examining the "executors" page in the web ui, only the driver is >>>>>>> listed, no executors are ever received, and the driver keep waiting >>>>>>> forever. Has anyone seemed similar problems? >>>>>>> >>>>>>> Thanks for any insights, >>>>>>> Anders >>>>>>> >>>>>> >>>>>> >>>>> >>> >> >