Here is the log from yarn application - run on another cluster (this time cdh5.7.0, but with similar configuration). Check the hostnames; in configuration, there are aliases used and the difference from fqdn may be the cause, judging by the log (exception at line 87)...
http://pastebin.com/iimPVbXB Thanks, Mira Maximilian Michels píše v Pá 19. 08. 2016 v 09:12 +0200: > Hi Mira, > > If I understood correctly, the log output should be for Flink 1.1.1. > However, there are classes present in the log which don't exist in > Flink 1.1.1, e.g. FlinkYarnClient. Could you please check if you > posted the correct log? > > Also, it would be good to have not only the client log but also the > log of the Flink Yarn application. > > Thanks, > Max > > On Thu, Aug 18, 2016 at 3:20 PM, Miroslav Gajdoš > <miroslav.gaj...@firma.seznam.cz> wrote: > > > > Tried to build it from source as well as use prebuilt binary > > release > > (v1.1.1), the last one produced this log output: > > http://pastebin.com/3L5Yhs9x > > > > Application in yarn still fails on "Fatal error in AM: The > > ContainerLaunchContext was not set". > > > > Mira > > > > Miroslav Gajdoš píše v Čt 18. 08. 2016 v 10:36 +0200: > > > > > > Hi Max, > > > > > > we are building it from sources and package it for debian. I can > > > try > > > to > > > use the binary release for hadoop 2.6.0. > > > > > > Regarding zookeeper, we do not share instances between dev and > > > production. > > > > > > Thanks, > > > Miroslav > > > > > > Maximilian Michels píše v Čt 18. 08. 2016 v 10:17 +0200: > > > > > > > > > > > > Hi Miroslav, > > > > > > > > From the logs it looks like you're using Flink version 1.0.x. > > > > The > > > > ContainerLaunchContext is always set by Flink. I'm wondering > > > > why > > > > this > > > > error can still occur. Are you using the default Hadoop version > > > > that > > > > comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build > > > > of > > > > Flink. > > > > > > > > Does your Dev cluster share the Zookeeper installation with the > > > > production cluster? I'm wondering because it receives incorrect > > > > leadership information although the leading JobManager seems to > > > > be > > > > attempting to register at the ApplicationMaster. > > > > > > > > Best, > > > > Max > > > > > > > > On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš > > > > <miroslav.gaj...@firma.seznam.cz> wrote: > > > > > > > > > > > > > > > > > > > > Log from yarn session runner is here: > > > > > http://pastebin.com/xW1W4HNP > > > > > > > > > > Our hadoop distribution is from cloudera, resourcenanager > > > > > version: > > > > > 2.6.0-cdh5.4.5, it runs in HA mode (there could be some > > > > > redirecting > > > > > on > > > > > accessing resourcemanager and/or namenode to active one). > > > > > > > > > > Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200: > > > > > > > > > > > > > > > > > > > > > > > > This could be a bug in Flink. Can you share the complete > > > > > > logs > > > > > > of > > > > > > the > > > > > > run? CC'ing Max who worked on the YARN client recently who > > > > > > might > > > > > > have > > > > > > an idea in which cases Flink would not set the context. > > > > > > > > > > > > On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš > > > > > > <miroslav.gaj...@firma.seznam.cz> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi guys, > > > > > > > > > > > > > > i've run into some problems with flink/yarn. I try to > > > > > > > deploy > > > > > > > flink > > > > > > > to > > > > > > > our cluster using /usr/lib/flink-scala2.10/bin/yarn- > > > > > > > session.sh, > > > > > > > but > > > > > > > the > > > > > > > yarn application does not even start, it goes from > > > > > > > accepted > > > > > > > to > > > > > > > finished/failed. Yarn info on resourcemanager looks like > > > > > > > this: > > > > > > > > > > > > > > User: wa-flink > > > > > > > Name: Flink session with 3 TaskManagers > > > > > > > Ap > > > > > > > plication Type: Apache Flink > > > > > > > Application Tags: > > > > > > > State: FINISHED > > > > > > > FinalStatus: FAILED > > > > > > > Started: Mon Aug 15 18:02:42 +0200 2016 > > > > > > > Elapsed: 16sec > > > > > > > Tracking URL: History > > > > > > > Diagnostics: Fatal error in AM: The > > > > > > > ContainerLaunchContext > > > > > > > was > > > > > > > not set. > > > > > > > > > > > > > > On dev cluster, applications deploys without problem, > > > > > > > this > > > > > > > happens > > > > > > > only > > > > > > > in production. > > > > > > > > > > > > > > What could be wrong? > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > -- > > > > > > > Miroslav Gajdoš > > > > > > > vývoj - webová analytika (Brno) > > > > > > > https://reporter.seznam.cz > > > > > > > miroslav.gaj...@firma.seznam.cz > > > > > > > > > > > > > > > > > > > -- > > > > > Miroslav Gajdoš > > > > > vývoj - webová analytika (Brno) > > > > > https://reporter.seznam.cz > > > > > miroslav.gaj...@firma.seznam.cz > > -- > > Miroslav Gajdoš > > vývoj - webová analytika (Brno) > > https://reporter.seznam.cz > > miroslav.gaj...@firma.seznam.cz -- Miroslav Gajdoš vývoj - webová analytika (Brno) https://reporter.seznam.cz miroslav.gaj...@firma.seznam.cz