Yarn Application Crashed?

2021-06-27 Thread Thomas Wang
Hi, I recently experienced a job crash due to the underlying Yarn application failing for some reason. Here is the only error message I saw. It seems I can no longer see any of the Flink job logs. Application application_1623861596410_0010 failed 1 times (global limit =2; local limit is =1) due t

Re: Yarn Application Crashed?

2021-06-27 Thread Thomas Wang
Just found some additional info. It looks like one of the EC2 instances got terminated at the time the crash happened and this job had 7 Task Managers running on that EC2 instance. Now I suspect it's possible that when Yarn tried to migrate the Task Managers, there were no idle containers as this j

Re: Yarn Application Crashed?

2021-06-28 Thread Piotr Nowojski
Hi, You should still be able to get the Flink logs via: > yarn logs -applicationId application_1623861596410_0010 And it should give you more answers about what has happened. About the Flink and YARN behaviour, have you seen the documentation? [1] Especially this part: > Failed containers (inc

Re: Yarn Application Crashed?

2021-06-29 Thread Thomas Wang
Thanks Piotr. This is helpful. Thomas On Mon, Jun 28, 2021 at 8:29 AM Piotr Nowojski wrote: > Hi, > > You should still be able to get the Flink logs via: > > > yarn logs -applicationId application_1623861596410_0010 > > And it should give you more answers about what has happened. > > About the

Re: Yarn Application Crashed?

2021-06-30 Thread Piotr Nowojski
You are welcome :) Piotrek śr., 30 cze 2021 o 08:34 Thomas Wang napisał(a): > Thanks Piotr. This is helpful. > > Thomas > > On Mon, Jun 28, 2021 at 8:29 AM Piotr Nowojski > wrote: > >> Hi, >> >> You should still be able to get the Flink logs via: >> >> > yarn logs -applicationId application_16