[DISCUSS] Proposal for Flink job execution/availability metrics impovement

2019-05-10 Thread Kim, Hwanju
Hi, I am Hwanju at AWS Kinesis Analytics. We would like to start a discussion thread about a project we consider for Flink operational improvement in production. We would like to start conversation early before detailed design, so any high-level feedback would welcome. For service providers wh

Re: [DISCUSS] Proposal for Flink job execution/availability metrics impovement

2019-05-16 Thread Piotr Nowojski
Hi Hwanju, Thanks for starting the discussion. Definitely any improvement in this area would be very helpful and valuable. Generally speaking +1 from my side, as long as we make sure that either such changes do not add performance overhead (which I think they shouldn’t) or they are optional.

Re: [DISCUSS] Proposal for Flink job execution/availability metrics impovement

2019-05-16 Thread Chesnay Schepler
On 16/05/2019 11:34, Piotr Nowojski wrote: Luckily it seems like those four issues/proposals could be implemented/discussed independently or in stages. I fully agree, and believe we should split this thread. We will end up discussing too many issues at once. Nevertheless, On 16/05/2019 11:34

Re: [DISCUSS] Proposal for Flink job execution/availability metrics impovement

2019-05-17 Thread Kim, Hwanju
Hi Piotrek, Thanks for insightful feedback and indeed you got most tricky parts and concerns. > 1. Do we currently account state restore as “RUNNING”? If yes, this might be > incorrect from your perspective. As Chesnay said, initializeState is called in StreamTask.invoke after transitioning t

Re: [DISCUSS] Proposal for Flink job execution/availability metrics impovement

2019-05-17 Thread Piotr Nowojski
Hi Hwanju & Chesney, Regarding various things that both of you mentioned, like accounting of state restoration separately or batch scheduling, we can always acknowledge some limitations of the initial approach and maybe we can address them later if we evaluate it worth the effort. Generally sp

Re: [DISCUSS] Proposal for Flink job execution/availability metrics impovement

2019-05-24 Thread Kim, Hwanju
Hi, As suggested by Piotrek, the first part, execution state tracking, is now split to a separate doc: https://docs.google.com/document/d/1oLF3w1wYyr8vqoFoQZhw1QxTofmAtlD8IF694oPLjNI/edit?usp=sharing We'd appreciate any feedback. I am still using the same email thread to provide a full context

Re: [DISCUSS] Proposal for Flink job execution/availability metrics impovement

2019-05-24 Thread Piotr Nowojski
Hi Hwanju, I looked through the document, however I’m not the best person to review/judge/discuss about implementation details here. I hope that Chesney will be able to help in this regard. Piotrek > On 24 May 2019, at 09:09, Kim, Hwanju wrote: > > Hi, > > As suggested by Piotrek, the first

Re: [DISCUSS] Proposal for Flink job execution/availability metrics impovement

2019-05-28 Thread Hwanju Kim
(Somehow my email has failed to be sent multiple times, so I am using my personal email account) Hi, Piotrek - Thanks for the feedback! I revised the doc as commented. Here's the second part about exception classification - https://docs.google.com/document/d/1pcHg9F3GoDDeVD5GIIo2wO67Hmjgy0-hRDeu

Re: [DISCUSS] Proposal for Flink job execution/availability metrics impovement

2019-06-16 Thread Becket Qin
Hi Hwanju, Thanks for the proposal. The enhancement will improve the operability of Flink and make it more service provider friendly. So in general I am +1 on the proposal. A few questions / thoughts: 1. From what I understand, the current availability metrics

Re: [DISCUSS] Proposal for Flink job execution/availability metrics impovement

2019-07-03 Thread Hwanju Kim
Hi Becket, Sorry for the late response since I might have missed this during my vacation. Thanks for the feedback! Please find inlined below. 2019년 6월 16일 (일) 오후 6:46, Becket Qin 님이 작성: > Hi Hwanju, > > Thanks for the proposal. The enhancement will improve the operability of > Flink and make it

Re: [DISCUSS] Proposal for Flink job execution/availability metrics impovement

2019-07-03 Thread Hwanju Kim
Hi, I am sharing the last doc, which is about progress monitor (a little deferred sharing by my vacation): https://docs.google.com/document/d/1Ov9A7V2tMs4uVimcSeHL5eftRJ3MCJBiVSFNdz8rmjU/edit?usp=sharing This last one seems like pretty independent from the first two (execution tracking and except

Re: [DISCUSS] Proposal for Flink job execution/availability metrics impovement

2019-07-08 Thread Becket Qin
Hi Hwanju, thanks for the reply. Regarding the first two proposals, my main concern is whether it is necessary to have something deeply coupled with Flink runtime. To some extent, the SLA metrics are kind of custom metrics. It would be good if we can support custom metrics in general, instead of o