Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-25 Thread Swathi C
missing >> anything and if it is related to pluggable enricher FLIP or anyway we can >> use pluggable enricker FLIP here for CRUD failures. >> >> Regards, >> Swathi C >> >> -- Forwarded message - >> From: Martijn Visser >> Date: T

Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-25 Thread Martijn Visser
if it is related to pluggable enricher FLIP or anyway we can > use pluggable enricker FLIP here for CRUD failures. > > Regards, > Swathi C > > -- Forwarded message - > From: Martijn Visser > Date: Thu, Apr 25, 2024 at 2:46 AM > Subject: Re: [ DISCUSS ]

Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-24 Thread Martijn Visser
I would prefer a separate FLIP On Wed, Apr 24, 2024 at 3:25 PM Swathi C wrote: > Sure Ahmed and Martijn. > Fetching the flink particular job related failure and adding this logic to > termination-log is definitely a sub-task of pluggable enricher as we can > leverage pluggable enricher to

Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-24 Thread Swathi C
Sure Ahmed and Martijn. Fetching the flink particular job related failure and adding this logic to termination-log is definitely a sub-task of pluggable enricher as we can leverage pluggable enricher to achieve this. But for CRUD level failures, which is mainly used to notify if the job manager

Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-24 Thread Ahmed Hamdy
Hi, I agree with the Martijn, We can reformulate the FLIP to introduce termination log as supported pluggable enricher. If you believe the scope of work is a subset (Further implementation) we can just add a Jira ticket for it. IMO this will also help with implementation taking the existing

Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-23 Thread Martijn Visser
>From a procedural point of view, we shouldn't make FLIPs sub-tasks for existing FLIPs that have been voted/are released. That will only cause confusion down the line. A new FLIP should take existing functionality (like FLIP-304) into account, and propose how to improve on what that original FLIP

Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-23 Thread ramkrishna vasudevan
Hi Gyula and Ahmed, I totally agree that there is an interlap in the final goal that both the FLIPs are achieving here and infact FLIP-304 is more comprehensive for job failures. But as a proposal to move forward can we make Swathi's FLIP/JIRA as a sub task for FLIP-304 and continue with the PR

Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-18 Thread Swathi C
Hi Gyula and Ahmed, Thanks for reviewing this. @gyula.f...@gmail.com , currently since our aim as part of this FLIP was only to fail the cluster when job manager/flink has issues such that the cluster would no longer be usable, hence, we proposed only related to that. Your right, that it

Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-18 Thread Ahmed Hamdy
Hi Swathi! Thanks for the proposal. Could you please elaborate what this FLIP offers more than Flip-304[1]? Flip 304 proposes a Pluggable mechanism for enriching Job failures, If I am not mistaken this proposal looks like a subset of it. 1-

Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-18 Thread Gyula Fóra
Hi Swathi! Thank you for creating this proposal. I really like the general idea of increasing the K8s native observability of Flink job errors. I took a quick look at your reference PR, the termination log related logic is contained completely in the ClusterEntrypoint. What type of errors will