Re: Is there an HA solution to run flink job with multiple source

2022-06-02 Thread Bariša Obradović
Hi, our use is that the data sources are independent, we are using flink to ingest data from kafka sources, do a bit of filtering and then write it to S3. Since we ingest from multiple kafka sources, and they are independent, we consider them all optional. Even if 1 just kafka is up and running,

Re: Is there an HA solution to run flink job with multiple source

2022-06-01 Thread Alexander Fedulov
Hi Bariša, The way I see it is you either - need data from all sources because you are doing some conjoint processing. In that case stopping the pipeline is usually the right thing to do. - the streams consumed from multiple servers are not combined and hence could be processed in independent

Re: Is there an HA solution to run flink job with multiple source

2022-06-01 Thread Jing Ge
Hi Bariša, Could you share the reason why your data processing pipeline should keep running when one kafka source is down? It seems like any one among the multiple kafka sources is optional for the data processing logic, because any kafka source could be the one that is down. Best regards, Jing

Re:Is there an HA solution to run flink job with multiple source

2022-06-01 Thread Xuyang
I think you can try to use a custom source to do that although the one of the kafka sources is down the operator is also running(just do nothing). The only trouble is that you need to manage the checkpoint and something else yourself. But the good news is that you can copy the implementation of

Is there an HA solution to run flink job with multiple source

2022-06-01 Thread Bariša Obradović
Hi, we are running a flink job with multiple kafka sources connected to different kafka servers. The problem we are facing is when one of the kafka's is down, the flink job starts restarting. Is there anyway for flink to pause processing of the kafka which is down, and yet continue processing