Hi Guowei and Arvid,

Thanks for the suggestion. I wonder if it makes sense and possible that the
operator will produce a side output message telling the source to 'pause',
and the same side output as the side input to the source, based on which,
the source would pause and resume?

Thanks a lot!
Eleanore

On Sun, Nov 29, 2020 at 11:33 PM Arvid Heise <ar...@ververica.com> wrote:

> Hi Eleanore,
>
> if the external system is down, you could simply fail the job after a
> given timeout (for example, using asyncIO). Then the job would restart
> using the restarting policies.
>
> If your state is rather small (and thus recovery time okay), you would
> pretty much get your desired behavior. The job would stop to make progress
> until eventually the external system is responding again.
>
> On Mon, Nov 30, 2020 at 7:39 AM Guowei Ma <guowei....@gmail.com> wrote:
>
>> Hi, Eleanore
>>
>> 1. AFAIK I think only the job could "pause" itself.  For example the
>> "query" external system could pause when the external system is down.
>> 2. Maybe you could try the "iterate" and send the failed message back to
>> retry if you use the DataStream api.
>>
>> Best,
>> Guowei
>>
>>
>> On Mon, Nov 30, 2020 at 1:01 PM Eleanore Jin <eleanore....@gmail.com>
>> wrote:
>>
>>> Hi experts,
>>>
>>> Here is my use case, it's a flink stateless streaming job for message
>>> validation.
>>> 1. read from a kafka topic
>>> 2. perform validation of message, which requires query external system
>>>        2a. the metadata from the external system will be cached in
>>> memory for 15minutes
>>>        2b. there is another stream that will send updates to update the
>>> cache if metadata changed                     within 15 minutes
>>> 3. if message is valid, publish to valid topic
>>> 4. if message is invalid, publish to error topic
>>> 5. if the external system is down, the message is marked as invalid with
>>> different error code, and published to the same error topic.
>>>
>>> Ask:
>>> For those messages that failed due to external system failures, it
>>> requires manual replay of those messages.
>>>
>>> Is there a way to pause the job if there is an external system failure,
>>> and resume once the external system is online?
>>>
>>> Or are there any other suggestions to allow auto retry such error?
>>>
>>> Thanks a lot!
>>> Eleanore
>>>
>>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Reply via email to