Re: Restarting a job with drain flag set to true

Pedro Facal Mon, 08 Nov 2021 07:36:33 -0800

Hi David,

  Thanks for the quick response. Say our app writes to disk every 10
minutes: the difference is, in one case a single window is emitted and thus
a single file is written, while if we drain and restart the pipeline we
will end up with two files (because we will have two windows emitted for
the 10 minutes period when the drain happened).


   In our particular case this is ok - everything consuming the files
downstream will just use the two "wrong" (ie partial) files transparently.
So if this is the extent of the drain side-effects, I think it's safe for
us to do this for pipeline upgrades. Or am I missing something?

Thanks,

    Pedro


On Mon, 8 Nov 2021 at 15:58, David Morávek <d...@apache.org> wrote:

> Hi Pedro,
>
> draining basically means that all of the sources will finish and progress
> their watermark to end of the global window, which will fire all of the
> triggers as a result. In other words, it will trigger the _ON_TIME_ results
> from all of the unfinished windows, even though they might not have seen a
> complete input.
>
> This behavior doesn't really play well with the pipeline upgrades, because
> you've already emitted "wrong" results from the pipeline.
>
> Best,
> D.
>
>
> On Mon, Nov 8, 2021 at 3:19 PM Pedro Facal <ped...@empathy.co> wrote:
>
>> Hello,
>>
>>    We have an apache beam streaming application, running under flink
>> native kubernetes. It consolidates aws kinesis records into parquet files
>> every few minutes.
>>
>>   To manage the lifecycle of this app, we use the rest api to stop the
>> job with a savepoint and then restart the cluster/job from said savepoint.
>> This normally works as expected, but we run into problems when the data
>> schema changes. So far so good, since, as expected, even if the schema
>> changes, stopping the job using "drain:true" results in a proper upgrade
>> without issues.
>>
>>    To avoid over complicating our release workflows, we are evaluating
>> the possibility of doing a "drain" restart every time we do a new release.
>> However, we have come across the following:
>>
>> > Use the --drain flag if you want to terminate the job permanently. If
>> you want to resume the job at a later point in time, then do not drain the
>> pipeline because it could lead to incorrect results when the job is resumed
>> [1].
>>
>> It's not clear what kind of "incorrect results" we could face here - can
>> anybody elaborate?  Our own tests show that we do not lose events from the
>> kinesis queue after the restart.
>>
>> Thanks,
>>
>>     Pedro
>>
>> [1](
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/#terminating-a-job
>> )
>>
>> --
>> *Pedro Facal San Luis* <ped...@empathy.co>
>> Data Team Lead
>> [image: Empathy Logo]
>> Privacy Policy <https://www.empathy.co/privacy-policy/>
>>
>

-- 
*Pedro Facal San Luis* <ped...@empathy.co>
Data Team Lead
[image: Empathy Logo]
Privacy Policy <https://www.empathy.co/privacy-policy/>

Re: Restarting a job with drain flag set to true

Reply via email to