Restarting a job with drain flag set to true

Pedro Facal Mon, 08 Nov 2021 06:18:46 -0800

Hello,

   We have an apache beam streaming application, running under flink native
kubernetes. It consolidates aws kinesis records into parquet files every
few minutes.


  To manage the lifecycle of this app, we use the rest api to stop the job
with a savepoint and then restart the cluster/job from said savepoint. This
normally works as expected, but we run into problems when the data schema
changes. So far so good, since, as expected, even if the schema changes,
stopping the job using "drain:true" results in a proper upgrade without
issues.

   To avoid over complicating our release workflows, we are evaluating the
possibility of doing a "drain" restart every time we do a new release.
However, we have come across the following:

> Use the --drain flag if you want to terminate the job permanently. If you
want to resume the job at a later point in time, then do not drain the
pipeline because it could lead to incorrect results when the job is resumed
[1].

It's not clear what kind of "incorrect results" we could face here - can
anybody elaborate?  Our own tests show that we do not lose events from the
kinesis queue after the restart.

Thanks,

    Pedro

[1](
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/#terminating-a-job
)

-- 
*Pedro Facal San Luis* <ped...@empathy.co>
Data Team Lead
[image: Empathy Logo]
Privacy Policy <https://www.empathy.co/privacy-policy/>

Restarting a job with drain flag set to true

Reply via email to