Sounds good.
Is someone working on this automation today?
If not, although my time is tight, I may be able to work on a PR
for getting us started down the path Kubernetes native cluster
mode.
On 12/4/18 5:35 AM, Till Rohrmann
wrote:
Hi Derek,
what I would recommend to use is to trigger the cancel
with savepoint command [1]. This will create a savepoint and
terminate the job execution. Next you simply need to respawn
the job cluster which you provide with the savepoint to
resume from.
Cheers,
Till
Hi Derek,
I think your automation steps look good.
Recreating deployments should not take long
and as you mention, this way you can avoid unpredictable
old/new version collisions.
Best,
Andrey
> On 4 Dec 2018, at 10:22, Dawid Wysakowicz <dwysakow...@apache.org> wrote:
>
> Hi Derek,
>
> I am not an expert in kubernetes, so I will cc Till, who
should be able
> to help you more.
>
> As for the automation for similar process I would
recommend having a
> look at dA platform[1] which is built on top of
kubernetes.
>
> Best,
>
> Dawid
>
> [1] https://data-artisans.com/platform-overview
>
> On 30/11/2018 02:10, Derek VerLee wrote:
>>
>> I'm looking at the job cluster mode, it looks great
and I and
>> considering migrating our jobs off our "legacy"
session cluster and
>> into Kubernetes.
>>
>> I do need to ask some questions because I haven't
found a lot of
>> details in the documentation about how it works yet,
and I gave up
>> following the the DI around in the code after a
while.
>>
>> Let's say I have a deployment for the job "leader" in
HA with ZK, and
>> another deployment for the taskmanagers.
>>
>> I want to upgrade the code or configuration and start
from a
>> savepoint, in an automated way.
>>
>> Best I can figure, I can not just update the
deployment resources in
>> kubernetes and allow the containers to restart in an
arbitrary order.
>>
>> Instead, I expect sequencing is important, something
along the lines
>> of this:
>>
>> 1. issue savepoint command on leader
>> 2. wait for savepoint
>> 3. destroy all leader and taskmanager containers
>> 4. deploy new leader, with savepoint url
>> 5. deploy new taskmanagers
>>
>>
>> For example, I imagine old taskmanagers (with an old
version of my
>> job) attaching to the new leader and causing a
problem.
>>
>> Does that sound right, or am I overthinking it?
>>
>> If not, has anyone tried implementing any automation
for this yet?
>>
>
|