My team and I are keen to help out with testing and review as soon as there is a pill request.
-H > On Feb 11, 2019, at 00:26, Till Rohrmann <trohrm...@apache.org> wrote: > > Hi Heath, > > I just learned that people from Alibaba already made some good progress with > FLINK-9953. I'm currently talking to them in order to see how we can merge > this contribution into Flink as fast as possible. Since I'm quite busy due to > the upcoming release I hope that other community members will help out with > the reviewing once the PRs are opened. > > Cheers, > Till > >> On Fri, Feb 8, 2019 at 8:50 PM Heath Albritton <halbr...@harm.org> wrote: >> Has any progress been made on this? There are a number of folks in >> the community looking to help out. >> >> >> -H >> >> On Wed, Dec 5, 2018 at 10:00 AM Till Rohrmann <trohrm...@apache.org> wrote: >> > >> > Hi Derek, >> > >> > there is this issue [1] which tracks the active Kubernetes integration. >> > Jin Sun already started implementing some parts of it. There should also >> > be some PRs open for it. Please check them out. >> > >> > [1] https://issues.apache.org/jira/browse/FLINK-9953 >> > >> > Cheers, >> > Till >> > >> > On Wed, Dec 5, 2018 at 6:39 PM Derek VerLee <derekver...@gmail.com> wrote: >> >> >> >> Sounds good. >> >> >> >> Is someone working on this automation today? >> >> >> >> If not, although my time is tight, I may be able to work on a PR for >> >> getting us started down the path Kubernetes native cluster mode. >> >> >> >> >> >> On 12/4/18 5:35 AM, Till Rohrmann wrote: >> >> >> >> Hi Derek, >> >> >> >> what I would recommend to use is to trigger the cancel with savepoint >> >> command [1]. This will create a savepoint and terminate the job >> >> execution. Next you simply need to respawn the job cluster which you >> >> provide with the savepoint to resume from. >> >> >> >> [1] >> >> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#cancel-job-with-savepoint >> >> >> >> Cheers, >> >> Till >> >> >> >> On Tue, Dec 4, 2018 at 10:30 AM Andrey Zagrebin >> >> <and...@data-artisans.com> wrote: >> >>> >> >>> Hi Derek, >> >>> >> >>> I think your automation steps look good. >> >>> Recreating deployments should not take long >> >>> and as you mention, this way you can avoid unpredictable old/new version >> >>> collisions. >> >>> >> >>> Best, >> >>> Andrey >> >>> >> >>> > On 4 Dec 2018, at 10:22, Dawid Wysakowicz <dwysakow...@apache.org> >> >>> > wrote: >> >>> > >> >>> > Hi Derek, >> >>> > >> >>> > I am not an expert in kubernetes, so I will cc Till, who should be able >> >>> > to help you more. >> >>> > >> >>> > As for the automation for similar process I would recommend having a >> >>> > look at dA platform[1] which is built on top of kubernetes. >> >>> > >> >>> > Best, >> >>> > >> >>> > Dawid >> >>> > >> >>> > [1] https://data-artisans.com/platform-overview >> >>> > >> >>> > On 30/11/2018 02:10, Derek VerLee wrote: >> >>> >> >> >>> >> I'm looking at the job cluster mode, it looks great and I and >> >>> >> considering migrating our jobs off our "legacy" session cluster and >> >>> >> into Kubernetes. >> >>> >> >> >>> >> I do need to ask some questions because I haven't found a lot of >> >>> >> details in the documentation about how it works yet, and I gave up >> >>> >> following the the DI around in the code after a while. >> >>> >> >> >>> >> Let's say I have a deployment for the job "leader" in HA with ZK, and >> >>> >> another deployment for the taskmanagers. >> >>> >> >> >>> >> I want to upgrade the code or configuration and start from a >> >>> >> savepoint, in an automated way. >> >>> >> >> >>> >> Best I can figure, I can not just update the deployment resources in >> >>> >> kubernetes and allow the containers to restart in an arbitrary order. >> >>> >> >> >>> >> Instead, I expect sequencing is important, something along the lines >> >>> >> of this: >> >>> >> >> >>> >> 1. issue savepoint command on leader >> >>> >> 2. wait for savepoint >> >>> >> 3. destroy all leader and taskmanager containers >> >>> >> 4. deploy new leader, with savepoint url >> >>> >> 5. deploy new taskmanagers >> >>> >> >> >>> >> >> >>> >> For example, I imagine old taskmanagers (with an old version of my >> >>> >> job) attaching to the new leader and causing a problem. >> >>> >> >> >>> >> Does that sound right, or am I overthinking it? >> >>> >> >> >>> >> If not, has anyone tried implementing any automation for this yet? >> >>> >> >> >>> > >> >>>