I would be happy to answer more questions later but it would be best if you could first try the operator or at least read the documentation: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/
You will find answers to most of your questions, and running it locally on minikube to try some test scenarios will be even more beneficial. Thanks Gyula On Fri, Jan 13, 2023 at 5:50 PM Tamir Sagi <tamir.s...@niceactimize.com> wrote: > Hey Gyula, > Thank you for fast response. > > I understand it completely. I believe the operator has similar > functionalities to the custom service we have developed regarding deploy, > update and delete clusters. > The different from our perspective is that we have added several more > capabilities and configurations for the deployment phase. > > Assuming there is an application cluster(Native k8s) with 3 Job managers > and 2 Task managers. The cluster is running for several hours. let's say In > a given point in time, the operator decides to scale the cluster up (based > on pre-defined configurations). > > you wrote > > *The operator also now contains an autoscaler module that runs within the > operator and monitors Flink clusters and determines whether a jobvertex > should be scaled up or down. It will then scale the job accordingly.* > > it sounds great, Does that mean that it simply creates a new TM pod > which then becomes part of the cluster? (I'm asking because the graph is > created while deploying the cluster at first place). > If yes, is this module can be used outside that operator? > > If not, would you please elaborate whether this scale(up/down) operation > leads to downtime? > > Best, > Tamir. > > > > > > > > > ------------------------------ > *From:* Gyula Fóra <gyula.f...@gmail.com> > *Sent:* Friday, January 13, 2023 4:42 PM > *To:* Tamir Sagi <tamir.s...@niceactimize.com> > *Cc:* Chesnay Schepler <ches...@apache.org>; user@flink.apache.org < > user@flink.apache.org> > *Subject:* Re: [EXTERNAL] Re: Flink reactive mode for application > clusters on AWS EKS > > > *EXTERNAL EMAIL* > > > Hi Tamir! > > Let me try to clarify a few points here. > > The operator works based on FlinkDeployment Custom Resources (Yaml > definition) and the operator creates the required clusters / taskmanagers > based on that. If you change the parallelism of your FlinkDeployment Yaml, > the operator will adjust the cluster size (scale up or down). > > The operator also now contains an autoscaler module that runs within the > operator and monitors Flink clusters and determines whether a jobvertex > should be scaled up or down. It will then scale the job accordingly. > The autoscaler currently only works with the default Native Deployment > mode. > > The operator does not use Flink reactive mode to perform autoscaling. > > I highly recommend trying to migrate to the operator (or at least testing > it locally so you fully understand the functionality), you will save > yourself a tremendous amount of work especially if you are looking to build > an autoscaler. > > Cheers, > Gyula > > On Fri, Jan 13, 2023 at 3:37 PM Tamir Sagi <tamir.s...@niceactimize.com> > wrote: > > Hey Gyula, > > Thanks for clarifying that. > > We created a custom service before an official Flink k8s operator was > released. That service deploys/upgrades/deletes clusters (no Yamls are > needed). It handles failures including retries and cleanups based on our > needs. Hence, moving to the official Flink operator might take a while. > > Does the operator also perform scale down? > > Regarding HPA, Task managers are created by Flink based on parallelism & > number of slots. Then the cluster has fixed size of X JMs and Y TMs. > I was thinking about adding HPA but wondered whether or not Flink will > handle the new TMs properly (I have not tested it). > > We are probably left with the option to implement the auto scaling > mechanism ourselves on top of Flink clusters. > > Best, > Tamir. > ------------------------------ > *From:* Gyula Fóra <gyula.f...@gmail.com> > *Sent:* Friday, January 13, 2023 8:39 AM > *To:* Swathi Chandrashekar <cswa...@microsoft.com> > *Cc:* Chesnay Schepler <ches...@apache.org>; Tamir Sagi < > tamir.s...@niceactimize.com>; user@flink.apache.org <user@flink.apache.org > > > *Subject:* Re: [EXTERNAL] Re: Flink reactive mode for application > clusters on AWS EKS > > > *EXTERNAL EMAIL* > > > What I am trying to say is use the Kubernetes operator with Native > (default) mode and forget about reactive . > > The operator does everything you wwant plus has an actual autoscaler. > > Gyula > > On Fri, 13 Jan 2023 at 07:24, Swathi Chandrashekar <cswa...@microsoft.com> > wrote: > > Got it, so this means, we should have standalone app mode cluster which is > managed by a flink Kubernetes operator and the operator would update the > replicas based on the metrics ( autoscale ) which in-tern changes the > parallelism as reactivemode is enabled. > > > > Regards, > > Swathi C > > > > *From:* Gyula Fóra <gyula.f...@gmail.com> > *Sent:* Friday, January 13, 2023 11:31 AM > *To:* Swathi Chandrashekar <cswa...@microsoft.com> > *Cc:* Chesnay Schepler <ches...@apache.org>; Tamir Sagi < > tamir.s...@niceactimize.com>; user@flink.apache.org > *Subject:* Re: [EXTERNAL] Re: Flink reactive mode for application > clusters on AWS EKS > > > > No but the Kubernetes operator itself already provides similar feature set. > > > > Not sure why you want the reactive mode in the first place . If it's > because you want to implement auto scaling on top of it, then I think the > operator is a better alternative. > > > > I think you should try to understand what exactly the reactive mode > provides vs what the operator does. Reactive mode alone doesn’t do too much. > > > > Gyula > > > > On Fri, 13 Jan 2023 at 06:33, Swathi Chandrashekar <cswa...@microsoft.com> > wrote: > > Hi @Gyula Fóra <gyula.f...@gmail.com>, > > > > Does this mean, with Kubernetes operator, we can have reactive mode in > native flink which is in app mode ? [ Not just standalone app mode ] > > > > Regards, > > Swathi C > > > > *From:* Gyula Fóra <gyula.f...@gmail.com> > *Sent:* Thursday, January 12, 2023 11:14 PM > *To:* Tamir Sagi <tamir.s...@niceactimize.com> > *Cc:* Chesnay Schepler <ches...@apache.org>; user@flink.apache.org > *Subject:* [EXTERNAL] Re: Flink reactive mode for application clusters on > AWS EKS > > > > Hey! > > I think the reactive scaling is a somewhat misunderstood feature. It only > works in standalone deployments (not in Kubernetes native for instace) and > it doesn't actually provide any autoscaling functionality on its own. > You would have to implement your scaling logic yourself somehow > (Kubernetes HPA or something similar) > > I suggest looking at the Flink Kubernetes Operator ( > https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/ > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnightlies.apache.org%2Fflink%2Fflink-kubernetes-operator-docs-main%2F&data=05%7C01%7Ccswathi%40microsoft.com%7C503a40530f07463cc60008daf52b880a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638091864604462716%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=R2zk4xBZKAX7EuRWuhc5Sp%2FYBlZuv60OnSvtZ9QWG9Y%3D&reserved=0>) > that will provide actual autoscaler capability for native Kubernetes > deployments. > > Cheers, > Gyula > > > > On Thu, Jan 12, 2023 at 6:23 PM Tamir Sagi <tamir.s...@niceactimize.com> > wrote: > > Hey Chesnay, > > > > Just to be more clear, > > I'm talking about plans to support reactive mode for application clusters > in Native Kubernetes. > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#application-mode > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnightlies.apache.org%2Fflink%2Fflink-docs-master%2Fdocs%2Fdeployment%2Fresource-providers%2Fnative_kubernetes%2F%23application-mode&data=05%7C01%7Ccswathi%40microsoft.com%7C503a40530f07463cc60008daf52b880a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638091864604462716%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FbL%2BgCndsgf%2BaJfYkMrvaqOa3f5gUXwnukBB0ddU34w%3D&reserved=0> > > > > Thanks, > > Tamir. > > > ------------------------------ > > *From:* Tamir Sagi <tamir.s...@niceactimize.com> > *Sent:* Thursday, January 12, 2023 6:17 PM > *To:* Chesnay Schepler <ches...@apache.org>; user@flink.apache.org < > user@flink.apache.org> > *Subject:* Re: Flink reactive mode for application clusters on AWS EKS > > > > Hey Chesnay, > > > > Thank you for your response. > > > > Since we are running our Flink jobs on EKS (Elastic Kubernetes Service) I > was asking regarding Application cluster on Kubernetes. > > > > The documentations I referred to clearly state that it is not > supported, the same as shown on Flink website. > > > > Is there any plan to support that anytime soon? > > > > Thanks > > > > Tamir. > ------------------------------ > > *From:* Chesnay Schepler <ches...@apache.org> > *Sent:* Thursday, January 12, 2023 4:30 PM > *To:* Tamir Sagi <tamir.s...@niceactimize.com>; user@flink.apache.org < > user@flink.apache.org> > *Subject:* Re: Flink reactive mode for application clusters on AWS EKS > > > > *EXTERNAL EMAIL* > > > > The adaptive scheduler and reactive mode both already support application > clusters since 1.13. > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/elastic_scaling/ > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnightlies.apache.org%2Fflink%2Fflink-docs-release-1.16%2Fdocs%2Fdeployment%2Felastic_scaling%2F&data=05%7C01%7Ccswathi%40microsoft.com%7C503a40530f07463cc60008daf52b880a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638091864604462716%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0T%2BA2g7sjbkWLIOuugZ7960zLvfnPQPFy3WVtUHXgb4%3D&reserved=0> > > > > On 19/12/2022 10:17, Tamir Sagi wrote: > > Hey, > > > > We are running stream jobs on application clusters (v1.15.2) on AWS EKS. > > > > I was reviewing the following pages on Flink confluence > > - Reactive mode [1] > - Adaptive Scheduler [2] > > I also encountered the following POC conducted by Robert Metzger ( > @rmetzger_ > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Frmetzger_&data=05%7C01%7Ccswathi%40microsoft.com%7C503a40530f07463cc60008daf52b880a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638091864604462716%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UkahsyFgl%2F4XjHyzQHXeVsj9VAxqUx%2F99yBMwsjwxqI%3D&reserved=0>) > on 06 May 2021. [3] > > > > my question is whether that feature will be supported in the future for > application clusters or not. > > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-159%3A+Reactive+Mode > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FFLINK%2FFLIP-159%253A%2BReactive%2BMode&data=05%7C01%7Ccswathi%40microsoft.com%7C503a40530f07463cc60008daf52b880a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638091864604462716%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GDajax9LQCwuwGkMzDim5afmzwIYVq95NcPMCxLPAvw%3D&reserved=0> > > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-160%3A+Adaptive+Scheduler > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FFLINK%2FFLIP-160%253A%2BAdaptive%2BScheduler&data=05%7C01%7Ccswathi%40microsoft.com%7C503a40530f07463cc60008daf52b880a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638091864604462716%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=euM7lNHKBvq0650GJBlKBQKeAw3eymG9eNazom7Our8%3D&reserved=0> > > [3] https://flink.apache.org/2021/05/06/reactive-mode.html > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2021%2F05%2F06%2Freactive-mode.html&data=05%7C01%7Ccswathi%40microsoft.com%7C503a40530f07463cc60008daf52b880a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638091864604462716%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZuKscanEqypolAK0epkbwBBWo7gBhoih1RweOh%2Buthc%3D&reserved=0> > > > > > > Thanks, > > Tamir. > > > > > Confidentiality: This communication and any attachments are intended for > the above-named persons only and may be confidential and/or legally > privileged. Any opinions expressed in this communication are not > necessarily those of NICE Actimize. If this communication has come to you > in error you must take no action based on it, nor must you copy or show it > to anyone; please delete/destroy and inform the sender by e-mail > immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. > > > > > Confidentiality: This communication and any attachments are intended for > the above-named persons only and may be confidential and/or legally > privileged. Any opinions expressed in this communication are not > necessarily those of NICE Actimize. If this communication has come to you > in error you must take no action based on it, nor must you copy or show it > to anyone; please delete/destroy and inform the sender by e-mail > immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. > > > Confidentiality: This communication and any attachments are intended for > the above-named persons only and may be confidential and/or legally > privileged. Any opinions expressed in this communication are not > necessarily those of NICE Actimize. If this communication has come to you > in error you must take no action based on it, nor must you copy or show it > to anyone; please delete/destroy and inform the sender by e-mail > immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. > > > Confidentiality: This communication and any attachments are intended for > the above-named persons only and may be confidential and/or legally > privileged. Any opinions expressed in this communication are not > necessarily those of NICE Actimize. If this communication has come to you > in error you must take no action based on it, nor must you copy or show it > to anyone; please delete/destroy and inform the sender by e-mail > immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. > > > Confidentiality: This communication and any attachments are intended for > the above-named persons only and may be confidential and/or legally > privileged. Any opinions expressed in this communication are not > necessarily those of NICE Actimize. If this communication has come to you > in error you must take no action based on it, nor must you copy or show it > to anyone; please delete/destroy and inform the sender by e-mail > immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. >