[DISCUSS] FLIP-159: Reactive Mode

2021-01-22 Thread Robert Metzger
Hi all, Till started a discussion about FLIP-160: Declarative scheduler [1] earlier today, the first major feature based on that effort will be FLIP-159: Reactive Mode. It allows users to operate Flink in a way that it reactively scales the job up or down depending on the provided resources: addin

Re: [DISCUSS] FLIP-159: Reactive Mode

2021-01-22 Thread Steven Wu
Thanks a lot for the proposal, Robert and Till. > No fixed parallelism for any of the operators Regarding this limitation, can the scheduler only adjust the default parallelism? if some operators set parallelism explicitly (like always 1), just leave them unchanged. On Fri, Jan 22, 2021 at 8:42

Re: [DISCUSS] FLIP-159: Reactive Mode

2021-01-24 Thread Xintong Song
Thanks for preparing the FLIP and starting the discussion, Robert. ## Cluster vs. Job configuration As I have commented on the FLIP-160 discussion thread [1], I'm a bit unsure about activating the reactive execution mode via a cluster level configuration option. I'm aware that in the first step th

Re: [DISCUSS] FLIP-159: Reactive Mode

2021-01-24 Thread Yang Wang
Thanks Robert for creating this FLIP and starting the discussion. This is a great start point to make Flink work with auto scaling service. The reactive mode is very useful in containerized environment(e.g. docker, Kubernetes). For example, combined with Kubernetes "Horizontal Pod Autoscaler"[1],

Re: [DISCUSS] FLIP-159: Reactive Mode

2021-01-25 Thread Robert Metzger
Thank you very much for the comments so far. @Steven: No fixed parallelism for any of the operators > > Regarding this limitation, can the scheduler only adjust the default > parallelism? if some operators set parallelism explicitly (like always 1), > just leave them unchanged. We will respect

Re: [DISCUSS] FLIP-159: Reactive Mode

2021-01-25 Thread Xintong Song
## configuration option I see your point that autoscaling mode might be more suitable for session clusters. It doesn't change that `execution-mode` could be a job-level configuration. But I'm good with keeping it cluster-level and marking it experimental at the moment, so we can change it later if

Re: [DISCUSS] FLIP-159: Reactive Mode

2021-01-26 Thread Robert Metzger
Thanks for your thoughts Xintong! What you are writing is very valuable feedback for me, as I have limited experience with real-world deployments. It seems that autoscaling support is a really important follow up. ## active resource managers I guess you can consider reactive mode a special case o

Re: [DISCUSS] FLIP-159: Reactive Mode

2021-01-26 Thread Xintong Song
Thanks for the explanation, Robert. Now I see how these things are expected to be supported in steps. I think you are right. Demanding a fixed finite amount of resources can be considered as a special case of `ScalingPolicy`. I'm now good with the current scope of reactive mode as a first step, a

Re: [DISCUSS] FLIP-159: Reactive Mode

2021-01-26 Thread Till Rohrmann
Thanks a lot for all the feedback Steven, Yang Wang and Xintong. I have a few more comments to add. # Keep it simple and stupid As Robert said we would like to keep the new feature initially as simple as possible in order to quickly implement it. Once we have a basic implementation, we want to re

Re: [DISCUSS] FLIP-159: Reactive Mode

2021-01-26 Thread Yang Wang
Thanks Robert and Till for the thorough explanation. Now I understand key difference between reactive mode and auto scaling mode. For the latter, we could dynamically adjust the desired value based on monitoring the metrics(e.g. cpu, memory, latency, delay, etc.). Since the reactive is simpler and

Re: [DISCUSS] FLIP-159: Reactive Mode

2021-01-27 Thread Robert Metzger
The discussion has been open for 6 days, and it seems that all questions and concerns raised so far have been addressed. I will start a VOTE thread for FLIP-159 now. On Tue, Jan 26, 2021 at 3:45 PM Yang Wang wrote: > Thanks Robert and Till for the thorough explanation. > > Now I understand key