Hi Sidhant, If you are starting fresh with Flink, I strongly recommend to skip ECS and EMR and directly go to a kubernetes-based solution. Scaling is much easier on K8s, there will be some kind of autoscaling coming in the next release, and the best of it all: you even have the option to go to a different cloud provider if needed.
The easiest option for you is to use EKS on AWS together with Ververica community edition [1] or with one of the many kubernetes operators. [1] https://www.ververica.com/getting-started On Tue, Aug 11, 2020 at 3:23 PM Till Rohrmann <trohrm...@apache.org> wrote: > Hi Sidhant, > > see the inline comments for answers > > On Tue, Aug 11, 2020 at 3:10 PM sidhant gupta <sidhan...@gmail.com> wrote: > >> Hi Till, >> >> Thanks for your response. >> I have few queries though as mentioned below: >> (1) Can flink be used in map-reduce fashion with data streaming api ? >> > > What do you understand as map-reduce fashion? You can use Flink's DataSet > API for processing batch workloads (consisting not only of map and reduce > operations but also other operations such as groupReduce, flatMap, etc.). > Flink's DataStream API can be used to process bounded and unbounded > streaming data. > > (2) Does it make sense to use aws EMR if we are not using flink in >> map-reduce fashion with streaming api ? >> > > I think I don't fully understand what you mean with map-reduce fashion. Do > you mean multiple stages of map and reduce operations? > > >> (3) Can flink cluster be auto scaled using EMR Managed Scaling when used >> with yarn as per this link >> https://aws.amazon.com/blogs/big-data/introducing-amazon-emr-managed-scaling-automatically-resize-clusters-to-lower-cost/ >> ? >> > > I am no expert on EMR managed scaling but I believe that it would need > some custom tooling to scale a Flink job down (by taking a savepoint a > resuming from it with a lower parallelism) before downsizing the EMR > cluster. > > >> (4) If we set an explicit max parallelism, and set current parallelism >> (which might be less than the max parallelism) equal to the maximum number >> of slots and set slots per task manager while starting the yarn session, >> then if we increase the task manager as per auto scaling then does the >> parallelism would increase (till the max parallelism ) and the load would >> be distributed across the newly spined up task manager ? Refer: >> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/production_ready.html#set-an-explicit-max-parallelism >> >> > > At the moment, Flink does not support this out of the box but the > community is working on this feature. > >> >> Regards >> Sidhant Gupta >> >> On Tue, 11 Aug, 2020, 5:19 PM Till Rohrmann, <trohrm...@apache.org> >> wrote: >> >>> Hi Sidhant, >>> >>> I am not an expert on AWS services but I believe that EMR might be a bit >>> easier to start with since AWS EMR comes with Flink support out of the box >>> [1]. On ECS I believe that you would have to set up the containers >>> yourself. Another interesting deployment option could be to use Flink's >>> native Kubernetes integration [2] which would work on AWS EKS. >>> >>> [1] >>> https://docs.aws.amazon.com/emr/latest/ReleaseGuide/flink-create-cluster.html >>> [2] >>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html >>> >>> Cheers, >>> Till >>> >>> On Tue, Aug 11, 2020 at 9:16 AM sidhant gupta <sidhan...@gmail.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> I'm kind of new to flink cluster deployment. I wanted to know which >>>> flink >>>> cluster deployment and which job mode in aws is better in terms of ease >>>> of >>>> deployment, maintenance, HA, cost, etc. As of now I am considering aws >>>> EMR >>>> vs ECS (docker containers). We have a usecase of setting up a data >>>> streaming api which reads records from a Kafka topic, process it and >>>> then >>>> write to a another Kafka topic. Please let me know your thoughts on >>>> this. >>>> >>>> Thanks >>>> Sidhant Gupta >>>> >>> -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng