Thanks, I will check it out. On Thu, 13 Aug, 2020, 7:55 PM Arvid Heise, <ar...@ververica.com> wrote:
> Hi Sidhant, > > If you are starting fresh with Flink, I strongly recommend to skip ECS and > EMR and directly go to a kubernetes-based solution. Scaling is much easier > on K8s, there will be some kind of autoscaling coming in the next release, > and the best of it all: you even have the option to go to a different cloud > provider if needed. > > The easiest option for you is to use EKS on AWS together with Ververica > community edition [1] or with one of the many kubernetes operators. > > [1] https://www.ververica.com/getting-started > > On Tue, Aug 11, 2020 at 3:23 PM Till Rohrmann <trohrm...@apache.org> > wrote: > >> Hi Sidhant, >> >> see the inline comments for answers >> >> On Tue, Aug 11, 2020 at 3:10 PM sidhant gupta <sidhan...@gmail.com> >> wrote: >> >>> Hi Till, >>> >>> Thanks for your response. >>> I have few queries though as mentioned below: >>> (1) Can flink be used in map-reduce fashion with data streaming api ? >>> >> >> What do you understand as map-reduce fashion? You can use Flink's DataSet >> API for processing batch workloads (consisting not only of map and reduce >> operations but also other operations such as groupReduce, flatMap, etc.). >> Flink's DataStream API can be used to process bounded and unbounded >> streaming data. >> >> (2) Does it make sense to use aws EMR if we are not using flink in >>> map-reduce fashion with streaming api ? >>> >> >> I think I don't fully understand what you mean with map-reduce fashion. >> Do you mean multiple stages of map and reduce operations? >> >> >>> (3) Can flink cluster be auto scaled using EMR Managed Scaling when used >>> with yarn as per this link >>> https://aws.amazon.com/blogs/big-data/introducing-amazon-emr-managed-scaling-automatically-resize-clusters-to-lower-cost/ >>> ? >>> >> >> I am no expert on EMR managed scaling but I believe that it would need >> some custom tooling to scale a Flink job down (by taking a savepoint a >> resuming from it with a lower parallelism) before downsizing the EMR >> cluster. >> >> >>> (4) If we set an explicit max parallelism, and set current parallelism >>> (which might be less than the max parallelism) equal to the maximum number >>> of slots and set slots per task manager while starting the yarn session, >>> then if we increase the task manager as per auto scaling then does the >>> parallelism would increase (till the max parallelism ) and the load would >>> be distributed across the newly spined up task manager ? Refer: >>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/production_ready.html#set-an-explicit-max-parallelism >>> >>> >> >> At the moment, Flink does not support this out of the box but the >> community is working on this feature. >> >>> >>> Regards >>> Sidhant Gupta >>> >>> On Tue, 11 Aug, 2020, 5:19 PM Till Rohrmann, <trohrm...@apache.org> >>> wrote: >>> >>>> Hi Sidhant, >>>> >>>> I am not an expert on AWS services but I believe that EMR might be a >>>> bit easier to start with since AWS EMR comes with Flink support out of the >>>> box [1]. On ECS I believe that you would have to set up the containers >>>> yourself. Another interesting deployment option could be to use Flink's >>>> native Kubernetes integration [2] which would work on AWS EKS. >>>> >>>> [1] >>>> https://docs.aws.amazon.com/emr/latest/ReleaseGuide/flink-create-cluster.html >>>> [2] >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html >>>> >>>> Cheers, >>>> Till >>>> >>>> On Tue, Aug 11, 2020 at 9:16 AM sidhant gupta <sidhan...@gmail.com> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I'm kind of new to flink cluster deployment. I wanted to know which >>>>> flink >>>>> cluster deployment and which job mode in aws is better in terms of >>>>> ease of >>>>> deployment, maintenance, HA, cost, etc. As of now I am considering aws >>>>> EMR >>>>> vs ECS (docker containers). We have a usecase of setting up a data >>>>> streaming api which reads records from a Kafka topic, process it and >>>>> then >>>>> write to a another Kafka topic. Please let me know your thoughts on >>>>> this. >>>>> >>>>> Thanks >>>>> Sidhant Gupta >>>>> >>>> > > -- > > Arvid Heise | Senior Java Developer > > <https://www.ververica.com/> > > Follow us @VervericaData > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time > > -- > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > -- > Ververica GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji > (Toni) Cheng >