Re: Flink cluster deployment strategy

sidhant gupta Thu, 13 Aug 2020 07:40:33 -0700

Thanks, I will check it out.

On Thu, 13 Aug, 2020, 7:55 PM Arvid Heise, <ar...@ververica.com> wrote:


> Hi Sidhant,
>
> If you are starting fresh with Flink, I strongly recommend to skip ECS and
> EMR and directly go to a kubernetes-based solution. Scaling is much easier
> on K8s, there will be some kind of autoscaling coming in the next release,
> and the best of it all: you even have the option to go to a different cloud
> provider if needed.
>
> The easiest option for you is to use EKS on AWS together with Ververica
> community edition [1] or with one of the many kubernetes operators.
>
> [1] https://www.ververica.com/getting-started
>
> On Tue, Aug 11, 2020 at 3:23 PM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
>> Hi Sidhant,
>>
>> see the inline comments for answers
>>
>> On Tue, Aug 11, 2020 at 3:10 PM sidhant gupta <sidhan...@gmail.com>
>> wrote:
>>
>>> Hi Till,
>>>
>>> Thanks for your response.
>>> I have few queries though as mentioned below:
>>> (1) Can flink be used in map-reduce fashion with data streaming api ?
>>>
>>
>> What do you understand as map-reduce fashion? You can use Flink's DataSet
>> API for processing batch workloads (consisting not only of map and reduce
>> operations but also other operations such as groupReduce, flatMap, etc.).
>> Flink's DataStream API can be used to process bounded and unbounded
>> streaming data.
>>
>> (2) Does it make sense to use aws EMR if we are not using flink in
>>> map-reduce fashion with streaming api ?
>>>
>>
>> I think I don't fully understand what you mean with map-reduce fashion.
>> Do you mean multiple stages of map and reduce operations?
>>
>>
>>> (3) Can flink cluster be auto scaled using EMR Managed Scaling when used
>>> with yarn as per this link
>>> https://aws.amazon.com/blogs/big-data/introducing-amazon-emr-managed-scaling-automatically-resize-clusters-to-lower-cost/
>>>  ?
>>>
>>
>> I am no expert on EMR managed scaling but I believe that it would need
>> some custom tooling to scale a Flink job down (by taking a savepoint a
>> resuming from it with a lower parallelism) before downsizing the EMR
>> cluster.
>>
>>
>>> (4) If we set an explicit max parallelism, and set current parallelism
>>> (which might be less than the max parallelism) equal to the maximum number
>>> of slots and set slots per task manager while starting the yarn session,
>>> then if we increase the task manager as per auto scaling then does the
>>> parallelism would increase (till the max parallelism ) and the load would
>>> be distributed across the newly spined up task manager ? Refer:
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/production_ready.html#set-an-explicit-max-parallelism
>>>
>>>
>>
>> At the moment, Flink does not support this out of the box but the
>> community is working on this feature.
>>
>>>
>>> Regards
>>> Sidhant Gupta
>>>
>>> On Tue, 11 Aug, 2020, 5:19 PM Till Rohrmann, <trohrm...@apache.org>
>>> wrote:
>>>
>>>> Hi Sidhant,
>>>>
>>>> I am not an expert on AWS services but I believe that EMR might be a
>>>> bit easier to start with since AWS EMR comes with Flink support out of the
>>>> box [1]. On ECS I believe that you would have to set up the containers
>>>> yourself. Another interesting deployment option could be to use Flink's
>>>> native Kubernetes integration [2] which would work on AWS EKS.
>>>>
>>>> [1]
>>>> https://docs.aws.amazon.com/emr/latest/ReleaseGuide/flink-create-cluster.html
>>>> [2]
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Tue, Aug 11, 2020 at 9:16 AM sidhant gupta <sidhan...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm kind of new to flink cluster deployment. I wanted to know which
>>>>> flink
>>>>> cluster deployment and which job mode in aws is better in terms of
>>>>> ease of
>>>>> deployment, maintenance, HA, cost, etc. As of now I am considering aws
>>>>> EMR
>>>>> vs ECS (docker containers). We have a usecase of setting up a data
>>>>> streaming api which reads records from a Kafka topic, process it and
>>>>> then
>>>>> write to a another Kafka topic. Please let me know your thoughts on
>>>>> this.
>>>>>
>>>>> Thanks
>>>>> Sidhant Gupta
>>>>>
>>>>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Re: Flink cluster deployment strategy

Reply via email to