Re: Flink cluster deployment strategy

Arvid Heise Thu, 13 Aug 2020 07:26:32 -0700

Hi Sidhant,

If you are starting fresh with Flink, I strongly recommend to skip ECS and
EMR and directly go to a kubernetes-based solution. Scaling is much easier
on K8s, there will be some kind of autoscaling coming in the next release,
and the best of it all: you even have the option to go to a different cloud
provider if needed.


The easiest option for you is to use EKS on AWS together with Ververica
community edition [1] or with one of the many kubernetes operators.

[1] https://www.ververica.com/getting-started

On Tue, Aug 11, 2020 at 3:23 PM Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Sidhant,
>
> see the inline comments for answers
>
> On Tue, Aug 11, 2020 at 3:10 PM sidhant gupta <sidhan...@gmail.com> wrote:
>
>> Hi Till,
>>
>> Thanks for your response.
>> I have few queries though as mentioned below:
>> (1) Can flink be used in map-reduce fashion with data streaming api ?
>>
>
> What do you understand as map-reduce fashion? You can use Flink's DataSet
> API for processing batch workloads (consisting not only of map and reduce
> operations but also other operations such as groupReduce, flatMap, etc.).
> Flink's DataStream API can be used to process bounded and unbounded
> streaming data.
>
> (2) Does it make sense to use aws EMR if we are not using flink in
>> map-reduce fashion with streaming api ?
>>
>
> I think I don't fully understand what you mean with map-reduce fashion. Do
> you mean multiple stages of map and reduce operations?
>
>
>> (3) Can flink cluster be auto scaled using EMR Managed Scaling when used
>> with yarn as per this link
>> https://aws.amazon.com/blogs/big-data/introducing-amazon-emr-managed-scaling-automatically-resize-clusters-to-lower-cost/
>>  ?
>>
>
> I am no expert on EMR managed scaling but I believe that it would need
> some custom tooling to scale a Flink job down (by taking a savepoint a
> resuming from it with a lower parallelism) before downsizing the EMR
> cluster.
>
>
>> (4) If we set an explicit max parallelism, and set current parallelism
>> (which might be less than the max parallelism) equal to the maximum number
>> of slots and set slots per task manager while starting the yarn session,
>> then if we increase the task manager as per auto scaling then does the
>> parallelism would increase (till the max parallelism ) and the load would
>> be distributed across the newly spined up task manager ? Refer:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/production_ready.html#set-an-explicit-max-parallelism
>>
>>
>
> At the moment, Flink does not support this out of the box but the
> community is working on this feature.
>
>>
>> Regards
>> Sidhant Gupta
>>
>> On Tue, 11 Aug, 2020, 5:19 PM Till Rohrmann, <trohrm...@apache.org>
>> wrote:
>>
>>> Hi Sidhant,
>>>
>>> I am not an expert on AWS services but I believe that EMR might be a bit
>>> easier to start with since AWS EMR comes with Flink support out of the box
>>> [1]. On ECS I believe that you would have to set up the containers
>>> yourself. Another interesting deployment option could be to use Flink's
>>> native Kubernetes integration [2] which would work on AWS EKS.
>>>
>>> [1]
>>> https://docs.aws.amazon.com/emr/latest/ReleaseGuide/flink-create-cluster.html
>>> [2]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Aug 11, 2020 at 9:16 AM sidhant gupta <sidhan...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm kind of new to flink cluster deployment. I wanted to know which
>>>> flink
>>>> cluster deployment and which job mode in aws is better in terms of ease
>>>> of
>>>> deployment, maintenance, HA, cost, etc. As of now I am considering aws
>>>> EMR
>>>> vs ECS (docker containers). We have a usecase of setting up a data
>>>> streaming api which reads records from a Kafka topic, process it and
>>>> then
>>>> write to a another Kafka topic. Please let me know your thoughts on
>>>> this.
>>>>
>>>> Thanks
>>>> Sidhant Gupta
>>>>
>>>

-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: Flink cluster deployment strategy

Reply via email to