We use EMR instead of ECS but if that’s an option for your team, you can 
configure auto scaling rules in your cloud formation so that your task/job load 
dynamically controls cluster sizing.

Sent from my iPhone

> On Nov 8, 2019, at 1:40 AM, Navneeth Krishnan <reachnavnee...@gmail.com> 
> wrote:
> 
> Hello All,
> 
> I have a streaming job running in production which is processing over 2
> billion events per day and it does some heavy processing on each event. We
> have been facing some challenges in managing flink in production like
> scaling in and out, restarting the job with savepoint etc. Flink provides a
> lot of features which seemed as an obvious choice at that time but now with
> all the operational overhead we are thinking should we still use flink for
> our stream processing requirements or choose kafka streams.
> 
> We currently deploy flink on ECR. Bringing up a new cluster for another
> stream job is too expensive but on the flip side running it on the same
> cluster becomes difficult since there are no ways to say this job has to be
> run on a dedicated server versus this can run on a shared instance. Also
> savepoint point, cancel and submit a new job results in some downtime. The
> most critical part being there is no shared state among all tasks sort of a
> global state. We sort of achieve this today using an external redis cache
> but that incurs cost as well.
> 
> If we are moving to kafka streams, it makes our deployment life much
> easier, each new stream job will be a microservice that can scale
> independently. With global state it's much easier to share state without
> using external cache. But the disadvantage is we have to rely on the
> partitions for parallelism. Although this might initially sound easier,
> when we need to scale much higher this will become a bottleneck.
> 
> Do you guys have any suggestions on this? We need to decide which way to
> move forward and any suggestions would be of much greater help.
> 
> Thanks

Reply via email to