Best configurations for bounded and unbounded streams pipelines

Cam Mach Tue, 09 Jul 2019 21:39:28 -0700

Hello Flink experts,
 
I believe the question below has been already asked, but since I couldn't find 
my answer from internet, I'd love to reach out the community for help.


We basically want to find out the best configurations for Flink that running on 
Kubernetes to achieve the best performance. Thinks like what are the parameters 
to tun e.g. number of Task Manager? number of task slot? parallelism? .... 

Our use case:
We have around one terabyte of data from legacy systems, and want to stream 
them to cloud. Our pipeline has 2 sources (one from Kinesis, and the other from 
SQL), one operator (that join the two sources by key), and a sink  
We like to enable RocksDb and checkpointing to S3. We're also looking for what 
is the best windowing strategy that can be applied in this scenario?

Assuming resources is not a constraints (since we can scale out easily in AWS's 
Kubernetes)

Appreciate if you can help or give us some pointers.

Thanks,

Best configurations for bounded and unbounded streams pipelines

Reply via email to