Re: How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks?

Theo Diefenthal Thu, 16 Apr 2020 06:44:04 -0700

Hi, 

I think you could utilize AsyncIO in your case with just using a local thread 
pool [1].


Best regards 
Theo 

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html
 


Von: "Elkhan Dadashov" <elkhan.dadas...@gmail.com> 
An: "user" <user@flink.apache.org> 
Gesendet: Donnerstag, 16. April 2020 10:37:55 
Betreff: How to scale a streaming Flink pipeline without abusing parallelism 
for long computation tasks? 

Hi Flink users, 
I have a basic Flnk pipeline, doing flatmap. 

inside flatmap, I get the input, path it to the client library to compute some 
result. 

That library execution takes around 30 seconds to 2 minutes (depending on the 
input ) for producing the output from the given input ( it is time-series based 
long-running computation). 

As it takes the library long time to compute, the input payloads keep buffered, 
and if not given enough parallelism, the job will crash/restart. 
(java.lang.RuntimeException: Buffer pool is destroyed.) 

Wanted to check what are other options for scaling Flink streaming pipeline 
without abusing parallelism for long-running computations in Flink operator? 

Is multi-threading inside the operator recommended? ( even though the single 
input computation takes a long time, but I can definitely run 4-8 of them in 
parallel threads, instead of one by one, inside the same FlatMap operator. 

1 core for each yarn slot ( which will hold 1 flatmap operator) seems too 
expensive. If we could launch more link operators with only 1 core, it could 
have been easier. 

If anyone faced a similar issue please share your experience. I'm using Flink 
1..6.3 version. 

Thanks.

Re: How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks?

Reply via email to