I know this seems a silly question but I am trying to figure out optimal set
up for our flink jobs. 
We are using standalone cluster with 5 jobs. Each job has 3 asynch operators
with Executors with thread counts of 20,20,100. Source is kafka and
cassandra and rest sinks exist.
Currently we are using parallelism = 1.  So, at max load a single job spans
at least 140 threads. Also we are using netty based libraries for cassandra
and restcalls . (As I can see in thread dump flink also uses netty server).

What we see is that total thread count adds up to ~ 500 for a single job.

The issue we faced is, all of a sudden all jobs began to fail in production
and we saw that it was mainly due to ulimit user process. All jobs did
started in one server in cluster ( I do not know why, as it is a cluster
with 3 members).

It was set to around 1500 in that server. We then set a higher value and
problems seem to go away.

Can you recommend an optional prod setting for standalone cluster? Or should
there be a max limit on threads spawned by a single job?

Regards



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to