Hello!

I'm trying to figure out how it happens: I'm having a program reading from
multiple socketTextStream and these text streams feed into different data
flow (and these data streams never connect in my job). It looks something
similar to below:

for(int i =0; i< hosts.length; i++) {

    DataStream<String> someStream = env.socketTextStream(hosts[i],
ports[i]);
    DataStream<Tuple2<String, String>> joinedAdImpressions =
rawMessageStream.rebalance() ...

However, when I run the job on a cluster I found that all source task have
been scheduled to one machine so the machine becomes a severe bottleneck
for the performance. Any ideas how would this happen?

Thanks!

Reply via email to