RE: Distribution of sinks among the nodes

Gwenhael Pasquiers Thu, 04 Feb 2016 00:55:34 -0800

Don’t we need to set the number of slots to 24 (4 sources + 16 mappers + 4 
sinks) ?

Or is there a way not to set the number of slots per TaskManager instead of 
globally so that they are at least equally dispatched among the nodes ?

As for the sink deployment : that’s not good news ; I mean we will have a 
non-negligible overhead : all the data generated by 3 of the 4 nodes will be 
sent to a third node instead of being sent to the “local” sink. Network I/O 
have a price.

Do you have some sort of “topology” feature coming in the roadmap ? Maybe a 
listener on the JobManager / env that would be trigerred, asking usk on which 
node we would prefer each node to be deployed. That way you keep the standard 
behavior, don’t have to make a complicated generic-optimized algorithm, and let 
the user make it’s choices. Should I create a JIRA ?

For the time being we could start the application 4 time : one time per node, 
put that’s not pretty at all ☺

B.R.

From: Till Rohrmann [mailto:trohrm...@apache.org]
Sent: mercredi 3 février 2016 17:58
To: user@flink.apache.org
Subject: Re: Distribution of sinks among the nodes

Hi Gwenhäel,

if you set the number of slots for each TaskManager to 4, then all of your 
mapper will be evenly spread out. The sources should also be evenly spread out. 
However, for the sinks since they depend on all mappers, it will be most likely 
random where they are deployed. So you might end up with 4 sink tasks on one 
machine.

Cheers,
Till

On Wed, Feb 3, 2016 at 4:31 PM, Gwenhael Pasquiers 
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>> wrote:
It is one type of mapper with a parallelism of 16
It's the same for the sinks and sources (parallelism of 4)

The settings are
Env.setParallelism(4)
Mapper.setPrallelism(env.getParallelism() * 4)

We mean to have X mapper tasks per source / sink

The mapper is doing some heavy computation and we have only 4 kafka partitions. 
That's why we need more mappers than sources / sinks

-----Original Message-----
From: Aljoscha Krettek [mailto:aljos...@apache.org<mailto:aljos...@apache.org>]
Sent: mercredi 3 février 2016 16:26
To: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: Distribution of sinks among the nodes

Hi Gwenhäel,
when you say 16 maps, are we talking about one mapper with parallelism 16 or 16 
unique map operators?

Regards,
Aljoscha
> On 03 Feb 2016, at 15:48, Gwenhael Pasquiers 
> <gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>> 
> wrote:
>
> Hi,
>
> We try to deploy an application with the following “architecture” :
>
> 4 kafka sources => 16 maps => 4 kafka sinks, on 4 nodes, with 24 slots (we 
> disabled operator chaining).
>
> So we’d like on each node :
> 1x source => 4x map => 1x sink
>
> That way there are no exchanges between different instances of flink and 
> performances would be optimal.
>
> But we get (according to the flink GUI and the Host column when looking at 
> the details of each task) :
>
> Node 1 : 1 source =>  2 map
> Node 2 : 1 source =>  1 map
> Node 3 : 1 source =>  1 map
> Node 4 : 1 source =>  12 maps => 4 sinks
>
> (I think no comments are needed J)
>
> The the Web UI says that there are 24 slots and they are all used but they 
> don’t seem evenly dispatched …
>
> How could we make Flink deploy the tasks the way we want ?
>
> B.R.
>
> Gwen’

RE: Distribution of sinks among the nodes

Reply via email to