Re: Broadcast data sent increases with # slots per TM

2016-07-22 Thread Till Rohrmann
>>> > > > As far as I know, the reason why the broadcast variables are > >>> > implemented > >>> > > that way is that the senders would have to know which sub-tasks are > >>> > > deployed to which TMs. > >>> > > > >>> > > As the broadcast var

Re: Broadcast data sent increases with # slots per TM

2016-07-07 Thread Felix Neutatz
uming that the same behavior will apply for >> broadcast >> > > joins as well. >> > > >> > > Is this the case? >> > > >> > > Regards, >> > > Alexander >> > > >> > > >> > >

[jira] [Created] (FLINK-4175) Broadcast data sent increases with # slots per TM

2016-07-07 Thread Felix Neutatz (JIRA)
Felix Neutatz created FLINK-4175: Summary: Broadcast data sent increases with # slots per TM Key: FLINK-4175 URL: https://issues.apache.org/jira/browse/FLINK-4175 Project: Flink Issue Type

Re: Broadcast data sent increases with # slots per TM

2016-06-09 Thread Felix Neutatz
> > > > > > Regards, > > > Alexander > > > > > > > > > 2016-06-08 17:13 GMT+02:00 Kunft, Andreas <andreas.ku...@tu-berlin.de > >: > > > > > > > Hi Till, > > > > > > > > thanks for the

Re: Broadcast data sent increases with # slots per TM

2016-06-09 Thread Stephan Ewen
; > thanks for the fast answer. > > > I'll think about a concrete way of implementing and open an JIRA. > > > > > > Best > > > Andreas > > > > > > Von: Till Rohrmann <trohrm...@apache.org> > > &

Re: Broadcast data sent increases with # slots per TM

2016-06-09 Thread Till Rohrmann
; > I'll think about a concrete way of implementing and open an JIRA. > > > > Best > > Andreas > > > > Von: Till Rohrmann <trohrm...@apache.org> > > Gesendet: Mittwoch, 8. Juni 2016 15:53 > > An: dev@flink.apache.org > >

Re: Broadcast data sent increases with # slots per TM

2016-06-08 Thread Alexander Alexandrov
pen an JIRA. > > Best > Andreas > > Von: Till Rohrmann <trohrm...@apache.org> > Gesendet: Mittwoch, 8. Juni 2016 15:53 > An: dev@flink.apache.org > Betreff: Re: Broadcast data sent increases with # slots per TM > > Hi Andreas, > > your obs

AW: Broadcast data sent increases with # slots per TM

2016-06-08 Thread Kunft, Andreas
t data sent increases with # slots per TM Hi Andreas, your observation is correct. The data is sent to each slot and the receiving TM only materializes one copy of the data. The rest of the data is discarded. As far as I know, the reason why the broadcast variables are implemented th

Re: Broadcast data sent increases with # slots per TM

2016-06-08 Thread Till Rohrmann
Hi Andreas, your observation is correct. The data is sent to each slot and the receiving TM only materializes one copy of the data. The rest of the data is discarded. As far as I know, the reason why the broadcast variables are implemented that way is that the senders would have to know which

Broadcast data sent increases with # slots per TM

2016-06-08 Thread Kunft, Andreas
Hi, we experience some unexpected increase of data sent over the network for broadcasts with increasing number of slots per Taskmanager. We provided a benchmark [1]. It not only increases the size of data sent over the network but also hurts performance as seen in the preliminary results