If you do "rebalance()", it will redistribute elements round-robin fashion, which should give you very even partition sizes.
On Tue, Jun 23, 2015 at 11:51 AM, Maximilian Alber < [email protected]> wrote: > Thank you! > > Still I cannot guarantee the size of each partition, or can I? > Something like randomSplit in Spark. > > Cheers, > Max > > On Mon, Jun 15, 2015 at 5:46 PM, Matthias J. Sax < > [email protected]> wrote: > >> Hi, >> >> using partitionCustom, the data distribution depends only on your >> probability distribution. If it is uniform, you should be fine (ie, >> choosing the channel like >> >> > private final Random random = new Random(System.currentTimeMillis()); >> > int partition(K key, int numPartitions) { >> > return random.nextInt(numPartitions); >> > } >> >> should do the trick. >> >> -Matthias >> >> On 06/15/2015 05:41 PM, Maximilian Alber wrote: >> > Thanks! >> > >> > Ok, so for a random shuffle I need partitionCustom. But in that case the >> > data might be out of balance then? >> > >> > For the splitting. Is there no way to have exact sizes? >> > >> > Cheers, >> > Max >> > >> > On Mon, Jun 15, 2015 at 2:26 PM, Till Rohrmann <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > Hi Max, >> > >> > you can always shuffle your elements using the |rebalance| method. >> > What Flink here does is to distribute the elements of each partition >> > among all available TaskManagers. This happens in a round-robin >> > fashion and is thus not completely random. >> > >> > A different mean is the |partitionCustom| method which allows you to >> > specify for each element to which partition it shall be sent. You >> > would have to specify a |Partitioner| to do this. >> > >> > For the splitting there is at moment no syntactic sugar. What you >> > can do, though, is to assign each item a split ID and then use a >> > |filter| operation to filter the individual splits. Depending on you >> > split ID distribution you will have differently sized splits. >> > >> > Cheers, >> > Till >> > >> > On Mon, Jun 15, 2015 at 1:50 PM Maximilian Alber >> > [email protected] >> > <http://mailto:[email protected]> wrote: >> > >> > Hi Flinksters, >> > >> > I would like to shuffle my elements in the data set and then >> > split it in two according to some ratio. Each element in the >> > data set has an unique id. Is there a nice way to do it with the >> > flink api? >> > (It would be nice to have guaranteed random shuffling.) >> > Thanks! >> > >> > Cheers, >> > Max >> > >> > >> > >> > >> >> >
