If you do "rebalance()", it will redistribute elements round-robin fashion,
which should give you very even partition sizes.


On Tue, Jun 23, 2015 at 11:51 AM, Maximilian Alber <
alber.maximil...@gmail.com> wrote:

> Thank you!
>
> Still I cannot guarantee the size of each partition, or can I?
> Something like randomSplit in Spark.
>
> Cheers,
> Max
>
> On Mon, Jun 15, 2015 at 5:46 PM, Matthias J. Sax <
> mj...@informatik.hu-berlin.de> wrote:
>
>> Hi,
>>
>> using partitionCustom, the data distribution depends only on your
>> probability distribution. If it is uniform, you should be fine (ie,
>> choosing the channel like
>>
>> > private final Random random = new Random(System.currentTimeMillis());
>> > int partition(K key, int numPartitions) {
>> >   return random.nextInt(numPartitions);
>> > }
>>
>> should do the trick.
>>
>> -Matthias
>>
>> On 06/15/2015 05:41 PM, Maximilian Alber wrote:
>> > Thanks!
>> >
>> > Ok, so for a random shuffle I need partitionCustom. But in that case the
>> > data might be out of balance then?
>> >
>> > For the splitting. Is there no way to have exact sizes?
>> >
>> > Cheers,
>> > Max
>> >
>> > On Mon, Jun 15, 2015 at 2:26 PM, Till Rohrmann <trohrm...@apache.org
>> > <mailto:trohrm...@apache.org>> wrote:
>> >
>> >     Hi Max,
>> >
>> >     you can always shuffle your elements using the |rebalance| method.
>> >     What Flink here does is to distribute the elements of each partition
>> >     among all available TaskManagers. This happens in a round-robin
>> >     fashion and is thus not completely random.
>> >
>> >     A different mean is the |partitionCustom| method which allows you to
>> >     specify for each element to which partition it shall be sent. You
>> >     would have to specify a |Partitioner| to do this.
>> >
>> >     For the splitting there is at moment no syntactic sugar. What you
>> >     can do, though, is to assign each item a split ID and then use a
>> >     |filter| operation to filter the individual splits. Depending on you
>> >     split ID distribution you will have differently sized splits.
>> >
>> >     Cheers,
>> >     Till
>> >
>> >     On Mon, Jun 15, 2015 at 1:50 PM Maximilian Alber
>> >     alber.maximil...@gmail.com
>> >     <http://mailto:alber.maximil...@gmail.com> wrote:
>> >
>> >         Hi Flinksters,
>> >
>> >         I would like to shuffle my elements in the data set and then
>> >         split it in two according to some ratio. Each element in the
>> >         data set has an unique id. Is there a nice way to do it with the
>> >         flink api?
>> >         (It would be nice to have guaranteed random shuffling.)
>> >         Thanks!
>> >
>> >         Cheers,
>> >         Max
>> >
>> >     ​
>> >
>> >
>>
>>
>

Reply via email to