Hi, using partitionCustom, the data distribution depends only on your probability distribution. If it is uniform, you should be fine (ie, choosing the channel like
> private final Random random = new Random(System.currentTimeMillis());
> int partition(K key, int numPartitions) {
> return random.nextInt(numPartitions);
> }
should do the trick.
-Matthias
On 06/15/2015 05:41 PM, Maximilian Alber wrote:
> Thanks!
>
> Ok, so for a random shuffle I need partitionCustom. But in that case the
> data might be out of balance then?
>
> For the splitting. Is there no way to have exact sizes?
>
> Cheers,
> Max
>
> On Mon, Jun 15, 2015 at 2:26 PM, Till Rohrmann <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hi Max,
>
> you can always shuffle your elements using the |rebalance| method.
> What Flink here does is to distribute the elements of each partition
> among all available TaskManagers. This happens in a round-robin
> fashion and is thus not completely random.
>
> A different mean is the |partitionCustom| method which allows you to
> specify for each element to which partition it shall be sent. You
> would have to specify a |Partitioner| to do this.
>
> For the splitting there is at moment no syntactic sugar. What you
> can do, though, is to assign each item a split ID and then use a
> |filter| operation to filter the individual splits. Depending on you
> split ID distribution you will have differently sized splits.
>
> Cheers,
> Till
>
> On Mon, Jun 15, 2015 at 1:50 PM Maximilian Alber
> [email protected]
> <http://mailto:[email protected]> wrote:
>
> Hi Flinksters,
>
> I would like to shuffle my elements in the data set and then
> split it in two according to some ratio. Each element in the
> data set has an unique id. Is there a nice way to do it with the
> flink api?
> (It would be nice to have guaranteed random shuffling.)
> Thanks!
>
> Cheers,
> Max
>
>
>
>
signature.asc
Description: OpenPGP digital signature
