Re: Partitioning key range

Fabian Hueske Mon, 08 Apr 2019 01:03:08 -0700

Hi Davood,

Flink uses hash partitioning to assign keys to key groups. Each key group
is then assigned to a task for processing (a task might process multiple
key groups).
There is no way to directly assign a key to a particular key group or task.
All you can do is to experiment with different custom KeySelectors which
return keys that are hashed into different key groups.


Best, Fabian

Am Sa., 6. Apr. 2019 um 11:43 Uhr schrieb Congxian Qiu <
qcx978132...@gmail.com>:

> Hi Davood
> Maybe a custom KeySelector can be helpful, you can define the key used to
> partition the stream. You can ref the code[1] for detail.
>
> [1]
> https://github.com/apache/flink/blob/8d05e91945c6c8d83f9924c00890ccf350f1f36f/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/partitioner/KeyGroupStreamPartitioner.java#L58
>
> Best, Congxian
> On Apr 5, 2019, 06:35 +0800, Davood Rafiei <rafieidavo...@gmail.com>,
> wrote:
>
> Hi all,
>
> I partition DataStream (say dsA) with parallelism 2 and get KeyedStream
> (say ksA) with parallelism 2.
> Depending on my keys in dsA, one partition remains empty in ksA.
> For example when my keys are 10 and 20 in dsA, then both partitions in ksA
> are full.
> However, with keys 1000 and 1001, only one partition receives all of the
> upstream data in ksA.
> Is there any way to get information about key ranges for each downstream
> partitions?
> Or is there any way to overcome this issue?
> We can assume that I know all possible keys (in this case 2 different
> keys) in dsA and therefore I want all partitions in ksA to be fully
> utilized.
>
> Thanks,
> Davood
>
>

Re: Partitioning key range

Reply via email to