Re: kafka partitions, data locality

2019-04-30 Thread Fabian Hueske
efan Richter [mailto:s.rich...@ververica.com] > *Sent:* Friday, April 26, 2019 11:15 AM > *To:* Smirnov Sergey Vladimirovich (39833) > *Cc:* Dawid Wysakowicz ; Ken Krugler < > kkrugler_li...@transpac.com>; user@flink.apache.org; d...@flink.apache.org > *Subject:* Re: kafk

RE: kafka partitions, data locality

2019-04-29 Thread Smirnov Sergey Vladimirovich (39833)
rugler mailto:kkrugler_li...@transpac.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org>; d...@flink.apache.org<mailto:d...@flink.apache.org> Subject: Re: kafka partitions, data locality Hi Smirnov, Actually there is a way to tell Flink that data is already partitioned. You can

Re: kafka partitions, data locality

2019-04-26 Thread Stefan Richter
> Cc: user@flink.apache.org; d...@flink.apache.org > Subject: Re: kafka partitions, data locality > > Hi Smirnov, > > Actually there is a way to tell Flink that data is already partitioned. You > can try the reinterpretAsKeyedStream[1] method. I must warn you though this &g

RE: kafka partitions, data locality

2019-04-26 Thread Smirnov Sergey Vladimirovich (39833)
[mailto:kkrugler_li...@transpac.com] Sent: Wednesday, April 17, 2019 9:23 PM To: Smirnov Sergey Vladimirovich (39833) <mailto:s.smirn...@tinkoff.ru> Subject: Re: kafka partitions, data locality Hi Sergey, As you surmised, once you do a keyBy/max on the Kafka topic, to group by clientId and find the max

Re: kafka partitions, data locality

2019-04-25 Thread Dawid Wysakowicz
; With best regards, > > Sergey > > *From:*Ken Krugler [mailto:kkrugler_li...@transpac.com] > *Sent:* Wednesday, April 17, 2019 9:23 PM > *To:* Smirnov Sergey Vladimirovich (39833) > *Subject:* Re: kafka partitions, data locality > >   > > Hi Sergey, > >   >

RE: kafka partitions, data locality

2019-04-19 Thread Smirnov Sergey Vladimirovich (39833)
: kafka partitions, data locality Hi Sergey, As you surmised, once you do a keyBy/max on the Kafka topic, to group by clientId and find the max, then the topology will have a partition/shuffle to it. This is because Flink doesn’t know that client ids don’t span Kafka partitions. I don’t know of