Re: KeyBy/Rebalance overhead?

2019-12-10 Thread Komal Mariam
Thank you so much for the detailed reply. I understand the usage for keyBy a lot better now. You are correct about the time variation too. We will apply different network settings and extend our datasets to check performance on different use cases. On Mon, 9 Dec 2019 at 20:45, Arvid Heise wrote:

Re: KeyBy/Rebalance overhead?

2019-12-09 Thread Arvid Heise
Hi Komal, as a general rule of thumb, you want to avoid network shuffles as much as possible. As vino pointed out, you need to reshuffle, if you need to group by key. Another frequent usecase is for a rebalancing of data in case of a heavy skew. Since neither applies to you, removing the keyby is

Re: KeyBy/Rebalance overhead?

2019-12-09 Thread vino yang
Hi Komal, Actually, the main factor about choosing the type of the partition depends on your business logic. If you want to do some aggregation logic based on a group. You must choose KeyBy to guarantee the correctness semantics. Best, Vino Komal Mariam 于2019年12月9日周一 下午5:07写道: > Thank you @vin

Re: KeyBy/Rebalance overhead?

2019-12-09 Thread Komal Mariam
Thank you @vino yang for the reply. I suspect keyBy will beneficial in those cases where my subsequent operators are computationally intensive. Their computation time being > than network reshuffling cost. Regards, Komal On Mon, 9 Dec 2019 at 15:23, vino yang wrote: > Hi Komal, > > KeyBy(Hash

Re: KeyBy/Rebalance overhead?

2019-12-08 Thread vino yang
Hi Komal, KeyBy(Hash Partition, logically partition) and rebalance(physical partition) are both one of the partitions been supported by Flink.[1] Generally speaking, partitioning may cause network communication(network shuffles) costs which may cause more time cost. The example provided by you ma

Re: KeyBy/Rebalance overhead?

2019-12-08 Thread Komal Mariam
Anyone? On Fri, 6 Dec 2019 at 19:07, Komal Mariam wrote: > Hello everyone, > > I want to get some insights on the KeyBy (and Rebalance) operations as > according to my understanding they partition our tasks over the defined > parallelism and thus should make our pipeline faster. > > I am reading

KeyBy/Rebalance overhead?

2019-12-06 Thread Komal Mariam
Hello everyone, I want to get some insights on the KeyBy (and Rebalance) operations as according to my understanding they partition our tasks over the defined parallelism and thus should make our pipeline faster. I am reading a topic which contains 170,000,000 pre-stored records with 11 Kafka par