Re: Unbalanced distribution of keyed stream to downstream parallel operators

2022-04-07 Thread Isidoros Ioannou
Hello Arvid , thank you for your reply. Actually using a window to aggregate the events for a time period is not applicable to my case since I need the records to be processed immediately. Even if I could I still can not understand how I could forward the aggregated events to lets say 2 parallel

Re: Unbalanced distribution of keyed stream to downstream parallel operators

2022-04-04 Thread Arvid Heise
You should create a histogram over the keys of the records. If you see a skew, one way to go about it is to refine the key or split aggregations. For example, consider you want to count events per users and 2 users are actually bots spamming lots of events accounting for 50% of all events. Then,

Re: Unbalanced distribution of keyed stream to downstream parallel operators

2022-04-04 Thread Isidoros Ioannou
Hello Qingsheng, thank you a lot for your answer. I will try to modify the key as you mentioned in your first assumption. In case the second assumption is valid also, what would you propose to remedy the situation? Try to experiment with different values of max parallelism? Στις Σάβ 2 Απρ 2022

Re: Unbalanced distribution of keyed stream to downstream parallel operators

2022-04-01 Thread Qingsheng Ren
Hi Isidoros, Two assumptions in my mind: 1. Records are not evenly distributed across different keys, e.g. some accountId just has more events than others. If the record distribution is predicable, you can try to combine other fields or include more information into the key field to help

Unbalanced distribution of keyed stream to downstream parallel operators

2022-04-01 Thread Isidoros Ioannou
Hello, we ran a flink application version 1.13.2 that consists of a kafka source with one partition so far then we filter the data based on some conditions, mapped to POJOS and we transform to a KeyedStream based on an accountId long property from the POJO. The downstream operators are 10 CEP