Hi All We are also seeking for a custom partitioning strategy, it will be helpful for us too.
Thanks and regards, Gowtham S On Mon, 18 Sept 2023 at 12:13, Karthick <ibmkarthickma...@gmail.com> wrote: > Thanks Liu Ron for the suggestion. > > Can you please give any pointers/Reference for the custom partitioning > strategy, we are currently using murmur hashing with the device unique id. > It would be helpful if we guide/refer any other strategies. > > Thanks and regards > Karthick. > > On Mon, Sep 18, 2023 at 9:18 AM liu ron <ron9....@gmail.com> wrote: > >> Hi, Karthick >> >> It looks like a data skewing problem, and I think one of the easiest and >> most efficient ways for this issue is to increase the number of Partitions >> and see how it works first, like try expanding by 100 first. >> >> Best, >> Ron >> >> Karthick <ibmkarthickma...@gmail.com> 于2023年9月17日周日 17:03写道: >> >>> Thanks Wei Chen, Giannis for the time, >>> >>> >>> For starters, you need to better size and estimate the required number >>>> of partitions you will need on the Kafka side in order to process 1000+ >>>> messages/second. >>>> The number of partitions should also define the maximum parallelism for >>>> the Flink job reading for Kafka. >>> >>> Thanks for the pointer, can you please guide on what are all the factors >>> we need to consider regarding this. >>> >>> use a custom partitioner that spreads those devices to somewhat separate >>>> partitions. >>> >>> Please suggest a working solution regarding the custom partitioner, to >>> distribute the load. It will be helpful. >>> >>> >>> What we were doing at that time was to define multiple topics and each >>>> has a different # of partitions >>> >>> Thanks for the suggestion, is there any calculation for choosing topics >>> count, is there are any formulae/factors to determine this topic number, >>> please let me know if available it will be helpful for us to choose that. >>> >>> Thanks and Regards >>> Karthick. >>> >>> >>> >>> On Sun, Sep 17, 2023 at 4:04 AM Wei Chen <sagitc...@qq.com> wrote: >>> >>>> Hi Karthick, >>>> We’ve experienced the similar issue before. What we were doing at that >>>> time was to define multiple topics and each has a different # of partitions >>>> which means some of the topics with more partitions will have the high >>>> parallelisms for processing. >>>> And you can further divide the topics into several groups and each >>>> group should have the similar # of partitions. For each group, you can >>>> define as the source of flink data stream to run them in parallel with >>>> different parallelism. >>>> >>>> >>>> >>>> >>>> ------------------------------ >>>> >>>> >>>> >>>> ------------------ Original ------------------ >>>> *From:* Giannis Polyzos <ipolyzos...@gmail.com> >>>> *Date:* Sat,Sep 16,2023 11:52 PM >>>> *To:* Karthick <ibmkarthickma...@gmail.com> >>>> *Cc:* Gowtham S <gowtham.co....@gmail.com>, user <user@flink.apache.org >>>> > >>>> *Subject:* Re: Urgent: Mitigating Slow Consumer Impact and Seeking >>>> Open-SourceSolutions in Apache Kafka Consumers >>>> >>>> Can you provide some more context on what your Flink job will be doing? >>>> There might be some things you can do to fix the data skew on the link >>>> side, but first, you want to start with Kafka. >>>> For starters, you need to better size and estimate the required number >>>> of partitions you will need on the Kafka side in order to process 1000+ >>>> messages/second. >>>> The number of partitions should also define the maximum parallelism for >>>> the Flink job reading for Kafka. >>>> If you know your "hot devices" in advance you might wanna use a custom >>>> partitioner that spreads those devices to somewhat separate partitions. >>>> Overall this is somewhat of a trial-and-error process. You might also >>>> want to check that these partitions are evenly balanced among your brokers >>>> and don't cause too much stress on particular brokers. >>>> >>>> Best >>>> >>>> On Sat, Sep 16, 2023 at 6:03 PM Karthick <ibmkarthickma...@gmail.com> >>>> wrote: >>>> >>>>> Hi Gowtham i agree with you, >>>>> >>>>> I'm eager to resolve the issue or gain a better understanding. Your >>>>> assistance would be greatly appreciated. >>>>> >>>>> If there are any additional details or context needed to address my >>>>> query effectively, please let me know, and I'll be happy to provide them. >>>>> >>>>> Thank you in advance for your time and consideration. I look forward >>>>> to hearing from you and benefiting from your expertise. >>>>> >>>>> Thanks and Regards >>>>> Karthick. >>>>> >>>>> On Sat, Sep 16, 2023 at 11:04 AM Gowtham S <gowtham.co....@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Karthik >>>>>> >>>>>> This appears to be a common challenge related to a slow-consuming >>>>>> situation. Those with relevant experience in addressing such matters >>>>>> should >>>>>> be capable of providing assistance. >>>>>> >>>>>> Thanks and regards, >>>>>> Gowtham S >>>>>> >>>>>> >>>>>> On Fri, 15 Sept 2023 at 23:06, Giannis Polyzos <ipolyzos...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Karthick, >>>>>>> >>>>>>> on a high level seems like a data skew issue and some partitions >>>>>>> have way more data than others? >>>>>>> What is the number of your devices? how many messages are you >>>>>>> processing? >>>>>>> Most of the things you share above sound like you are looking for >>>>>>> suggestions around load distribution for Kafka. i.e number of >>>>>>> partitions, >>>>>>> how to distribute your device data etc. >>>>>>> It would be good to also share what your flink job is doing as I >>>>>>> don't see anything mentioned around that.. are you observing back >>>>>>> pressure >>>>>>> in the Flink UI? >>>>>>> >>>>>>> Best >>>>>>> >>>>>>> On Fri, Sep 15, 2023 at 3:46 PM Karthick <ibmkarthickma...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Dear Apache Flink Community, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I am writing to urgently address a critical challenge we've >>>>>>>> encountered in our IoT platform that relies on Apache Kafka and >>>>>>>> real-time >>>>>>>> data processing. We believe this issue is of paramount importance and >>>>>>>> may >>>>>>>> have broad implications for the community. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> In our IoT ecosystem, we receive data streams from numerous >>>>>>>> devices, each uniquely identified. To maintain data integrity and >>>>>>>> ordering, >>>>>>>> we've meticulously configured a Kafka topic with ten partitions, >>>>>>>> ensuring >>>>>>>> that each device's data is directed to its respective partition based >>>>>>>> on >>>>>>>> its unique identifier. This architectural choice has proven effective >>>>>>>> in >>>>>>>> maintaining data order, but it has also unveiled a significant problem: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *One device's data processing slowness is interfering with other >>>>>>>> devices' data, causing a detrimental ripple effect throughout our >>>>>>>> system.* >>>>>>>> >>>>>>>> >>>>>>>> To put it simply, when a single device experiences processing >>>>>>>> delays, it acts as a bottleneck within the Kafka partition, leading to >>>>>>>> delays in processing data from other devices sharing the same >>>>>>>> partition. >>>>>>>> This issue undermines the efficiency and scalability of our entire data >>>>>>>> processing pipeline. >>>>>>>> >>>>>>>> Additionally, I would like to highlight that we are currently using >>>>>>>> the default partitioner for choosing the partition of each device's >>>>>>>> data. >>>>>>>> If there are alternative partitioning strategies that can help >>>>>>>> alleviate >>>>>>>> this problem, we are eager to explore them. >>>>>>>> >>>>>>>> We are in dire need of a high-scalability solution that not only >>>>>>>> ensures each device's data processing is independent but also prevents >>>>>>>> any >>>>>>>> interference or collisions between devices' data streams. Our primary >>>>>>>> objectives are: >>>>>>>> >>>>>>>> 1. *Isolation and Independence:* We require a strategy that >>>>>>>> guarantees one device's processing speed does not affect other devices >>>>>>>> in >>>>>>>> the same Kafka partition. In other words, we need a solution that >>>>>>>> ensures >>>>>>>> the independent processing of each device's data. >>>>>>>> >>>>>>>> >>>>>>>> 2. *Open-Source Implementation:* We are actively seeking pointers >>>>>>>> to open-source implementations or references to working solutions that >>>>>>>> address this specific challenge within the Apache ecosystem or any >>>>>>>> existing >>>>>>>> projects, libraries, or community-contributed solutions that align >>>>>>>> with our >>>>>>>> requirements would be immensely valuable. >>>>>>>> >>>>>>>> We recognize that many Apache Flink users face similar issues and >>>>>>>> may have already found innovative ways to tackle them. We implore you >>>>>>>> to >>>>>>>> share your knowledge and experiences on this matter. Specifically, we >>>>>>>> are >>>>>>>> interested in: >>>>>>>> >>>>>>>> *- Strategies or architectural patterns that ensure independent >>>>>>>> processing of device data.* >>>>>>>> >>>>>>>> *- Insights into load balancing, scalability, and efficient data >>>>>>>> processing across Kafka partitions.* >>>>>>>> >>>>>>>> *- Any existing open-source projects or implementations that >>>>>>>> address similar challenges.* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> We are confident that your contributions will not only help us >>>>>>>> resolve this critical issue but also assist the broader Apache Flink >>>>>>>> community facing similar obstacles. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Please respond to this thread with your expertise, solutions, or >>>>>>>> any relevant resources. Your support will be invaluable to our team >>>>>>>> and the >>>>>>>> entire Apache Flink community. >>>>>>>> >>>>>>>> Thank you for your prompt attention to this matter. >>>>>>>> >>>>>>>> >>>>>>>> Thanks & Regards >>>>>>>> >>>>>>>> Karthick. >>>>>>>> >>>>>>>