Joe, - Correction, it's 1,00,000 partitions - We can have at max only 1 consumer/partition. Not 50 per 1 partition. Yes, we have a hashing mechanism to support future partition increase as well. We override the Default Partitioner. - We use both Simple and HighLevel consumers depending on the consumption use-case. - I clearly mentioned that 200 TB/week and not a day. - We have separate producers and consumers, each operating as different processes in different machines.
I was explaining why we may end up with so many partitions. I think the question about 200 TB/day got deviated. Any suggestions reg. the performance impact of the 200TB/week? On Fri, Dec 19, 2014 at 10:53 PM, Joe Stein <joe.st...@stealth.ly> wrote: > > Wait, how do you get 2,000 topics each with 50 partitions == 1,000,000 > partitions? I think you can take what I said below and change my 250 to 25 > as I went with your result (1,000,000) and not your arguments (2,000 x 50). > > And you should think on the processing as a separate step from fetch and > commit your offset in batch post processing. Then you only need more > partitions to fetch batches to process in parallel. > > Regards, Joestein > > On Fri, Dec 19, 2014 at 12:01 PM, Joe Stein <joe.st...@stealth.ly> wrote: > > > > see some comments inline > > > > On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash < > > achanta.va...@flipkart.com> wrote: > >> > >> We require: > >> - many topics > >> - ordering of messages for every topic > >> > > > > Ordering is only on a per partition basis so you might have to pick a > > partition key that makes sense for what you are doing. > > > > > >> - Consumers hit different Http EndPoints which may be slow (in a push > >> model). In case of a Pull model, consumers may pull at the rate at which > >> they can process. > >> - We need parallelism to hit with as many consumers. Hence, we currently > >> have around 50 consumers/topic => 50 partitions. > >> > > > > I think you might be mixing up the fetch with the processing. You can > have > > 1 partition and still have 50 message being processed in parallel (so a > > batch of messages). > > > > What language are you working in? How are you doing this processing > > exactly? > > > > > >> > >> Currently we have: > >> 2000 topics x 50 => 1,00,000 partitions. > >> > > > > If this is really the case then you are going to need at least 250 > brokers > > (~ 4,000 partitions per broker). > > > > If you do that then you are in the 200TB per day world which doesn't > sound > > to be the case. > > > > I really think you need to strategize more on your processing model some > > more. > > > > > >> > >> The incoming rate of ingestion at max is 100 MB/sec. We are planning > for a > >> big cluster with many brokers. > > > > > > It is possible to handle this on just 3 brokers depending on message > size, > > ability to batch, durability are also factors you really need to be > > thinking about. > > > > > >> > >> We have exactly the same use cases as mentioned in this video (usage at > >> LinkedIn): > >> https://www.youtube.com/watch?v=19DvtEC0EbQ > >> > >> To handle the zookeeper scenario, as mentioned in the above video, we > are > >> planning to use SSDs and would upgrade to the new consumer (0.9+) once > >> its > >> available as per the below video. > >> https://www.youtube.com/watch?v=7TZiN521FQA > >> > >> On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar > >> <j_thak...@yahoo.com.invalid > >> > wrote: > >> > >> > Technically/conceptually it is possible to have 200,000 topics, but do > >> you > >> > really need it like that?What do you intend to do with those messages > - > >> > i.e. how do you forsee them being processed downstream? And are those > >> > topics really there to segregate different kinds of processing or > >> different > >> > ids?E.g. if you were LinkedIn, Facebook or Google, would you have have > >> one > >> > topic per user or one topic per kind of event (e.g. login, pageview, > >> > adview, etc.)Remember there is significant book-keeping done within > >> > Zookeeper - and these many topics will make that book-keeping > >> significant. > >> > As for storage, I don't think it should be an issue with sufficient > >> > spindles, servers and higher than default memory configuration. > >> > Jayesh > >> > From: Achanta Vamsi Subhash <achanta.va...@flipkart.com> > >> > To: "users@kafka.apache.org" <users@kafka.apache.org> > >> > Sent: Friday, December 19, 2014 9:00 AM > >> > Subject: Re: Max. storage for Kafka and impact > >> > > >> > Yes. We need those many max partitions as we have a central messaging > >> > service and thousands of topics. > >> > > >> > On Friday, December 19, 2014, nitin sharma < > kumarsharma.ni...@gmail.com > >> > > >> > wrote: > >> > > >> > > hi, > >> > > > >> > > Few things you have to plan for: > >> > > a. Ensure that from resilience point of view, you are having > >> sufficient > >> > > follower brokers for your partitions. > >> > > b. In my testing of kafka (50TB/week) so far, haven't seen much > issue > >> > with > >> > > CPU utilization or memory. I had 24 CPU and 32GB RAM. > >> > > c. 200,000 partitions means around 1MB/week/partition. are you sure > >> you > >> > > need so many partitions? > >> > > > >> > > Regards, > >> > > Nitin Kumar Sharma. > >> > > > >> > > > >> > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash < > >> > > achanta.va...@flipkart.com <javascript:;>> wrote: > >> > > > > >> > > > We definitely need a retention policy of a week. Hence. > >> > > > > >> > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash < > >> > > > achanta.va...@flipkart.com <javascript:;>> wrote: > >> > > > > > >> > > > > Hi, > >> > > > > > >> > > > > We are using Kafka for our messaging system and we have an > >> estimate > >> > for > >> > > > > 200 TB/week in the coming months. Will it impact any performance > >> for > >> > > > Kafka? > >> > > > > > >> > > > > PS: We will be having greater than 2 lakh partitions. > >> > > >> > > >> > > > > > >> > > > > -- > >> > > > > Regards > >> > > > > Vamsi Subhash > >> > > > > > >> > > > > >> > > > > >> > > > -- > >> > > > Regards > >> > > > Vamsi Subhash > >> > > > > >> > > > >> > > >> > > >> > -- > >> > Regards > >> > Vamsi Subhash > >> > > >> > > >> > > >> > > >> > >> > >> > >> -- > >> Regards > >> Vamsi Subhash > >> > > > -- Regards Vamsi Subhash