Re: Max. storage for Kafka and impact

Achanta Vamsi Subhash Fri, 19 Dec 2014 09:41:01 -0800

Joe,

- Correction, it's 1,00,000 partitions
- We can have at max only 1 consumer/partition. Not 50 per 1 partition.
Yes, we have a hashing mechanism to support future partition increase as
well. We override the Default Partitioner.
- We use both Simple and HighLevel consumers depending on the consumption
use-case.
- I clearly mentioned that 200 TB/week and not a day.
- We have separate producers and consumers, each operating as different
processes in different machines.


I was explaining why we may end up with so many partitions. I think the
question about 200 TB/day got deviated.

Any suggestions reg. the performance impact of the 200TB/week?

On Fri, Dec 19, 2014 at 10:53 PM, Joe Stein <joe.st...@stealth.ly> wrote:
>
> Wait, how do you get 2,000 topics each with 50 partitions == 1,000,000
> partitions? I think you can take what I said below and change my 250 to 25
> as I went with your result (1,000,000) and not your arguments (2,000 x 50).
>
> And you should think on the processing as a separate step from fetch and
> commit your offset in batch post processing. Then you only need more
> partitions to fetch batches to process in parallel.
>
> Regards, Joestein
>
> On Fri, Dec 19, 2014 at 12:01 PM, Joe Stein <joe.st...@stealth.ly> wrote:
> >
> > see some comments inline
> >
> > On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash <
> > achanta.va...@flipkart.com> wrote:
> >>
> >> We require:
> >> - many topics
> >> - ordering of messages for every topic
> >>
> >
> > Ordering is only on a per partition basis so you might have to pick a
> > partition key that makes sense for what you are doing.
> >
> >
> >> - Consumers hit different Http EndPoints which may be slow (in a push
> >> model). In case of a Pull model, consumers may pull at the rate at which
> >> they can process.
> >> - We need parallelism to hit with as many consumers. Hence, we currently
> >> have around 50 consumers/topic => 50 partitions.
> >>
> >
> > I think you might be mixing up the fetch with the processing. You can
> have
> > 1 partition and still have 50 message being processed in parallel (so a
> > batch of messages).
> >
> > What language are you working in? How are you doing this processing
> > exactly?
> >
> >
> >>
> >> Currently we have:
> >> 2000 topics x 50 => 1,00,000 partitions.
> >>
> >
> > If this is really the case then you are going to need at least 250
> brokers
> > (~ 4,000 partitions per broker).
> >
> > If you do that then you are in the 200TB per day world which doesn't
> sound
> > to be the case.
> >
> > I really think you need to strategize more on your processing model some
> > more.
> >
> >
> >>
> >> The incoming rate of ingestion at max is 100 MB/sec. We are planning
> for a
> >> big cluster with many brokers.
> >
> >
> > It is possible to handle this on just 3 brokers depending on message
> size,
> > ability to batch, durability are also factors you really need to be
> > thinking about.
> >
> >
> >>
> >> We have exactly the same use cases as mentioned in this video (usage at
> >> LinkedIn):
> >> https://www.youtube.com/watch?v=19DvtEC0EbQ
> >>
> >> To handle the zookeeper scenario, as mentioned in the above video, we
> are
> >> planning to use SSDs and would upgrade to the new consumer (0.9+) once
> >> its
> >> available as per the below video.
> >> https://www.youtube.com/watch?v=7TZiN521FQA
> >>
> >> On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar
> >> <j_thak...@yahoo.com.invalid
> >> > wrote:
> >>
> >> > Technically/conceptually it is possible to have 200,000 topics, but do
> >> you
> >> > really need it like that?What do you intend to do with those messages
> -
> >> > i.e. how do you forsee them being processed downstream? And are those
> >> > topics really there to segregate different kinds of processing or
> >> different
> >> > ids?E.g. if you were LinkedIn, Facebook or Google, would you have have
> >> one
> >> > topic per user or one topic per kind of event (e.g. login, pageview,
> >> > adview, etc.)Remember there is significant book-keeping done within
> >> > Zookeeper - and these many topics will make that book-keeping
> >> significant.
> >> > As for storage, I don't think it should be an issue with sufficient
> >> > spindles, servers and higher than default memory configuration.
> >> > Jayesh
> >> >       From: Achanta Vamsi Subhash <achanta.va...@flipkart.com>
> >> >  To: "users@kafka.apache.org" <users@kafka.apache.org>
> >> >  Sent: Friday, December 19, 2014 9:00 AM
> >> >  Subject: Re: Max. storage for Kafka and impact
> >> >
> >> > Yes. We need those many max partitions as we have a central messaging
> >> > service and thousands of topics.
> >> >
> >> > On Friday, December 19, 2014, nitin sharma <
> kumarsharma.ni...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> > > hi,
> >> > >
> >> > > Few things you have to plan for:
> >> > > a. Ensure that from resilience point of view, you are having
> >> sufficient
> >> > > follower brokers for your partitions.
> >> > > b. In my testing of kafka (50TB/week) so far, haven't seen much
> issue
> >> > with
> >> > > CPU utilization or memory. I had 24 CPU and 32GB RAM.
> >> > > c. 200,000 partitions means around 1MB/week/partition. are you sure
> >> you
> >> > > need so many partitions?
> >> > >
> >> > > Regards,
> >> > > Nitin Kumar Sharma.
> >> > >
> >> > >
> >> > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
> >> > > achanta.va...@flipkart.com <javascript:;>> wrote:
> >> > > >
> >> > > > We definitely need a retention policy of a week. Hence.
> >> > > >
> >> > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
> >> > > > achanta.va...@flipkart.com <javascript:;>> wrote:
> >> > > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > We are using Kafka for our messaging system and we have an
> >> estimate
> >> > for
> >> > > > > 200 TB/week in the coming months. Will it impact any performance
> >> for
> >> > > > Kafka?
> >> > > > >
> >> > > > > PS: We will be having greater than 2 lakh partitions.
> >> >
> >> >
> >> > > > >
> >> > > > > --
> >> > > > > Regards
> >> > > > > Vamsi Subhash
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Regards
> >> > > > Vamsi Subhash
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Regards
> >> > Vamsi Subhash
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Regards
> >> Vamsi Subhash
> >>
> >
>


-- 
Regards
Vamsi Subhash

Re: Max. storage for Kafka and impact

Reply via email to