see some comments inline

On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash <
achanta.va...@flipkart.com> wrote:
>
> We require:
> - many topics
> - ordering of messages for every topic
>

Ordering is only on a per partition basis so you might have to pick a
partition key that makes sense for what you are doing.


> - Consumers hit different Http EndPoints which may be slow (in a push
> model). In case of a Pull model, consumers may pull at the rate at which
> they can process.
> - We need parallelism to hit with as many consumers. Hence, we currently
> have around 50 consumers/topic => 50 partitions.
>

I think you might be mixing up the fetch with the processing. You can have
1 partition and still have 50 message being processed in parallel (so a
batch of messages).

What language are you working in? How are you doing this processing
exactly?


>
> Currently we have:
> 2000 topics x 50 => 1,00,000 partitions.
>

If this is really the case then you are going to need at least 250 brokers
(~ 4,000 partitions per broker).

If you do that then you are in the 200TB per day world which doesn't sound
to be the case.

I really think you need to strategize more on your processing model some
more.


>
> The incoming rate of ingestion at max is 100 MB/sec. We are planning for a
> big cluster with many brokers.


It is possible to handle this on just 3 brokers depending on message size,
ability to batch, durability are also factors you really need to be
thinking about.


>
> We have exactly the same use cases as mentioned in this video (usage at
> LinkedIn):
> https://www.youtube.com/watch?v=19DvtEC0EbQ​
>
> ​To handle the zookeeper scenario, as mentioned in the above video, we are
> planning to use SSDs​ and would upgrade to the new consumer (0.9+) once its
> available as per the below video.
> https://www.youtube.com/watch?v=7TZiN521FQA
>
> On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar
> <j_thak...@yahoo.com.invalid
> > wrote:
>
> > Technically/conceptually it is possible to have 200,000 topics, but do
> you
> > really need it like that?What do you intend to do with those messages -
> > i.e. how do you forsee them being processed downstream? And are those
> > topics really there to segregate different kinds of processing or
> different
> > ids?E.g. if you were LinkedIn, Facebook or Google, would you have have
> one
> > topic per user or one topic per kind of event (e.g. login, pageview,
> > adview, etc.)Remember there is significant book-keeping done within
> > Zookeeper - and these many topics will make that book-keeping
> significant.
> > As for storage, I don't think it should be an issue with sufficient
> > spindles, servers and higher than default memory configuration.
> > Jayesh
> >       From: Achanta Vamsi Subhash <achanta.va...@flipkart.com>
> >  To: "users@kafka.apache.org" <users@kafka.apache.org>
> >  Sent: Friday, December 19, 2014 9:00 AM
> >  Subject: Re: Max. storage for Kafka and impact
> >
> > Yes. We need those many max partitions as we have a central messaging
> > service and thousands of topics.
> >
> > On Friday, December 19, 2014, nitin sharma <kumarsharma.ni...@gmail.com>
> > wrote:
> >
> > > hi,
> > >
> > > Few things you have to plan for:
> > > a. Ensure that from resilience point of view, you are having sufficient
> > > follower brokers for your partitions.
> > > b. In my testing of kafka (50TB/week) so far, haven't seen much issue
> > with
> > > CPU utilization or memory. I had 24 CPU and 32GB RAM.
> > > c. 200,000 partitions means around 1MB/week/partition. are you sure you
> > > need so many partitions?
> > >
> > > Regards,
> > > Nitin Kumar Sharma.
> > >
> > >
> > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
> > > achanta.va...@flipkart.com <javascript:;>> wrote:
> > > >
> > > > We definitely need a retention policy of a week. Hence.
> > > >
> > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
> > > > achanta.va...@flipkart.com <javascript:;>> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > We are using Kafka for our messaging system and we have an estimate
> > for
> > > > > 200 TB/week in the coming months. Will it impact any performance
> for
> > > > Kafka?
> > > > >
> > > > > PS: We will be having greater than 2 lakh partitions.
> >
> >
> > > > >
> > > > > --
> > > > > Regards
> > > > > Vamsi Subhash
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards
> > > > Vamsi Subhash
> > > >
> > >
> >
> >
> > --
> > Regards
> > Vamsi Subhash
> >
> >
> >
> >
>
>
>
> --
> Regards
> Vamsi Subhash
>

Reply via email to