Wait, how do you get 2,000 topics each with 50 partitions == 1,000,000
partitions? I think you can take what I said below and change my 250 to 25
as I went with your result (1,000,000) and not your arguments (2,000 x 50).

And you should think on the processing as a separate step from fetch and
commit your offset in batch post processing. Then you only need more
partitions to fetch batches to process in parallel.

Regards, Joestein

On Fri, Dec 19, 2014 at 12:01 PM, Joe Stein <joe.st...@stealth.ly> wrote:
>
> see some comments inline
>
> On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash <
> achanta.va...@flipkart.com> wrote:
>>
>> We require:
>> - many topics
>> - ordering of messages for every topic
>>
>
> Ordering is only on a per partition basis so you might have to pick a
> partition key that makes sense for what you are doing.
>
>
>> - Consumers hit different Http EndPoints which may be slow (in a push
>> model). In case of a Pull model, consumers may pull at the rate at which
>> they can process.
>> - We need parallelism to hit with as many consumers. Hence, we currently
>> have around 50 consumers/topic => 50 partitions.
>>
>
> I think you might be mixing up the fetch with the processing. You can have
> 1 partition and still have 50 message being processed in parallel (so a
> batch of messages).
>
> What language are you working in? How are you doing this processing
> exactly?
>
>
>>
>> Currently we have:
>> 2000 topics x 50 => 1,00,000 partitions.
>>
>
> If this is really the case then you are going to need at least 250 brokers
> (~ 4,000 partitions per broker).
>
> If you do that then you are in the 200TB per day world which doesn't sound
> to be the case.
>
> I really think you need to strategize more on your processing model some
> more.
>
>
>>
>> The incoming rate of ingestion at max is 100 MB/sec. We are planning for a
>> big cluster with many brokers.
>
>
> It is possible to handle this on just 3 brokers depending on message size,
> ability to batch, durability are also factors you really need to be
> thinking about.
>
>
>>
>> We have exactly the same use cases as mentioned in this video (usage at
>> LinkedIn):
>> https://www.youtube.com/watch?v=19DvtEC0EbQ​
>>
>> ​To handle the zookeeper scenario, as mentioned in the above video, we are
>> planning to use SSDs​ and would upgrade to the new consumer (0.9+) once
>> its
>> available as per the below video.
>> https://www.youtube.com/watch?v=7TZiN521FQA
>>
>> On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar
>> <j_thak...@yahoo.com.invalid
>> > wrote:
>>
>> > Technically/conceptually it is possible to have 200,000 topics, but do
>> you
>> > really need it like that?What do you intend to do with those messages -
>> > i.e. how do you forsee them being processed downstream? And are those
>> > topics really there to segregate different kinds of processing or
>> different
>> > ids?E.g. if you were LinkedIn, Facebook or Google, would you have have
>> one
>> > topic per user or one topic per kind of event (e.g. login, pageview,
>> > adview, etc.)Remember there is significant book-keeping done within
>> > Zookeeper - and these many topics will make that book-keeping
>> significant.
>> > As for storage, I don't think it should be an issue with sufficient
>> > spindles, servers and higher than default memory configuration.
>> > Jayesh
>> >       From: Achanta Vamsi Subhash <achanta.va...@flipkart.com>
>> >  To: "users@kafka.apache.org" <users@kafka.apache.org>
>> >  Sent: Friday, December 19, 2014 9:00 AM
>> >  Subject: Re: Max. storage for Kafka and impact
>> >
>> > Yes. We need those many max partitions as we have a central messaging
>> > service and thousands of topics.
>> >
>> > On Friday, December 19, 2014, nitin sharma <kumarsharma.ni...@gmail.com
>> >
>> > wrote:
>> >
>> > > hi,
>> > >
>> > > Few things you have to plan for:
>> > > a. Ensure that from resilience point of view, you are having
>> sufficient
>> > > follower brokers for your partitions.
>> > > b. In my testing of kafka (50TB/week) so far, haven't seen much issue
>> > with
>> > > CPU utilization or memory. I had 24 CPU and 32GB RAM.
>> > > c. 200,000 partitions means around 1MB/week/partition. are you sure
>> you
>> > > need so many partitions?
>> > >
>> > > Regards,
>> > > Nitin Kumar Sharma.
>> > >
>> > >
>> > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
>> > > achanta.va...@flipkart.com <javascript:;>> wrote:
>> > > >
>> > > > We definitely need a retention policy of a week. Hence.
>> > > >
>> > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
>> > > > achanta.va...@flipkart.com <javascript:;>> wrote:
>> > > > >
>> > > > > Hi,
>> > > > >
>> > > > > We are using Kafka for our messaging system and we have an
>> estimate
>> > for
>> > > > > 200 TB/week in the coming months. Will it impact any performance
>> for
>> > > > Kafka?
>> > > > >
>> > > > > PS: We will be having greater than 2 lakh partitions.
>> >
>> >
>> > > > >
>> > > > > --
>> > > > > Regards
>> > > > > Vamsi Subhash
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Regards
>> > > > Vamsi Subhash
>> > > >
>> > >
>> >
>> >
>> > --
>> > Regards
>> > Vamsi Subhash
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Regards
>> Vamsi Subhash
>>
>

Reply via email to