It's still not clear to me why you need to create so many topics. 

Write the data to a single topic and consume it when it arrives. It doesn't 
matter if it arrives in bursts, as long as you can process it all within 6 
minutes, right?

And if you can't consume it all within 6 minutes, partition the topic until you 
can run enough consumers such that you can keep up. The fact that you are 
thinking about so many topics is a sign your design is wrong, or Kafka is the 
wrong solution. 

Philip

> On Aug 11, 2014, at 5:18 PM, Chen Wang <chen.apache.s...@gmail.com> wrote:
> 
> Philip,
> That is right. There is huge amount of data flushed into the topic within 
> each 6 minutes. Then at the end of each 6 min, I only want to read from that 
> specify topic, and data within that topic has to be processed as fast as 
> possible. I was originally using redis queue for this purpose, but it takes 
> much longer to process a redis queue than kafka queue(testing data is 2M 
> messages). Since we already have kafka infrastructure setup, instead of 
> seeking other tools(activeMQ, rabbitMQ etc), I would rather make use of 
> kafka, although it does not seem like a common kafka user case.
> 
> Chen
> 
> 
>> On Mon, Aug 11, 2014 at 5:01 PM, Philip O'Toole 
>> <philip.oto...@yahoo.com.invalid> wrote:
>> I'd love to know more about what you're trying to do here. It sounds like 
>> you're trying to create topics on a schedule, trying to make it easy to 
>> locate data for a given time range? I'm not sure it makes sense to use Kafka 
>> in this manner.
>> 
>> Can you provide more detail?
>> 
>> 
>> Philip
>> 
>>  
>> -----------------------------------------
>> http://www.philipotoole.com
>> 
>> 
>> On Monday, August 11, 2014 4:45 PM, Chen Wang <chen.apache.s...@gmail.com> 
>> wrote:
>> 
>> 
>> 
>> Todd,
>> I actually only intend to keep each topic valid for 3 days most. Each of
>> our topic has 3 partitions, so its around 3*240*3 =2160 partitions. Since
>> there is no api for deleting topic, i guess i could set up a cron job
>> deleting the out dated topics(folders) from zookeeper..
>> do you know when the delete topic api will be available in kafka?
>> Chen
>> 
>> 
>> 
>> On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino <tpal...@linkedin.com.invalid>
>> wrote:
>> 
>> > You need to consider your total partition count as you do this. After 30
>> > days, assuming 1 partition per topic, you have 7200 partitions. Depending
>> > on how many brokers you have, this can start to be a problem. We just
>> > found an issue on one of our clusters that has over 70k partitions that
>> > there¹s now a problem with doing actions like a preferred replica election
>> > for all topics because the JSON object that gets written to the zookeeper
>> > node to trigger it is too large for Zookeeper¹s default 1 MB data size.
>> >
>> > You also need to think about the number of open file handles. Even with no
>> > data, there will be open files for each topic.
>> >
>> > -Todd
>> >
>> >
>> > On 8/11/14, 2:19 PM, "Chen Wang" <chen.apache.s...@gmail.com> wrote:
>> >
>> > >Folks,
>> > >Is there any potential issue with creating 240 topics every day? Although
>> > >the retention of each topic is set to be 2 days, I am a little concerned
>> > >that since right now there is no delete topic api, the zookeepers might be
>> > >overloaded.
>> > >Thanks,
>> > >Chen
>> >
>> >
> 

Reply via email to