I would like to second that. It would be real useful. Philip
On Oct 8, 2013, at 9:31 AM, Jason Rosenberg <j...@squareup.com> wrote: > What I would like to see is a way for inactive topics to automatically get > removed after they are inactive for a period of time. That might help in > this case. > > I added a comment to this larger jira: > https://issues.apache.org/jira/browse/KAFKA-330 > > Perhaps it should really be it's own jira entry. > > Jason > > > On Tue, Oct 8, 2013 at 10:29 AM, Aniket Bhatnagar < > aniket.bhatna...@gmail.com> wrote: > >> Thanks Neha. Is it worthwhile to investigate an option to store topic >> metadata (partitions, etc) into another consistent data store (MySQL, >> HBase, etc)? Should we make this feature pluggable? >> >> The reason I am thinking we may need to go surpass the 2000 total partition >> limit is because there may be genuine use cases to have high number of >> topics. For example, in my particular case, I am using Kafka as a buffer to >> store data arriving from various sensors deployed in physical world. These >> sensors may be short lived or may be long lived. I was thinking of having >> individual topics for each sensor. This ways, if a badly behaving sensor >> attempts to pushes the data at a much faster rate than we can process as a >> Kafka consumer, we will eventually overflow and start losing data for that >> particular sensor. However, we can still potentially continue to process >> data from other sensors that are pushing data at manageable rate. If I go >> with 1 topic for all the sensors, 1 misbehaving sensor can potentially lead >> us not catching up with the topic in the retention period thus making us >> loose data from all sensors. >> >> The other issue is that if we go with a topic per sensor and the sensors >> are short lived and we have reached a threshold of 2000 sensors already >> deployed, Kafka will stop working (because of Zookeeper limitation) if >> though the previously deployed sensors may not be active at all. >> >> I am sure there may be other genuine use cases for having topics much >> larger than 2000. >> >> >> On 4 October 2013 19:04, Neha Narkhede <neha.narkh...@gmail.com> wrote: >> >>> You probably want to think of this in terms of number of partitions on a >>> single broker, instead of per topic since I/O is the limiting factor in >>> this case. Another factor to consider is total number of partitions in >> the >>> cluster as Zookeeper becomes a limiting factor there. 30 partitions is >> not >>> too large provided the total number of partitions doesn't exceed roughly >>> couple thousand. To give you an example, some of our clusters are 16 >> nodes >>> big and some of the topics on those clusters have 30 partitions. >>> >>> Thanks, >>> Neha >>> On Oct 4, 2013 4:15 AM, "Aniket Bhatnagar" <aniket.bhatna...@gmail.com> >>> wrote: >>> >>>> I am using kafka as a buffer for data streaming in from various >> sources. >>>> Since its a time series data, I generate the key to the message by >>>> combining source ID and minute in the timestamp. This means I can >> utmost >>>> have 60 partitions per topic (as each source has its own topic). I have >>>> set num.partitions to be 30 (60/2) for each topic in broker config. I >>> don't >>>> have a very good reason to pick 30 as default number of partitions per >>>> topic but I wanted it to be a high number so that I can achieve high >>>> parallelism during in-stream processing. I am worried that having a >> high >>>> number like 30 (default configuration had it as 2), it can negatively >>>> impact kafka performance in terms of message throughput or memory >>>> consumption. I understand that this can lead to many files per >> partition >>>> but I am thinking of dealing with it by having multiple directories on >>> the >>>> same disk if at all I run into issues. >>>> >>>> My question to the community is that am I prematurely attempting to >>>> optimizing the partition number as right now even a partition number >> of 5 >>>> seems sufficient and hence will run into unwanted issues? Or is 30 an >> Ok >>>> number to use for number of partitions? >>