Hi Marc, Having a different set of metadata stored on the brokers to feed metadata requests from producers and consumers would be very tricky I think. For your use case, one thing you could try is use a customized partitioning function in the producer to only produce to a subset of the partitions depending on the traffic, and partitions that do not have any data coming in will eventually delete all the logs of the partition on that broker, hence effectively reduce the file handlers for both the logs and the sockets.
Guozhang On Tue, Jan 28, 2014 at 12:53 PM, Marc Labbe <mrla...@gmail.com> wrote: > Hi Guozhang, > > thinking out loud... delete then recreate works if it is acceptable to have > a topic specific downtime during which Kafka can't accept requests for that > topic. This downtime would last for the duration while the topic gets > deleted and then recreated. I am assuming here that a producer sending data > for a topic, while it is being deleted and before it is recreated, will > receive an error. The error will be pushed to clients if this process lasts > longer than the time allowed for retries and number of retries configured > on the producer. In our case, the producer is a web service. > > It may be acceptable if we do this maintenance during low use periods and > the process is rapid enough (guessing within 30s). Our clients have means > to resend messages when an error occurs but we may still lose messages if > it lasts too long. E.g. the client may be shutdown with pending messages. I > would like to avoid buffering in the web service as much as possible. > > What if you don't merge partitions and simply keep shrunk partitions until > log segments are rolled out and deleted? The only thing you have to worry > about is to prevent producers from sending data to those partitions by > having a producer specific metadata which doesn't contain the partitions to > be deleted? This has the impact of having a different set of metadata for > topics depending on if you are producer or consumer, which isn't so nice > though. > > I admit this is probably way more simplistic than it really is... > > marc > > > > On Mon, Jan 27, 2014 at 7:24 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > > Siyuan, Marc: > > > > We are currently working on topic-deletion supports > > (KAFKA-330<https://issues.apache.org/jira/browse/KAFKA-330>), > > would first-delete-then-recreate-with-fewer-partitions work for your > cases? > > The reason why we are trying to avoid shrinking partition is that it > would > > make the logic very complicated. For example, we need to think about > > within-partition ordering guarantee with partition merging and > > producing-in-progress simultaneously. > > > > Guozhang > > > > > > On Mon, Jan 27, 2014 at 12:35 PM, Marc Labbe <mrla...@gmail.com> wrote: > > > > > I have the same need, and I've just created a Jira: > > > https://issues.apache.org/jira/browse/KAFKA-1231 > > > > > > The reasoning behind it is because our topics are created on a per > > product > > > basis and each of them usually starts big during the initial weeks and > > > gradually reduces in time (1-2 years). > > > > > > thanks > > > marc > > > > > > > > > On Thu, Dec 5, 2013 at 7:45 PM, Guozhang Wang <wangg...@gmail.com> > > wrote: > > > > > > > Hi Siyuan, > > > > > > > > We do not have a tool to shrink the number of partitions (if that is > > what > > > > you want) for a topic at runtime yet. Could you file a JIRA for this? > > > > > > > > Guozhang > > > > > > > > > > > > On Thu, Dec 5, 2013 at 2:16 PM, hsy...@gmail.com <hsy...@gmail.com> > > > wrote: > > > > > > > > > Hi guys, > > > > > > > > > > I found there is a tool to add partition on the fly. My question > is, > > is > > > > > there a way to delete a partition at runtime? Thanks! > > > > > > > > > > Best, > > > > > Siyuan > > > > > > > > > > > > > > > > > > > > > -- > > > > -- Guozhang > > > > > > > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang